A G.729 and G.723.1 Based Multi-Channel Speech Mixing Method for Multi-Point Conferencing System

Abstract

In a multi-point conference, users are offered a substitute for a face-to-face meeting within the economic constraints of the technology available. In this situation, an audio mixing scheme is needed to make the meeting successful. Audio mixing can create a full-duplex conversation environment that users can speak at any moment. Furthermore, it can be used in entertainment applications, such as audio chat rooms and online games.

Full decoding method is an intuitive and traditional audio mixing method, but it requires high computational complexity and long processing time. In this work ,we propose a partial decoding method based on CELP coding architecture. This method selects a target frame as the mixed output from all incoming frames. There is no need for any encoding and decoding processes. Partial decoding method can be directly applied to CELP based speech coding, such as G.729 and G.723.1 speech standards. It achieves excellent voice quality as the full decoding method does while it only requires 5% to 8% computation loading.