運用G.729與G.723.1於多點會議系統之

多聲道語音混合方法

摘要

    音訊混合(audio mixing)是網路音訊會議中不可缺少的機制,其重要性除了提供與真實會議現場一樣的發言環境,讓每位與會者在網路上以全雙工(full-duplex)的方式進行交談,尚可應用在網路連線遊戲及語音聊天室等娛樂用途。傳統上最佳的語音混合方法是使用完全解碼(full decoding)的方式,在過程上必需進行語音的壓縮及解壓縮處理,造成運算複雜度過高與時間延遲長的缺點。

    為此,本論文提出一套部份解碼(partial decoding)方式的語音混合方法,利用語音訊號的特性,針對多個待混合的已壓縮音訊訊號,以碼框(frame)為單位,分析代表各音訊所需的音訊參數,選出一組目標音訊參數,作為混合後的音訊輸出。該組目標音訊參數亦符合原壓縮方法之壓縮格式,同時,可混合多個輸入的音訊。可運用在G.729與G.723.1語音壓縮標準上,並有效地降低運算複雜度為完全解碼法的5%至8%,且可得到與完全解碼法相同的混音品質。
  

A G.729 and G.723.1 Based Multi-Channel Speech Mixing Method for Multi-Point Conferencing System

Abstract

    In a multi-point conference, users are offered a substitute for a face-to-face meeting within the economic constraints of the technology available. In this situation, an audio mixing scheme is needed to make the meeting successful. Audio mixing can create a full-duplex conversation environment that users can speak at any moment. Furthermore, it can be used in entertainment applications, such as audio chat rooms and online games.
    Full decoding method is an intuitive and traditional audio mixing method, but it requires high computational complexity and long processing time. In this work ,we propose a partial decoding method based on CELP coding architecture. This method selects a target frame as the mixed output from all incoming frames. There is no need for any encoding and decoding processes. Partial decoding method can be directly applied to CELP based speech coding, such as G.729 and G.723.1 speech standards. It achieves excellent voice quality as the full decoding method does while it only requires 5% to 8% computation loading.