Spatial characteristic based scalable audio coding structure

Abstract

 

MPEG Layer-3 (MP3) and MPEG-4 Advance Audio Coding (AAC) exhibit high coding efficiency by utilizing the psychoacoustic model to remove the masked frequency components. However, the psychoacoustic model aims at the analysis of single channel audio signals without considering the correlation between audio channels. As a result, adding more audio channels to encode will result in an approximately linear increase of the total required transmission bit-rate. The Spatial Audio Coding (SAC) technology exploits human perceptual capability to locate sound in space. It captures and encodes the spatial characteristic parameters at the encoder. At the decoder, the sound field can be reconstructed from fewer audio channels with spatial parameters.

 

In this work, we propose a scalable audio coding scheme, which is based on spatial audio coding techniques including parametric stereo and MPEG surround, to transmit multi-channel audio through networks. More audio channels and better quality can be obtained with more enhancement layers received. We also observe that when uncorrelated signals, such as dialogs, exist in multi-channel signals the reconstructed audio suffers from serious interference. In this case, we execute inter-channel interference processing to encode the uncorrelated part individually. The experimental results show excellent subjective as well as objective quality improvement.