Spatial characteristic based scalable audio
coding structure
Abstract
MPEG
Layer-3 (MP3) and MPEG-4 Advance Audio Coding (AAC) exhibit high coding
efficiency by utilizing the psychoacoustic model to remove the masked
frequency components. However, the psychoacoustic model aims at the analysis
of single channel audio signals without considering the correlation between
audio channels. As a result, adding more audio channels to encode will result
in an approximately linear increase of the total required transmission
bit-rate. The Spatial Audio Coding (SAC) technology exploits human perceptual
capability to locate sound in space. It captures and encodes the spatial
characteristic parameters at the encoder. At the decoder, the sound field can
be reconstructed from fewer audio channels with spatial parameters.
In
this work, we propose a scalable audio coding scheme, which is based on
spatial audio coding techniques including parametric stereo and MPEG surround,
to transmit multi-channel audio through networks. More audio channels and better quality
can be obtained with more enhancement layers received. We also observe that
when uncorrelated signals, such as dialogs, exist in multi-channel signals
the reconstructed audio suffers from serious interference. In this case, we
execute inter-channel interference processing to encode the uncorrelated part
individually. The experimental results show excellent subjective as well as
objective quality improvement.
|