運用小波分頻與零樹編碼配合人耳聲學模型的音訊壓縮系統

摘要

以小波分頻的訊號壓縮技術已被廣泛地應用在音視訊編碼系統中,而漸進式零樹編碼方法(Embedded Zero-Tree Coding)更被證實可成功地運用在靜態影像壓縮上;本論文以研究樂音(Audio)的壓縮編碼方法為主,提出M-EZWP (Masking-Embedded Zero-tree Wavelet Packet)系統,以小波封包(Wavelet Packet)分頻方式,將樂音訊號經由濾波器群組分成29個次頻帶,其頻寬分布與人類聽覺的26個關鍵頻帶(Critical Band)相近,藉以找出人耳聲學模型(Psychoacoustic Model)中的最小遮蔽臨界值(Minimum Masking Threshold),此值將輸入零樹編碼方塊中,藉由零樹編碼將每個次頻帶的係數依照其重要性程度予以編碼傳送,可大幅降低位元率,並由於其擁有漸進式傳輸(Embedded)的特性,可依通道的狀況及不同的品質需求達成可變位元率(variable bitrate)的傳送,CD品質單聲道樂音位元率可達40Kbps,解碼後的樂音品質與MPEG audio Layer II相比,可達到聽覺上更好的效果。

Audio Compression Using Wavelet Packets

and a Zero-Tree Coder

with Psychoacoustic Modeling

Abstract

The wavelet filter bank analysis-synthesis technique has been popularly applied in many areas of digital signal processing, including audio and video coding. The embedded zero-tree wavelet (EZW) coding has shown great performance in progressive image coding. In this work, we focus on high quality audio coding which delivers transparent perceptual quality. The segmented audio signal is divided into 29 subbands via wavelet packet analysis, and then coded by a zero-tree coder with the modified algorithm based on the minimum masking thresholds which are generated by the psychoacoustic model. Subjective listening tests show that the Masking-Embedded Zero-tree Wavelet Packet (M-EZWP) system we propose has better performance compared with MPEG audio Layer II standard, especially in the case of very low bitrate. The perceptual transparent quality of monophonic audio can be achieved at about 40 Kbps. Furthermore, the M-EZWP system could be adjusted to various network conditions, such as VBR and CBR transmissions because of the embedded property.