多層漸進式零樹小波分頻音訊壓縮技術

 

摘要

 

在網路的多媒體應用,將成為截取量最大的資料。而依不同的多媒體應用特性,如傳輸頻寬,傳輸即時性,有其不同的網路需求,而為了使多媒體應用達到有效率傳輸的目的,需要特別地對多媒體訊號作適當的壓縮與處理。因此,本論文發展一套多層式樂音編碼技術,來配合不同網路及其頻寬。本論文樂音壓縮標準的架構,以有多重解析度分析的小波轉換與小波封包為基礎,發展出多重解析層的樂音壓縮,並且包含重疊音框以消除區塊效應,再依音框做小波封包處理,然後優先萃取重要小波頻帶係數編碼,並利用熵編碼加以量化編碼。而進一步再利用漸進式零樹編碼、並配合人耳聲學模型之零樹搜尋編碼,再以熵編碼做進一步的編碼。本論文的壓縮分成三層,因此可針對網路之不同傳輸頻寬來選擇其適當之位元率來傳輸,其位元率分別為16 Kbps32 Kbps64 Kbps。在聽覺音質方面,雖然是低位元率,但是可保持一定的樂音品質,比現行在網路傳輸之相同位元率之壓縮方式優良。尤其在非常低位元率的壓縮時,本論文所提出的壓縮方式有很不錯的表現。

 

 

Scalable Audio Compression Using Wavelet Packet Decomposition and Embedded Zero Tree Coding

 

Abstract

 

Multimedia transmission over Internet is getting popular and increasingly important. In particular, scalable coding is desirable for heterogeneous network with varies bandwidths. In this work, we propose a scalable embedded zero tree wavelet packet (EZWP) audio coding system that is a scalable audio compression system using wavelet packet decomposition and embedded zero-tree coding. We focus on multi-layer low bitrate coding which delivers high perceptual quality. In the base layer, the overlapped audio segment is first transformed by wavelet packet. Then the local significant coefficients are extracted, quantized, and coded by variable length coding. In the enhancement layer and the full band layer, the residual signal that is the difference between the original and the output of the previous layer is coded via EZW with psychoacoustic model and arithmetic coding. The target bit rates for three layers are 16, 32, and 64 Kbps, respectively. The performance of the proposed coding system is only slightly inferior to MPEG-1 layer 3 while it provides bitrate scalability. Therefore, it is suitable for multimedia distribution over Internet that is composed of heterogeneous networks.