基於置換不變模型之雙遮罩風噪聲消除方式

摘要

 

生活中充滿了各式各樣的聲音,對於我們有意義而需要收集的為訊號,而不需要或是造成干擾的則為雜訊。而風聲存在大自然中各處,在戶外收音時是一個無可避免的干擾。

本篇論文提出的方法則是利用語音分離模型的特點採取兩個不同的遮罩來同時對訊號加強,反過來能更好的消除風噪聲帶來的干擾。訓練方法採取遞迴神經網路(Recurrent Neural Network, RNN)的架構去利用頻譜特徵做訓練。遞迴神經網絡應用在時變函數的成效良好,對於連續的音頻訊號的特徵在做處理時會有較佳的表現。風噪聲在表現上為非平穩且不具週期性,較不易直接針對它進行處理。在遞迴神經網路中我們選用雙向門控遞迴單元(Gated Recurrent Unit, GRU)的網絡來訓練遮罩。針對混合訊號分別對風聲和語音訊號訓練遮罩,透過分別調整權重比例,來估計出理想比例遮罩(Ideal Ratio Mask, IRM)再利用兩個遮罩的權重改良出適合雙遮罩的損失函數。有別於一般方法的雜訊消除,這種除了保留自身所需部分同時減弱噪聲的干擾的方式分離信號,並反向利用雜訊的遮罩來輔助強化進而去除雜訊。此雙遮罩結合的方法能比單純使用單遮罩有更佳的效果。

關鍵字 : 深度學習、風噪聲、雜訊消除、語音分離、雙遮罩

 

Dual-Masking Wind Noise Reduction System Based on Permutation Invariant Training Model

Abstract

 

Our daily environment is full of all kinds of sounds, the ones that are meaningful to us and need to be collected are signals, and the ones that are not needed or make interference are noise. The sound of wind exists everywhere in nature, and it is an unavoidable interference when recording on outdoors.

In this paper, we proposed the method utilizes the characteristics of the speech separation model and combine two different masks to enhance the signal. We adopt a recurrent neural network architecture to use spectral features for training. The application of recurrent neural network to time-varying functions has good results, and it has better performance in continuous audio signals. Because wind noise is non-stationary and non-periodic, it is not easy to deal with it. In the recurrent neural network, we use a Bidirectional Gated Recurrent Unit (BGRU) network to train the mask. Training masks for wind and speech signals for mixed signals, respectively, by adjusting the weight ratios respectively, to estimate the Ideal Ratio Mask (IRM), and then use the weights of the two masks to improve the loss function suitable for dual masks. Different from the general noise reduction methods, this method separates the signal in addition to preserving the necessary part while reducing the interference of the noise, and reversely uses the noise mask to assist in strengthening and removing the noise.

Keyword: deep learning, wind noise, noise reduction, speech separation, dual mask