High Quality Speech Transformation Based on Linear Prediction Coding and Pitch Synchronization

 

 

Abstract

 

Speech transformation that can change the tone and the speed of a speaker is useful in privacy protection and entertainment. Intuitively, the best approach for speech transformation is the analysis-synthesis method because it has the freedom for synthesizing arbitrary voice once the speech parameters are obtained. However, pitch out of synchronization is a serious problem in most frame-based speech analysis and synthesis processes when the pitch or the frame length is changed.

 

In this paper we propose a speech transformation method based on the linear prediction coding with pitch synchronization to solve the problem. The speech transformation is only performed in voiced frames because they are relatively stable and have high energy.  Cross-correlation is calculated to locate the synchronization point, i.e., the pitch mark. Arbitrary pitch scaling and time scaling can then be performed. Simulations show that the speech can be modified to different timbre and tone, or speaking speed, with high quality, especially in the case of transforming from female to male.

 

We have some experiment results.