Voice Feature Extraction Method for a Mimicking Voice Synthesizer

This paper presents a method of extracting a new speaker’s voice features for the purpose of synthetically mimicking the voice of the donor speaker in a text-to-speech system. The voice characteristics we intend to extract are of two kinds: segmental (phonetic) and suprasegmental (prosodic). The feature extraction method proposed here is designed to be simple and effective by making the most of the processing components of the text-to-speech engine, and the extracted features can be directly used by the synthesizer as appropriate kinds of unit inventories. We investigated the performance of the feature extraction using three speech corpora of different types. The method worked quite successfully for read-out speech, although there is still room for improvement in its handling of emotional speech.

By: Takashi Saito, Masaharu Sakamoto

Published in: 2nd International Conference on Information and Communication Security(ICICS'99), unknown in 1999

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .