Use of F0 Features in Automatic Segmentation for Speech Synthesis

This paper focuses on a method for automatically dividing speech utterances into phonemic segments, which are used for constructing synthesis unit inventories for speech synthesis. Here, we propose a new segmentation parameter called “dynamics of fundamental frequency” (DF0). In the fine structures of F0 contours, there exist phonemic events that are observed as local dips at phonemic transition regions, especially around voiced consonants. We apply this observation about F0 contours to a speech segmentation method. The DF0 segmentation parameter is used in the final stage of the segmentation procedure to refine the phonemic boundaries obtained roughly by DP alignment. We conducted experiments using the proposed automatic segmentation method with a speech database prepared for unit inventory construction, and compare the resulting boundaries with those obtained by manual segmentation to show the effectiveness of the proposed method. We also discuss the effects of the boundary refinement on synthesized speech

By: Takashi Saito

Published in: RT0293 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0293.pdf

Questions about this service can be mailed to reports@us.ibm.com .