Frame-Based Phonotactic Language Identification

This paper describes a frame-based phonotactic Language Identification (LID) system, which was used for the LID evaluation of the Robust Automatic Transcription of Speech (RATS) program by the Defense Advanced Research Projects Agency (DARPA). The proposed approach utilizes features derived from frame-level phone log-likelihoods from a phone recognizer. It is an attempt to capture not only phone sequence information but also short-term timing information for phone N-gram events, which is lacking in conventional phonotactic LID systems that simply count phone N-gram events. Based on this new method, we achieved 26% relative improvement in terms of Cavg for the RATS LID evaluation data compared to phone N-gram counts modeling. We also observed that it had a significant impact on score combination with our best acoustic system based on Mel-Frequency Cepstral Coefficients (MFCCs).

By: Kyu J. Han, Jason Pelecanos

Published in: Source: 2012 IEEE Spoken Language Technology Workshop (SLT 2012). ProceedingsPiscataway, NJ,IEEE, , p.303-6 in 2012

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc25336.pdf

Questions about this service can be mailed to reports@us.ibm.com .