Composite Acoustic Models for Improved Performance on Specific Tasks

        We describe the construction of a "composite" acoustic model for speech recognition systems. The goal of the construction process is to improve performance on some application specific small vocabulary task, while still retaining the capability of dealing with large vocabularies. The composite model comprises of two parts, a generic acoustic model and a task-specific acoustic model (which is generally much smaller than the generic model.) The task-specific model is constructed so as to optimize performance on the small vocabulary task (by using word models, discriminative estimation of model parameters, etc.). The generic model comprises of phone (rather than word) models, and can be used for dealing with words outside the small vocabulary task is usually at the expense of performance on a more general task, however the degradation in performance on the general task, however the degradation in performance on the general task is quite small. Our experimental results show a 11% (relative) improvement on a digits task, at the expense of a 3% degradation (relative) on a general task. Further, as the task specific part is much smaller than the generic model, it is relatively easy to construct new task-specific models. Hence, the composite model approach offers the possibility of easily tailoring the acoustic models for new applications because it involves replacing only the task specific part.

By: Mukund Padmanabhan

Published in: RC21280 in 1999

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .