Model M Lite: A Fast Class-Based Language Model

While advanced language models such as neural network language models and Model M have achieved superior performance as compared to word n-gram models, they are generally much more expensive computationally to train and to decode with. As a result, word n-gram models remain dominant in real-world speech recognition applications. In this paper, we investigate whether it is possible to design a language model with similar performance to Model M that can be trained and applied with similar expense as a word n-gram model. We propose Model M Lite, an ensemble of class-based back-off n-gram models that contains a similar set of n-gram features as Model M. The ensemble of models can be stored compactly and is only slightly larger than a comparable Model M. We evaluate several schemes for dynamically choosing interpolation weights for the ensemble members. On a Wall Street Journal task, our new model achieves 73% to 92% of the absolute gain in word-error rate of a 4-gram Model M as compared to a word 4-gram model, translating to an absolute gain of up to 2% over the baseline.

By: Stanley F. Chen

Published in: RC25631 in 2016

rc25631.pdf

Questions about this service can be mailed to reports@us.ibm.com .