Extending a Bilingual MT Lexicon Using Word Formation Rules

In the IBM LMT Machine Translation (MT) system, a built-in strategy provides lexical coverage of a particular subset of words that are not listed in its bilingual lexicons. The recognition and coding of these words and their transfer generation is based on a set of derivational morphological rules. A new utility extends unfound words of this type in an LMT-compatible format in an auxiliary bilingual lexical file to be subsequently merged into the core lexicons. What characterizes this approach is the use of morphological, semantic, and syntactic features for both analysis and transfer. The auxiliary lexical file
(ALF) has to be revised before a merge into the core lexicons. This utility integrates a linguistics-based analysis and transfer rules with a corpus-based method of verifying or falsifying linguistic hypotheses against extensive document translation, which in addition yields statistics on frequencies of occurrence as well as local context.

Published under title "Using Word Formation Rules to Extend MT Lexicons"

By: Claudia Gdaniec, Esmeralda Manandise

Published in: Lecture Notes in Artificial Intelligence, volume 2499, (no ), pages 64-73 in 2002

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .