Effective Use of TimeBank for TimeML Analysis

TimeML is an expressive language for temporal information, but its rich representational properties raise the bar for traditional information extraction methods when applied to the task of text-to-TimeML analysis. We analyse the extent to which TimeBank, the reference corpus for TimeML, supports development of TimeML-compliant analytics. The first release of the corpus exhibits challenging characteristics: small size and some noise. Nonetheless, a particular design of a time annotator trained on TimeBank is able to exploit the data in an implementation which deploys a hybrid analytical strategy of mixing aggressive finite-state processing over linguistic annotations with a state-of-the-art machine learning technique capable of leveraging large amounts of unannotated data.We present our design, in light of encouraging performance results; we also interpret these results in relation to a close analysis of TimeBank’s annotation ‘profile’.We conclude that even the first release of the corpus is invaluable; we further argue for more infrastructure work needed to create a larger and more robust reference corpus.1

By: Branimir K. Boguraev; Rie Kubota Ando

Published in: Annotating, Extracting and Reasoning about Time and EventsBerlin, GermanySpringer, Lecture Notes in Computer Science, vol.4795, p.41-58 in 2005

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .