TimeML is an expressive language for temporal information, but its rich representational properties raise the bar for traditional information extraction methods when applied to the task of text-to-TimeML analysis. We analyse the extent to which TimeBank, the reference corpus for TimeML, supports development of TimeML-compliant analytics. The first release of the corpus exhibits challenging characteristics: small size and some noise. Nonetheless, a particular design of a time annotator trained on TimeBank is able to exploit the data in an implementation which deploys a hybrid analytical strategy of mixing aggressive finite-state processing over linguistic annotations with a state-of-the-art machine learning technique capable of leveraging large amounts of unannotated data.We present our design, in light of encouraging performance results; we also interpret these results in relation to a close analysis of TimeBank’s annotation ‘profile’.We conclude that even the first release of the corpus is invaluable; we further argue for more infrastructure work needed to create a larger and more robust reference corpus.1
By: Branimir K. Boguraev; Rie Kubota Ando
Published in: Annotating, Extracting and Reasoning about Time and EventsBerlin, GermanySpringer, Lecture Notes in Computer Science, vol.4795, p.41-58 in 2005
Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.
Questions about this service can be mailed to reports@us.ibm.com .