Extraction of Temporal Information from Text Documents

Detailed analysis of time information in documents is a complex problem; the payoffs, however, for advanced applications capable of temporal reasoning are huge. This brief note argues that the graph-like representation typically maintained by temporal reasoners is derivable from what is an emerging standard for rich and robust annotation of temporal information in text.

We highlight some of the main features of TimeML, a temporal annotation language, and outline a mapping process which derives, from a TimeML-compliant representation, an isomorphic set of time-points and intervals. The problem of automatically analysing a document into TimeML is still too complex to tackle fully; however, a non-trivial fragment of TimeML analysis can be carried out by a finite-state based temporal expressions recogniser, running concurrently with a syntactic shallow parser. Broadly, we focus on strategies for identification and temporally anchoring of events. We also present an evaluation of some of the recognition capabilities as they apply to identification of temporal information fragments. The results are encouraging, as an independent evaluation shows that a temporal parser can be grounded into high accuracy recognition of key TimeML components. This, in its own turn, points at the viability of practical end-to-end natural language analysis and reasoning systems for advanced information management

By: Branimir K. Boguraev

Published in: RC22974 in 2003

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc22974.pdf

Questions about this service can be mailed to reports@us.ibm.com .