Online Analytical Processing of Text Data

Although there are the recent visible demands for structured/unstructured information integration and advanced analytics, conventional database technology has not been able to present a robus and practical implementation of a truly integrated architecture for such purposes. After working on several industrial applications (in particular, in the healthcare and life sciences area), we have identified fundamental issues and technical approaches to tackle the issues. In this paper, we propose a data representations and algebraic operations for integrating semantic information (e.g., ontologies) into OLAP systems, which allow us to analyze a huge set of textual documents with their underlying semantic information. The proposed method is implemented with a persistent store using preorders and postorders in a hierarchy. The performance has been evaluated using real world datasets, and the high scalability and flexibility of our approach have been confirmed with respect to the computation time.

By: Akihiro Inokuchi, Koichi Takeda

Published in: RT0703 in 2007

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

RT0703.pdf

Questions about this service can be mailed to reports@us.ibm.com .