MALM: A Framework for Mining Sequence Database at Multiple Abstraction Levels

Similarity searches of sequences are usually performed in the time domain using techniques such as template matching, or in appropriate feature spaces where feature vectors are pre-extracted. In this paper, we propose a framework, MALM, where similarity search can be performed at multiple abstraction levels extracted from the time series. In the proposed approach, the time series in the databases are first segmented with a regression algorithm. Features such as regression coefficients, mean square error, and higher-order statistics based on the histogram of the regression residuals are extracted from each segment. A neural network clustering algorithm is then used to assign labels to each segment. The clustered time series segements can then be queried at the symbol level, the feature level, or the sequence level. This framework provides a powerful mechanism for the user to generalize the query template for similarity and all-pair searches. Numerical results are obtained based on the 20-year Dow Jones sequence database.

By: Chung-Sheng Li, Philip S. Yu, and Vittorio Castelli

Published in: Proceedings of 7th International Conference on Information and Knowledge Management, New York, , ACM. , p.267-72 in 1998

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .