We propose a new paradigm for string matching, namely structural matching. In structural matching, the text and pattern contents are not important. Rather, some areas in the text and patterns are singled out, say intervals. A "match" is a text location where a specified relation between the text and pattern areas is satisfied. In particular we define the structural matching problem of Overlap (Parity) Matching. We seek the text locations where all overlaps of the given pattern and text intervals have even length. We show that this problem can be solved in time O(n log m), where the text length is n and the pattern length is m. As an application of overlap matching, we show how to reduce the String Matching with
Swaps problem to the overlap matching problem. The String Matching with Swaps problem is the problem of string matching in the presence of local swaps. The best known deterministic upper bound for this problem was O(nm1/3 log m log sigma) for a general alphabet Sigma, where sigma = min(m, |Sigma|). Our reduction provides a solution to the pattern matching with swaps problem in time O(n log m log sigma).

By: Amihod Amir, Richard Cole, Ramesh Hariharan, Moshe Lewenstein, Ely Porat

Published in: RC22139 in 2001

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rc22139.pdf

Questions about this service can be mailed to reports@us.ibm.com .