Efficient Algorithms for Finding Submasses in Weighted Strings

We present new algorithms for different versions of the Submass Finding Problem: Given a string s over a weighted alphabet, i.e., an alphabet with a mass function answer efficiently questions of the type, whether, for a set of input masses {M₁, . . . , M_k}, s has substrings with masses M₁, . . . , M_k ; and for each M_i where s has a substring with mass M_i, find one or all occurrences of such substrings. Furthermore, our algorithms allow us to compute efficiently the number of different submasses of s (masses of substrings). We encode submasses in polynomials and use Fast Fourier Transform for polynomial multiplications. Our main algorithm runs in time where us the total mass of string s. Since methods for compressing sparse polynomials are know, this can be viewed as where denotes the number of different submasses of s. This makes the runtime independent of the size of the individual masses of characters, and instead only dependent on hence optimal in a sense. We give two variations of this algorithm for the different problems, a Las Vegas randomized algorithm, and a deterministic algorithm, which are both considerably faster than naive solutions if is not too large.

By: Nikhil Bansal, Mark Cieliebak, Zsuzsanna Lipták

Published in: Lecture Notes in Computer Science, volume 3109, (no ), pages 194-204 in 2004

Please obtain a copy of this paper from your local library. IBM cannot distribute this paper externally.

Questions about this service can be mailed to reports@us.ibm.com .