To Parallelize or Not to Parallelize: XPath Queries on Multi-core Systems

With wide availability of commodity multi-core systems, it is imperative to understand what, if any, changes are needed to existing software systems to harness the newly available computational power. In this context, this work explores acceleration of XML processing systems. Specifically, we investigate parallelization of individual XPath queries over shared-address space multi-core processors. Unlike past approaches that have considered a distributed setting or ad hoc parallel solutions, ours is the first methodical end-to-end proposal. Our solution first identifies if a particular XPath query should be parallelized and then determines the optimal way of parallelizing that query. This decision is based on a cost-base approach that relies both on the query specifics and data statistics. At each stage of the parallelization process, we evaluate three alternative approaches, namely, data-, query-, and hybrid-partitioning. For a given XPath query, our parallel cost model uses selectivity and cardinality estimates to compute costs for these different alternatives. The costs are then fed to parallel query optimizer that generates an optimal parallel execution plan. We have implemented a prototype end-to-end Parallel XPath processing system that integrates the XPath parser, cost estimator, query optimizer, and a parallel runtime library . We use this system to evaluate efficacy of our proposal by an extensive set of experiments using well-known XML documents. These results conclusively validate our parallel cost model and optimization framework, and demonstrate that it is possible to accelerate XPath processing using commodity multi-core systems.

By: Rajesh Bordawekar; Lipyeow Lim; Anastasios Kementsietsidis; Bryant Wei-Lun Kok

Published in: RC24926 in 2009


This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.


Questions about this service can be mailed to .