Extended Path Expressions for XML

Query languages for XML often use path expressions to locate elements
in XML documents. Path expressions are regular expressions such that
underlying alphabets represent conditions on nodes. Path expressions
represent conditions on paths from the root, but do not represent
conditions on siblings, siblings of superiors, and descendants of such
siblings. In order to capture such conditions, we propose to extend
underlying alphabets. Each symbol in an extended alphabet is a
triplet (e1, a, e2), where "a" is a condition on nodes, and e1 (e2) is
a condition on elder (resp. younger) siblings and their descendants;
e1 and e2 are represented by hedge regular expressions, which are as
expressive as hedge automata (hedges are ordered sequences of trees).
Such an extended path expression can be evaluated for every element by
traversing the XML document three times. Furthermore, given an input
schema and a query operation controlled by an extended path
expression, it is possible to construct an output schema. This is
done by identifying where in the input schema the given pointed hedge
representation is satisfied.

By: MURATA Makoto

Published in: RT0389 in 2002

LIMITED DISTRIBUTION NOTICE:

This Research Report is available. This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g., payment of royalties). I have read and understand this notice and am a member of the scientific community outside or inside of IBM seeking a single copy only.

rt0389.pdf

Questions about this service can be mailed to reports@us.ibm.com .