biXid: A Bidirectional Transformation Language for MXL

Often, independent organizations define and advocate different
XML formats for a similar purpose and, as a result, application
programs need to mutually convert between such formats. Existing
XML transformation languages, such as XSLT and XDuce, are
unsatisfactory for this purpose since we would have to write, e.g.,
two programs for the forward and the backward transformations in
case of two formats, incur high developing and maintenance costs.
This paper proposes the bidirectional XML transformation language
biXid, allowing us to write only one program for both directions
of conversion. Our language adopts a common paradigm
programming-by-relation, where a program defines a relation over
documents and transforms a document to another in a way satisfying
this relation. Our contributions here are specific language
features for facilitating realistic conversions whose target formats
are loosely in parallel but have many discrepancies in details. Concretely,
we (1) adopt XDuce-style regular expression patterns for
describing and analyzing XML structures, (2) fully permit ambiguity
for treating formats that do not have equivalent expressivenesses,
and (3) allow non-linear pattern variables for expressing
non-trivial transformations that cannot be written only with linear
patterns, such as conversion between unordered and ordered data.
We further develop an efficient evaluation algorithm for biXid,
consisting of the “parsing” phase that transforms the input document
to an intermediate “parse tree” structure and the “unparsing”
phase that transforms it to an output document. Both phases use
a variant of finite tree automata for performing a one-pass scan on
the input or the parse tree by using a standard technique that “maintains
the set of all transitable states.” However, the construction of
the “unparsing” phase is challenging since ambiguity causes different
ways of consuming the parse tree and thus results in multiple
possible outputs that may have different structures.
We have implemented a prototype system of the biXid language
and confirmed that it has enough expressiveness and a lineartime
performance, through experiments with several realistic bidirectional
transformations including one between vCard-XML and
ContactXML.

By: Shinya KAWANAKA, Haruo HOSOYA

Published in: RT0673 in 2007

This Research Report is not available electronically. Please request a copy from the contact listed below. IBM employees should contact ITIRC for a copy.

Questions about this service can be mailed to reports@us.ibm.com .