Sponsored by BMBF Logo

A Transformation from XML to RDF via XSLT


  XML
XSLT

Introduction

Astronomy with its modern telescopes, satellites and simulations is an example for a science where progress produces larger and larger amounts of data. Semantic astronomy tries to improve the metadata management through the application of semantic technologies. Recent developments of semantic computing have focused on the usage of the Resource Description Framework (RDF) as metadata format. Also in AstroGrid-D RDF is used by the information service Stellaris. RDF is a new data model which describes data through a hierarchical structure of resources, which are accessible through universal resource identifiers (URIs). Complex resources are composed of simpler ones in analogy to the real-world object they describe. For example a telescope is composed of a camera, which has a filter wheel, which has filters, etc. This concept makes RDF an interesting choice for the metadata management in heterogeneous software environments, where an automated interaction between different components is desirable.

However, in general data is not available in RDF and therefore requires a transformation. The development of individual solutions for each data format requires a lot of effort and therefore is often not feasible.

Here a generic XSLT transformation is presented which transforms arbitrary XML dialects into RDF.

Design Goals

There are different ways to represent XML in RDF. Different solutions are shown in the history section below. The latest transformation achieves the following design goals:

  1. avoidance of blank nodes,
  2. one-to-one mapping for bidirectional extension,
  3. independence of XML schema.

Blank nodes are subjects without name. Therefore access to them is more difficult and some operations such as direct replacement of nodes cannot be performed. By avoiding blank nodes these complications can be avoided.

A one-to-one mapping is necessary for the inverse transformation.  A idirectional transformation can be important e.g. in a robotic telescope network where information about scheduled observations is stored in RDF but where rescheduling requires the original RTML observation request. Therefore in AstroGrid-D also the RTML observation requests were stored along with the RDF. The inverse transformation could make this additional service unnecessary. A unique reconstruction of the original XML requires e.g. to preserve the distinction between attributes and elements. As shown below, this is accomplished by the different transformation of attributes and elements.

The last point makes the transformation independent from the underlying XML schema, so that the structure of RDF is completely determined by the XML. It requires that the order of elements is preserved.

Transformation

The transformation is accomplished via XSLT. The latest stylesheet (xml2rdf3.xsl) and some earlier versions are found below. As an example we show the transformation of a reduced description of the robotic telescope STELLA-I from its XML dialect in RTML (STELLA-I.rtml) into RDF (STELLA-I_3.rdf, STELLA-I_3.ttl). The description of STELLA-I is shown below.

<?xml version="1.0" encoding="UTF-8"?>
<RTML version="3.2" mode="resource" uid="rtml://www.opentel.net/STELLA-I"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns="http://www.rtml.org/v3.2" xsi:schemaLocation="RTML-v3.2.xsd">
<!-- This is only a fragment -->
<Telescope>
<FocalLength units="meters">9.6</FocalLength>
<Camera>
<FilterWheel>
<Filter type="Johnson_U" name="U"/>
<Filter type="Johnson_B" name="B"/>
</FilterWheel>
</Camera>
</Telescope>
</RTML>

The transformation is executed with an XSLT processor like xsltproc as follows:

  xsltproc xml2rdf3.xsl STELLA-I.rtml > STELL-I_3.rdf  

The resulting RDF has the structure shown in the graphic below. It is obtained using the RDF visualization tool RDF Gravity.




Applications

This transformation can be applied where XML data is to be converted in RDF.

In AstroGrid-D this transformation is used for monitoring with the information service Stellaris. More specifically it is used for the transformation of

- metadata of robotic telescopes which is specified in RTML

- Monitoring & Discovery System (MDS) information of the Globus Toolkit

- job information provided by Audit logging of the Globus Toolkit and converted in Usage Record format

Version History

The table below contains different version of the transformation. The STELLA-I.rtml was slightly different for older versions.
A graphical overview can be found here.

Release
Version/Date Changes
Graph
RDF/XML
xml2rdf3.xsl 3.0 / 2009-05-19 rdf:value for every text, no attribute triples, order predicates, comments as triples
STELLA-I_3.png STELLA-I_3.rdf
xml2rdf25.xsl 2.5 / 2009-05-19 added BaseURI variable, keep comments as comments
STELLA-I_2.5.png STELLA-I_2.5.rdf
xml2rdf24.xsl
2.4 / 2008-09-30 no rdf:type information used (simpler); attributes are distinguished from elements by an additional xs:attribute triple
STELLA-I_2.4.png
xml2rdf23.xsl 2.3 / 2008-09-25
distinction of elements from attributes by an rdf:type xsl:element
xml2rdf22.xsl 2.2 / 2008-09-23
distinction of attributes from elements by an rdf:type xsl:attribute STELLA-I_2.2.png
xml2rdf21.xsl 2.1 / 2008-03-14 resources have an rdf:type information
STELLA-I_2.1.png
xml2rdf2.xsl 2.0 / 2007-11-05 blank nodes are replaced by URIs constructed from the hierarchy of XML element
STELLA-I_2.0.png STELLA-I_2.0.rdf
xml2rdf1.xsl 1.0 / 2007-03-26 elements and attributes become literals connected by blank nodes similar to the Java tool OwlMap .
STELLA-I_1.0.png

References

Contact

Frank Breitling
fbreitling (at) aip.de
http://www.aip.de/People/fbreitling/