A Transformation from XML to RDF via XSLT
|
XML
|
|
XSLT |
|
|
Introduction
Astronomy with its modern telescopes, satellites and simulations is an example for a science where progress produces larger and larger amounts of data. Semantic astronomy tries to improve the metadata management through the application of semantic technologies. Recent developments of semantic computing have focused on the usage of the Resource Description Framework (RDF) as metadata format. Also in AstroGrid-D RDF is used by the information service Stellaris. RDF is a new data model which describes data through a hierarchical structure of resources, which are accessible through universal resource identifiers (URIs). Complex resources are composed of simpler ones in analogy to the real-world object they describe. For example a telescope is composed of a camera, which has a filter wheel, which has filters, etc. This concept makes RDF an interesting choice for the metadata management in heterogeneous software environments, where an automated interaction between different components is desirable.
However, in general data is not available in RDF and therefore requires a transformation. The development of individual solutions for each data format requires a lot of effort and therefore is often not feasible.
Here a generic XSLT transformation is presented which transforms arbitrary XML dialects into RDF.
Design Goals
There are different ways to represent XML in RDF. Different solutions are shown in the history section below. The latest transformation achieves the following design goals:
- avoidance of blank nodes,
- one-to-one mapping for bidirectional extension,
- independence of XML schema.
Blank nodes are subjects without name. Therefore access to them is more difficult and some operations such as direct replacement of nodes cannot be performed. By avoiding blank nodes these complications can be avoided.
A one-to-one mapping is necessary for the inverse transformation. A idirectional transformation can be important e.g. in a robotic telescope network where information about scheduled observations is stored in RDF but where rescheduling requires the original RTML observation request. Therefore in AstroGrid-D also the RTML observation requests were stored along with the RDF. The inverse transformation could make this additional service unnecessary. A unique reconstruction of the original XML requires e.g. to preserve the distinction between attributes and elements. As shown below, this is accomplished by the different transformation of attributes and elements.
The last point makes the transformation independent from the underlying XML schema, so that the structure of RDF is completely determined by the XML. It requires that the order of elements is preserved.
Transformation
The transformation is accomplished via XSLT. The latest stylesheet (xml2rdf3.xsl) and some earlier versions are found below. As an example we show the transformation of a reduced description of the robotic telescope STELLA-I from its XML dialect in RTML (STELLA-I.rtml) into RDF (STELLA-I_3.rdf, STELLA-I_3.ttl). The description of STELLA-I is shown below.
<?xml version="1.0" encoding="UTF-8"?> |
The transformation is executed with an XSLT processor like xsltproc as follows:
xsltproc xml2rdf3.xsl STELLA-I.rtml > STELL-I_3.rdf |
The resulting RDF has the structure shown in the graphic below. It is obtained using the RDF visualization tool RDF Gravity.
Applications
This transformation can be applied where XML data is to be converted in RDF.
In AstroGrid-D this transformation is used for monitoring with the information service Stellaris. More specifically it is used for the transformation of
- metadata of robotic telescopes which is specified in RTML
- Monitoring & Discovery System (MDS) information of the Globus Toolkit
- job information provided by Audit logging of the Globus Toolkit and converted in Usage Record format
Version History
The table below contains different version of the transformation. The STELLA-I.rtml was slightly different for older versions.A graphical overview can be found here.
|
Release |
Version/Date |
Changes |
Graph |
RDF/XML
|
| xml2rdf3.xsl | 3.0 / 2009-05-19 | rdf:value for every text, no attribute triples, order predicates, comments as triples |
STELLA-I_3.png | STELLA-I_3.rdf |
| xml2rdf25.xsl | 2.5 / 2009-05-19 | added BaseURI variable, keep comments as comments |
STELLA-I_2.5.png | STELLA-I_2.5.rdf |
|
xml2rdf24.xsl
|
2.4 / 2008-09-30 | no rdf:type information used (simpler); attributes are distinguished from elements by an additional xs:attribute triple |
STELLA-I_2.4.png | |
| xml2rdf23.xsl |
2.3 / 2008-09-25 |
distinction of elements from attributes by an rdf:type xsl:element |
||
| xml2rdf22.xsl | 2.2 / 2008-09-23 |
distinction of attributes from elements by an rdf:type xsl:attribute | STELLA-I_2.2.png | |
| xml2rdf21.xsl | 2.1 / 2008-03-14 | resources have an rdf:type information |
STELLA-I_2.1.png | |
| xml2rdf2.xsl | 2.0 / 2007-11-05 | blank nodes are replaced by URIs constructed from the hierarchy of XML element |
STELLA-I_2.0.png |
STELLA-I_2.0.rdf
|
| xml2rdf1.xsl | 1.0 / 2007-03-26 | elements and attributes become literals connected by blank nodes similar to the Java tool
OwlMap
. |
STELLA-I_1.0.png
|
References
- Breitling, F., 2009: "A standard tranformation from XML to RDF via XSLT", Astronomical Notes, Volume 330 Issue 7, 755-760, http://arxiv.org/abs/0906.2291 .
- Grid Integration of Robotic Telescopes, (talk and contribution) to the proceedings of Hot-wiring the Transient Universe: A Joint VOEvent & HTN Workshop, 4-7 June 2007, web page)
- Providing Remote Access to Robotic Telescopes by Adopting Grid Technology, (Proceedings of the German e-Science Conference 2007, 2-5 May)
- Grid Integration of Robotic Telescopes (Workshop on Scientific Instruments and Sensors on the Grid, ICTP Triest, 23-28 April 2007)
Contact
Frank Breitlingfbreitling (at) aip.de
http://www.aip.de/People/fbreitling/



