Sponsored by BMBF Logo

AstroGrid-D: SED

Spectral Energy Distribution (SED) Classification

(This use case is adapted from the Diploma Thesis by Tobias Scholl [p2pstreams].)

Workflow overview

The classification of spectral energy distributions is a key research area for the astrophysics community. It gives clues about the physical characteristics of celestial bodies, for example, whether they are active galactic nuclei (AGN), galaxies, or quasi-stellar objects (quasar). A pan-chromatic approach, combining photometric measurements from different frequency bands or with gravitational measurements, promises to result in a better classification.

The researcher provides a list, the input list, with regions of interest (giving the coordinates of the observed luminaries) and a list of the catalogs to be queried (and all the necessary meta-data). The result of the matching process are the n best matching classifications.

Two kinds of spectra are used in this process:

  • observational spectra
  • theoretical spectra

The process to generate observational SEDs can be subdivided into SED-matching and SED-assembly. First potential useful information is collected from each catalog individually. The retrieved results are finally assembled (the information from the different catalogs is fused together) into the observational SEDs.

Theoretical spectra are the spectra especially designed to represent a celestial body with certain properties (such as temperature or gravitation) and against which the observational spectra are matched. Before the classification can take place, these theoretical spectra have to be down sampled and modified to simulate an observation.

StarGlobe Example Scenario

In our prototype we realized sub-functionality necessary to implement SED classification. To conduct a (simple) Cone Search on a catalog, you specify a center (point on the sphere) and the search radius. Calculating this using Cartesian coordinates is much faster, thus a Conversion step from spherical coordinates to Cartesian coordinates is necessary. To measure the distance between selected catalog sources and the center of the cone the Mahalanobis (a multi-dimensional variance scaled) distance is calculated.

Figure 1

The above figure shows a possible realization of these steps in a distributed fashion.
Our network contains four peers (Peer 0 through Peer 3) and one single catalog (an exemplary part of the ROSAT catalog) sourced at Peer 0. The user situated at Peer 3 wants to query the ROSAT catalog using self-made implementations of the operations specified above (coordinate conversion, cone search, Mahalanobis distance).

The user creates an XML-based query evaluation plan which specifies the operators and where they should be installed. Finally, the plan is injected into the network (step 1).

Each Peer receiving the plan installs the specified operators and propagates the sub-plans to its neighbors (step 2 - 4).

Then the catalog is processed and directed through the operator chain and the result is finally displayed at Peer 3 (step 5).

Even this simple example shows the potential for distributed parallelism and pipelining in the SED classification process and how it would benefit from using data streams.