Installation Instructions for Ganglia and MDS4 (monitoring)
- Build Ganglia
- Install Ganglia
- Test Ganglia
- Configure System Service
- Configure Globus for Ganglia
- Configure Globus MDS
- Possible Problems
- Start Globus and Test
A general overview of Ganglia and its combination with Globus can be found at IBM:
Maximize your grid potential, Part 1: Ganglia.
In the following, only the monitoring daemon
gmond is used. The Ganglia Meta Daemon
gmetad is not considered further here, but is more suited for observing entire cluster complexes. MDS4 uses
First, we need the current Ganglia sources. The installation archives are available at SourceForge at:
http://ganglia.info/ (ganglia-3.0.7.tar.gz). These instructions have been tested with Ganglia 3.0.7.
If your host is running a firewall, note: for Ganglia to gather hardware statistics, the default port 8649 (both TCP and UDP) must be open on the host. This port need not be open to the Internet however.
To build the software, start in the Globus directory. As user globus:
- cd /work1/globus/
- tar xvfz /tmp/ganglia-3.0.x.tar.gz
This will unpack the archive into a directory ganglia-3.0.x/. We now change to this directory:
- cd ganglia-3.0.x/
All commands that follow are assumed to be executed from within this directory.
The Globus helper package contains a script to configure Ganglia. Suppose that the package has already been unpacked into a subdirectory globus-helper in the globus user's home directory:
- cp ~/globus-helper/globus-install/ganglia.cfg .
- sh -x ganglia.cfg
Edit the file
Replace the line
Now build Ganglia:
The installation of Ganglia is done as user root:
- make install
This will install files under /usr/local/globus/ganglia/: libraries under lib/libganglia*, and
should now exist there.
Create a configuration file as user root:
- /usr/local/globus/ganglia/sbin/gmond -t > /etc/gmond.conf
Edit the file /etc/gmond.conf. Fill out the fields "
owner", and "
4. Test Ganglia
It should now be possible to execute gmond.
gmond should be listening at port 8649:
- telnet localhost 8649
The XML output may look somewhat messy, but it is easy for machines to read. If you are running
also on local net segments computers already, be prepared to see output
about other machines besides the local machine. This may lead to
problems later on when running MDS4 on the output data, and may require alterations in the
gmond configuration. See below.
<?xml version="1.0" encoding="ISO-8859-1" standalone="yes"?>
<!DOCTYPE GANGLIA_XML [
<!ELEMENT GANGLIA_XML (GRID|CLUSTER|HOST)*>
<!ATTLIST GANGLIA_XML VERSION CDATA #REQUIRED>
<!ATTLIST GANGLIA_XML SOURCE CDATA #REQUIRED>
<GANGLIA_XML VERSION="3.0.5" SOURCE="gmond">
<CLUSTER NAME="AIP workstation cashmere" LOCALTIME="1193928615" OWNER="AIP" LATLONG="N52.4040 E13.1022" URL="unspecified">
<HOST NAME="cashmere.aip.de" IP="18.104.22.168" REPORTED="1193928599" TN="16" TMAX="20" DMAX="0" LOCATION="unspecified" GMOND_STARTED="1193928579">
<METRIC NAME="disk_total" VAL="339.425" TYPE="double" UNITS="GB" TN="27" TMAX="1200" DMAX="0" SLOPE="both" SOURCE="gmond"/>
It is recommended to stop any running
gmond (as user
- gmond/gmond.init stop
and to install it permanently as a service (MDS4 works best with
gmond being installed as a service):
- cp gmond/gmond.init /etc/rc.d/init.d/gmond
- /sbin/chkconfig --add gmond
- /sbin/chkconfig --list gmond
- /etc/rc.d/init.d/gmond start
We now configure MDS4 to analyze the
The configuration of the Globus Toolkit for Ganglia depends on the version installed.
a) For Globus Toolkit version < 4.0.5
- Edit the file
and replace the "
defaultProvider" line with
b) For Globus Toolkit version ≥ 4.0.5
Globus 4.0.5+ uses the Resource Property Provider component of the UsefulRP subsystem to communicate information about specific grid services available on the resource over MDS. It comes with a tool mds-gluerp-configure that correctly configures the settings files. The basic configuration to provide Ganglia information and a 'fork' job submission is generated in two separate lines:
mds-gluerp-configure none ganglia $GLOBUS_LOCATION/etc/globus_wsrf_mds_index/ganglia-config.xml
Successfuly wrote configuration output file to: /usr/local/globus/gtk/etc/globus_wsrf_mds_index/ganglia-config.xml
mds-gluerp-configure fork ganglia $GLOBUS_LOCATION/etc/gram-service-Fork/gluerp-config.xml
Successfuly wrote configuration output file to: /usr/local/globus/gtk/etc/gram-service-Fork/gluerp-config.xml
To configure Globus for MDS, the file
is edited. The following lines are inserted in the section "<globalConfiguration>":
<parameter name="logicalHost" value="myhost.domain.de"/>
is to be replaced by the Internet address of the machine that will run gmond.
For the MDS upload, in the file
un-comment the existing commented-out section "
<upstream>" and substitute its contents by
Ganglia's normal behavior is to exchange information between machines running Ganglia via UDP ports, which are set up in the /etc/gmond.conf sections
If another machine is running
The automatic exchange of information with other computers using the UDP channel is sufficient for simple Ganglia usage, but is not sufficient for MDS4. And the machine should be grouped later using MDS4.
As an example, to prohibit communication with other
gmond when problems with monitoring with MDS4 are encountered, edit the /etc/gmond.conf sections
udp_recv_channel can be modified by changing the
mcast_join addresses and the
bind address to a value deviating from the value in the other
In the future, advanced methods will be developed to monitor real clusters in order to produce complex
gmond output that is processed without error by MDS4.
So at this time, more than one
HOST entry may cause problems:
- telnet localhost 8649 | grep "HOST NAME=" | wc -l
should ideally be equal to 1.
Start the Globus container:
- /etc/init.d/globus restart
Here is an
example of the contents of the log file $GLOBUS_LOCATION/var/container.log after a correct setup of Globus for Ganglia.
MDS4 and Ganglia should be communicating. We can verify this with the following query:
- wsrf-query -a -z none -s https://127.0.0.1:8443/wsrf/services/DefaultIndexService
The answer may take a few seconds, but if MDS4 can analyze the Ganglia output correctly, we should receive information about the name of the computers and many details about the processor, main memory, disk space, operating system, load etc. If information is missing in the output, MDS4 has a problem. Possibly, one of the above mentioned problems might hinder output.
As an example, this is a fragment of the correct output of MDS4:
Note especially the strings
FileSystem. These are derived from Ganglia information read by MDS.
Finally, in the
$GLOBUS_LOCATION/var/container.log file, it is normal to see after the SOAP services listing lines like