Dynamo: Task farming of Atomic Grid Jobs in AstroGrid-D
- Documentation -
Download Dynamo from the Dynamo Web Page (http://www.gac-grid.org/project-products/Applications/DynamoIntro.html) or check it out from the dynamo part of the AstroGrid-D Software SVN (svn://svn.gac-grid.org/gacg_software/dynamo/).
This documentation will explain how the demo submission works, and give pointers as to how you might adapt the demo scripts to your needs.
When you extract the package, you see the following directory structure:
- a directory doc/, containing
- this documentation
- other help
- a shockwave flash file with a demonstration run of the Use Case Dynamo.
- a directory submission/, containing
submit.sh: The main, job submission script.
jsdl.template: Describes all common elements of the job submission in the "Job Submission Description Language" (XML)
- machines: A list of the resources selected
- update_machinefile.sh: A script to retrieve available compute resources from Stellaris, AGD's information server (also uses the file mds.xsl)
- progress.sh: An example for an intermediate result retrieval script
- cleanall: a clean script to tidy up the submission directory
- Visualisation example: A pre-compiled IDL program "grid_dynamo.sav", which are used for real time visualisation of the Dynamo Use Case.
- a directory input/, containing
- Binary files of the executable, in this case the compiled and linked "dynamo" software (dynamo.x), To deploy your own binary, you must copy it here and specify it in the submit.sh script.
- Input data, parameter files and starting conditions in subdirectorys input0, input1, .... For the prepared example of the Use Case "Dynamo" you will find four different model settings (sub-directories input0/ to input3/). For your own programs, you must of course supply your own directories.
- a directory etc/ containing*
- a src/ directory with the different sources of the Dynamo Use Case and the IDL visualisation.
- a (somewhat aged) presentation.
How to run the Grid submission
The Dynamo demo that is the content of the package is an example of an "atomic use-case", one in which independent processes are run on multiple compute resources on the grid. However, with small adoptions it will submit many forms of scientific code. To run the example, follow these steps:
- Change into the submission/ subdirectory. This is where everything important is handled.
- If restarting, run the cleanall script, just in case.
- Check the machines file and specify the grid resources you want to use. Run update_machinefile.sh to retrieve all resources available within AstroGrid-D. Note that you may be able to access additional resources on D-Grid. It is possible to specify the same resource several times, which makes sense for clusters.
- If necessary, edit the submit.sh script and change the settings at the beginning of the file.
- Run the submit.sh script. This will upload the dynamo files onto the selected machines and start the Globus jobs. Then all you have to do is wait: When everything runs fine, the output should appear in a <project>_result_<,,,> subdirectory upon completion.
Workflow IntroductionThe main flow of control in the dynamo.sh script consists of a loop over a set of input files, running a process on a different compute resource
for each input. The particular input files are specific to dynamo, but such a loop is common to many applications. To adopt the script, you will want to change the content of the input subdirectories (input/input0 ... /input<n>). Also change the name of the main executable as specified at the beginning of the submit.sh. If necessary, you can supply a shell script here, as long as all other necessary files are present in each "input" subdirectory.
submit.shWe will now explain a few details of the submit.sh script. If you look at the line beginning with
mkdir -m 777 ...:The necessity to specify permissions was due to the fact that the processes run in a grid account, but the job submission may be from a non-grid user account. If you submit everything from a grid account, the setting of permissions may be unnecessary.
The files to be uploaded ("staged in") to the compute resource are all put in a temporary directory tree, which is tarred and zipped. This is an easy way to create a consistent directory structure on the compute resources.
The complicated line beginning with
eval echo ...processes the input jsdl.template file, expanding any environment varaiables it contains, and outputs an RSL file, which is used by Globus for job
The script then does the actual submission with the call to
globusrun-wsusing the RSL file just created. It writes EPR (End Point Reference) files in the local directory ~/.epr/, which may be used for monitoring the state of the job.
Finally, if IDL is installed, it is used for graphical visualisation of the output. You could replace this by whatever post-processing you like.
jsdl.templateThe JSDL template file doesn't contain logic, but rather a description of which files to upload and download to the compute resources, and
what to execute there. The main reference for the JSDL language is the JSDL Specification, http://www.gridforum.org/documents/GFD.56.pdf
Note especially the section
jsdl-posix:POSIXApplicationThis section executes a small script (in the "Arguments" subsection) inside a shell (here the "Executable"). You might want to change this, for example, to build source code on the compute resource. The section also specifies "Output" and "Error" streams, which may be set to files.
The first section
jsdl:DataStagingspecifies which "Source" files to upload to the compute resource, here specified as a URI. In the demo, this is just the tar file of inputs created by the dynamo.sh script. You may want to use it for other purposes. Note especially the elements "CreationFlag" and "DeleteOnTermination".
In many simple cases, it is possible to stage in source code, which is to be compiled by the script on the compute resource.
The next section
jsdl:DataStagingspecifies the "Target" files (or directory trees) that constitute output to be saved. In the demo there is just one such tree; for each separate file or directory tree you will need a separate "Target" section. Note also that if it is a directory tree, the URI section must end in a slash ("/").
The basic workflow of the script package is once more shown in this graph:
Job monitoringMonitoring allows you to see the progress of jobs while they are running. It is not an essential part and developed separate from the dynamo distribution. If the AGD-Monitoring of 2008 is already installed on the resources used, additional use of the AGD monitoring package as described below is not necessary. But it may complement the automatic monitoring.
The AGD monitoring can be found in the AGD SVN: svn.gac-grid.org/gacg_software/monitoring/
The directory in SVN consists of a directory monitoring/, containing:
- the monitoring script monitor.pl (needs Perl)
- a few other files needed for the execution
With monitoring enabled, you can retrieve information of the job status from the AstroGrid-D information service Stellaris and also use the Grid Status pages of AstroGrid-D (e.g. timeline, see http://www.gac-grid.org/project-products/grid-status.html).
VisualisationIncluded in the package is an IDL visualisation example, specific to the Use Case Dynamo. To run it you will need
A) EITHER the (free) IDL virtual machine installed.
The submit.sh script attempts to run "idl -vm=grid_dynamo_visualise.sav" at the end. You can get the virtual machine from http://www.ittvis.com/idlvm.
B) OR an installed IDL V6.0 or higher and an IDL license.
Make sure the libraries are included by doing a
IDL> !path=expand_path("!path;+/work/dynamo/visualisation")[or wherever dynamo is installed] or copy the visualisation directory into an IDL library directory included in your IDL !path
If IDL is installed, submit.sh will automatically attempt to start the visualisation upon submission, via the virtual machine ("idl -vm=").
If you have not installed that VM, you can alternatively start the visualisation program manually as soon as dynamo has finished. Enter "grid_dynamo" at your IDL prompt. The program also has to run in the submission directory and will call the "progress.sh" to retrieve new results whenever possible. It may take a few minutes before the first data is displayed. Machines in your local network should respond much faster.
NotesRemember that this package is not yet a fully robust application catching all potential errors. Test it before you want to present it. If it does not work
the reason may be a path error. Look into the submit.sh, try using a debugger ($ sh -x submit.sh) and check all the paths settings and directories for
the different programs. Or talk to someone from the AIP team (see http://www.aip.de/groups/escience) who might be able to help you if you can catch them. Good luck!