Defining a job#
The EQ/SQL system is setup to be very agnostic of the process it is running and there are a number of generic tasks types that can be scheduled on the cluster. If the user has arguments or data to pass to the task that will be passed via the input
column in the database; this can be any Python object that can be transmitted via JSON serialisation.
The most common type of job to run is a python script with some helper .py
files.
The script (and helpers) will be saved onto the fileserver (generally under ShareAndFileTranfer) and a reference to this directory will become part of the task definition that is submitted to the queue.
The full set of currently supported task types is:
control-task
- EQ_ABORT, EQ_GIT_PULLpython-script
- A single python script that can be run directlypython-module
- A package of python modules with an entry-point script that can be run directlybash-script
- A bash script (useful for seeeing disk free space across the cluster)
Examples#
The main way that a custom job would be defined for the cluster is by modifying one of the given examples (see the examples
directory). A copy would be made and customized to meet the project requirements - this would be stored with other project specific datasets on a fileserver location that is accessible from the worker nodes.
Here we will examine the run_convergence
example in some detail to discuss the concepts.
Important
The reader is assumed to be somewhat familiar with the polaris-studio Convergence Runner methodology.
This is an example of the python-module
task type and comprises two python files in a single directory. A main.py
and some helper methods in a separate custom_callbacks.py
.
main.py
is responsible for controlling the main logic of running a convergence run on a remote node, it will:
Take the given JSON serialized input data and use it to determine where the base model files are (on the fileserver). These will be downloaded to a local directory for faster on-node processing.
It will then apply any customization specified in the input to the run config and then run the local model.
The
post_loop_fn
andend_of_loop_fn
callbacks (defined incustom_callbacks.py
) will then be responsible for copying back the various iterations of the model run to a central file server location (again specified in the JSON input by the user).
Currently the JSON payload is setup like so
{
"run-id": "MyStude_001",
"where-is-polaris-exe": f"/mnt/p/VMS_Software/15-CI-CD-Artifacts/Polaris/polaris-linux/develop/latest/ubuntu-20.04/Integrated_Model",
"where-is-base-model": "/mnt/p/VMS_Software/15-CI-CD-Artifacts/Polaris/models/RUN/Grid",
"put-logs-here": "/mnt/p/ShareAndFileTransfer/For Me/MyStudy/logs/",
"put-results-here": "/mnt/p/ShareAndFileTransfer/For Me/MyStudy/results/",
}
Going further#
The above is a fairly simple model running setup, designed to be useful in exploring the concept introduced by EQ/SQL. Real studies are likely to be more complicated and require greater customizations. An example of this is the GPRA22 studies, the final version of the actual run scripts for this study have been provided in the examples/gpra22
folder to provide a template for a more realistic study setup.