Worker Nodes on LCRC#
Worker nodes can be launched on LCRC HPCs like Improv, Bebop, or Crossover. The following steps will ensure a worker node can be setup for use with a study.
One-Time Environment Setup For A New Study#
Example of crossover shown below. Other LCRC HPCs are accessed the same way.
Step 1: SSH into crossover.lcrc.anl.gov on the terminal. Options include:
Using the Putty terminal shortcut in WinSCP (that uses the configuration already setup for crossover access in WinSCP)
Directly on Putty or an alternative terminal
Step 2: Load relevant modules for the HPC to have access to Python, GCC. For a list of modules for LCRC, go to Bebop and Crossover. Note: This step is needed every time the terminal is reopened or restarted.
Step 3: Create a Conda environment for use with the worker. Workers look for the HPC name in the environment and selects the first one - so be sure to maintain only one per HPC
conda create -n polarislib_xover_eqsql conda activate polarislib_xover_eqsql
Note: You may need to restart terminal after running
conda init bash
forconda activate
to work if its your first conda environment on this shell.Step 4: Setup your conda environment to use a supported version of Python.
conda install python=3.10
Note: Supported versions of python can be found at Installation
Step 5: Get polaris-studio installed to your python.
Navigate to the the POLARIS project which is a shared space for everyone at TSM:
cd /lcrc/project/POLARIS
Note: If you do not have access to the POLARIS project, please contact Griffin White.
Clone the polaris-studio repository into a STUDY folder
cd /lcrc/project/POLARIS/STUDY git clone https://git-out.gss.anl.gov/polaris/code/polarislib.git
Install polaris-studio from the cloned repository.
python -m pip install -e .[dev]
Note: The
hpc
configuration is sufficient for workers but if the study requires additional dependencies they need to be installed into the conda environment. The example above shows using thedev
option which is comprehensive. Building models from CSVs requires thebuilder
option. For a complete set of options, visit Installation Options
Step 6: Setup Globus Authentication so that
magic_copy
can copy files between VMS fileservers and LCRC filesystem.cd /lcrc/project/POLARIS/STUDY/polarislib python bin/authenticate_globus.py
Note: Follow instructions that include logging into globus and pasting the token to authenticate globus.
Step 7: Log in to www.globus.org with your Argonne credentials.
Check if you are a part of the POLARIS Group. If not contact Jamie, Griffin, Josh, or Murthy to be added to this group.
A collection/endpoint for the study needs to be setup for moving files between specific folders on VMS file server and the LCRC filesystem
Launching a worker#
From the polaris-studio folder that was cloned, navigate to the bin/hpc
directory to launch a worker.
cd /lcrc/project/POLARIS/STUDY/polarislib/bin/hpc
Schedule a job using the pre-prepared shell script for use with LCRC. The following example launches a 32 of 128 threads on a node in Crossover with a runtime of 7 days.
sbatch --time=7-00:00:00 --partition=TPS --account=TPS --ntasks-per-node=32 worker_loop_lcrc.sh
Note: Keep in mind that there is an idle timeout built in - so if no jobs are currently running, these jobs will cancel out automatically after 30 minutes.
For details about scheduling a job on LCRC, please visit: Schedule job
Helpful commands#
Project allocations available for running POLARIS studies can be confirmed for individual accounts.
lcrc-sbank -q balance