Worker Nodes on LCRC#
Worker nodes can be launched on LCRC HPCs like Improv, Bebop, or Crossover. The following steps will ensure a worker node can be setup for use with a study.
There is a fully scripted version of the below procedure in ./bin/hpc/setup_conda_env.sh
Setup your environment#
Example of crossover shown below. Other LCRC HPCs are accessed the same way.
Step 1: SSH into crossover.lcrc.anl.gov on the terminal. You can use either putty, VS Code or your terminal of choice.
Step 2: Load the conda module for the HPC to have access to Python. For a list of modules for LCRC, go to Bebop and Crossover.This step is needed every time the terminal is reopened or restarted but can be automated into your .bashrc.
module load miniforge3
Step 3: Create a Conda environment using a supported python version (3.11+) for use with the worker. Workers look for the HPC name in the environment and selects the first one - so be sure to maintain only one per HPC cluster
conda create -y -n polaris_xover -c conda-forge python=3.12 conda activate polaris_xover
Tip
You may need to run
conda init bashand then restart your shell forconda activateto work if its your first conda environment on this shell.Step 4: Get polaris-studio installed to your python.
Clone the polaris-studio repository into a YOUR_STUDY_NAME folder in the projects shared space. If you do not have access to the POLARIS project, please contact Griffin White.
mkdir /lcrc/project/POLARIS/YOUR_STUDY_NAME git clone https://git-out.gss.anl.gov/polaris/code/polarislib.git /lcrc/project/POLARIS/YOUR_STUDY_NAME/polaris-studio
Note: When beginning a new study, it is up to the user if they want to clone a new version or reuse previous polaris-studio cloned directories. Please ensure you are using a compatible version of polaris-studio for your study. The helpful commands below can be used to confirm which polaris-studio will be used by the worker that is launched.
Install polaris-studio from the cloned repository into the named conda environment.
python -m pip install -e /lcrc/project/POLARIS/YOUR_STUDY_NAME[dev]
Note
Note: The
hpcconfiguration is sufficient for workers but if the study requires additional dependencies they need to be installed into the conda environment. The example above shows using thedevoption which is comprehensive. Building models from CSVs requires thebuilderoption. For a complete set of options, visit Installation Options
Step 5: Setup Globus Authentication so that
magic_copycan copy files between VMS fileservers and LCRC filesystem.python /lcrc/project/POLARIS/STUDY/polaris-studio/bin/authenticate_globus.pyFollow the on-screen instructions to authenticate globus (login to globus using the link provided and then paste the token).
Tip
When using any terminal - Ctrl+C is generally the CANCEL, not copy, command. Pressing Ctrl-C will cancel the currently running process that is waiting for you to input your auth token. Most terminals will automatically copy text when you select it, or you may need to right click to copy, refer to documentation for chosen terminal for details.
Step 6: Log in to www.globus.org with your Argonne credentials.
Check if you are a part of the POLARIS Group. If not contact Jamie, Griffin, Josh, or Murthy to be added to this group.
A collection/endpoint for the study needs to be setup for moving files between specific folders on VMS file server and the LCRC filesystem
Launching a worker#
From the polaris-studio folder that was cloned, navigate to the bin/hpc directory to launch a worker.
cd /lcrc/project/POLARIS/YOUR_STUDY_NAME/polaris-studio/bin/hpc
Schedule a job using the pre-prepared shell script for use with LCRC. The following example launches a 32 (of 128 threads) on a node in Crossover with a runtime of 7 days. The task-type argument is optional but recommended. This ensures that the worker only accepts jobs that match the same task-type. This can be used to prevent other non-compatible jobs from running on the worker that is launched.
sbatch --time=7-00:00:00 --partition=TPS --account=TPS --ntasks-per-node=32 worker_loop_lcrc.sh --task-type 151
Note: Keep in mind that there is an idle timeout built in - so if no jobs are currently running, these jobs will cancel out automatically after 5 minutes.
For details about scheduling a job on LCRC, please visit: Schedule job
Helpful commands#
Project allocations available for running POLARIS studies can be confirmed for individual accounts.
lcrc-sbank -q balance
Ensure that the correct version of Python is being used within your conda environment. Python path should refer to the one within
polaris_xoverand may sometimes refer to the python from the loaded module. These instances occur when module is not loaded correctly or the conda init in your.bashrcis initializing an incorrect version of conda.conda activate polaris_xover which python