Worker Nodes on LCRC#

Worker nodes can be launched on LCRC HPCs like Improv, Bebop, or Crossover. The following steps will ensure a worker node can be setup for use with a study.

There is a fully scripted version of the below procedure in ./bin/hpc/setup_conda_env.sh

Setup your environment#

Example of crossover shown below. Other LCRC HPCs are accessed the same way.

  • Step 1: SSH into crossover.lcrc.anl.gov on the terminal.

    You can use either putty, VS Code or your terminal of choice.

  • Step 2: Load relevant modules for the HPC to have access to Python, GCC. For a list of modules for LCRC, go to Bebop and Crossover. Note: This step is needed every time the terminal is reopened or restarted.

  • Step 3: Create a Conda environment for use with the worker. Workers look for the HPC name in the environment and selects the first one - so be sure to maintain only one per HPC

    conda create -n polaris_xover
    conda activate polaris_xover
    

    Note: You may need to restart terminal after running conda init bash for conda activate to work if its your first conda environment on this shell.

    Loading HPC-specific modules (from a saved list) and activating the conda environment

  • Step 4: Setup your conda environment to use a supported version of Python.

    conda install python=3.12
    

    Note: Supported versions of python can be found at Installation

  • Step 5: Get polaris-studio installed to your python.

    • Navigate to the the POLARIS project which is a shared space for everyone at TSM:

      cd /lcrc/project/POLARIS
      

      Note: If you do not have access to the POLARIS project, please contact Griffin White.

    • Clone the polaris-studio repository into a STUDY folder

      cd /lcrc/project/POLARIS/STUDY
      git clone https://git-out.gss.anl.gov/polaris/code/polarislib.git
      

      Note: When beginning a new study, it is upto the user to reuse previous polaris-studio cloned directories. Please ensure you are using a compatible version of polaris-studio for your study. The helpful commands below can be used to confirm which polaris-studio will be used by the worker that is launched.

    • Install polaris-studio from the cloned repository. Change directory using cd polarislib so that polaris-studio can be installed with the following command

      python -m pip install -e .[dev]
      

      Note: The hpc configuration is sufficient for workers but if the study requires additional dependencies they need to be installed into the conda environment. The example above shows using the dev option which is comprehensive. Building models from CSVs requires the builder option. For a complete set of options, visit Installation Options

  • Step 6: Setup Globus Authentication so that magic_copy can copy files between VMS fileservers and LCRC filesystem.

    cd /lcrc/project/POLARIS/STUDY/polarislib
    python bin/authenticate_globus.py
    

    Follow the on-screen instructions to authenticate globus (login to globus using the link provided and then paste the token).

    Note: When using any terminal - Ctrl+C is the CANCEL, not copy, command. Pressing Ctrl-C will end the process that is waiting for you to input your auth token. For this reason most terminals will automatically copy text when you select it, or you may need to right click to copy.

  • Step 7: Log in to www.globus.org with your Argonne credentials.

    • Check if you are a part of the POLARIS Group. If not contact Jamie, Griffin, Josh, or Murthy to be added to this group.

    • A collection/endpoint for the study needs to be setup for moving files between specific folders on VMS file server and the LCRC filesystem

Launching a worker#

From the polaris-studio folder that was cloned, navigate to the bin/hpc directory to launch a worker.

cd /lcrc/project/POLARIS/STUDY/polarislib/bin/hpc

Schedule a job using the pre-prepared shell script for use with LCRC. The following example launches a 32 (of 128 threads) on a node in Crossover with a runtime of 7 days. The task-type argument is optional but recommended. This ensures that the worker only accepts jobs that match the same task-type. This can be used to prevent other non-compatible jobs from running on the worker that is launched.

sbatch --time=7-00:00:00 --partition=TPS --account=TPS --ntasks-per-node=32 worker_loop_lcrc.sh --task-type 151

Note: Keep in mind that there is an idle timeout built in - so if no jobs are currently running, these jobs will cancel out automatically after 5 minutes.

For details about scheduling a job on LCRC, please visit: Schedule job

Helpful commands#

  • Project allocations available for running POLARIS studies can be confirmed for individual accounts.

    lcrc-sbank -q balance
    
  • Ensure that the correct version of Python is being used within your conda environment. Python path should refer to the one within polaris_xover and may sometimes refer to the python from the loaded module. These instances occur when module is not loaded correctly or the conda init in your .bashrc is initializing an incorrect version of conda.

    conda activate polaris_xover
    which python