BEBOP and Crossover#
Bebop is the Argonne HPC cluster on which we often run our simulations. It is operated by the LCRC division within Argonne. In addition to the shared computation nodes that LCRC operates for general ANL use, they also maintain a dedicated set of POLARIS specific nodes that are only available to staff within the POLARIS group. These nodes are hosted on a separate partition and accessed via another set of login nodes called “crossover”. Recently the new HPC cluster ‘IMPROV’ has also come online and cann be utilised for testing purposes.
Cluster |
Login Url |
---|---|
Bebop |
username@bebop.lcrc.anl.gov |
Crossover |
username@crossover.lcrc.anl.gov |
Improv |
username@improv.lcrc.anl.gov |
LCRC Account#
You will need an LCRC account that you can use to login to BEBOP and access group runtime with.
You will then need to add your ssh key to your LCRC account.
You must activate an account before being added to the POLARIS project.
Build POLARIS on LCRC cluster#
Bebop, Improv and Crossover all use a module manager called Spack which allows for many potentially conflicting sets of software to be “installed” within an environment. When you login to the cluster you can choose which of the many options you want to have available within your current environment.
For example, to load the modules that are required to build POLARIS you would run the following snippet:
module load gcc/11 anaconda3 hdf5 # Bebop
module load gcc/10.4 hdf5/1.12 anaconda3 # Crossover
# module load ninja/1.10.2-s5ukqre
conda activate polaris-<insert-cluster-name>
python -m pip install patchelf
Important
Note that the default version of Python can be quite old and we generally replace it with a more recent
one via conda create
, being careful to keep a diff environment for each cluster. See the Anaconda section below.
Module snapshots#
After loading in a set of modules you can save these for later use.
module save polaris_build_bebop # Save a snapshot of currently loaded modules
module restore polaris_build_bebop # Corresponding loading of module snapshot
To have the POLARIS build requirements available each time you login, just add the appropriate restore
lines (module restore polaris_build_requirements
and optionally conda activate polaris
) to your
~/.bashrc
or ~/.bash_profile
file.
if hostname | grep xover > /dev/null; then
module restore polaris_build_xover
elif hostname | grep bebop > /dev/null; then
module restore polaris_build_bebop
export PATH="${PATH}:~/bin/bebop"
fi
Anaconda#
When using Python, the best practice found has been to use anaconda to setup Python in your home directory and then create a project specific virtualenv for each cluster that builds on that version and adds in dependencies. For example the following snippet creates a named conda virtualenv called polaris-bebop
based on Python 3.11 and installs the requirements of polarislib.
conda create -y -n polaris-bebop python=3.11 # Note use of bebop here, replace with polaris-xover on that cluster
conda activate polaris-bebop # Also here
conda install --channel conda-forge libspatialite
python -m pip install -e ~/git/polaris-studio[hpc] # installs the HPC dependencies of polaris-studio (assumed to be in ~/git)
If conda isn’t an option the following achieves basically the same effect but using pure python binaries.
python -m pip install virtualenv
python -m virtualenv ~/venvs/polarislib-bebop
. ~/venvs/polarislib-bebop/bin/activate
python -m pip install -e ~/git/polaris-studio[hpc] # installs the HPC dependencies of polaris-studio (assumed to be in ~/git)
You can also have conda automatically activate your environment but care must be taken with the cluster environment. The following is a
modified version of the code inserted to a user .bashrc
file by the conda init bash
command.
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
if hostname | egrep "xover|crossover" > /dev/null; then
__conda_basedir=/gpfs/fs1/soft/xover/manual/anaconda3/2021.05
__conda_env=polaris-xover
elif hostname | egrep "bebop|bdw" > /dev/null; then
__conda_basedir=/gpfs/fs1/soft/bebop/software/custom-built/anaconda3/2024.06
__conda_env=polaris-bebop
else
__conda_basedir=/soft/anaconda3/2024.2
__conda_env=polaris-improv
fi
__conda_setup="$(${__conda_basedir}/bin/conda 'shell.bash' 'hook') 2> /dev/null"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "${__conda_basedir}/etc/profile.d/conda.sh" ]; then
. "${__conda_basedir}/etc/profile.d/conda.sh"
else
export PATH="${__conda_basedir}/bin:$PATH"
fi
fi
conda activate ${__conda_env}
unset __conda_setup
unset __conda_basedir
# <<< conda initialize <<<
Clone and build code#
The following command will configure using CMake (-c), build (-b) and install the binaries to the given location.
git clone https://git-out.gss.anl.gov/polaris/code/polaris-linux.git
cd polaris-linux
python ./build.py –cb -d <deps_dir> --install <install_dir>
Default dependencies directory is /lcrc/project/POLARIS/bebop/opt/polaris/deps
If you supply the install path and point it to your project, the executable and all required SO/DLL files will be copied there. A common pattern is to use --install /lcrc/project/POLARIS/bebop/MyProject/bin
.
Important
Historically it has been common to compile POLARIS on the login nodes of the the clusters. It would appear that LCRC are being more proactive about shutting down computation on login nodes and it may be necessary to request an actual compute node to do the compilation. This can be accomplished on PBS using the command qsub -A POLARIS -I -l select=1:mem=64gb,walltime=00:15:00 -j oe
Copy model data#
Watch demo on using WinSCP
Schedule job#
After you have compiled your software changes, you can run an individual POLARIS simulation by wrapping up the execution with a small script that adds the relevant Slurm or PBS options (account, partition, how many nodes, etc). Full instructions for LCRC runs can be found at: https://docs.lcrc.anl.gov/
Any line in the job script starting with the exact string #SBATCH
or #PBS
(called a “magic cookie”) will be interpreted by the scheduler as a special processing directive.
Important
On crossover, use --account=TPS --partition=TPS
to target dedicated nodes.
As we are allowing multiple jobs per node on Crossover, you will also have to make sure that you request more than a single core with --ntasks-per-node=64
option, use --exclusive
to disable node sharing.
$ cat srun_polaris.sh
───────┬──────────────────────────────────────────────────────────────────────────────────────────────────────
│ File: srun_polaris.sh
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ #!/bin/bash
2 │ set -eu
3 │
4 │ # Setup Slurm options - note these can be overriden when running this script through sbatch, i.e.
5 │ # $ sbatch --time=01:00:00 srun_polaris.sh $data_dir $exe_name $threads
6 │ #
7 │ #SBATCH --job-name=polaris_base
8 │ #SBATCH --account=POLARIS # <= important
9 │ #SBATCH --partition=bdwall
10 │ #SBATCH --nodes=1
11 │ #SBATCH --time=04:00:00
12 |
13 | # Setup for PBS scheduler
14 | # $ qsub srun_polaris.sh $data_dir $exe_name $threads
15 │ #PBS -N polaris_base
16 | #PBS -A POLARIS # <= important
17 | #PBS -l select=4:mpiprocs=128
18 | #PBS -l walltime=04:00:00
19 │
20 │ data=$1
21 │ program=$2
22 │ scenario=$3
23 │ threads=$4
24 │
25 │ cd $data
26 │ $program $scenario $threads
───────┴──────────────────────────────────────────────────────────────────────────────────────────────────────
This can then be invoked like this
$ export project_dir=/lcrc/project/POLARIS/bebop/MyProject
$ sbatch srun_polaris.sh ${project_dir}/data/grid-model ${project_dir}/bin/Integrated_Model scenario_init.json 12
Submitted batch job 2153400
The console output from your run will be captured by slurm and redirected into a file in the current directory called slurm-<job_id>.out
. You can tail this file to watch progress:
$ tail -f slurm-2153400.out
LOG4CPP : [NOTICE] Successfully initialized logging utility.
LOG4CPP : [NOTICE] There are 3 arguments:
LOG4CPP : [NOTICE] /lcrc/project/POLARIS/bebop/polaris-bebop-test/data/integrated
LOG4CPP : [NOTICE] scenario_init.json
LOG4CPP : [NOTICE] 12
LOG4CPP : [ERROR] Failed to copy file into output folder.
0
LOG4CPP : [NOTICE] Initializing network skims...
LOG4CPP : [NOTICE] Network skims done.
LOG4CPP : [NOTICE] Starting simulation...
You can also follow the more detailed logging that will also be written in the model run output directory.
$ tail -f ${project_dir}/data/grid-model/grid_ABM3/log/polaris_progress.log
23/06/2021 13:17:19,630 | Thread: 47944819926784 [NOTICE] 23:59:54,
23/06/2021 13:17:19,636 | Thread: 47944819926784 [NOTICE] departed= 0, arrived= 0, in_network= 0, VMT= 0.00, VHT= 0.00
23/06/2021 13:17:19,636 | Thread: 47944819926784 [INFO] PRINTING ALL THE STUCK CARS... Time: 86399
23/06/2021 13:17:19,637 | Thread: 47944819926784 [INFO] 23:59:54,
23/06/2021 13:17:19,637 | Thread: 47944819926784 [INFO] departed= 0, arrived= 0, in_network= 0, VMT= 0.00, VHT= 0.00
23/06/2021 13:17:19,638 | Thread: 47944819926784 [INFO] 0 cars are printed!
23/06/2021 13:17:19,726 | Thread: 47944783320896 [NOTICE] Finished!
Scheduling a python script:#
When running python scripts on the cluster, we will want to make sure that our appropriate modules and conda environments are available on the compute node while running. An example setup is shown in the run script below. This script would be invoked as
$ sbatch run_land_use_study.sh --data-dir foo --exe-name ${project_dir}/bin/Integrated_Model
Because the args to the script are passed down to the python file, all the argument parsing can be handled in Python with a library like optparse
.
───────┬──────────────────────────────────────────────────────────────────────────────────────────────────────────
│ File: run_land_use_study.sh
───────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────
1 │ #!/bin/bash
2 │ #SBATCH --job-name=land_use
3 │ #SBATCH --account=TPS --partition=TPS --nodes=1
4 │ #SBATCH --time=60:00:00 --ntasks-per-node=64 --exclusive
5 │ #SBATCH [email protected] --mail-type=ALL
6 │
7 │ set -eu
8 │
9 │ main() {
10 │ cd /lcrc/project/POLARIS/bebop/SMART_FY22_LAND_USE/PILATES-SRC-JA
11 │ load_modules
12 │ setup_venv
16 │ python3 ./run.py "$@"
17 │ }
18 │
19 │ # Method which creates a conda virtual environment
20 │ setup_venv() {
21 | if which conda; then
22 | # Conda Time
23 | set +u
24 | eval "$(conda shell.bash hook)"
25 | set -u
26 | local pattern=""
27 | if hostname | egrep "xover|crossover"; then
28 | pattern="xover|crossover"
29 | elif hostname | egrep "bebop|bdw"; then
30 | pattern="bebop"
31 | elif hostname | egrep "improv|ilogin|i0|i1|i2"; then
32 | pattern="improv"
33 | fi
34 | conda_env=$(conda env list | cut -d\ -f1 | grep ^[a-z] | grep polaris | egrep "${pattern}" | head -n1)
35 | [[ -z "${conda_env}" ]] && echo "Couldn't find a conda environment for this host" && exit 1
36 | conda activate ${conda_env}
37 | fi
38 │ }
39 |
40 | load_modules() {
41 | if hostname | egrep "crossover|xover"; then
42 | module load polarislib_xover
43 | elif hostname | egrep "bebop|bdw"; then
44 | module load polarislib_bebop
45 | elif hostname | egrep "improv|ilogin|i0|i1|i2"; then
46 | module load polarislib_improv
47 | else
48 | echo "not bebop or crossover or improv? EXITING!"
49 | exit 1
50 | fi
51 | }
52 │
53 │ main "$@"
───────┴──────────────────────────────────────────────────────────────────────────────────────────────────────────
Some useful LCRC commands:#
james.cook@beboplogin2 $ squeue -u rweimer
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
james.cook@beboplogin2 $ qstat -u rweimer
Job id Name User Time Use S Queue
---------------- ---------------- ---------------- -------- - -----
james.cook@beboplogin2 $ sbank-list-allocations -p POLARIS
Allocation Suballocation Start End Resource Project Jobs Charged Available Balance
---------- ------------- ---------- ---------- -------- ------- ---- ------- -----------------
900 847 2024-07-01 2024-10-01 bebop POLARIS 6 3.5 5,552.5
Totals:
Rows: 1
Bebop:
Available Balance: 5,552.5 node hours
Charged : 3.5 node hours
Jobs : 6
All transactions displayed are in Node Hours. 1 Improv Node Hour is 128 Core Hours. 1 Bebop Node Hour is 36 Core Hours. Balances and transactions displayed will update every 5 minutes.
james.cook@beboplogin2 $ lcrc-quota
----------------------------------------------------------------------------------------
Home Current Usage Space Avail Quota Limit Grace Time
----------------------------------------------------------------------------------------
james.cook 56 GB 43 GB 100 GB
----------------------------------------------------------------------------------------
Project Current Usage Space Avail Quota Limit Grace Time
----------------------------------------------------------------------------------------
POLARIS 1456 GB 2637 GB 4096 GB
POLARIS_TCF 815 GB 209 GB 1024 GB
TPS 0 GB 1024 GB 1024 GB