Slurm launchers are provided under scripts/
to facilitate software builds.
For convenience on iris
, a GNU screen configuration file config/screenrc
is provided to quickly bootstrap the appropriate tabs:
# iris
screen -c config/screenrc
# 'SW' tab meant for git / sync operations. To enable the ssh agent:
# eval "$(ssh-agent)"
# ssh-add ~/.ssh/id_rsa
# make up
# make fork-easyconfigs-update
# 'broadwell' tab for associated build. Ex interactive job:
# ./scripts/get-interactive-job
# 'skylake' tab for associated build. Ex interactive job:
# ./scripts/get-interactive-job-skylake
# 'gpu' tab for associated build. Ex interactive job:
# ./scripts/get-interactive-job-gpu
# 'epyc' tab for aion builds
# ssh aion
# ./scripts/get-interactive-job
For aion, you should use Tmux instead.
Don't forget to kill your ssh agent when you have finish: eval "$(ssh-agent -k)"
You are encouraged to submit/run your builds in each appropriate <arch>
tab, where you can follow the logs of a given passive build with:
tail -s1 -f logs/<JOB-NAME>-<jobid>.out # -s1 to force refresh every second
In general, the following launcher scripts are provided, either for testings purposes (under project sw
), or for production builds (as resif
user).
Script | Cluster | Arch. | Description | Partition | Settings |
---|---|---|---|---|---|
launcher-{test,prod}-build-cpu.sh |
aion | epyc | AMD CPU default builds | batch |
settings/[...]/aion.sh |
launcher-{test,prod}-build-cpu.sh |
iris | broadwell | Intel CPU default builds | batch |
settings/[...]/iris.sh |
launcher-{test,prod}-build-cpu-skylake.sh |
iris | skylake | Intel CPU Skylake Builds | batch |
settings/[...]/iris.sh |
launcher-{test,prod}-build-gpu.sh |
iris | skylake | Nvidia GPU Builds | gpu |
settings/[...]/iris-gpu.sh |
In case you wish to extend and existing software set, see extend.md
for instructions -- you'll have to prefer home/project based builds facilitated by the resif-load-{home,project}-swset-{prod,devel}
utilities.
For a given arch, you MUST use the script scripts/get-interactive-job*
to get the appropriate compute node for 2h
$ ./scripts/get-interactive-job -h
get-interactive-job:
Get an interactive job for '<arch>' node to test a RESIF build.
By default, this will reserve one **full** node (i.e. 1 task over <c> threads)
USAGE
./scripts/get-interactive-job [-n] [-c 1]
OPTIONS:
-n --dry-run: Dry run mode
-c <N>: ask <N> core (thread) instead of <c> (default) -- may be scheduled faster
# Get an interactive job for default arch (broadwell on iris, epyc on aion)
$ ./scripts/get-interactive-job
## OR, for Intel skylake on iris
$ ./scripts/get-interactive-job-skylake
## OR, for GPU nodes:
$ ./scripts/get-interactive-job-gpu
Use the interactive jobs to install/update EB to the latest version using scripts/setup.h
AFTER sourcing the appropriate settings.
You probably want to enforce the software set release.
### Iris broadwell or Aion epyc
$ ./scripts/get-interactive-job -c1
$ source settings/[prod/][version/]${ULHPC_CLUSTER}.sh
$ ./scripts/setup.sh -h # check $EASYBUILD_PREFIX
$ ./scripts/setup.sh -n
$ ./scripts/setup.sh
### Iris skylake
$ ./scripts/get-interactive-job-skylake.sh -c1
$ source settings/[prod/][<version>/]${ULHPC_CLUSTER}.sh
$ ./scripts/setup.sh -h # check $EASYBUILD_PREFIX
$ ./scripts/setup.sh -n
$ ./scripts/setup.sh
### Iris GPU
$ ./scripts/get-interactive-job-gpu.sh -c1
$ source settings/[prod/][<version>/]${ULHPC_CLUSTER}-gpu.sh
$ ./scripts/setup.sh -h # check $EASYBUILD_PREFIX
$ ./scripts/setup.sh -n
$ ./scripts/setup.sh
You can test builds under /work/projects/sw/resif
($LOCAL_RESIF_ROOT_DIR
) under your ULHPC account (assuming you belong to the group sw
) with the launcher scripts scripts/launcher-test-build-*
.
Software and modules will be installed under the local
software set ($LOCAL_RESIF_ENVIRONMENT
) i.e. under /work/projects/sw/resif/iris/local/<arch>
.
Note: The build command is sg sw -c "eb [...] -r --rebuild"
(to enforce group ownership to the sw
project -- see docs).
Mode | Arch. | Launcher | Settings |
---|---|---|---|
Local Tests | epyc |
./scripts/launcher-test-build-amd.sh |
source settings/aion.sh |
Local Tests | broadwell |
./scripts/launcher-test-build-cpu.sh |
source settings/iris.sh |
Local Tests | skylake |
./scripts/launcher-test-build-cpu-skylake.sh |
source settings/iris.sh |
Local Tests | gpu |
./scripts/launcher-test-build-gpu.sh |
source settings/iris-gpu.sh |
Example of help usage:
$ ./scripts/launcher-test-build-cpu.sh -h
NAME
launcher-test-build-cpu.sh: RESIF 3.0 launcher for ULHPC User Software building
Based on UL HPC Module Bundles ULHPC-[<topic>-]<version>.eb
Default CPU Builds compliant with all Intel archictectures in place
USAGE
[sbatch] ./scripts/launcher-test-build-cpu.sh [-n] [-D] [ toolchains | bd | bio | cs | dl | math | perf | tools | visu ]
[sbatch] ./scripts/launcher-test-build-cpu.sh [-n] [-D] path/to/file.eb
Default bundle version: - see easyconfigs/u/ULHPC-*
List of ULHPC Module Bundles: toolchains bd bio cs dl math perf tools visu
OPTIONS:
-d --debug: Debug mode
-n --dry-run: Dry run mode
-D --dry-run-short Print build overview incl. dependencies but do not build
-e <ENV>: Set **local** RESIF environment to <ENV> (Default: )"
-u --use <version>: use the modules from the *prod* environment <version>
-v <version>: Set Bundle version (default: '')
Each of these scripts supports two main usage (use -n
for dry runs beforehand to echo
each command instead of ececuting them):
- passing as argument a bundle name (Ex:
toolchains
,bio
,cs
etc.) to launch the build of the associate ULHPC bundle i.e.ULHPC-<name>-<version>.eb
- passing as argument a path to an easyconfigs you want to build
- For interactive runs, you need to source the appropriate settings before
These launchers are simple wrapper around the following commands:
source settings/<cluster>[...].sh
eb <file>.eb -r --rebuild -D # Dry-run, report dependencies
sg sw -c "eb <file>.eb -r --rebuild"
On iris
, the commands are embeded via sg
to force the execution within the group sw
i.e. as follows: sg sw -c "eb [...]"
# Interactive tests (broadwell)
(access)$> ./scripts/get-interactive-job
(node)$> source settings/iris.sh
(node)$> eb --version
(node)$> eb -S [...] # OR ./scripts/suggest-easyconfigs -v <version> [...]
(node)$> ./scripts/launcher-test-build-cpu.sh [toolchains | bio | ...]
# Passible jobs - (default) builds for Intel (broadwell)
(access)$> sbatch ./scripts/launcher-test-build-cpu.sh [toolchains | bio | ...]
# AND/OR Skylake optimized builds
(access)$> sbatch ./scripts/launcher-test-build-cpu-skylake.sh [toolchains | bio | ...]
# AND/OR GPU optimized builds
(access)$> sbatch ./scripts/launcher-test-build-gpu.sh [gpu]
Equivalent on aion
:
# Interactive tests (epyc)
(access)$> ./scripts/get-interactive-job
(node)$> source settings/aion.sh
(node)$> eb --version
(node)$> eb -S [...] # OR ./scripts/suggest-easyconfigs -v <version> [...]
(node)$> ./scripts/launcher-test-build-amd.sh [toolchains | bio | ...]
# Passible jobs - (default) builds for Intel (broadwell)
(access)$> sbatch ./scripts/launcher-test-build-amd.sh [toolchains | bio | ...]
Slurm Logs are located under logs/
directory -- use tail -s1 -f logs/[...]
to follow them live when tied to a passive job running.
Easybuild-related logs are placed under logs/easybuild/
(access)$> tail -s1 -f logs/<jobname>-<jobid>.out # Slurm logs, EB logs under logs/easybuild
You may want at some point to clean the logs:
make clean
Once a given version of the ULHPC software set is released to production (Ex: 2020b), it is very likely that some users may request the addition of a missing software assuming an easyconfig exists.
Instead of trying the builds directly in production, you MUST first test it under the sw
project and prefix the production modulepath.
To facilitate that work, you MUST either use the -v <version>
option, OR use the predefined launcher scripts ./scripts/<version>/launcher-test-build-*
.
# aion
[sbatch] ./scripts/<version>/launcher-test-build-amd.sh [...]
# iris
[sbatch] ./scripts/<version>/launcher-test-build-{cpu,cpu-skylake,gpu}.sh [...]
# Alternative:
[sbatch] ./scripts/launcher-test-build-amd.sh -v <version> [...] # Aion
[sbatch] ./scripts/launcher-test-build-{cpu,cpu-skylake,gpu}.sh -v <version> [...] # Iris
You can also decide to target a private environment with -e <name>
(Ex: 2020b) to ensure the builds goes into a dedicated (separated) directory, note however that you need to install EB into this directory (see below).
So to facilitate such tests, dedicated launchers scripts have been created under ./scripts/<version>/launcher-test-build-*
Mode | Arch. | Launcher | Settings |
---|---|---|---|
Local Tests <version> release |
epyc |
./scripts/<version>/launcher-test-build-amd.sh |
source settings/<version>/aion.sh |
Local Tests <version> release |
broadwell |
./scripts/<version>/launcher-test-build-cpu.sh |
source settings/<version>/iris.sh |
Local Tests <version> release |
skylake |
./scripts/<version>/launcher-test-build-cpu-skylake.sh |
source settings/<version>/iris.sh |
Local Tests <version> release |
gpu |
./scripts/<version>/launcher-test-build-gpu.sh |
source settings/<version>/iris-gpu.sh |
Example: testing for QuantumEXPRESSO:
### Ex: broadwell
# SW tabs: update to latest easyconfigs and settings
$ make up # ULHPC/sw repo
$ make fork-easyconfigs-update # local fork of easyconfigs
### broadwell tab
$ ./scripts/get-interactive-job
$ source settings/2019b/iris.sh
# check suggested EB
$ ./scripts/suggest-easyconfigs Quantum
#[...]
# ... potential exact match for 2019b toolchain
# QuantumESPRESSO-6.5-intel-2019b.eb
# QuantumESPRESSO-6.6-foss-2019b.eb
# QuantumESPRESSO-6.6-intel-2019b.eb
# QuantumESPRESSO-6.7-foss-2019b.eb
# QuantumESPRESSO-6.7-intel-2019b.eb
# QuantumESPRESSO-6.7-iomkl-2019b.eb
# check existing (missing) dependencies **against** 2019b: use scripts/2019b/launcher-test-build*
$ ./scripts/2019b/launcher-test-build-cpu.sh -D QuantumESPRESSO-6.7-foss-2019b.eb
#[...]
# ==> using/prepend (production) modules from '2019b' release for resif arch 'broadwell'
# Updated MODULEPATH=/opt/apps/resif/iris/2019b/broadwell/modules/all:/work/projects/sw/resif/iris/2019b/broadwell/modules/all
#
# Due to MODULEPATH changes, the following have been reloaded:
# 1) tools/EasyBuild/4.3.3
# #[...]
# * [x] /opt/apps/resif/iris/2019b/broadwell/software/EasyBuild/4.3.3/easybuild/easyconfigs/f/foss/foss-2019b.eb (module: toolchain/foss/2019b)
# * [ ] /opt/apps/resif/iris/2019b/broadwell/software/EasyBuild/4.3.3/easybuild/easyconfigs/e/ELPA/ELPA-2019.11.001-foss-2019b.eb (module: math/ELPA/2019.11.001-foss-2019b)
# * [ ] /opt/apps/resif/iris/2019b/broadwell/software/EasyBuild/4.3.3/easybuild/easyconfigs/q/QuantumESPRESSO/QuantumESPRESSO-6.7-foss-2019b.eb (module: chem/QuantumESPRESSO/6.7-foss-2019b)
# Repeat to test the build
$ ./scripts/2019b/launcher-test-build-cpu.sh QuantumESPRESSO-6.7-foss-2019b.eb
You probably want to repeat in passive mode for skylake
:
### skylake tabs
# adapt expected run time
$ sbatch -t 01:00:00 ./scripts/2019b/launcher-test-build-cpu-skylake.sh QuantumESPRESSO-6.7-foss-2019b.eb
If it builds successfully, you probably want to
- complete and commit the appropriate ULHPC bundle: Ex:
easyconfigs/u/ULHPC-<type>/ULHPC-<type>-2019b.eb
- repeat the prod build as
resif
, after pulling use the script./scripts/prod/launcher-prod-build-[...] -v 2019b <type>
- see below
You should do it in interactive jobs by sourcing the appropriate settings and running ./script/setup.sh [-h] [-n]
### Ex: Testing environment (project sw) against the 2019b release
# Broadwell tab
$ ./scripts/get-interactive-job -c1
$ source settings/2019b/iris.sh
----------------------------------------
LOCAL_RESIF_ENVIRONMENT = 2019b
LOCAL_RESIF_SYSTEM_NAME = iris
SYS_ARCH = broadwell
LOCAL_RESIF_ARCH = broadwell
LOCAL_RESIF_ROOT_DIR = /work/projects/sw/resif
----------------------------------------
EASYBUILD_PREFIX = /work/projects/sw/resif/iris/2019b/broadwell
[...]
/!\ WARNING: Module tools/EasyBuild NOT FOUND
# double-check $EASYBUILD_PREFIX
$ ./scripts/setup.sh -h # check CAREFULLY EASYBUILD_PREFIX
$ ./scripts/setup.sh -n # dry-run
$ ./scripts/setup.sh
$ exit
### /!\ IMPORTANT: repeat for each architecture: skylake and gpu
# Skylake tabs
$ ./scripts/get-interactive-job-skylake -c1
$ source settings/2019b/iris.sh
TOP_DIR = /mnt/irisgpfs/users/svarrette/git/github.com/ULHPC/sw
----------------------------------------
LOCAL_RESIF_ENVIRONMENT = 2019b
LOCAL_RESIF_SYSTEM_NAME = iris
SYS_ARCH = skylake
LOCAL_RESIF_ARCH = skylake
LOCAL_RESIF_ROOT_DIR = /work/projects/sw/resif
----------------------------------------
EASYBUILD_PREFIX = /work/projects/sw/resif/iris/2019b/skylake
[...]
/!\ WARNING: Module tools/EasyBuild NOT FOUND
# double-check $EASYBUILD_PREFIX
$ ./scripts/setup.sh -h # check CAREFULLY EASYBUILD_PREFIX
$ ./scripts/setup.sh -n # dry-run
$ ./scripts/setup.sh
$ exit
# GPU tab
$ ./scripts/get-interactive-job-gpu -c1
$ source settings/2019b/iris-gpu.sh
TOP_DIR = /mnt/irisgpfs/users/svarrette/git/github.com/ULHPC/sw
----------------------------------------
LOCAL_RESIF_ENVIRONMENT = 2019b
LOCAL_RESIF_SYSTEM_NAME = iris
SYS_ARCH = skylake
LOCAL_RESIF_ARCH = gpu
LOCAL_RESIF_ROOT_DIR = /work/projects/sw/resif
----------------------------------------
EASYBUILD_PREFIX = /work/projects/sw/resif/iris/2019b/gpu
[...]
/!\ WARNING: Module tools/EasyBuild NOT FOUND
# double-check $EASYBUILD_PREFIX
$ ./scripts/setup.sh -h # check CAREFULLY EASYBUILD_PREFIX
$ ./scripts/setup.sh -n # dry-run
$ ./scripts/setup.sh
$ exit
Production builds MUST be run as resif
using the launcher scripts under scripts/prod/*
.
In particular, to generate builds for the <version>
software set:
# /!\ ADAPT <version>
[sbatch] ./scripts/prod/launcher-resif-prod-build-{cpu,cpu-skylake,gpu}.sh -v <version> [...]
Software and modules will be installed under /opt/apps/resif
($LOCAL_RESIF_ROOT_DIR
) -- See Technical Docs.
You MUST BE VERY CAREFUL when running these scripts as they alter the production environment. This step is only allowed to the ULHPC teams.
Mode | Arch. | Launcher | Settings |
---|---|---|---|
Prod build aion |
epyc |
./scripts/prod/launcher-prod-build-amd.sh -v <version> |
source settings/prod/<version>/aion.sh |
Prod build iris |
broadwell |
./scripts/prod/launcher-prod-build-cpu.sh -v <version> |
source settings/prod/<version>/iris.sh |
Prod build iris |
skylake |
./scripts/prod/launcher-prod-build-cpu-skylake.sh -v <version> |
source settings/prod/<version>/iris.sh |
Prod build iris |
gpu |
./scripts/prod/launcher-prod-build-gpu.sh -v <version> |
source settings/prod/<version>/iris-gpu.sh |
For a production release, it is necessary to configure the ULHPC/sw
repository under the resif
user.
- Update the
resif
ULHPC user (also on gitlab/github) - configure access to the repo on git (NOT with deployed key, but as user)
- Follow-up setup instructions:
- install direnv
- create virtualenv:
python3 -m venv ~/venv/resif3
- clone
ULHPC/sw
repo and setup bash and local fork of the Easyconfigs repository
ssh resif@iris-cluster
mkdir git/github.com/ULHPC
cd git/github.com/ULHPC
git clone ssh://[email protected]:8022/ULHPC/sw.git
cd sw
python3 -m venv ~/venv/resif3
make setup
make setup-direnv
direnv allow .
make setup-python
Post-check iris: you need also to prepare the /opt/apps/resif/licenses_keys.yaml
as per hook configurations
In the appropriate screen/tmux tabs:
sbatch ./scripts/prod/launcher-prod-build-amd.sh -v <version> [ toolchains | bd | bio | cs | dl | math | tools ]
tail -s1 -f logs/RESIF-Prod-CPU-epyc-<jobid>.out
In the appropriate screen/tmux tabs:
# Broadwell
sbatch ./scripts/prod/launcher-prod-build-cpu.sh -v <version> [ toolchains | bd | bio | cs | dl | math | tools ]
tail -s1 -f logs/RESIF-Prod-CPU-broadwell-<jobid>.out
# Skylake
sbatch ./scripts/prod/launcher-prod-build-cpu-skylake.sh -v <version> [ toolchains | bd | bio | cs | dl | math | tools ]
tail -s1 -f logs/RESIF-Prod-CPU-skylake-<jobid>.out
# GPU
sbatch ./scripts/prod/launcher-prod-build-gpu.sh -v <version>
tail -s1 -f logs/RESIF-Prod-GPU-<jobid>.out
Some software require a manual post-install run as root.
See ./scripts/post-install.sh
Details are provided in the debugging notes