Skip to content

Latest commit

 

History

History
437 lines (311 loc) · 31.3 KB

modules.md

File metadata and controls

437 lines (311 loc) · 31.3 KB

ULHPC Software/Modules Environment

{: style="width:200px; margin-right:10px; float: left;"}

The UL HPC facilities provides a large variety of scientific applications to its user community, including domain-specific codes and general purpose development tools for a wide range of applications.1 An environment module system, LMod, is used to manage the shell environment and provide access to installed software.

The main advantages of using an environment module system are the following:

  1. Many different versions and/or installations of a single software package can be provided on a given machine, including a default version as well as several older and newer version.
  2. Users can easily switch to different versions or installations of a software package without having to explicitly modify their shell environment.

!!! important "Most UL HPC modules are automatically generated by Easybuild."

Environment modules

Environment module systems are a standard set of tools deployed in most HPC sites to allow dynamic modification of user environments. The environment module framework was fist implement in Environment Modules in Tcl and later in other tools such as Lmod that is written in Lua. All implementations provide the module command to

  • manage the PATH, LD_LIBRARY_PATH, MANPATH, and other shell environment variables,
  • define shell functions, and
  • call other environment modifying tools, like Conda and Python virtual environments.

By automatically modifying the shell environment, the modules can load and unload an application, and any profile files and libraries on which it depends. This enables the

  • automatic management of complex liking dependencies in libraries used in scientific software, and
  • the provision of multiple version of software packages that can co-exists independently in different module environments.

!!! danger "The module command is only available on the compute nodes" There is no environment module system installed in login nodes. This is a deliberate choice to prevent users from running large jobs on login nodes. You need to be within a job (interactive or not) to load modules provided by UL HPC or private modules.

??? info "Inner working of environment modules systems"

When users login to a Linux system, they get a login shell and the [shell](index.md#shell-and-dotfiles) uses Environment variables to run commands and applications. Most common are:

* [`PATH`](https://en.wikipedia.org/wiki/PATH_(variable)):  colon-separated list of directories in which your system looks for executable files;
* [`MANPATH`](https://man7.org/linux/man-pages/man1/manpath.1.html): colon-separated list of directories in which [`man`](https://man7.org/linux/man-pages/man1/man.1.html) searches for the man pages;
* [`LD_LIBRARY_PATH`](https://man7.org/linux/man-pages/man8/ld.so.8.html): colon-separated list of directories in which your system looks for for ELF / `*.so` libraries at execution time needed by applications.

There are also _application specific_ environment variables such as `CPATH`, `LIBRARY_PATH`, `JAVA_HOME`, `LM_LICENSE_FILE`, `MKLROOT` etc.

A _traditional_ way to setup these Environment variables is by customizing the [shell initialization files](index.md#customizing-shell-environment): _i.e._ `/etc/profile`, `.bash_profile`, and `.bashrc`. This proves to be very impractical on multi-user systems with various applications and multiple application versions installed as on an HPC facility.

To overcome the difficulty of setting and changing the Environment variables, the Tcl/C Environment Modules were introduced over 2 decades ago. The [Environment Modules](https://lmod.readthedocs.io/) package is a tool that simplify shell initialization and lets users easily modify their environment during the session with [modulefiles](https://lmod.readthedocs.io/en/latest/015_writing_modules.html).

- Each modulefile contains the information needed to configure the shell for an application. Once the Modules package is initialized, the environment can be modified on a per-module basis using the `module` command which interprets modulefiles. Typically modulefiles instruct the `module` command to alter or set shell environment variables such as `PATH`, `MANPATH`, etc.
- [Modulefiles](https://lmod.readthedocs.io/en/latest/015_writing_modules.html) may be shared by many users on a system (as done on the ULHPC clusters) and users may have their own collection to supplement or replace the shared modulefiles.

Modules can be loaded and unloaded dynamically and atomically, in an clean fashion. All popular shells are supported, including `bash`, `ksh`, `zsh`, `sh`, `csh`, `tcsh`, `fish`, as well as some scripting languages such as `perl`, `ruby`, `tcl`, `python`, `cmake` and `R`. Modules are useful in managing different versions of applications. Modules can also be bundled into metamodules that will load an entire suite of different applications: this is precisely the way the [UL HPC Software Set](../software/swsets.md) is managed.

??? info "Tcl/C Environment Modules vs. Tcl Environment Modules vs. Lmod"

There exists several implementation of the `module` tool:

* [Tcl/C Environment Modules](http://modules.sourceforge.net/) (3.2.10 $\leq$ version < 4), also known as `Tmod`: the _seminal_ (old) implementation
* [Environment modules](https://modules.readthedocs.io/) (version $\geq$ 4), previously called `Modules-Tcl`: Tcl-only variant of Environment modules
* (**recommended**) [Lmod](https://lmod.readthedocs.io/): a Lua based Environment Module system
    - Lmod ("L" stands for [Lua](http://www.lua.org/)) provides all of the functionality of Tcl/C Environment Modules plus more features:
        * support for _hierarchical_ module file structure
        * `MODULEPATH` is dynamically updated when modules are loaded.
        * makes loaded modules inactive and active to provide sane environment.
        * supports for _hidden_ modules
        * support for _optional_ usage tracking (implemented on ULHPC facilities)
   - In particular, Lmod enforces the following [safety features](https://lmod.readthedocs.io/en/latest/010_user.html#safety-features) which were only reently provided by [Environment Modules](https://modules.readthedocs.io/):
      1. The _One Name Rule_: Users can only have one version active
      2. Users can only load one compiler or MPI stack at a time (through the `family(...)` directive)

The ULHPC Facility relies on [Lmod](https://lmod.readthedocs.io/), a Lua-based Environment module system that easily handles the `MODULEPATH` Hierarchical problem.

Working with environment modules

In the UL HPC systems Lmod is used to provide an environment management system. The associated Modulefiles are almost exclusively generated automatically by Easybuild. The module command supports the following subcommands:

Command Description
module avail Lists all the modules which are available to be loaded
module spider <pattern> Search for <pattern> among available modules (Lmod only)
module load <mod1> [mod2...] Load a module
module unload <module> Unload a module
module list List loaded modules
module purge Unload all modules (purge)
module display <module> Display what a module does
module use <path> Prepend the directory to the MODULEPATH environment variable
module unuse <path> Remove the directory from the MODULEPATH environment variable

At the heart of environment modules interaction resides the following components:

  • the MODULEPATH environment variable, which defines a colon-separated list of directories to search for module files, and
  • a modulefile (see an example) associated with each available software package.

??? example "Example of ULHPC toolchain/foss (auto-generated) module file" ```console $ module show toolchain/foss ----------------------------------------------------------------------------------------------------- /opt/apps/easybuild/systems/aion/rhel810-20250216/2023b/epyc/modules/all/toolchain/foss/2023b.lua: ----------------------------------------------------------------------------------------------------- help([[ Description =========== GNU Compiler Collection (GCC) based compiler toolchain, including OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.

More information
================
 - Homepage: https://easybuild.readthedocs.io/en/master/Common-toolchains.html#foss-toolchain
]])
whatis("Description: GNU Compiler Collection (GCC) based compiler toolchain, including
 OpenMPI for MPI support, OpenBLAS (BLAS and LAPACK support), FFTW and ScaLAPACK.")
whatis("Homepage: https://easybuild.readthedocs.io/en/master/Common-toolchains.html#foss-toolchain")
whatis("URL: https://easybuild.readthedocs.io/en/master/Common-toolchains.html#foss-toolchain")
conflict("toolchain/foss")
depends_on("compiler/GCC/13.2.0")
depends_on("mpi/OpenMPI/4.1.6-GCC-13.2.0")
depends_on("lib/FlexiBLAS/3.3.1-GCC-13.2.0")
depends_on("numlib/FFTW/3.3.10-GCC-13.2.0")
depends_on("numlib/FFTW.MPI/3.3.10-gompi-2023b")
depends_on("numlib/ScaLAPACK/2.2.0-gompi-2023b-fb")
setenv("EBROOTFOSS","/opt/apps/easybuild/systems/aion/rhel810-20250216/2023b/epyc/software/foss/2023b")
setenv("EBVERSIONFOSS","2023b")
setenv("EBDEVELFOSS","/opt/apps/easybuild/systems/aion/rhel810-20250216/2023b/epyc/software/foss/2023b/easybuild/toolchain-foss-2023b-easybuild-devel")
```

Modules for software sets in UL HPC

In UL HPC we are using environment modules to modify the available module set. This is done to prevent accidental mixing of modules from different software sets. To load a set of software modules, simply load the appropriate module the modifies the available software set. There are two types of software sets.

  • Modules under env: this is a set of modules sticky optimized for the UL HPC systems. The modules are designed to load a different natively optimized set of modules for each system in UL HPC.
  • Modules under EESSI: this modules load EESSI software sets. The software sets distributed under EESSI are generically optimized software sets for a number of architectures that are designed to support reproducibility and are available across multiple HPC centers.

!!! important "When to use EESSI"

While the performance of EESSI modules is slightly lower that the natively optimized software set (`env`), it is easier to move your computations to new systems that support EESSI. If you plan to also run your computations in a centre where EESSI is available, you should consider using EESSI. Otherwise use the local modules.

When you login to a compute node, the default software set is loaded automatically. You can change that by loading another software set.

Loading a natively optimized software set under env

Each software set under env corresponds to a different software set release on UL HPC. The local software sets under env are mutually exclusive, you cannot have 2 of them loaded at the same time. This is done to prevent accidental mixing of modules from different software sets. To load for instance env/development/2023b, use the command:

$ module load env/development/2023b

Lmod is automatically replacing "env/release/default" with "env/development/2023b".

The command informs us that the default software set was replaces with the 2023b software set from the development category. The modules setting the environment are sticky modules, meaning that you can only unload/purge them with the --force flag:

$ module unload env/development/2023b
The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) env/development/2023b
$ module --force unload env/development/2023b

This saves you from loading environment setting module every time you purge your environment modules during normal operations.

Loading an EESSI software set

The EESSI software set is distributed in a flat manner, meaning that all modules from all releases are available in the same software set. While there is no danger when using a software set like that, the sheer number of available alternatives can lead to mistakes if someone is not careful. Furthermore, the EESSI modules that load the EESSI software set are not sticky, so when you purge your modules you have to reload your EESSI software environment module. For instance, consider loading EESSI/2023.06:

$ module load EESSI/2023.06

Then, listing the available modules

$ module avail
---- /cvmfs/software.eessi.io/versions/2023.06/software/linux/x86_64/amd/zen2/modules/all ----
...
   Abseil/20240116.1-GCCcore-13.2.0         (D)
   Archive-Zip/1.68-GCCcore-12.2.0
   Armadillo/11.4.3-foss-2022b
   Armadillo/12.6.2-foss-2023a
   Armadillo/12.8.0-foss-2023b              (D)
   Arrow/11.0.0-gfbf-2022b
...

there are multiple versions of the same software available from multiple toolchains (foss-2023a, foss-2023b). Loading a module and purging the loaded modules also removes the EESSI/2023.06 module.

$ module load foss/2023b
$ module list EESSI

Currently Loaded Modules Matching: EESSI
  1) EESSI/2023.06

$ module purge
The following modules were not unloaded:
  (Use "module --force purge" to unload all):

  1) env/development/2023b

$ module list EESSI

Currently Loaded Modules Matching: EESSI
  None found.

UL HPC toolchains and software set versioning

Our centre offers a yearly release of the UL HPC software set based on corresponding release of EasyBuid toolchains.2 Count at least 6 months of validation and testing after an EasyBuild release before a UL HPC release.

!!! info "Tool chains and software releases"

The idea behind toolchains is that a core set of modules is _fixed per release_ and the rest of the software in the release is built around the core set. Only one version of the toolchain modules is present in the software set, where as multiple versions of other software can be present.

For an exhaustive list of components version fixed per release have a look as the [foss](https://docs.easybuild.io/common-toolchains/#common_toolchains_overview_foss) and [intel](https://docs.easybuild.io/common-toolchains/#common_toolchains_overview_intel) toolchains.

An overview of the currently available core toolchain component versions in the UL HPC releases is depicted below:

Name Type 2019b (legacy) 2020b (release) 2023b (development) (testing)
GCCCore compiler 8.3.0 10.2.0 13.2.0
foss toolchain 2019b 2020b 2023b
intel toolchain 2019b 2020b 2023b
binutils 2.32 2.35 2.40
Python 3.7.4 (and 2.7.16) 3.8.6 3.11.5
Clang compiler 9.0.1 11.0.0 17.0.6
OpenMPI MPI 3.1.4 4.0.5 4.1.6

When using the natively optimized software sets loaded with the module under env, you should always have a single core component available. With the flat layout used in EESSI there will be multiple core components available with you load modules providing the EESSI software sets.

Architecture of the software set

By default, the environment module system uses the contents of the ${MODULEPATH} environment variable as the path where is will look for modules. In UL HPC the environment variable contains by default the following paths.

  • /opt/apps/easybuild/environment/modules: Location of sticky modules under env that provide the native optimized software modules.
  • /cvmfs/software.eessi.io/init/modules: Location of modules under EESSI, a module that provide the EESSI software sets.

There is also a default version of a natively optimized env module loaded. The natively optimized module under env append the following path to ${MODULEPATH} in the order described below.

  • On all nodes except for the gpu partition of Iris:

    • /opt/apps/easybuild/systems/<cluster name>/<build version>/<software set version>/<target architecture>/modules/all: Location of natively optimized modules.
    • /opt/apps/easybuild/systems/binary/<build version>/<software set version>/generic/modules/all: Location of software distributed as binaries that cannot be optimized for any target architecture.
  • On nodes of the gpu partition of Iris:

    • /opt/apps/easybuild/systems/iris/<build version>/<software set version>/gpu/modules/all: Location of natively optimized modules that use the GPU.
    • /opt/apps/easybuild/systems/iris/<build version>/<software set version>/skylake/modules/all: Location of natively optimized modules.
    • /opt/apps/easybuild/systems/binary/<build version>/<software set version>/generic/modules/all: Location of software distributed as binaries that cannot be optimized for any target architecture.

    Note that the GPU optimized modules still need the CPU modules to function, like for instance MPI modules, and GPU nodes use Skylake CPUs.

??? info "Parameters in the software set directory paths"

The follow parameters are used in the paths to the software set directories:

- `<cluster name>`: the name of the cluster (`iris` or `aion`), as set in the environment variable `${ULHPC_CLUSTER}`.
- `<build version>`: the version of the software build, determined by the operating system (OS) and the date of the build as `<OS version>-<ISO date squashed>`, where for instance
    - RHEL 8.10 become `<OS version>`=`rhel810`, and
    - 2025-02-16 becomes `<ISO date squashed>`=`20252010`.
- `<software set version>`: the ULHPC Software set release, aligned with [Easybuid toolchains release](https://easybuild.readthedocs.io/en/master/Common-toolchains.html#component-versions-in-foss-toolchain).
- `<target architecture>`: the architecture for which the software set has been optimized, as set in the `${RESIF_ARCH}` environment variable.

Organization of software set files and manual selection of the software set

There are nodes with broadwell and nodes with skylake CPUs in the CPU partitions (batch and interactive) of Iris. To ensure that a compatible binary is used in all CPUs of the partition, modules loading software sets are configured to load binaries that are compatible for broadwell, the older architecture of the two (note that the binary is selected in the primary node of an allocation).

The RESIF_ARCH environment variable is used to load the software set for the appropriate architecture in the natively optimized software sets under env. The ${RESIF_ARCH} value used for all nodes in the CPU partitions of Iris is broadwell.

Similarly to RESIF_ARCH, EESSI provides the EESSI_ARCHDETECT_OPTIONS_OVERRIDE environment variable to enforce an architecture; by default EESSI_ARCHDETECT_OPTIONS_OVERRIDE is unset, and the EESSI module selects an appropriate architecture for the software set (as the name suggests). The EESSI_ARCHDETECT_OPTIONS_OVERRIDE variable is set to x86_64/intel/haswell in the CPU partitions of iris by default and unset in every other partition. Note that architectural support in EESSI is relatively limited. The available CPU architectures in EESSI for Iris nodes are

- `x86_64/intel/haswell` for `broadwell` CPUs, and
- `x86_64/intel/skylake` for `skylake` CPUs.

EESSI does not provide builds optimized for all architectures, so the older haswell was chosen as the best alternative for broadwell which is missing.

!!! info "Values for the RESIF_ARCH and EESSI_ARCHDETECT_OPTIONS_OVERRIDE environment variables in UL HPC systems"

| Cluster                          | Partition (`--parition=`) | Architecture (`${RESIF_ARCH}`) | EESSI Architecture (`EESSI_ARCHDETECT_OPTIONS_OVERRIDE`) |
|:---------------------------------|:--------------------------|:-------------------------------|:---------------------------------------------------------|
| [Iris](../systems/iris/index.md) | `batch`                   | `broadwell`                    | `x86_64/intel/haswell`                                   |
| [Iris](../systems/iris/index.md) | `interactive`             | `broadwell`                    | `x86_64/intel/haswell`                                   |
| [Iris](../systems/iris/index.md) | `bigmem`                  | `skylake`                      |                                                          |
| [Iris](../systems/iris/index.md) | `gpu`                     | `gpu`                          |                                                          |
| [Aion](../systems/aion/index.md) | `batch`                   | `epyc`                         |                                                          |
| [Aion](../systems/aion/index.md) | `interactive`             | `epyc`                         |                                                          |

Note that all `bigmen` and `skylake` nodes use Skylake CPUs.

There are occasion where a user may want to set the software set manually. For instance, a job can be constrained to run on a single kind of CPU, using for instance the --constraint=skylake flag on sbatch or salloc to force the job to run only on Skylake nodes of the batch partition in Iris. In this case it makes sense to use a software set optimized for Skylake.

??? info "Selecting a natively optimized software set for Skylake CPUs in the Iris CPU partitions"

The `${RESIF_ARCH}` value used for all nodes in the CPU partitions of Iris is `broadwell`. To use the more optimized `skylake` software set, first purge any loaded natively optimized software sets:

```console
$ module --force purge
```

There there are 2 options to select the natively optimized software set:

- Set the `RESIF_ARCH` variable manually and load the software set you require with a `module`:
  ```console
  $ export RESIF_ARCH=skylake
  $ module load env/development/2023b
  ```
- Edit the `MODULEPATH` variable with the `use` option of the `module` command:
  ```console
  $ module use /opt/apps/easybuild/systems/iris/rhel810-20250216/2023b/skylake/modules/all
  ```

??? info "Use an optimal EESSI software sets for Skylake CPUs in the Iris CPU partitions"

The EESSI module loading the software set is configured to load modules for a CPU architecture that is compatible with all CPUs in the CPU partitions of Iris. If you are sure that your program will run on a single type of CPU architecture simply unset the variable and load the EESSI software modules:

```console
$ unset EESSI_ARCHDETECT_OPTIONS_OVERRIDE
$ module load EESSI
```

Then the EESSI module automatically detects the architecture and load the appropriate modules.

You can always add a software set manually to MODULEPATH using the use option of the module command. To facilitate the organization of the natively optimized software sets the values of the RESIF_ARCH are used to determine the storage path of each software set. These location are summarized in the following table.

!!! info "Location of natively optimized software set"

| Cluster                          | Arch. `${RESIF_ARCH}` | `${MODULEPATH}` Environment variable                                                            |
|:---------------------------------|:----------------------|:------------------------------------------------------------------------------------------------|
| [Iris](../systems/iris/index.md) | `broadwell`           | `/opt/apps/easybuild/systems/iris/<build version>/<software set version>/broadwell/modules/all` |
| [Iris](../systems/iris/index.md) | `skylake`             | `/opt/apps/easybuild/systems/iris/<build version>/<software set version>/skylake/modules/all`   |
| [Iris](../systems/iris/index.md) | `gpu`                 | `/opt/apps/easybuild/systems/iris/<build version>/<software set version>/gpu/modules/all`       |
| [Aion](../systems/aion/index.md) | `epyc`                | `/opt/apps/easybuild/systems/aion/<build version>/<software set version>/epyc/modules/all`      |

Module Naming Schemes

!!! important "Module Naming Schemes on ULHPC system"

ULHPC modules are organised through the Categorized Naming Scheme.

Format: `<category>/<name>/<version>-<toolchain><versionsuffix>`

This means that the typical module hierarchy has as prefix a category level, taken from one of the supported software category or module class:

$ eb --show-default-moduleclasses
Default available module classes:

        base:      Default module class
        ai:        Artificial Intelligence (incl. Machine Learning)
        astro:     Astronomy, Astrophysics and Cosmology
        bio:       Bioinformatics, biology and biomedical
        cae:       Computer Aided Engineering (incl. CFD)
        chem:      Chemistry, Computational Chemistry and Quantum Chemistry
        compiler:  Compilers
        data:      Data management & processing tools
        debugger:  Debuggers
        devel:     Development tools
        geo:       Earth Sciences
        ide:       Integrated Development Environments (e.g. editors)
        lang:      Languages and programming aids
        lib:       General purpose libraries
        math:      High-level mathematical software
        mpi:       MPI stacks
        numlib:    Numerical Libraries
        perf:      Performance tools
        quantum:   Quantum Computing
        phys:      Physics and physical systems simulations
        system:    System utilities (e.g. highly depending on system OS and hardware)
        toolchain: EasyBuild toolchains
        tools:     General purpose tools
        vis:       Visualization, plotting, documentation and typesetting

It follows that the ULHPC software modules are structured accordingly.

Using Easybuild to Create Custom Modules

You may want to use Easybuild to complete the existing software set with your own modules and software builds. See Building Custom (or missing) software documentation for more details.

Creating a Custom Module Environment

You can modify your environment so that certain modules are loaded whenever you log in. Use module save [<name>] and module restore [<name>] for that purpose, see Lmod documentation on User collections for more details.

You can also create and install your own modules for your convenience or for sharing software among collaborators. See the modulefile documentation for details of the required format and available commands. These custom modulefiles can be made visible to the module command by:

module use /path/to/the/custom/modulefiles

!!! warning 1. Make sure the UNIX file permissions grant access to all users who want to use the software. 2. Do not give write permissions to your home directory to anyone else.

!!! note The module use command adds new directories before other module search paths (defined as ${MODULEPATH}), so modules defined in a custom directory will have precedence if there are other modules with the same name in the module search paths. If you prefer to have the new directory added at the end of ${MODULEPATH}, use module use -a instead of module use.

Module FAQ

??? question "What is module?"

`module` is a shell _function_ that modifies user shell upon load of a modulefile.
It is defined as follows
```console
$ type module
module is a function
module ()
{
    eval $($LMOD_CMD bash "$@") && eval $(${LMOD_SETTARG_CMD:-:} -s sh)
}
```
In particular, `module` is NOT a program.

??? question "Is there an environment variable that captures loaded modules?"

Yes, active modules can be retrieved via `$LOADEDMODULES`, this environment variable is automatically changed to reflect active loaded modules that is reflected via `module list`. If you want to access the path to module files for loaded modules you can retrieve via `$_LM_FILES`.

??? question "What is a Module Naming Scheme?"

The full software and module install paths for a particular software package are determined by the _active_ module naming scheme along with the general software and modules install paths specified by the EasyBuild configuration.

You can list the supported module naming schemes of [Easybuild](https://easybuilders.github.io/easybuild-tutorial/hmns/) using:
```console
$ eb --avail-module-naming-schemes
List of supported module naming schemes:
    EasyBuildMNS
    CategorizedHMNS
    MigrateFromEBToHMNS
    HierarchicalMNS
    CategorizedModuleNamingScheme
```
See [Flat vs. Hierarchical module naming scheme](https://easybuilders.github.io/easybuild-tutorial/hmns/#flat-vs-hierarchical) for an illustrated explaination of the difference between two extreme cases: flat or 3-level hierarchical. On ULHPC systems, we selected an intermediate scheme called `CategorizedModuleNamingScheme`.

Footnotes

  1. See our software list for a detailed list of available applications.

  2. See the basic info section for the terminology related to toolchains.