Skip to content

Commit 6b3a742

Browse files
committed
Rename preempt to preemptable
1 parent dc546d1 commit 6b3a742

File tree

3 files changed

+59
-5
lines changed

3 files changed

+59
-5
lines changed

docs/general/news.md

+54
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,59 @@
11
# News
22

3+
22.05.2024:
4+
5+
- UBELIX went through a major upgrade:
6+
7+
- The operating system was upgraded to Rocky Linux 9.3
8+
- The supported software stack was updated. Supported toolchain versions are now 2021a through 2023a
9+
- The scheduler accounting hierarchy was restructured and simplified
10+
- The monitoring system was upgraded to Grafana
11+
- The user documentation was refactored
12+
13+
**SSH Key Switch**
14+
15+
Please be aware that the sshd configuration of UBELIX has changed. Consequently, only ED25519 host keys are supported. You will receive a warning when connecting to UBELIX for the first time after the change. You will need to remove the old host keys from your known hosts:
16+
17+
- ssh-keygen -R submit.unibe.ch
18+
- ssh-keygen -R submit01.unibe.ch
19+
- ssh-keygen -R submit02.unibe.ch
20+
- ssh-keygen -R submit03.unibe.ch
21+
- ssh-keygen -R submit04.unibe.ch
22+
23+
The new ED22519 host key fingerprints are:
24+
25+
- submit01.unibe.ch (130.92.250.231) - SHA256:qmMfIbwyosfLUsY8BMCTgj6HjQ3Im6bAdhCWK9nSiDs.
26+
- submit02.unibe.ch (130.92.250.232) - SHA256:eRTZGWp2bvlEbl8O1pigcONsFZAVKT+hp+5lSQ8lq/A.
27+
- submit03.unibe.ch (130.92.250.233) - SHA256:PUkldwWf86h26PSFHCkEGKsrYlXv668LeSnbHBrMoCQ
28+
- submit04.unibe.ch (130.92.250.234) - SHA256:D3cmfXkb40P7W935J2Un8sBUd4Sv2MNLkvz9isJOnu0.
29+
30+
**Software Stack**
31+
32+
Since the all software modules have changed, we advise recompiling all custom software based on the new toolchains!
33+
34+
The supported software was rebuilt in the most recent stable version of the foss/2023a toolchain if available. You can search for packages or modules containing a specific string using "module spider". You can list all currently available packages using "module avail". Beware, this list is very long! It may be more useful to use "module spider" instead.
35+
36+
In case you're missing software, please follow these steps:
37+
38+
- Check if a newer version of the software is already available (module spider <software>). Please use this version. If this isn't possible you will need to install it yourself. See our documentation for more information.
39+
- Check if your tool/version is available in the easybuilders/easybuild-easyconfigs repository for an easyconfig for foss/2023a or intel/2023a toolchains.
40+
- Follow the instructions in this documenation to install the software to your personal stack. If the software is useful to a larger group of users, please open a ticket. Note that no software from unsupported toolchains will be centrally installed.
41+
42+
**SLURM changes**
43+
44+
Slurm associations are no longer set on partitions. This means it is now possible to submit a job to both the epyc2 as well as the bdw partition, e.g. --partition=epyc2,bdw. When no partition is specified in the job script, partition=epyc2,bdw will be the default. The scheduler will then try to start your job as early as possible on either of the two partitions while prioritizing the partition mentioned first.
45+
46+
To eliminate confusion the QoS "job_gpu_preempt" has been renamed to "job_gpu_preemptable" to indicate that jobs submitted with this QoS are in fact preemptable by investor jobs.
47+
48+
Also, there are no longer personal and workspace accounts. This means you don't have to specify an account when submitting jobs.
49+
50+
Finally, the resources that users can use at the same time, given by the CPU core limit per user in the past has been replaced by a maximum CPU hours limit per user. This should improve the overall scheduling performance.
51+
52+
53+
**Monitoring**
54+
55+
The status web page is now available at https://ubelix.hpc.unibe.ch. Please note that user jobs are no longer displayed on the status page. Use the "squeue --me" command to get a high-level overview of all your active (running and pending) jobs in the cluster.
56+
357
12.01.2024:
458

559
- The user documentation has been streamlined and updated with recent information

docs/slurm/gpus.md

+3-3
Original file line numberDiff line numberDiff line change
@@ -37,11 +37,11 @@ default job QoS:
3737
```
3838

3939

40-
## QoS `job_gpu_preempt`
40+
## QoS `job_gpu_preemptable`
4141

4242
For investors we provide the `gpu-invest` investor partitions with a specific
4343
QoS per investor that guarantees instant access to the purchased resources.
44-
Nevertheless, to efficiently use all resources, the QoS `job_gpu_preempt` exists
44+
Nevertheless, to efficiently use all resources, the QoS `job_gpu_preemptable` exists
4545
in the `gpu` partition. Jobs, submitted with this QoS have access to all GPU
4646
resources, but may be interrupted if resources are required for investor jobs.
4747
Short jobs, and jobs that make use of checkpointing will benefit from these
@@ -51,7 +51,7 @@ Example: Requesting any four RTX3090 from the resource pool in the `gpu`
5151
partition:
5252
```Bash
5353
#SBATCH --partition=gpu
54-
#SBATCH --qos=job_gpu_preempt
54+
#SBATCH --qos=job_gpu_preemptable
5555
#SBATCH --gres=gpu:rtx3090:4
5656
## Use the following option to ensure that the job, if preempted,
5757
## won't be re-queued but canceled instead:

docs/slurm/partitions.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,6 @@ module load Workspace # use the Workspace account
4545
sbatch --partition=gpu-invest job.sh
4646
```
4747

48-
!!! note "Preempt"
48+
!!! note "Preemptable"
4949
The resources dedicated to investors can be used by non-investing users too.
50-
A certain amount of CPUs/GPUs are "reserved" in the investor partitions. But if not used, jobs with the QOS `job_gpu_preempt` can run on these resources. But beware that preemptable jobs may be terminated by investor jobs at any time! Therefore use the qos `job_gpu_preempt` only if your job supports checkpointing or restarts.
50+
A certain amount of CPUs/GPUs are "reserved" in the investor partitions. But if not used, jobs with the QOS `job_gpu_preemptable` can run on these resources. But beware that preemptable jobs may be terminated by investor jobs at any time! Therefore use the qos `job_gpu_preemptable` only if your job supports checkpointing or restarts.

0 commit comments

Comments
 (0)