Skip to content

Commit ca0a574

Browse files
committed
update for Fall 2024; tf 2.17
1 parent a5f3893 commit ca0a574

File tree

1 file changed

+26
-25
lines changed
  • content/courses/containers-for-hpc

1 file changed

+26
-25
lines changed

content/courses/containers-for-hpc/using.md

+26-25
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ $ apptainer exec lolcow_latest.sif which fortune
115115

116116
- Apptainer bind mounts these host directories at runtime:
117117
- Personal directories: `/home`, `/scratch`
118-
- Leased storage shared by your research group: `/project`, `/standard`, `/nv`
118+
- Leased storage shared by your research group: `/project`, `/standard`
119119
- Your current working directory
120120
- To bind mount additional host directories/files, use `--bind`/`-B`:
121121

@@ -170,11 +170,11 @@ The corresponding `run` command is displayed upon loading a module.
170170
```bash
171171
$ module load tensorflow
172172
To execute the default application inside the container, run:
173-
apptainer run --nv $CONTAINERDIR/tensorflow-2.10.0.sif
173+
apptainer run --nv $CONTAINERDIR/tensorflow-2.13.0.sif
174174

175175
$ module list
176176
Currently Loaded Modules:
177-
1) apptainer/1.2.2 2) tensorflow/2.10.0
177+
1) apptainer/1.2.2 2) tensorflow/2.13.0
178178
```
179179

180180
- `$CONTAINERDIR` is an environment variable. It is the directory where containers are stored.
@@ -204,7 +204,7 @@ Currently Loaded Modules:
204204
Copy these files:
205205

206206
```bash
207-
cp /share/resources/tutorials/apptainer_ws/tensorflow-2.10.0.slurm .
207+
cp /share/resources/tutorials/apptainer_ws/tensorflow-2.13.0.slurm .
208208
cp /share/resources/tutorials/apptainer_ws/mnist_example.{ipynb,py} .
209209
```
210210

@@ -213,36 +213,37 @@ Examine Slurm script:
213213
```bash
214214
#!/bin/bash
215215
#SBATCH -A hpc_training # account name
216-
#SBATCH -p gpu # partition/queue
217-
#SBATCH --gres=gpu:1 # request 1 gpu
218-
#SBATCH -c 1 # request 1 cpu core
219-
#SBATCH -t 00:05:00 # time limit: 5 min
220-
#SBATCH -J tftest # job name
221-
#SBATCH -o tftest-%A.out # output file
222-
#SBATCH -e tftest-%A.err # error file
223-
216+
#SBATCH -p gpu # partition/queue
217+
#SBATCH --gres=gpu:1 # request 1 gpu
218+
#SBATCH -c 1 # request 1 cpu core
219+
#SBATCH -t 00:05:00 # time limit: 5 min
220+
#SBATCH -J tftest # job name
221+
#SBATCH -o tftest-%A.out # output file
222+
#SBATCH -e tftest-%A.err # error file
223+
224+
VERSION=2.13.0
224225
# start with clean environment
225226
module purge
226-
module load apptainer tensorflow/2.10.0
227+
module load apptainer tensorflow/$VERSION
227228
228-
apptainer run --nv $CONTAINERDIR/tensorflow-2.10.0.sif mnist_example.py
229+
apptainer run --nv $CONTAINERDIR/tensorflow-$VERSION.sif mnist_example.py
229230
```
230231

231232
Submit job:
232233

233234
```bash
234-
sbatch tensorflow-2.10.0.slurm
235+
sbatch tensorflow-2.13.0.slurm
235236
```
236237

237238
#### What does `--nv` do?
238239

239240
See [Apptainer GPU user guide](https://apptainer.org/user-docs/master/gpu.html#nvidia-gpus-cuda-standard)
240241

241242
```bash
242-
$ apptainer shell $CONTAINERDIR/tensorflow-2.10.0.sif
243+
$ apptainer shell $CONTAINERDIR/tensorflow-2.13.0.sif
243244
Apptainer> ls /.singularity.d/libs
244245
245-
$ apptainer shell --nv $CONTAINERDIR/tensorflow-2.10.0.sif
246+
$ apptainer shell --nv $CONTAINERDIR/tensorflow-2.13.0.sif
246247
Apptainer> ls /.singularity.d/libs
247248
libEGL.so libGLX.so.0 libnvidia-cfg.so libnvidia-ifr.so
248249
libEGL.so.1 libGLX_nvidia.so.0 libnvidia-cfg.so.1 libnvidia-ifr.so.1
@@ -255,21 +256,21 @@ libEGL.so.1 libGLX_nvidia.so.0 libnvidia-cfg.so.1 libnvidia-ifr.so.
255256

256257
### "Can I use my own container on JupyterLab?"
257258

258-
Suppose you need to use TensorFlow 2.11.0 on JupyterLab. First, note we do not have `tensorflow/2.11.0` as a module:
259+
Suppose you need to use TensorFlow 2.17.0 on JupyterLab. First, note we do not have `tensorflow/2.17.0` as a module:
259260

260261
```bash
261262
module spider tensorflow
262263
```
263264

264-
Go to [TensorFlow's Docker Hub page](https://hub.docker.com/r/tensorflow/tensorflow/tags?page=1&name=2.11.0) and search for the tag (i.e. version). You'll want to use one that has the `-gpu-jupyter` suffix. Pull the container in your account.
265+
Go to [TensorFlow's Docker Hub page](https://hub.docker.com/r/tensorflow/tensorflow) and search for the tag (i.e. version). You'll want to use one that has the `-gpu-jupyter` suffix. Pull the container in your account.
265266

266267
### Installation
267268

268269
#### Manual
269270
1. Create kernel directory
270271

271272
```bash
272-
DIR=~/.local/share/jupyter/kernels/tensorflow-2.11.0
273+
DIR=~/.local/share/jupyter/kernels/tensorflow-2.17.0
273274
mkdir -p $DIR
274275
cd $DIR
275276
```
@@ -279,11 +280,11 @@ cd $DIR
279280
```
280281
{
281282
"argv": [
282-
"/home/<user>/.local/share/jupyter/kernels/tensorflow-2.11.0/init.sh",
283+
"/home/<user>/.local/share/jupyter/kernels/tensorflow-2.17.0/init.sh",
283284
"-f",
284285
"{connection_file}"
285286
],
286-
"display_name": "Tensorflow 2.11",
287+
"display_name": "Tensorflow 2.17",
287288
"language": "python"
288289
}
289290
```
@@ -315,7 +316,7 @@ Usage: jkrollout sif display_name [gpu]
315316
```
316317

317318
```bash
318-
jkrollout /path/to/sif "Tensorflow 2.11" gpu
319+
jkrollout /path/to/sif "Tensorflow 2.17" gpu
319320
```
320321

321322
### Test your new kernel
@@ -325,13 +326,13 @@ jkrollout /path/to/sif "Tensorflow 2.11" gpu
325326
- Partition: GPU
326327
- Work Directory: (location of your `mnist_example.ipynb`)
327328
- Allocation: `hpc_training`
328-
- Select the new "TensorFlow 2.11" kernel
329+
- Select the new "TensorFlow 2.17" kernel
329330
- Run `mnist_example.ipynb`
330331

331332
### Remove a custom kernel
332333

333334
```bash
334-
rm -rf ~/.local/share/jupyter/kernels/tensorflow-2.11.0
335+
rm -rf ~/.local/share/jupyter/kernels/tensorflow-2.17.0
335336
```
336337

337338
---

0 commit comments

Comments
 (0)