uvarc
diff --git a/‎content/courses/containers-for-hpc/using.md
+30-29 b/‎content/courses/containers-for-hpc/using.md
+30-29
diff --git a/‎content/courses/fortran-introduction/array_intrinsics.md
+3 b/‎content/courses/fortran-introduction/array_intrinsics.md
+3
diff --git a/‎content/courses/parallel-computing-introduction/codes/a.out
4.45 KB b/‎content/courses/parallel-computing-introduction/codes/a.out
4.45 KB
diff --git a/‎content/courses/parallel-computing-introduction/codes/mpi_vector_type.f90
+86 b/‎content/courses/parallel-computing-introduction/codes/mpi_vector_type.f90
+86
diff --git a/‎content/courses/parallel-computing-introduction/distributed_mpi_setup.md
+36-8 b/‎content/courses/parallel-computing-introduction/distributed_mpi_setup.md
+36-8
@@ -13,7 +13,7 @@ Log on to our HPC cluster
 - Run `hdquota`
     - Make sure you have a few GBs of free space
 - Run `allocations`
-    - Check if you have `rivanna-training`
+    - Check if you have `hpc_training`
 
 ---
 
@@ -115,7 +115,7 @@ $ apptainer exec lolcow_latest.sif which fortune
 
 - Apptainer bind mounts these host directories at runtime:
     - Personal directories: `/home`, `/scratch`
-    - Leased storage shared by your research group: `/project`, `/standard`, `/nv`
+    - Leased storage shared by your research group: `/project`, `/standard`
     - Your current working directory
 - To bind mount additional host directories/files, use `--bind`/`-B`:
 
@@ -170,11 +170,11 @@ The corresponding `run` command is displayed upon loading a module.
 ```bash
 $ module load tensorflow
 To execute the default application inside the container, run:
-apptainer run --nv $CONTAINERDIR/tensorflow-2.10.0.sif
+apptainer run --nv $CONTAINERDIR/tensorflow-2.13.0.sif
 
 $ module list
 Currently Loaded Modules:
-  1) apptainer/1.2.2   2) tensorflow/2.10.0
+  1) apptainer/1.2.2   2) tensorflow/2.13.0
 ```
 
 - `$CONTAINERDIR` is an environment variable. It is the directory where containers are stored.
@@ -204,45 +204,46 @@ Currently Loaded Modules:
 Copy these files:
 
 ```bash
-cp /share/resources/tutorials/apptainer_ws/tensorflow-2.10.0.slurm .
+cp /share/resources/tutorials/apptainer_ws/tensorflow-2.13.0.slurm .
 cp /share/resources/tutorials/apptainer_ws/mnist_example.{ipynb,py} .
 ```
 
 Examine Slurm script:
 
 ```bash
 #!/bin/bash
-#SBATCH -A rivanna-training      # account name
-#SBATCH -p gpu                   # partition/queue
-#SBATCH --gres=gpu:1             # request 1 gpu
-#SBATCH -c 1                     # request 1 cpu core
-#SBATCH -t 00:05:00              # time limit: 5 min
-#SBATCH -J tftest                # job name
-#SBATCH -o tftest-%A.out         # output file
-#SBATCH -e tftest-%A.err         # error file
-
+#SBATCH -A hpc_training      # account name
+#SBATCH -p gpu               # partition/queue
+#SBATCH --gres=gpu:1         # request 1 gpu
+#SBATCH -c 1                 # request 1 cpu core
+#SBATCH -t 00:05:00          # time limit: 5 min
+#SBATCH -J tftest            # job name
+#SBATCH -o tftest-%A.out     # output file
+#SBATCH -e tftest-%A.err     # error file
+
+VERSION=2.13.0
 # start with clean environment
 module purge
-module load apptainer tensorflow/2.10.0
+module load apptainer tensorflow/$VERSION
 
-apptainer run --nv $CONTAINERDIR/tensorflow-2.10.0.sif mnist_example.py
+apptainer run --nv $CONTAINERDIR/tensorflow-$VERSION.sif mnist_example.py
 ```
 
 Submit job:
 
 ```bash
-sbatch tensorflow-2.10.0.slurm
+sbatch tensorflow-2.13.0.slurm
 ```
 
 #### What does `--nv` do?
 
 See [Apptainer GPU user guide](https://apptainer.org/user-docs/master/gpu.html#nvidia-gpus-cuda-standard)
 
 ```bash
-$ apptainer shell $CONTAINERDIR/tensorflow-2.10.0.sif
+$ apptainer shell $CONTAINERDIR/tensorflow-2.13.0.sif
 Apptainer> ls /.singularity.d/libs
 
-$ apptainer shell --nv $CONTAINERDIR/tensorflow-2.10.0.sif
+$ apptainer shell --nv $CONTAINERDIR/tensorflow-2.13.0.sif
 Apptainer> ls /.singularity.d/libs
 libEGL.so		  libGLX.so.0		       libnvidia-cfg.so			  libnvidia-ifr.so
 libEGL.so.1		  libGLX_nvidia.so.0	       libnvidia-cfg.so.1		  libnvidia-ifr.so.1
@@ -255,21 +256,21 @@ libEGL.so.1		  libGLX_nvidia.so.0	       libnvidia-cfg.so.1		  libnvidia-ifr.so.
 
 ### "Can I use my own container on JupyterLab?"
 
-Suppose you need to use TensorFlow 2.11.0 on JupyterLab. First, note we do not have `tensorflow/2.11.0` as a module:
+Suppose you need to use TensorFlow 2.17.0 on JupyterLab. First, note we do not have `tensorflow/2.17.0` as a module:
 
 ```bash
 module spider tensorflow
 ```
 
-Go to [TensorFlow's Docker Hub page](https://hub.docker.com/r/tensorflow/tensorflow/tags?page=1&name=2.11.0) and search for the tag (i.e. version). You'll want to use one that has the `-gpu-jupyter` suffix. Pull the container in your account.
+Go to [TensorFlow's Docker Hub page](https://hub.docker.com/r/tensorflow/tensorflow) and search for the tag (i.e. version). You'll want to use one that has the `-gpu-jupyter` suffix. Pull the container in your account.
 
 ### Installation
 
 #### Manual
 1. Create kernel directory
 
 ```bash
-DIR=~/.local/share/jupyter/kernels/tensorflow-2.11.0
+DIR=~/.local/share/jupyter/kernels/tensorflow-2.17.0
 mkdir -p $DIR
 cd $DIR
 ```
@@ -279,11 +280,11 @@ cd $DIR
 ```
 {
  "argv": [
-  "/home/<user>/.local/share/jupyter/kernels/tensorflow-2.11.0/init.sh",
+  "/home/<user>/.local/share/jupyter/kernels/tensorflow-2.17.0/init.sh",
   "-f",
   "{connection_file}"
  ],
- "display_name": "Tensorflow 2.11",
+ "display_name": "Tensorflow 2.17",
  "language": "python"
 }
 ```
@@ -315,23 +316,23 @@ Usage: jkrollout sif display_name [gpu]
 ```
 
 ```bash
-jkrollout /path/to/sif "Tensorflow 2.11" gpu
+jkrollout /path/to/sif "Tensorflow 2.17" gpu
 ```
 
 ### Test your new kernel
 
-- Go to https://rivanna-portal.hpc.virginia.edu
+- Go to https://ood.hpc.virginia.edu
 - Select JupyterLab
     - Partition: GPU
     - Work Directory: (location of your `mnist_example.ipynb`)
-    - Allocation: `rivanna-training`
-- Select the new "TensorFlow 2.11" kernel
+    - Allocation: `hpc_training`
+- Select the new "TensorFlow 2.17" kernel
 - Run `mnist_example.ipynb`
 
 ### Remove a custom kernel
 
 ```bash
-rm -rf ~/.local/share/jupyter/kernels/tensorflow-2.11.0
+rm -rf ~/.local/share/jupyter/kernels/tensorflow-2.17.0
 ```
 
 ---
 
@@ -25,6 +25,7 @@ These create new arrays from old. `PACK` and `UNPACK` can be used to "flatten" m
 
 ```fortran
 ! Convert an array from one shape to another (total size must match)
+! SHAPE must be a rank-one array whose elements are sizes in each dimension
 RESHAPE(SOURCE,SHAPE[,PAD][,ORDER])
 ! Combine two arrays of same shape and size according to MASK
 !   Take from ARR1 where MASK is .true., ARR2 where it is .false.
@@ -37,8 +38,10 @@ SPREAD(SOURCE,DIM,NCOPIES)
 ```
 **Example**
 ```fortran
+!Array and mask are of size NxM
 mask=A<0
 merge(A,0,mask)
+B=reshape(A,(/M,N/))
 ! for C=1, D=[1,2]
 print *, spread(C, 1, 2)            ! "1 1"
 print *, spread(D, 1, 2)            ! "1 1 2 2"
 
@@ -0,0 +1,86 @@
+program sendrows
+   use mpi_f08
+
+   double precision, allocatable, dimension(:,:)  :: u,w
+   integer            :: N
+   integer            :: i,j
+
+   integer            :: nr, nc
+   integer            :: rank, nprocs, tag=0
+   integer            :: err, errcode
+   integer            :: ncount, blocklength, stride
+   type(MPI_Status),  dimension(:), allocatable  :: mpi_status_arr
+   type(MPI_Request), dimension(:), allocatable  :: mpi_requests
+   type(MPI_Datatype) :: rows
+
+   integer, parameter :: root=0
+   integer            :: src, dest
+
+   !Initialize MPI, get the local number of columns
+   call MPI_INIT()
+   call MPI_COMM_SIZE(MPI_COMM_WORLD,nprocs)
+   call MPI_COMM_RANK(MPI_COMM_WORLD,rank)
+
+   !We will make the matrix scale with number of processes for simplicity
+   nr=nprocs
+   nc=nprocs
+
+   allocate(u(nr,nc),w(nr,nc))
+   allocate(mpi_requests(2*nprocs),mpi_status_arr(2*nprocs))
+   u=0.0d0
+   w=0.0d0
+
+   !Cyclic sending
+   if (rank == nprocs-1) then
+       src=rank-1
+       dest=0
+   else if (rank==0) then
+       src=nprocs-1
+       dest=rank+1
+   else
+       src=rank-1
+       dest=rank+1
+   endif
+
+   ncount=1
+   blocklength=nc
+   stride=nr
+
+   call MPI_Type_vector(ncount,blocklength,stride,MPI_DOUBLE_PRECISION,rows)
+
+   call MPI_TYPE_COMMIT(rows)
+
+   do i=0,nprocs-1
+       if (rank==i) then
+           tag=i
+           print *, i,i+1,i+nprocs+1
+           if (i==0) then
+               call MPI_Irecv(w(nprocs,1),1,rows,src,tag,MPI_COMM_WORLD,mpi_requests(i+1))
+               call MPI_Isend(u(i+1,1),1,rows,dest,tag,MPI_COMM_WORLD,mpi_requests(i+nprocs+1))
+            else if (i==nprocs-1) then
+               call MPI_Irecv(w(1,1),1,rows,src,tag,MPI_COMM_WORLD,mpi_requests(i+1))
+               call MPI_Isend(u(nprocs,1),1,rows,dest,tag,MPI_COMM_WORLD,mpi_requests(i+nprocs+1))
+            else
+               call MPI_Irecv(w(i+2,1),1,rows,src,tag,MPI_COMM_WORLD,mpi_requests(i+1))
+               call MPI_Isend(u(i+1,1),1,rows,dest,tag,MPI_COMM_WORLD,mpi_requests(i+nprocs+1))
+            endif
+        endif
+   enddo
+
+   call MPI_Waitall(size(mpi_requests),mpi_requests,mpi_status_arr)
+
+
+   call MPI_TYPE_FREE(rows)
+
+   !Print neatly
+   do i=1,nr
+      write(*,*) "|",u(i,:),"|","    |",w(i,:),"|"
+   enddo
+
+   call MPI_Finalize()
+
+end program
+
+
+
+
@@ -9,15 +9,44 @@ menu:
         parent: Distributed-Memory Programming
 ---
 
+Using MPI requires access to a computer with at least one node with multiple cores.  The Message Passing Interface is a standard and there are multiple implementations of it, so a choice of distribution must be made.  Popular implementations include [MPICH](https://www.mpich.org/), [OpenMPI](https://www.open-mpi.org/), [MVAPICH2](https://mvapich.cse.ohio-state.edu/), and [IntelMPI](https://www.intel.com/content/www/us/en/developer/tools/oneapi/mpi-library.html#gs.gdkhva).  MPICH, OpenMPI, and MVAPICH2 must be built for a system, so a compiler must be chosen as well.  IntelMPI is typically used with the Intel compiler and is provided by the vendor as part of their [HPC Toolkit](https://www.intel.com/content/www/us/en/developer/tools/oneapi/hpc-toolkit.html#gs.gdkm8y).  MVAPICH2 is a version of MPICH that is specialized for high-speed [Infiniband](https://en.wikipedia.org/wiki/InfiniBand) networks on high-performance clusters, so would generally not be appropriate for installation on individual computers.
+
 ### On a Remote Cluster
 
-Refer to the instructions from your site, for example [UVA Research Computing](https://www.rc.virginia.edu/userinfo/howtos/rivanna/mpi-howto/) for our local environment.  Nearly always, you will be required to prepare your code and run it through a _resource manager_ such as [Slurm](https://www.rc.virginia.edu/userinfo/rivanna/slurm/).
+Refer to the instructions from your site, for example [UVA Research Computing](https://www.rc.virginia.edu/userinfo/howtos/rivanna/mpi-howto/) for our local environment.  Nearly always, you will be required to prepare your code and run it through a _resource manager_ such as [Slurm](https://www.rc.virginia.edu/userinfo/rivanna/slurm/).  Most HPC sites use a _modules_ system, so generally you will need to load modules for an MPI version and usually the corresponding compiler.  It is important to be sure that you use a version of MPI that can communicate correctly with your resource manager.
+```bash
+module load gcc
+module load openmpi
+```
+is an example setup for compiled-language users. 
+
+For Python, the mpi4py package is most widely available. It is generally preferable, and may be required, that mpi4py be installed from the conda-forge repository.  On a cluster, mpi4py will need to link to a locally-built version of MPI that can communicate with the resource manager.  The conda-forge maintainers provide instructions for this [here](https://conda-forge.org/docs/user/tipsandtricks/#using-external-message-passing-interface-mpi-libraries). In our example, we will use openmpi.  First we must load the modules for the compiler and MPI version:
 
-For Python, you will need to install mpi4py.  You may wish to create a conda environment for it.  On the UVA system you must use `pip` rather than conda. 
 ```bash
 module load gcc openmpi
-module load anaconda
-pip install --user mpi4py
+```
+We must not install OpenMPI directly from conda-forge; rather we make use of the "hooks" they have provided.  
+```bash
+module list openmpi
+```
+In our example, the module list returns
+```bash
+Currently Loaded Modules Matching: openmpi
+  1) openmpi/4.1.4
+```
+Now we check that our version of OpenMPI is available
+```bash
+conda search -f openmpi -c conda-forge
+```
+Most versions are there, so we can install the one we need
+```bash
+conda install -c conda-forge "openmpi=4.1.4=external_*"
+```
+Be sure to include the `external_*` string.  
+
+After this completes, we can install mpi4py
+```bash
+conda install -c conda-forge mpi4py
 ```
 
 ### On a Local Computer
@@ -32,7 +61,7 @@ The author of mpi4py [recommends](https://mpi4py.readthedocs.io/en/stable/instal
 ```no-highlight
 python -m pip install mpi4py
 ```
-This may avoid some issues that occasionally arise in prebuilt mpi4py packages. Be sure that an appropriate `mpicc` executable is in the path.  Alternatively, use the `conda-forge` channel (recommended in general for most scientific software). 
+This may avoid some issues that occasionally arise in prebuilt mpi4py packages. Be sure that an appropriate `mpicc` executable is in the path.  Alternatively, use the `conda-forge` channel (recommended in general for most scientific software).  Most of the time, if you are installing mpi4py from conda-forge, you can simply install the package.  MPICH is the default when installed as a prerequisite for conda-forge.
 
 #### Linux
 
@@ -47,7 +76,7 @@ Installing the HPC Toolkit will also install IntelMPI.
 _NVIDIA HPC SDK_
 The NVIDIA software ships with a precompiled version of OpenMPI.
 
-The headers and libraries for MPI _must_ match.  Using a header from one MPI and libraries from another, or using headers from a version from one compiler and libraries from a different compiler, usually results in some difficult-to-interpret bugs.  Moreover, the process manager must be compatible with the MPI used to compile the code.  Because of this, if more than one compiler and especially more than one MPI version is installed, the use of _modules_ ([environment modules](http://modules.sourceforge.net/) or [lmod](https://lmod.readthedocs.io/en/latest/)) becomes particularly beneficial.  Both Intel and NVIDIA provide scripts for the environment modules package (lmod can also read these), with possibly some setup required.  If you plan to use mpi4py as well as compiled-language versions, creating a module for your Python distribution would also be advisable.
+The headers and libraries for MPI _must_ match.  Using a header from one MPI and libraries from another, or using headers from a version from one compiler and libraries from a different compiler, usually results in some difficult-to-interpret bugs.  Moreover, the process manager must be compatible with the MPI used to compile the code.  Because of this, if more than one compiler and especially more than one MPI version is installed, the use of _modules_ ([environment modules](http://modules.sourceforge.net/) or [lmod](https://lmod.readthedocs.io/en/latest/)) becomes particularly beneficial.  Both Intel and NVIDIA provide scripts for the environment modules package (lmod can also read these), with possibly some setup required.  If you plan to use mpi4py as well as compiled-language versions, creating a module for your Python distribution would also be advisable. Installation of a module system on an individual Linux system is straightforward for an administrator with some experience.
 
 #### Mac OS
 
@@ -63,7 +92,7 @@ The NVIDIA suite is not available for Mac OS.
 #### Windows
 
 _GCC_
-The simplest way to use OpenMPI on Windows is through [Cygwin](https://www.cygwin.com/).  In this case, the gcc compiler suite would first be installed, with g++ and/or gfortran added.  Then the openmpi package could also be installed through the cygwin package manager.
+The easiest way to use OpenMPI on Windows is through [Cygwin](https://www.cygwin.com/).  In this case, the gcc compiler suite would first be installed, with g++ and/or gfortran added.  Then the openmpi package could also be installed through the cygwin package manager.
 
 _Intel oneAPI_
 Install the HPC Toolkit. 
@@ -74,4 +103,3 @@ Download the package when it is available.
 MPI codes must generally be compiled and run through a command line on Windows.  Cygwin users can find a variety of tutorials online, for example [here](https://www.youtube.com/watch?v=ENH70zSaztM). 
 
 The Intel oneAPI Basic Toolkit includes a customized command prompt in its folder in the Apps menu.
-