uvarc
diff --git a/‎content/courses/cpp-introduction/setting_up.md
+5-4 b/‎content/courses/cpp-introduction/setting_up.md
+5-4
diff --git a/‎content/courses/fortran-introduction/setting_up.md
+5-4 b/‎content/courses/fortran-introduction/setting_up.md
+5-4
diff --git a/‎content/courses/parallel-computing-introduction/codes/mpi_twod_exchange.py
+107 b/‎content/courses/parallel-computing-introduction/codes/mpi_twod_exchange.py
+107
diff --git a/‎content/courses/parallel-computing-introduction/distributed_mpi_global1.md
+1-1 b/‎content/courses/parallel-computing-introduction/distributed_mpi_global1.md
+1-1
diff --git a/‎content/courses/parallel-computing-introduction/distributed_mpi_global2.md
+1-1 b/‎content/courses/parallel-computing-introduction/distributed_mpi_global2.md
+1-1
diff --git a/‎content/courses/parallel-computing-introduction/distributed_mpi_global3.md
+25-5 b/‎content/courses/parallel-computing-introduction/distributed_mpi_global3.md
+25-5
diff --git a/‎content/courses/parallel-computing-introduction/distributed_mpi_types.md
+57-1 b/‎content/courses/parallel-computing-introduction/distributed_mpi_types.md
+57-1
diff --git a/‎content/courses/parallel-computing-introduction/distributed_mpi_types_example.md
+11 b/‎content/courses/parallel-computing-introduction/distributed_mpi_types_example.md
+11
diff --git a/‎content/courses/parallel-computing-introduction/img/mpi_vector_type.png
9.4 KB b/‎content/courses/parallel-computing-introduction/img/mpi_vector_type.png
9.4 KB
diff --git a/‎content/courses/python-high-performance/_index.md
+7-7 b/‎content/courses/python-high-performance/_index.md
+7-7
diff --git a/‎content/courses/python-high-performance/codes/.ipynb_checkpoints/Untitled-checkpoint.ipynb
+6 b/‎content/courses/python-high-performance/codes/.ipynb_checkpoints/Untitled-checkpoint.ipynb
+6
diff --git a/‎content/courses/python-high-performance/codes/.ipynb_checkpoints/dask_delayed-checkpoint.py
+31 b/‎content/courses/python-high-performance/codes/.ipynb_checkpoints/dask_delayed-checkpoint.py
+31
diff --git a/‎content/courses/python-high-performance/codes/.virtual_documents/Untitled.ipynb
+1 b/‎content/courses/python-high-performance/codes/.virtual_documents/Untitled.ipynb
+1
@@ -61,13 +61,14 @@ Recently, Microsoft has released the Windows Subsystem for Linux ([WSL](https://
 A drawback to both Cygwin and the WSL is portability of executables.  Cygwin executables must be able to find the Cygwin DLL and are therefore not standalone.
 WSL executables only run on the WSL.  For standalone, native binaries a good choice is _MingGW_.  MinGW is derived from Cygwin.
 
-MinGW provides a free distribution of gcc/g++/gfortran.  The standard MinGW distribution is updated fairly rarely and generates only 32-bit executables.  We will describe [MinGW-w64](http://mingw-w64.org/doku.php), a fork of the original project.
+MinGW provides a free distribution of gcc/g++/gfortran.  The standard MinGW distribution is updated fairly rarely and generates only 32-bit executables.  We will describe [MinGW-w64](https://www.mingw-w64.org/), a fork of the original project.
 {{< figure src="/courses/cpp-introduction/img/MinGW1.png" width=500px >}}
 
-MinGW-w64 can be installed beginning from the [MSYS2](https://www.msys2.org/) project.  MSYS2 provides a significant subset of the Cygwin tools.
-Download and install it.
+MinGW-w64 can be installed beginning from the [MSYS2](https://www.msys2.org/) project.  MSYS2 provides a significant subset of the Cygwin tools.  Download and install it.
 {{< figure src="/courses/cpp-introduction/img/MSYS2.png" width=500px >}}
-Once it has been installed, follow the [instructions](https://www.msys2.org/) to open a command-line tool, update the distribution, then install the compilers and tools.
+Once it has been installed, follow the [instructions](https://www.msys2.org/) to open a command-line tool, update the distribution, then install the compilers and tools. 
+
+A discussion of installing MinGW-64 compilers for use with VSCode has been posted by Microsoft [here](https://code.visualstudio.com/docs/cpp/config-mingw). 
 
 _Intel oneAPI_
 First install [Visual Studio](https://visualstudio.microsoft.com/vs/community/).
 
@@ -54,13 +54,14 @@ Recently, Microsoft has released the Windows Subsystem for Linux ([WSL](https://
 A drawback to both Cygwin and the WSL is portability of executables.  Cygwin executables must be able to find the Cygwin DLL and are therefore not standalone.
 WSL executables only run on the WSL.  For standalone, native binaries a good choice is _MingGW_.  MinGW is derived from Cygwin.
 
-MinGW provides a free distribution of gcc/g++/gfortran.  The standard MinGW distribution is updated fairly rarely and generates only 32-bit executables.  We will describe [MinGW-w64](http://mingw-w64.org/doku.php), a fork of the original project.
+MinGW provides a free distribution of gcc/g++/gfortran.  The standard MinGW distribution is updated fairly rarely and generates only 32-bit executables.  We will describe [MinGW-w64](https://www.mingw-w64.org/), a fork of the original project.
 {{< figure src="/courses/fortran-introduction/img/MinGW1.png" width=500px >}}
 
-MinGW-w64 can be installed beginning from the [MSYS2](https://www.msys2.org/) project.  MSYS2 provides a significant subset of the Cygwin tools.
-Download and install it.
+MinGW-w64 can be installed beginning from the [MSYS2](https://www.msys2.org/) project.  MSYS2 provides a significant subset of the Cygwin tools.  Download and install it.
 {{< figure src="/courses/fortran-introduction/img/MSYS2.png" width=500px >}}
-Once it has been installed, follow the [instructions](https://www.msys2.org/) to open a command-line tool, update the distribution, then install the compilers and tools.
+Once it has been installed, follow the [instructions](https://www.msys2.org/) to open a command-line tool, update the distribution, then install the compilers and tools. For Fortran users, the `mingw64` repository may be preferable to the `ucrt64` repo. To find packages, visit their [repository](https://packages.msys2.org/package/). 
+
+A discussion of installing MinGW-64 compilers for use with VSCode has been posted by Microsoft [here](https://code.visualstudio.com/docs/cpp/config-mingw). To use mingw64 rather than ucrt64, simply substitute the text string. Fortran users should install both the C/C++ and Fortran extensions for VSCode.
 
 _Intel oneAPI_
 Download and install the basic toolkit and, for Fortran, the HPC toolkit.
 
@@ -0,0 +1,107 @@
+import sys
+import numpy as np
+from mpi4py import MPI
+
+comm = MPI.COMM_WORLD
+rank = comm.Get_rank()
+nprocs = comm.Get_size()
+
+N = 400
+M = 600
+
+#This example exchanges data among four rectangular domains with halos.
+#Most real codes use squares, but we want to illustrate how to use different
+#dimensions.
+
+#Divide up the processes.  Either we require a perfect square, or we
+#must specify how to distribute by row/column.  In a realistic program,
+#the process distribution (either the total, for a perfect square, or
+#the rows/columns) would be read in and we would need to check that the number
+#of processes requested is consistent with the decomposition.
+
+nproc_rows=2
+nproc_cols=3
+
+if nproc_rows*nproc_cols != nprocs:
+    print("Number of rows times columns does not equal nprocs")
+    sys.exit()
+
+#Strong scaling
+if N%nproc_rows==0 and M%nproc_cols==0:
+    nrl = N//nproc_rows
+    ncl = M//nproc_cols
+else:
+    print("Number of ranks should divide the number of rows evenly.")
+    sys.exit()
+
+w = np.zeros((nrl+2, ncl+2), dtype=np.double)
+
+#Set up the topology assuming processes numbered left to right by row
+
+print("Layout ",nproc_rows,nproc_cols)
+
+my_row=rank%nproc_cols
+my_col=rank%nproc_rows
+
+print("Topology ",rank,my_row,my_col)
+
+#Set up boundary conditions
+if my_row == 0:
+    w[0,:] = 0.        # up
+
+if my_row == nproc_rows-1 :
+    w[nrl+1,:] = 100.   # bottom
+
+if my_col == 0:  
+    w[:,0] = 100.      # left
+
+if my_col == nproc_cols-1:
+    w[:,ncl+1] = 100.   # right
+
+#Arbitrary value for interior that may speed up convergence somewhat.
+#Be sure not to overwrite boundaries.
+w[1:nrl+1,1:ncl+1] = 50.
+
+# setting up the up and down rank for each process
+if my_row == 0 :
+    up = MPI.PROC_NULL
+else :
+    up = rank - nproc_cols
+
+if my_row == nprocs - 1 :
+    down = MPI.PROC_NULL
+else :
+    down = rank + nproc_cols
+
+if my_col == 0 :
+    left = MPI.PROC_NULL
+else:
+    left = rank-1
+
+if my_col == ncl-1:
+    right = MPI.PROC_NULL
+else:
+    right = rank+1
+
+print("Upsie downsie ",rank,my_row,my_col,up,down)
+
+# set up MPI vector type for column
+column=MPI.DOUBLE.Create_vector(nrl,1,ncl)
+column.Commit()
+
+tag=0
+
+# sending up and receiving down
+comm.Sendrecv([w[1,1:ncl+1],MPI.DOUBLE], up, tag, [w[nrl+1,1:ncl+1],MPI.DOUBLE], down, tag)
+# sending down and receiving up
+comm.Sendrecv([w[nrl,1:ncl+1],MPI.DOUBLE], down, tag, [w[0,1:ncl+1],MPI.DOUBLE], up, tag)
+
+# sending right and left.
+comm.Sendrecv((w[0,ncl+1:ncl+2],1,column), right, tag, (w[0:,1],MPI.DOUBLE), left, tag)
+
+comm.Sendrecv((w[0,0:1],1,column), left, tag, (w[0:,ncl],1,MPI.DOUBLE), left, tag)
+
+# Spot-check result
+for n in range(nprocs):
+    if n==rank:
+        print(n,w[0,ncl//2],w[nrl+1,ncl//2])
@@ -1,5 +1,5 @@
 ---
-title: "Global Communication in MPI: One to Many"
+title: "Collective Communication in MPI: One to Many"
 toc: true
 type: docs
 weight: 50
 
@@ -1,5 +1,5 @@
 ---
-title: "Global Communication in MPI: Many To One"
+title: "Collective Communication in MPI: Many To One"
 toc: true
 type: docs
 weight: 52
 
@@ -12,7 +12,7 @@ In many-to-many collective communications, all processes in the communicator gro
 
 ## Barrier
 
-When `MPI_Barrier` is invoked, each process pauses until all processes in the communicator group have called this function.  The `MPI_BARRIER` is used to synchronize processes.  It should be used sparingly, since it "serializes" a parallel program. Most of the global communication routines contain an implicit barrier so an explicit `MPI_Barrier` is not required.
+When `MPI_Barrier` is invoked, each process pauses until all processes in the communicator group have called this function.  The `MPI_BARRIER` is used to synchronize processes.  It should be used sparingly, since it "serializes" a parallel program. Most of the collective communication routines contain an implicit barrier so an explicit `MPI_Barrier` is not required.
 
 ### C++
 ```c++
@@ -65,11 +65,11 @@ As the examples in the previous chapter demonstrated, when MPI_Reduce is called,
 The syntax for `MPI_Allreduce` is identical to that of `MPI_Reduce` but with the root number omitted.
 
 ```c
-int MPI_Allreduce(void *operand, void *result, int count, MPI_Datatype type, MPI_Op operator, MPI_Comm comm );
+int MPI_Allreduce(void *operand, void *result, int ncount, MPI_Datatype type, MPI_Op operator, MPI_Comm comm );
 ```
 
 ```fortran
-call MPI_ALLREDUCE(sendbuf, recvbuf, count, datatype, op, comm, ierr)
+call MPI_ALLREDUCE(sendbuf, recvbuf, ncount, datatype, op, comm, ierr)
 ```
 
 ```python
@@ -137,7 +137,7 @@ Modify the example gather code in your language of choice to perform an Allgathe
 
 In MPI_Alltoall, each process sends data to every other process.  Let us consider the simplest case, when each process sends one item to every other process. Suppose there are three processes and rank 0 has an array containing the values \[0,1,2\], rank 1 has \[10,11,12\], and rank 2 has \[20,21,22\].  Rank 0 keeps (or sends to itself) the 0 value, sends 1 to rank 1, and 2 to rank 2.  Rank 1 sends 10 to rank 0, keeps 11, and sends 12 to rank 2.  Rank 2 sends 20 to rank 0, 21 to rank 1, and keeps 22.
 
-distributed_mpi_global2.md:{{< figure src="/courses/parallel-computing-introduction/img/alltoall.png" caption="Alltoall.  Note that as depicted, the values in the columns are transposed to values as rows." >}}
+{{< figure src="/courses/parallel-computing-introduction/img/alltoall.png" caption="Alltoall.  Note that as depicted, the values in the columns are transposed to values as rows." >}}
 
 ### C++
 {{< spoiler text="alltoall.cxx" >}}
@@ -158,4 +158,24 @@ Two more general forms of alltoall exist; `MPI_Alltoallv`, which is similar to `
 
 ## MPI_IN_PLACE
 
-We often do not need the send buffer once the message has been communicated, and allocating two buffers wastes memory and requires some amount of unneeded communication.  Several MPI procedures allow the special receive buffer `MPI_IN_PLACE`.  When used, the send buffer variable is overwritten with the transmitted data.  The expected send and receive buffers must be the same size for this to be valid.
+We often do not need one buffer once the message has been communicated, and allocating two buffers wastes memory and requires some amount of unneeded communication. MPI collective procedures allow the special buffer `MPI_IN_PLACE`. This special value can be used instead of the receive buffer in `Scatter` and `Scatterv`; in the other collective functions it takes the place of the send buffer.  The expected send and receive buffers must be the same size for this to be valid. As usual for mpi2py, the Python name of the variable is MPI.IN_PLACE.
+
+**Examples**
+
+```c++
+MPI_Scatter(sendbuf, ncount, MPI_Datatype, MPI_IN_PLACE, ncount, MPI_Datatype, root, MPI_COMM_WORLD);
+
+MPI_Reduce(MPI_IN_PLACE, recvbuf, ncount, MPI_Datatype, MPI_Op, root, MPI_COMM_WORLD);
+```
+
+```fortran
+call MPI_Scatter(vals, ncount, MPI_TYPE, MPI_IN_PLACE, ncount, MPI_TYPE, root, MPI_COMM_WORLD)
+
+call MPI_REDUCE(MPI_IN_PLACE, recvbuf, ncount, MPI_TYPE, MPI_Op, root, MPI_COMM_WORLD, ierr)
+```
+
+```python
+comm.Scatter([sendvals,MPI.DOUBLE],MPI.IN_PLACE,root=0)
+
+comm.Reduce(sendarr, MPI.IN_PLACE, operation, root=0)
+```
@@ -1,5 +1,5 @@
 ---
-title: "MPI Types"
+title: "MPI Derived Types"
 toc: true
 type: docs
 weight: 220
@@ -9,3 +9,59 @@ menu:
 ---
 
 Modern programming languages provide data structures that may be called "structs," or "classes," or "types."  These data structures permit grouping of different quantities under a single variable name.
+
+MPI also provides a general type that enables programmer-defined datatypes. Unlike arrays, which must be adjacent in memory, MPI derived datatypes may consist of elements in noncontiguous locations in memory.
+
+While more general derived MPI datatypes are available, one of the most commonly used is the `MPI_TYPE_VECTOR`. This creates a group of elements of size _blocklength_ separated by a constant interval, called the _stride_, in memory. Examples would be generating a type for columns in a row-major-oriented language, or rows in a column-major-oriented language.  
+
+{{< figure src="/courses/parallel-computing-introduction/img/mpi_vector_type.png" caption="Layout in memory for vector type. In this example, the blocklength is 4, the stride is 6, and the count is 3." >}}
+
+C++
+```c++
+MPI_Datatype newtype;
+MPI_Type_vector(ncount, blocklength, stride, oldtype, newtype);
+```
+Fortran
+```fortran
+integer newtype
+!code
+call MPI_TYPE_VECTOR(ncount, blocklength, stride, oldtype, newtype, ierr)
+```
+For both C++ and Fortran, `ncount`, `blocklength`, and `stride` must be integers. The `oldtype` is a pre-existing type, usually a built-in MPI Type such as MPI_FLOAT or MPI_REAL. For C++ the new type would be declared as an `MPI_Datatype`, unless it corresponds to an existing built-in type.  For Fortran `oldtype` would be an integer if not a built-in type. The `newtype` is a name chosen by the programmer.
+
+Python
+```python
+newtype = oldtype.Create_vector(ncount, blocklength, stride)
+```
+
+A derived type must be _committed_ before it can be used.
+
+```c++
+MPI_Type_commit(newtype)
+```
+Fortran
+```fortran
+call MPI_TYPE_COMMIT(newtype,ierr)
+```
+Python
+```
+newtype.Commit()
+```
+
+To use our newly committed type in an MPI communication function, we must pass it the starting position of the data to be placed into the type.
+
+C++
+```c++
+MPI_Send(&a[0][i],1,newtype,i,MPI_COMM_WORLD);
+//We need to pass the first element by reference because an array element
+//is not a pointer
+```
+
+Fortran
+```
+MPI_Send(a(1)(i),1,newtype,i,MPI_COMM_WORLD,ierr)
+```
+
+
+
+
@@ -0,0 +1,11 @@
+---
+title: "MPI Vector Type Example"
+toc: true
+type: docs
+weight: 230
+menu:
+    parallel_programming:
+        parent: Distributed-Memory Programming
+---
+
+Our example will construct an $N \times $M$ array of floating-point numbers.  In C++ and Python we will exchange the "halo" columns using the MPI type, and the rows in the usual way.  In Fortran we will exchange "halo" rows with MPI type and columns with ordinary Sendrecv.
@@ -23,17 +23,17 @@ For this tutorial, it is assumed that you have experience with programming in Py
 
 ## Setup
 
-To follow along for the [Serial Optimization](#serial-optimization-strategies) and [Multiprocessing](#multiprocessing) examples, you can execute the code examples on your own computer or on UVA's high-performance computing cluster, Rivanna.  Examples described in the last section, [Distributed Parallelization](#distributed-parallelization), are best executed on UVA's high-performance computing platform, Rivanna.
+To follow along for the [Serial Optimization](#serial-optimization-strategies) and [Multiprocessing](#multiprocessing) examples, you can execute the code examples on your own computer or on UVA's high-performance computing cluster.  Examples described in the last section, [Distributed Parallelization](#distributed-parallelization), are best executed on UVA's high-performance computing platform.
 
 If you are using your local computer, we recommend the Anaconda distribution (<a href="https://www.anaconda.com/distribution/" target="balnk_">download</a>) to run the code examples. Anaconda provides multiple Python versions, an integrated development environment (IDE) with editor and profiler, Jupyter notebooks, and an easy to use package environment manager.
 
-**If you are using Rivanna, follow these steps to verify that your account is active:**
+**If you are using UVA HPC, follow these steps to verify that your account is active:**
 
-### Check your Access to Rivanna
+### Check your Access to UVA HPC
 
-1. In your web browser, got to <a href="https://rivanna-desktop.hpc.virginia.edu" target="_blank">rivanna-desktop.hpc.virginia.edu</a>.  This takes you to our FastX web portal that lets you launch a remote desktop environment on Rivanna.  If you are off Grounds, you must be connected through the UVA Anywhere VPN client.
+1. In your web browser, go to <a href="https://fastx.hpc.virginia.edu" target="_blank">fastx.hpc.virginia.edu</a>.  This takes you to our FastX web portal that lets you launch a remote desktop environment on a frontend.  If you are off Grounds, you must be connected through the UVA Anywhere VPN client.
 
-2. Log in with your UVA credentials and start a MATE session.  You can find a more detailed description of the Rivanna login procedure <a href="https://www.rc.virginia.edu/userinfo/rivanna/logintools/fastx/" target="_blank">here</a>.
+2. Log in with your UVA credentials and start a MATE session.  You can find a more detailed description of the FastX login procedure <a href="https://www.rc.virginia.edu/userinfo/rivanna/logintools/fastx/" target="_blank">here</a>.
   * **User name:** Your UVA computing id (e.g. mst3k; don't enter your entire email address)
   * **Password:** Your UVA Netbadge password 
 
@@ -44,14 +44,14 @@ python -V
 ```
 You will obtain a response like
 ```
-Python 3.6.6
+Python 3.11.3
 ```
 Now type
 ```
 spyder &
 ```
 
-For Jupyterlab you can use [Open OnDemand](https://rivanna-portal.hpc.virginia.edu).  Jupyterlab is one of the Interactive Apps.  Note that these apps submit a job to the compute nodes.  If you are working on quick development and testing and you wish to use the frontend, to run Jupyter or Jupyterlab on the FastX portal you can run 
+For Jupyterlab you can use [Open OnDemand](https://ood.hpc.virginia.edu).  Jupyterlab is one of the Interactive Apps.  Note that these apps submit a job to the compute nodes.  If you are working on quick development and testing and you wish to use the frontend, to run Jupyter or Jupyterlab on the FastX portal you can run 
 ```
 module load anaconda
 anaconda-navigator &
 
@@ -0,0 +1,6 @@
+{
+ "cells": [],
+ "metadata": {},
+ "nbformat": 4,
+ "nbformat_minor": 5
+}
@@ -0,0 +1,31 @@
+import dask
+from dask.distributed import Client, progress
+import time
+import random
+
+def inc(x):
+    time.sleep(random.random())
+    return x + 1
+
+def dec(x):
+    time.sleep(random.random())
+    return x - 1
+
+def add(x, y):
+    time.sleep(random.random())
+    return x + y
+
+if __name__=="__main__":
+   client = Client(threads_per_worker=4, n_workers=1)
+
+   zs=[]
+   for i in range(20):
+       x = dask.delayed(inc(i))
+       y = dask.delayed(dec(x))
+       z = dask.delayed(add(x, y))
+       z=z.compute()
+       zs.append(z)
+   print(zs)
+
+   client.close()
+
@@ -0,0 +1 @@
+