Choose one or more of the CUDA samples in HPCTrainingExamples/HIPIFY/mini-nbody/cuda
directory. Manually convert it to HIP. Tip: for example, the cudaMalloc will be called hipMalloc.
You can choose from
You'll want to compile on the node you've been allocated so that hipcc will choose the correct GPU architecture.
Use the hipify-perl
script to "hipify" the CUDA samples you used to manually convert to HIP in Exercise 1. hipify-perl is in $ROCM_PATH/hip/bin
directory and should be in your path.
First test the conversion to see what will be converted
hipify-perl -examine
You'll see the statistics of HIP APIs that will be generated. The output might be different depending on the ROCm version.
[HIPIFY] info: file '' statistics:
CONVERTED refs count: 7
TOTAL lines of code: 91
[HIPIFY] info: CONVERTED refs by names:
cudaFree => hipFree: 1
cudaMalloc => hipMalloc: 1
cudaMemcpyDeviceToHost => hipMemcpyDeviceToHost: 1
cudaMemcpyHostToDevice => hipMemcpyHostToDevice: 1
Now let's actually do the conversion.
hipify-perl > nbody-orig.cpp
Compile the HIP programs.
hipcc -DSHMOO -I ../ nbody-orig.cpp -o nbody-orig
The #define SHMOO
fixes some timer printouts. Add --offload-arch=<gpu_type>
to specify the GPU type and avoid the autodetection issues when running on a single GPU on a node.
- Fix any compiler issues, for example, if there was something that didn't hipify correctly.
- Be on the lookout for hard-coded Nvidia specific things like warp sizes and PTX.
Run the program
A batch version of Exercise 2 is:
#SBATCH --ntasks=1
#SBATCH --gpus=1
#SBATCH -p LocalQ
#SBATCH -t 00:10:00
module load rocm
cd HPCTrainingExamples/HIPIFY/mini-nbody/cuda
hipify-perl -print-stats > nbody-orig.cpp
hipcc -DSHMOO -I ../ nbody-orig.cpp -o nbody-orig
- Hipify tools do not check correctness
