demonstrates inference on a VGG16 deep neural network (implements convolution, max-pooling, fully-connected and softmax layers)
build with
inside the folder -
cd data/nets/vgg16 && ./get_vgg16.sh
to download the necessary VGG16 layers/net -
desktop/cli: run with
--image <path/to/image.png>
(image must be 224*224px 32-bit RGBA for now)
image-based warping, implemented using a scatter-based approach and a gather-based approach (based on "Image-Based Bidirectional Scene Reprojection")
also demonstrates use of: tessellation/displacement, argument buffers and indirect command pipelines
build with
inside the folder -
gather-based warping ref: Image-Based Bidirectional Scene Reprojection (original)
NOTE: in addition to floor + dependencies, this requires SDL3_image with libpng, as well as libwarp
N-body simulation to demonstrate local/shared memory buffers, local memory barriers, compute/render buffer sharing, loop unrolling and that high performance computing is indeed possible with this toolchain
build with
inside the folder -
ref: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch31.html
triangle/triangle collision detection of animated meshes using HLBVH (constructed per-step/frame)
build with
inside the folder -
ref: https://research.nvidia.com/sites/default/files/publications/karras2012hpg_paper.pdf
simple (WIP) reduction example that showcases 3 different reduce implementations: local/shared memory reduce, shuffle reduce and CUDA coop kernel + shuffle reduce
inclusive/exclusive scan test
build with
inside the folder
offline-compute-compiler, compiles compute/graphics C++ code to CUDA/PTX, Metal/AIR, OpenCL/SPIR/SPIR-V, Vulkan/SPIR-V or Host-Compute/x86/ARM code/binaries (see --help for all options)
build with
inside the folder