Skip to content

Commit 5fb8eb4

Browse files
ashaoMattToastjuliaputko
authored
Extend smart build to CUDA-11, CUDA-12, and ROCm (CrayLabs#669)
- The RedisAIBuilder class was completely overhauled to allow users to express a wider range of support for hardware/software stacks. This will be extended to support ROCm, CUDA-11, and CUDA-12. - Versions for each of these packages are no longer specified in an internal class. Instead a default set of JSON files specifies the sources and versions. Users can specify their own custom specifications at smart build time --------- [ committed by @ashao ] [ reviewed by @MattToast @juliaputko ] Co-authored-by: Matt Drozt <[email protected]> Co-authored-by: Julia Putko <[email protected]>
1 parent 72be515 commit 5fb8eb4

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

51 files changed

+2534
-1970
lines changed

.github/workflows/run_tests.yml

+4-12
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,7 @@ env:
4949

5050
jobs:
5151
run_tests:
52-
name: Run tests ${{ matrix.subset }} with ${{ matrix.os }}, Python ${{ matrix.py_v}}, RedisAI ${{ matrix.rai }}
52+
name: Run tests ${{ matrix.subset }} with ${{ matrix.os }}, Python ${{ matrix.py_v}}
5353
runs-on: ${{ matrix.os }}
5454
strategy:
5555
fail-fast: false
@@ -63,9 +63,6 @@ jobs:
6363
- os: macos-14
6464
py_v: "3.9"
6565

66-
env:
67-
SMARTSIM_REDISAI: ${{ matrix.rai }}
68-
6966
steps:
7067
- uses: actions/checkout@v4
7168
- uses: actions/setup-python@v5
@@ -109,15 +106,10 @@ jobs:
109106
- name: Install SmartSim (with ML backends)
110107
run: |
111108
python -m pip install git+https://github.com/CrayLabs/SmartRedis.git@develop#egg=smartredis
112-
python -m pip install .[dev,mypy,ml]
113-
114-
- name: Install ML Runtimes with Smart (with pt, tf, and onnx support)
115-
if: contains( matrix.os, 'ubuntu' ) || contains( matrix.os, 'macos-12')
116-
run: smart build --device cpu --onnx -v
109+
python -m pip install .[dev,mypy]
117110
118-
- name: Install ML Runtimes with Smart (no ONNX,TF on Apple Silicon)
119-
if: contains( matrix.os, 'macos-14' )
120-
run: smart build --device cpu --no_tf -v
111+
- name: Install ML Runtimes
112+
run: smart build --device cpu -v
121113

122114
- name: Run mypy
123115
run: |

.gitignore

+1
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,7 @@ tests/test_output
1212
# Dependencies
1313
smartsim/_core/.third-party
1414
smartsim/_core/.dragon
15+
smartsim/_core/build
1516

1617
# Docs
1718
_build

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -643,11 +643,11 @@ from C, C++, Fortran and Python with the SmartRedis Clients:
643643
<tr>
644644
<td rowspan="3">1.2.7</td>
645645
<td>PyTorch</td>
646-
<td>2.0.1</td>
646+
<td>2.1.0</td>
647647
</tr>
648648
<tr>
649649
<td>TensorFlow\Keras</td>
650-
<td>2.13.1</td>
650+
<td>2.15.0</td>
651651
</tr>
652652
<tr>
653653
<td>ONNX</td>

doc/changelog.md

+33
Original file line numberDiff line numberDiff line change
@@ -9,6 +9,39 @@ Jump to:
99

1010
## SmartSim
1111

12+
### Cuda 12 and ROCm support branch
13+
14+
To be merged into `develop` at some future point in time
15+
16+
Description
17+
18+
- Refactor to the RedisAI build to allow more flexibility in versions
19+
and sources of ML backends
20+
- Add Dockerfiles with GPU support
21+
- Fine grain build support for GPUs
22+
- Update Torch to 2.1.0, Tensorflow to 2.15.0
23+
- Better error messages in build process
24+
25+
Detailed Notes
26+
27+
- The RedisAIBuilder class was completely overhauled to allow users to
28+
express a wider range of support for hardware/software stacks. This
29+
will be extended to support ROCm, CUDA-11, and CUDA-12.
30+
- Versions for each of these packages are no longer specified in an
31+
internal class. Instead a default set of JSON files specifies the
32+
sources and versions. Users can specify their own custom specifications
33+
at smart build time
34+
- Two new Dockerfiles are now provided (one each for 11.8 and 12.1) that
35+
can be used to build a container to run the tutorials. No HPC support
36+
should be expected at this time
37+
- SmartSim can now be built using Cuda version 11.8 or Cuda 12.1 by specify
38+
`smart build --device=cuda118` or `smart build --device=cuda121`. The
39+
original `smart build --device=gpu` will default to using Cuda 11.8.
40+
- As a result of the previous change, SmartSim now requires C++17 and a
41+
minimum Cuda version of 11.8 in order to build Torch 2.1.0.
42+
- Error messages were not being interpolated correctly. This has been
43+
addressed to provide more context when exposing error messages to users.
44+
1245
### Development branch
1346

1447
To be released at some future point in time

0 commit comments

Comments
 (0)