Slow Training with GPU #1

jecummin · 2025-03-06T15:14:09Z

I ran the usage example in the README after setting up the repo and environment, but the training speeds seems to be much slower than the README suggests.

> git clone https://github.com/iliao2345/CompressARC.git
> cd CompressARC
> mamba create -n compress
> mamba activate compress
> pip install -r requirements.txt
> python analyze_example.py
Enter which split you want to find the task in (training, evaluation, test): training
Enter which task you want to analyze (eg. 272f95fa): 272f95fa
...

The progress bar estimates that the task will take ~50min to train. Roughly 3-4x slower than the README suggests. I've tried this with both an NVIDIA T4 and an NVIDIA A100 GPU for processing. I confirmed that the GPU was in use during training. By comparison, if I restrict training to only the CPU, training time is estimated to be ~1hr. The GPU doesn't seem to be accelerating much.

I had made no edits to the code.

The text was updated successfully, but these errors were encountered:

iliao2345 · 2025-03-06T22:18:25Z

Hi there,

Thanks for testing out the code and providing your feedback. All our performance benchmarks were conducted using an NVIDIA RTX 4070, so the training speeds mentioned in the README are based solely on that hardware. I realize that GPUs like the T4 and A100 are typically considered more powerful, but their architectures and optimal usage can differ significantly.

For instance, the A100 can be much more efficient when using FP16 precision, Tensor Cores, and optimizations like torch.compile(), none of which are currently enabled in the code. Additionally, if the model is more compute-heavy rather than memory-bound, it might actually perform better on the 4070 in our specific case.

It might be worth experimenting with FP16 to see if that improves performance on your setup.

Thanks again for reporting this, and please let me know if you discover any other issues!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow Training with GPU #1

Slow Training with GPU #1

jecummin commented Mar 6, 2025 •

edited

Loading

iliao2345 commented Mar 6, 2025

Slow Training with GPU #1

Slow Training with GPU #1

Comments

jecummin commented Mar 6, 2025 • edited Loading

iliao2345 commented Mar 6, 2025

jecummin commented Mar 6, 2025 •

edited

Loading