You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I ran the usage example in the README after setting up the repo and environment, but the training speeds seems to be much slower than the README suggests.
> git clone https://github.com/iliao2345/CompressARC.git
> cd CompressARC
> mamba create -n compress
> mamba activate compress
> pip install -r requirements.txt
> python analyze_example.py
Enter which split you want to find the task in (training, evaluation, test): training
Enter which task you want to analyze (eg. 272f95fa): 272f95fa
...
The progress bar estimates that the task will take ~50min to train. Roughly 3-4x slower than the README suggests. I've tried this with both an NVIDIA T4 and an NVIDIA A100 GPU for processing. I confirmed that the GPU was in use during training. By comparison, if I restrict training to only the CPU, training time is estimated to be ~1hr. The GPU doesn't seem to be accelerating much.
I had made no edits to the code.
The text was updated successfully, but these errors were encountered:
Thanks for testing out the code and providing your feedback. All our performance benchmarks were conducted using an NVIDIA RTX 4070, so the training speeds mentioned in the README are based solely on that hardware. I realize that GPUs like the T4 and A100 are typically considered more powerful, but their architectures and optimal usage can differ significantly.
For instance, the A100 can be much more efficient when using FP16 precision, Tensor Cores, and optimizations like torch.compile(), none of which are currently enabled in the code. Additionally, if the model is more compute-heavy rather than memory-bound, it might actually perform better on the 4070 in our specific case.
It might be worth experimenting with FP16 to see if that improves performance on your setup.
Thanks again for reporting this, and please let me know if you discover any other issues!
I ran the usage example in the README after setting up the repo and environment, but the training speeds seems to be much slower than the README suggests.
The progress bar estimates that the task will take ~50min to train. Roughly 3-4x slower than the README suggests. I've tried this with both an NVIDIA T4 and an NVIDIA A100 GPU for processing. I confirmed that the GPU was in use during training. By comparison, if I restrict training to only the CPU, training time is estimated to be ~1hr. The GPU doesn't seem to be accelerating much.
I had made no edits to the code.
The text was updated successfully, but these errors were encountered: