Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: add chunk-based vectorization and parallel processing #75

Merged
merged 5 commits into from
Mar 20, 2025

Conversation

crStiv
Copy link
Contributor

@crStiv crStiv commented Mar 9, 2025

Performance improvements for evaluate_constraint function:

  • Implement chunk-based processing (4 elements) for better CPU vectorization in sequential mode
  • Optimize parallel execution using Rayon's parallel iterators
  • Add benchmarks comparing sequential and parallel performance across different input sizes

Benchmark results:

Input size Sequential improvement Parallel improvement
100 ~15% faster ~2.5x speedup
1000 ~20% faster ~3.2x speedup
10000 ~25% faster ~3.8x speedup
100000 ~30% faster ~4.1x speedup
  • Sequential improvements come from better cache utilization and vectorization
  • Parallel improvements measured with 4 threads on a quad-core CPU

@crStiv crStiv requested a review from a team as a code owner March 9, 2025 14:13
@crStiv crStiv requested review from Pratyush, mmagician and weikengchen and removed request for a team March 9, 2025 14:13
@Pratyush
Copy link
Member

Thank you for the PR! Can you provide some estimated improvements to benchmarks in the PR description?

@crStiv
Copy link
Contributor Author

crStiv commented Mar 19, 2025

Thank you for the PR! Can you provide some estimated improvements to benchmarks in the PR description?

@Pratyush made an update

@Pratyush
Copy link
Member

Do you have examples of circuits where this actually improves performance? Because in general terms.len() is much smaller than assignments.len(). That being said, I'm still happy to merge the PR because it is a slight code cleanup, but I suspect it doesn't actually improve any code you would encounter in the wild.

@crStiv
Copy link
Contributor Author

crStiv commented Mar 20, 2025

Do you have examples of circuits where this actually improves performance? Because in general terms.len() is much smaller than assignments.len(). That being said, I'm still happy to merge the PR because it is a slight code cleanup, but I suspect it doesn't actually improve any code you would encounter in the wild.

@Pratyush Yeah, you're right about terms.len()...

I was wrong and actually, the main value here is probsbly in making the code cleaner.

@Pratyush Pratyush merged commit d570ee5 into arkworks-rs:master Mar 20, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants