-
Notifications
You must be signed in to change notification settings - Fork 689
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Tracking] 1M TPS benchmarks #13130
Comments
Results for native token transfers: intra-shard traffic, single nodeHardware: GCP n2d-standard-16
This's the best I could do, any higher TPS breaks the chain because of 'repeated chunk misses' which lead to 'extremely high block time'. PR to reproduce the benchmark, including all needed setup and configuration: #12918 |
Results for native token transfers: intra-shard traffic, multi-node localnetHardware: GCP n2d-standard-16
Performance is worse than a single node because of the IO bottleneck caused by the presence of multiple rocks DB on the same disk. This setup is too different from how a real node operates, and it doesn't help to reproduce any real production issue linked to performance. For these reasons, I won't go further in investigating localnet TPS. |
Results for native token transfers: intra-shard traffic, forknet (5 CP / 5 Shards)Hardware: GCP n2d-standard-16 Draft PR with the scripts I used to run the benchmark. OutcomeI was able to sustain 4k TPS for a long time, with no issue in the chain. The peak was about 4.4k TPS. In this scenario, the first bottleneck I found is the RPC node used to send transactions.
ObservationsThe RPC load follow the pattern highlighted in this extract from the logs:
At the start, there's no load. Once I start to send transactions, CPU and network traffic increase a lot. The node is not able to keep up and goes into block catchup. While transactions are sent, the max Pgas/s is about 3. When user transactions stop, the node has more room to process blocks and max Pgas/s reaches 5.
ConclusionsIt might be possible to squeeze some more TPS by applying client optimizations from CTR (estimate 20%-40%) and by having multiple RPC nodes to spread the transaction load (estimate 30%-50%). Even with this, we'll hit a new bottleneck at 8-10k TPS, which is almost entirely independent from the number of shards. I think we must take steps to scale horizontally submitting transactions, and later also reading their results. Ideas:
|
Results for native token transfers: intra-shard traffic, forknet, no RPC (5 CP / 5 Shards)Hardware: GCP n2d-standard-16 and n2d-standard-8 OutcomeTPS with n2d-standard-16 machines: 12k Grafana link to one benchmark run. Observations
|
Results for native token transfers: intra-shard traffic, forknet, no RPC (10 CP / 10 Shards)Hardware: GCP n2d-standard-8 Config
Outcome
Grafana link to one benchmark run. Observation
|
Tracking issues for all tasks related to benchmarks, for the 1 million TPS initiative.
Benchmarks
State generation
Create genesis, adjust nodes configuration, and build a suitable initial database state.
synth-bm
from CRTTraffic generation
Generate transactions to stress the network.
TODO
Benchmark setup
Benchmark runs
Issues found
High priority
Medium priority
1k accounts -> 4k TPS
100k accounts -> 2.7k TPS
[ ] The chain breaks at relatively low TPS due to exponentially growing block times, caused by chunk misses because of lack of endorsements.This is a side effect of the client actor bottleneck: endorsements are not processed in time. Does not happen if proper gas limits are in place.Low priority
The text was updated successfully, but these errors were encountered: