Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IPC state provider #489

Open
wants to merge 12 commits into
base: develop
Choose a base branch
from

Conversation

mralj
Copy link
Contributor

@mralj mralj commented Mar 11, 2025

📝 Summary

rbuilder can now work with any node via IPC.
This means that any node can provide the state for rbuilder, revm is still used as the EVM.
Node requirements are that it exposes the following RPC calls:

Calls not available in the ETH JSON RPC spec:

  1. rbuilder_calculateStateRoot for state root calculation
  2. rbuilder_getCodeByHash given Bytecode hash, returns Bytecode

Calls optimised for rbuilder, but have a counterpart in the ETH JSON RPC spec:

  1. rbuilder_getBlockHash gets the block hash given block number (similar to eth_getBlockByNumber, but just returns block hash)
  2. rbuilder_getAccount gets account info (similar to eth_getProof, but w/o proof stuff)

To use rbuilder with node via IPC, the config.toml must have the following config (example):

[ipc_provider]
ipc_path = "/root/execution-data/nethermind.ipc"
mempool_server_url = "ws://localhost:8546"
request_timeout_ms = 75

Implementation details

IPC

This implementation was initially intended to introduce a remote state provider. By remote, I mean that the idea was that state could be provided via HTTP/WS/IPC. Unfortunately, due to implementation issues/constraints, I've decided only to implement state provisioning via IPC.
I don't think this has any practical downside, especially since the state provider must be fast. There is a non-trivial number of calls to read state (~300/s), meaning it would be unrealistic to use this over the network and have near disk read latency.

Code-wise, issues above and constraints stem mainly from the fact that the traits to read state are sync. Initially, I relied on tokio and alloy.rs to fetch this remote state, but this implementation had many issues.
Firstly, each call to fetch any data (e.g. fetching account or Bytecode) had to be wrapped in a function call like this:

    /// Runs fututre in sync context
    // StateProvider(Factory) traits require sync context, but calls to remote provider are async
    // What's more, rbuilder is executed in async context, so we have a situation
    // async -> sync -> async
    // This helper function allows execution in such environment
    fn run<F, R>(&self, f: F) -> R
    where
        F: Future<Output = R>,
    {
        tokio::task::block_in_place(|| self.runtime_handle.block_on(f))
    }

This adds additional overhead on Tokio runtime and doesn't play well with some parts of the codebase, specifically mutex locking. Not to go too deep into explaining issues around this in the PR description, but we would end up in scenarios where the whole Tokio runtime I/O would be blocked or we would dead-lock parking-lot mutexes (in some scenarios).

The solutions (monitoring thread + async mutex) seemed hacky and suboptimal.
This is why, in the end, I reached for sync (but concurrent) IPC solution, which I implemented from scratch here. It's called REIPC (coming from request/response IPC).

This solution was tested using Nethermind node, and while Nethermind will do some improvements on IPC to reduce latency and increase throughput, here are the initial request&response latencies:

CleanShot 2025-03-01 at 11 44 11@2x

Dashmap & QuickCache in IPC provider

We need caches because otherwise the number of IPC calls would be pretty high (in thousands per sec).
I also reached for concurrent caches because both StateProviderFactory and StateProvider need to be Send + Sync.

QuickCache is used so that I don't have to implement concurrent-cache invalidation by hand :)
Here is some info on QuickCache vs Moka (other popular caching crate), TL;DR; for our simple case QuickCache seems a better fit.

On enum StateProviderFactories

The reason cli.rs passes StateProviderFactories enum to config.new_builder is because I wanted the state provider for rbuilder to be chosen via config at the runtime.
This is why, AFAIK, static dispatch is not an option. So I was left with the choice of refactoring to dynamic dispatch (akin to StateProviderBox) OR the enum solution.
I chose the enum solution for following reasons:
1. It seemed to me that code diff would be smaller (smaller change to implement)
2. AFIAK it's faster. Enum matching will compile to JMP vs CALL (in case of dynamic dispatch), which compiler will be able to optimize better (especially in this scenario), and given branch predicitoning I guess that it'll be almost free (+ no need for vtable/pointer loading).]

UPDATE: this was properly handled in this commit thanks to suggestions from @ZanCorDX

On MempoolSource enum

The reason I chose to use WebSockets for streaming transactions when rbuilder uses IPC state provider is because, currently, REIPC doesn't support ETH subscriptions/streams.

✅ I have completed the following steps:

  • Run make lint
  • Run make test
  • Added tests (if applicable)

@mralj mralj force-pushed the mralj/ipc-state-provider-reipc branch from 2624095 to a9c6535 Compare March 11, 2025 18:06
/// Gets block header given block hash
fn header_by_number(&self, num: u64) -> ProviderResult<Option<Header>> {
let block = rpc_call::<
_,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in favor of using concrete type here. This will help in case we will do any refactoring and accidentally change parameters. This way we would be able to find problem instantly.

@dvush
Copy link
Contributor

dvush commented Mar 12, 2025

Great! Thanks, I'll review it this week. (cc @ZanCorDX don't merge before pls)

@mralj

This comment was marked as resolved.

@mralj mralj force-pushed the mralj/ipc-state-provider-reipc branch from 1b2b61b to 2dc4f41 Compare March 14, 2025 11:01
@mralj mralj mentioned this pull request Mar 15, 2025
3 tasks
ZanCorDX pushed a commit that referenced this pull request Mar 17, 2025
## 📝 Summary
When testing #489, I noticed that when processing new orders, if one of
them fails, we stop further processing.
This PR addresses this issue.

## 💡 Motivation and Context

The effect of the above-explained behaviour, in the worst case scenario,
is that the whole slot can _fail_, meaning if the processing of the new
order fails for the very first order, the whole slot ends up being
empty, and in non-worse scenario we loose % of transactions, depending
on when the processing was stopped.

### When processing new order fails?
The processing of the new orders will fail only if the _StateProvider_
was unable to _provide_ the Nonce, i.e. the _StateProvider_ errors.
This is edge-case, but this can happen. 

I don't see any downsides with this approach, so this is why I opened PR
instead of an issue. (I'm not saying hat there aren't any, just that _I_
don't see them 🙂)

### Process new simulations
Similar argument can be made for the processing new simulations,
specifically for the following:

```Rust
    pub fn submit_simulation_tasks_results(
        &mut self,
        results: Vec<SimulatedResult>,
    ) -> Result<(), ProviderError> {
        for result in results {
            // NOTE: we can refactor the following line to
            // let _ = self.process_simulation_task_result(result);
            self.process_simulation_task_result(result)?;
        }
        Ok(())
    }
```
The code above only errors in case of the _nonce issues_, and in that
scenario we are stopping processing of all simulation tasks.
I haven't pushed this change because I wanted so se how you feel about
the this, especially since simulations _seem more serious_ than orders.

---

## ✅ I have completed the following steps:

* [x] Run `make lint`
* [x] Run `make test`
* [ ] Added tests (if applicable)
@mralj mralj force-pushed the mralj/ipc-state-provider-reipc branch from 2dc4f41 to 8fece37 Compare March 18, 2025 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants