Skip to content
This repository was archived by the owner on Oct 25, 2024. It is now read-only.

Possible Memory Leak #13

Closed
cjios opened this issue Dec 1, 2022 · 8 comments · Fixed by #35
Closed

Possible Memory Leak #13

cjios opened this issue Dec 1, 2022 · 8 comments · Fixed by #35

Comments

@cjios
Copy link

cjios commented Dec 1, 2022

System information

Geth version: 1.10.23-stable
Go version: go1.19.3
OS & Version: Linux
Commit hash : 9ca4d16

env.state = e.state

This line is causing a memory leak for me. My system has 128gb of memory, but after running the builder for around 10-12 hours the node consumes all my available memory and crashes. Commenting out this line results in the builder only using 2-3GB of memory. The following image is the pprof heap output after running the node for around 30 minutes. As you can see the trie being copied is consuming most of the heap space. I'm not sure why the garbage collector isn't clearing the space properly?

Screen Shot 2022-12-01 at 2 24 48 PM

@Ruteri
Copy link
Collaborator

Ruteri commented Dec 2, 2022

This is intriguing as we are running the same code for weeks without OOMs, and the profiles look clean.
Can you paste the command line arguments?
This looks like it could be an issue with not stopping the state's prefetcher.

@cjios
Copy link
Author

cjios commented Dec 2, 2022

This is how I launch the instance ./build/bin/geth --http --http.api eth,net,engine,admin --builder --builder.remote_relay_endpoint https://boost-relay.flashbots.net --builder.secret_key {secret_key}

@Ruteri
Copy link
Collaborator

Ruteri commented Dec 6, 2022

From the screenshot, it seems those are only using 5GBs or so, with a total of 7GBs consumed. Am I misreading it?

@cjios
Copy link
Author

cjios commented Dec 8, 2022

yes, but this is only after around 30 minutes. The memory consumptions continues to scale linearly, so after 2 or 3 hours around 30gb of memory get's consumed.

@Ruteri
Copy link
Collaborator

Ruteri commented Dec 8, 2022

Alright. I think we should take a look at how the db prefetcher is handled and whether it's properly closed on all branches then.
Can you compare with running --miner.algotype greedy?

@cjios
Copy link
Author

cjios commented Dec 15, 2022

Screen Shot 2022-12-14 at 4 16 54 PM

This is after running the greedy builder for ~ 2 days

@cjios
Copy link
Author

cjios commented Dec 19, 2022

I have replicated this same issue on another machine. The greedy algotype won't trigger this issue because it doesn't have this line: builder/miner/algo_common.go. Likewise, if you don't have the BUILDER_TX_SIGNING_KEY environment variable the line isn't reached and there is no error. Perhaps thats why other people running this software are not having this issue?

@dvush
Copy link
Contributor

dvush commented Dec 23, 2022

@cjios thanks for raising this issue

I've looked into it. Code handling prefetcher is problematic but we don't see any errors because its never used if:

  1. you have BUILDER_TX_SIGNING_KEY set
  2. it happens so you commit the bundle first in the block

It works like that because the prefetcher is actually started after the first tx is committed. If the prefetcher was copied it never started and in both of these situations, we copy the state before committing the first tx so it's effectively useless.

One solution would be to simply remove the prefetcher startup explicitly and forget about it since we don't use it.

But I think that instead, we should fix it so it's never leaked and actually used properly. We don't need it for tx / bundles commit because that part of the code uses snapshots anyway but we need it for block finalization where we calculate the state hash of the block. It takes a considerable amount of time and it hits the disk more than needed. Block finalization is around 30% of the block creation and it can benefit from proper prefetcher.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants