-
Notifications
You must be signed in to change notification settings - Fork 643
Node halting with ERR failed to process message err="error part set unexpected index" #431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hrmm, this is an issue at the p2p layer, let me ask TM team as well |
Looks like the block had more parts than it was expecting? or an index number on a part got messed up and set out of the actual range? |
Any chance there is more to that log line that logged the part, that may have gotten cut off with github line length limits or smth? |
Is there anything exotic about the setup? I think that @helder-moreira may have mentioned something similar, and I think he is running it in Kube. |
Interesting. Definitely seems like the log is cut off or for some reason. Could be that it's actually too long for the logger? Looks like that line is almost 2**16 characters which strikes me as a possible limit for the logger? So I wonder if the halt could just be an artifact of trying to log something too big? Not sure if we've seen something like that before. That aside it does seem like the node had already received a complete proposal block and then receives a bad block part for the same height, which I don't think is supposed to happen unless there's a misbehaving peer (though perhaps there's some rare concurrency bug that could cause it). |
@mdyring were you using cosmovisor? apparently there's a bug where it halts on log lines of 2^16 bytes or more which might be exactly what happened here ... |
@faddat Nope, nothing exotic, running on AWS r5b instance. @ebuchman Yep, using Cosmovisor so that definitely sounds like it could be what is happening here. 👍 Related issue: cosmos/cosmos-sdk#9875 |
On release v3.1.0. We've experienced a halted validator node a couple of times on osmosis-1.
Process just hangs and does not terminate:
Not sure if this a race condition somewhere under high load, it looks like the indexing is lagging behind:
The text was updated successfully, but these errors were encountered: