Skip to content

Cosmovisor hangs node when piping API response. #9875

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
4 tasks
mrlp4 opened this issue Aug 7, 2021 · 13 comments
Closed
4 tasks

Cosmovisor hangs node when piping API response. #9875

mrlp4 opened this issue Aug 7, 2021 · 13 comments
Assignees
Labels
C:Cosmovisor Issues and PR related to Cosmovisor

Comments

@mrlp4
Copy link

mrlp4 commented Aug 7, 2021

Summary of Bug

When gaiad launched through cosmovisor, with API(on port 1317) server up, tx query on path=/cosmos.tx.v1beta1.Service/GetTx
of transaction with more than 64kb of total size hangs up the node, without any error codes or whatsoever.
Looks like the scanning stops unrecoverably at EOF with error bufio.ErrTooLong, the first I/O error, or a token too large to fit in the buffer

Version

commit 40bb2f4

Steps to Reproduce

Run gaiad node through cosmovisor, make tx with more ~100 messages, and try to query it using the API server path /cosmos.tx.v1beta1.Service/GetTx/<tx_hash>
If API server response is more than 64kb in size - node will hang.


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned
@jhernandezb
Copy link

jhernandezb commented Aug 8, 2021

We got hit by this issue with cosmwasm because full contract bytecode being printed by info level logs.

There is a undocumented setting for this DAEMON_LOG_BUFFER_SIZE=512 where you can increase the buffer.

@mrlp4
Copy link
Author

mrlp4 commented Aug 9, 2021

We got hit by this issue with cosmwasm because full contract bytecode being printed by info level logs.

There is a undocumented setting for this DAEMON_LOG_BUFFER_SIZE=512 where you can increase the buffer.

Hmm, that really worked.
Thanks @jhernandezb for sharing that info.

@mrlp4 mrlp4 closed this as completed Aug 9, 2021
@timlind
Copy link
Contributor

timlind commented Aug 10, 2021

I think this should be re-opened as it shouldn't be possible to misconfigure in a way that causes a crash. Also this environment variable only exists in v43 at the moment.

We started seeing this issue when upgrading to v0.42.4, only when the log level is set to info (which logs tx / query payloads).

@alexanderbez
Copy link
Contributor

@jhernandezb should we re-open and perhaps do the following:

  1. Make this ENV var available on 0.42.x
  2. Increase the default value

@jhernandezb
Copy link

I agree on both, specially for cosmwasm chains it can halt validator nodes just by someone deploying contracts and not an easy bug to detect, until cosmovisor switches away from reading logs there should be better default values.

@mdyring
Copy link

mdyring commented Aug 24, 2021

Just bumped into this, triggered by p2p messages from Tendermint: osmosis-labs/osmosis#431.

Assuming the majority of validators are using Cosmovisor these days, it seems like chain halts might be induced this way if a TX can be crafted to produce > 64k of log output.

@alexanderbez
Copy link
Contributor

@mdyring do you know the offending log line? Where is the log coming from? The mempool?

@mdyring
Copy link

mdyring commented Aug 25, 2021

@mdyring do you know the offending log line? Where is the log coming from? The mempool?

Yeah more details in osmosis-labs/osmosis#431. 👍

@mrlp4
Copy link
Author

mrlp4 commented Aug 25, 2021

Just bumped into this, triggered by p2p messages from Tendermint: osmosis-labs/osmosis#431.

Assuming the majority of validators are using Cosmovisor these days, it seems like chain halts might be induced this way if a TX can be crafted to produce > 64k of log output.

I believe cosmovisor is not the major software, when the daemon we run through it - is number 1, so helper tool must be adjusted for the purpose, not vise versa.

@alexanderbez
Copy link
Contributor

@robert-zaremba
Copy link
Contributor

robert-zaremba commented Sep 16, 2021

I believe we solved this issue in master with the cosmovisor file approach.

  • v0.1 is scanning logs
  • master is observing file change.

Could you install the latest cosmovisor (from current master) and check if this issue is still a case?

@yaruwangway
Copy link
Contributor

Hi it should be working now with the new cosmovisor in the cosmos-sdk master branch. If the new cosmovisor also works for you, I will close this issue. @mrlp4

@mrlp4
Copy link
Author

mrlp4 commented Sep 17, 2021

Hi it should be working now with the new cosmovisor in the cosmos-sdk master branch. If the new cosmovisor also works for you, I will close this issue. @mrlp4

Hi! Just checked and everything seems to be fine now with latest master branch.
Good to close!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:Cosmovisor Issues and PR related to Cosmovisor
Projects
None yet
Development

No branches or pull requests

8 participants