Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: networkReachability=NotReachable log? #1499

Closed
dao opened this issue Jan 17, 2023 · 8 comments
Closed

bug: networkReachability=NotReachable log? #1499

dao opened this issue Jan 17, 2023 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@dao
Copy link

dao commented Jan 17, 2023

Problem

Running status-im/nwaku:0.14.0 docker image, I observe previously unseen INFO log messages:
Peer reachability status topics="wakunode" tid=1 file=waku_node.nim:225 networkReachability=NotReachable confidence=1.0

Impact

I am uncertain of the impact; the node still appears to have ~2 dozen connected peers but the log suggests that the node is not reachable

To reproduce

I'm unaware of any special steps or configurations required to reproduce. The node is running relay protocol only (no store, filter, etc) with rln disabled.

Expected behavior

If the message does not actually mean that my node is unreachable, it should not cause alarm by being logged as INFO

Screenshots/logs

see attached

nwaku version/commit hash

version / git commit hash: v0.14.0
log.txt

@dao dao added bug Something isn't working track:maintenance labels Jan 17, 2023
@alrevuelta
Copy link
Contributor

I am uncertain of the impact; the node still appears to have ~2 dozen connected peers but the log suggests that the node is not reachable

Are any of these connections inbound? A node can be NotReachable and still have connected peers (outbound peers).

Note though that this is an experimental feature that was just released both in our side and nim-libp2p, and works by asking other peers if they can connect to you.

Does your node/computer have the waku port open? Or even if its closed do you have a router supporting upnp?

If you do expect your node to be reachable, and you want to give it a shot, feel free to experiment with the autonatservice. Would suggest to enable askNewConnectedPeers and changing numPeersToAsk or maxQueueSize.

@alrevuelta
Copy link
Contributor

@dao After some discussions with nim-libp2p team, this is a known issue on nim-libp2p. This feature works by requesting other nodes to dial you, and if they succeed, they flag you as Reachable otherwise NotReachable.

Problem is that we currently limit the maximum number of connection a node can have to 1. So if you are already connected the the peer that you are requesting to dial you to test your reachability, it will fail because you will reject that connection. This should explain why you see NotReachable.

We can also see that this metric is really unstable in our fleets, jumping quite often from NotReachable to Reachable.

By now you can ignore the logs. Thanks for reporting!

@jm-clius jm-clius moved this to Todo in Waku Jan 23, 2023
@alrevuelta alrevuelta self-assigned this Jan 23, 2023
@alrevuelta
Copy link
Contributor

Work is being done in vacp2p/nim-libp2p#845 and vacp2p/nim-libp2p#846. Once fixed in nim-libp2p we will bump the version and should be fixed.

@jm-clius jm-clius moved this from Todo to In Progress in Waku Jan 30, 2023
@alrevuelta
Copy link
Contributor

Just merged the nim-libp2p version bump containing the fixes. Will monitor our fleets (where the issue is also present) over the next 24 hours and check if the issue is resolved.

@alrevuelta
Copy link
Contributor

Can verify in our wakuv2.test fleets that the nim-libp2p version bump to 67939b was deployed, containing 4ace70d53b0b0e3b58cd3bead70b967d34bd03f3 nim-libp2p version. However, I still see the issue: Reachable and NotReachable statuses are i) wrongly reported and ii) unstable.

image

This needs further investigation.

@alrevuelta
Copy link
Contributor

alrevuelta commented Feb 20, 2023

image

An update on this. Latest nim-libp2p version included a fix but still a bit unstable. NotReachable state is still reported on nodes that are Reachable from time to time. Issue seems to happen more often when multiple nodes try to dial each other testing reachability at the same time.

This new issue was detected by nim-libp2p team and a fix should be ready soon. Once fixed, this should be closed once we bump to the latest version.

See: vacp2p/nim-libp2p#865

@alrevuelta
Copy link
Contributor

alrevuelta commented Apr 5, 2023

v0.16.0 should fix this.

I can verify it works in a private network with v0.16.0. Will wait until we deploy this release to our fleets before closing.

Note thought that in a network containing a mix of old (v0.15.0) and new (>v0.16.0) nodes, new nodes will be biased by old nodes (they will report dialMe failed while that may not be the case)

@alrevuelta
Copy link
Contributor

v0.16.0 indeed fixed this, but if other nodes did not update, they may flag us as notreachable while being reachable. Since multiple nodes are queried for our state, its just matter of consensus (if more nodes report us as reachable then we will flag ourselves are reachable).

closing since this is fixed from v0.16.0

@github-project-automation github-project-automation bot moved this from In Progress to Done in Waku May 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
Archived in project
Development

No branches or pull requests

2 participants