-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DP-110] NC blocks when remote nodes are unavailable #204
Comments
In |
A program showing the issue: |
Aren't we still focused on fixing this at the network transport level? I don't think unbounded buffers are sensible, but wait handles and a local thread pool or dedicated thread for dealing with completed operations could make sense. Didn't @mboes argue strongly that we shouldn't be papering over this in the node controller though, and instead focus on making network transport operations asynchronous? Might introducing handles go some way towards that? |
[Imported from JIRA. Reported by Facundo Dominguez @facundominguez) as DP-110 on 2015-04-16 17:39:09]
The node controller sends messages to remote nodes sometimes. When the remote node is unreachable, the NC may block for a while.
To fix ideas, let's assume we are using network-transport-tcp.
The NC uses
sendBinary
to send messages to other nodes. When a node is unreachable and there is no connection, establishing a new connection needs to time out forsendBinary
to return the control back to the NC. If there is a connection, sending a messages through it may still block if the send buffer is full.One tentative fix could be to have the NC spawn an auxiliary thread to call
sendBinary
. However, whensendBinary
blocks, this can cause multiple auxiliary threads to accumulate trying to communicate with the unreachable node, and this can have some impact in performance depending on the amount of accumulated threads.Another solution is to have a message queue with a dedicated thread per remote
NodeId
. When the NC needs to send a message to a node, the message is placed in the corresponding queue. A bit of cleverness can make the collection of queues dynamic, so queues and threads are created on demand and disposed of when empty.At the transport level we could ask
send
andconnect
to be asynchronous at least for unreliable connections, and have the NC use unreliable connections to send messages.The text was updated successfully, but these errors were encountered: