Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DP-110] NC blocks when remote nodes are unavailable #204

Open
qnikst opened this issue Jun 17, 2015 · 4 comments
Open

[DP-110] NC blocks when remote nodes are unavailable #204

qnikst opened this issue Jun 17, 2015 · 4 comments

Comments

@qnikst
Copy link
Contributor

qnikst commented Jun 17, 2015

[Imported from JIRA. Reported by Facundo Dominguez @facundominguez) as DP-110 on 2015-04-16 17:39:09]
The node controller sends messages to remote nodes sometimes. When the remote node is unreachable, the NC may block for a while.

To fix ideas, let's assume we are using network-transport-tcp.

The NC uses sendBinary to send messages to other nodes. When a node is unreachable and there is no connection, establishing a new connection needs to time out for sendBinary to return the control back to the NC. If there is a connection, sending a messages through it may still block if the send buffer is full.

One tentative fix could be to have the NC spawn an auxiliary thread to call sendBinary. However, when sendBinary blocks, this can cause multiple auxiliary threads to accumulate trying to communicate with the unreachable node, and this can have some impact in performance depending on the amount of accumulated threads.

Another solution is to have a message queue with a dedicated thread per remote NodeId. When the NC needs to send a message to a node, the message is placed in the corresponding queue. A bit of cleverness can make the collection of queues dynamic, so queues and threads are created on demand and disposed of when empty.

At the transport level we could ask send and connect to be asynchronous at least for unreliable connections, and have the NC use unreliable connections to send messages.

@qnikst qnikst mentioned this issue Jun 18, 2015
@qnikst qnikst changed the title NC blocks when remote nodes are unavailable [DP-110] NC blocks when remote nodes are unavailable Jun 18, 2015
@qnikst qnikst added this to the distributed-process-0.6 milestone Jun 19, 2015
@facundominguez
Copy link
Contributor

Pull request #199 implements message queues per remote NodeId in d-p.

Pull request #249 addresses blocking when closing connections.

We have to consider redesigning the network transport interface to make operations asynchronous. That is, making it much more like CCI.

@qnikst
Copy link
Contributor Author

qnikst commented Jul 16, 2015

In CCI they use to a. use process buffers for messages in order to avoid uncontrollable memory usage by network library, also asynchronous operations returns Handle that client could use in order to wait end of the operation. Should we consider similar approach?

@facundominguez
Copy link
Contributor

A program showing the issue:
#199 (comment)

@hyperthunk
Copy link
Member

Aren't we still focused on fixing this at the network transport level? I don't think unbounded buffers are sensible, but wait handles and a local thread pool or dedicated thread for dealing with completed operations could make sense.

Didn't @mboes argue strongly that we shouldn't be papering over this in the node controller though, and instead focus on making network transport operations asynchronous? Might introducing handles go some way towards that?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants