[DP-110] NC blocks when remote nodes are unavailable #204

qnikst · 2015-06-17T18:39:14Z

[Imported from JIRA. Reported by Facundo Dominguez @facundominguez) as DP-110 on 2015-04-16 17:39:09]
The node controller sends messages to remote nodes sometimes. When the remote node is unreachable, the NC may block for a while.

To fix ideas, let's assume we are using network-transport-tcp.

The NC uses sendBinary to send messages to other nodes. When a node is unreachable and there is no connection, establishing a new connection needs to time out for sendBinary to return the control back to the NC. If there is a connection, sending a messages through it may still block if the send buffer is full.

One tentative fix could be to have the NC spawn an auxiliary thread to call sendBinary. However, when sendBinary blocks, this can cause multiple auxiliary threads to accumulate trying to communicate with the unreachable node, and this can have some impact in performance depending on the amount of accumulated threads.

Another solution is to have a message queue with a dedicated thread per remote NodeId. When the NC needs to send a message to a node, the message is placed in the corresponding queue. A bit of cleverness can make the collection of queues dynamic, so queues and threads are created on demand and disposed of when empty.

At the transport level we could ask send and connect to be asynchronous at least for unreliable connections, and have the NC use unreliable connections to send messages.

The text was updated successfully, but these errors were encountered:

facundominguez · 2015-07-16T11:51:33Z

Pull request #199 implements message queues per remote NodeId in d-p.

Pull request #249 addresses blocking when closing connections.

We have to consider redesigning the network transport interface to make operations asynchronous. That is, making it much more like CCI.

qnikst · 2015-07-16T11:57:46Z

In CCI they use to a. use process buffers for messages in order to avoid uncontrollable memory usage by network library, also asynchronous operations returns Handle that client could use in order to wait end of the operation. Should we consider similar approach?

facundominguez · 2016-03-10T14:07:45Z

A program showing the issue:
#199 (comment)

hyperthunk · 2016-03-10T15:53:47Z

Aren't we still focused on fixing this at the network transport level? I don't think unbounded buffers are sensible, but wait handles and a local thread pool or dedicated thread for dealing with completed operations could make sense.

Didn't @mboes argue strongly that we shouldn't be papering over this in the node controller though, and instead focus on making network transport operations asynchronous? Might introducing handles go some way towards that?

qnikst added type:bug Bug and removed type:bug labels Jun 17, 2015

qnikst mentioned this issue Jun 18, 2015

Dp 110 fd #199

Closed

qnikst changed the title ~~NC blocks when remote nodes are unavailable~~ [DP-110] NC blocks when remote nodes are unavailable Jun 18, 2015

qnikst added the In Progress label Jun 19, 2015

qnikst added this to the distributed-process-0.6 milestone Jun 19, 2015

qnikst modified the milestones: distributed-process-0.6, distributed-process-0.7 Feb 24, 2016

hyperthunk mentioned this issue Sep 3, 2024

Make n-t-tcp fully asynchronous... Maybe.. #428

Open

LaurentRDC added this to Release 0.8 Sep 2, 2024

LaurentRDC moved this to Backlog in Release 0.8 Sep 2, 2024

facundominguez mentioned this issue Jan 29, 2016

Carefully document semantics. #407

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[DP-110] NC blocks when remote nodes are unavailable #204

[DP-110] NC blocks when remote nodes are unavailable #204

qnikst commented Jun 17, 2015

facundominguez commented Jul 16, 2015

qnikst commented Jul 16, 2015

facundominguez commented Mar 10, 2016

hyperthunk commented Mar 10, 2016

[DP-110] NC blocks when remote nodes are unavailable #204

[DP-110] NC blocks when remote nodes are unavailable #204

Comments

qnikst commented Jun 17, 2015

facundominguez commented Jul 16, 2015

qnikst commented Jul 16, 2015

facundominguez commented Mar 10, 2016

hyperthunk commented Mar 10, 2016