-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[DP-108] Investigate why spawn is slow #206
Comments
@qnikst - did you ever get anywhere with this? I see similar 50-100ms delays for simple calls like |
Could this be related to the issue with network when is compiled with -threaded? Do you both see this for spawnlocal too? |
@teh unfortunately we didn't move much in this direction. We had a discussion with @nh2 about the problem and solution. If I recall correctly large timing was due to TCP Nagle's algorithm, one solution is to set TCP_NO_DELAY option in network transport TCP (it's possible today), another, that was discussed was extend network-transport API by the If you have minimal example we could reiterate on this issue and try to investigate that further. |
Created https://github.com/teh/dh-minimal-slow-testcase - see README for details |
I think I narrowed this down to 1-byte writes and reads. Looking at strace:
And the actual trace looks like this:
|
This strace looking like dumping logs to |
@qnikst do you mean |
Also, tracing might still be a bit broken for some cases, as per #265. That is high on my stack to fix, and I'm hoping to get to it once I've got CI fixed for the repos I'm maintainer for. |
Ah that was dumb, apologies. Looks like unbuffered 1-char-at-a-time logging which is not ideal but almost certainly a separate issue. Without logging to CONSOLE (
Another thing I noticed is that I see an unusually high number of
BTW, can you reproduce the slowness in my repo above? If you can't reproduce then I will focus on my system (build flags etc), if you can reproduce I'd start looking into h-d library code. |
Here's an illustration of the issue as a Specifically the client sends data at But this data only arrives at the server at
|
Summary for the next person hitting this issue: The problem is The cause is a bad interaction of small initial window size and delayed TCP acks: The sending side waits for an ack to send more data but the receiving side is doing a delayed TCP ack. This is a good explanation of the issue. |
Yeah as I said in #206 (comment), we have solved similar issue for us by using TCP_NODELAY (and exposing that option in n-t-tcp API). But using TCP_NODELAY is having a latency by dropping throughput, and quite possibly network throughput if your nodes are not too close to each other. So if you have enough streaming data in your communication to make TCP flow, this option is undesirable and even can hit performance (for this reason it's not a default). This why solution with flush was discussed. |
@qnikst can you point me to the discussion about |
Here is a summary #206 (comment) discussion itself had happened few years ago during ZuriHac in person. |
@qnikst I saw that yeah, question is what are the semantics for deciding when to call |
Also, does this issue persist with the example repository/project you wrote, @teh, if you use a different network-transport library? For example the network-transport-cci library, or network-transport-zeromq? Those libraries may be somewhat better behaved since the stacks they're running on top of are doing a lot more work to manage communication between endpoints... |
@hyperthunk for the flush the idea was to introduce a new method for connection and allow a user to I'm not sure that I see good places for implicit flush in the project. |
[Imported from JIRA. Reported by Niklas Hambuechen @[email protected]) as DP-108 on 2015-03-15 02:48:31]
Copying my posts from the #haskell-distributed IRC channel:
spawn
seems to be very slow for me, even though I'm on localhost. Doing it in a loop gets me to almost 50 ms per spawn, why would it be so high? I can't usespawnAsync
in my case, but why would aspawn
on localhost take this long in the first place? My ethernet latency is 0.5ms and localhost latency is 0.1ms, so that can't be it. CPU is low too.I have a suspicion: using
strace -f -c -w
on the node onto which Ispawn
the processes (a slave using simplelocalnet), I see 179596 calls to theselect
syscall. That doesn't seem right given that I only do 100 spawns and nothing else. Might this be that the master is sending a lot of small numbers, which itrecv
s one after the other? I think this is the only way to trigger so manyselect
s, and I've seen thatrecvInt32
does exactly such a thing (recv'ing 4 bytes at a time), and it does appear in my profiling output.Further, the 50ms that each spawn takes are suspiciously close to the 40ms TCP ACK delay on Linux (I'm on Linux), as mentioned here: http://stackoverflow.com/a/2253620/263061.
I have found something different though that fixes the problem: setting +RTS -V0 on the slave reduces the time for each
spawn
to 3ms. How can it be that this has such a huge effect?I can get the same good results with +RTS -C0.001. But why? This sets the context switch interval; if that has such a positive effect, doesn't that mean that there are other Haskell threads around that actually run and thus stop my recv/recv from immediately being scheduled again? Assume there's only one recv that I'm running; when it gets a context switch interrupt, interrupting the recv, it should see that there are no other Haskell threads to be run, and immediately go back into my recv again, I can't see a reason why it should do anything else that's not my recv ...
Also, setting +RTS -C to something very high does not make it slower than 50ms per spawn, e.g. setting +RTS -C1 does not make it take 1 second per spawn, it's still 50ms.
Setting +RTS -N2/-N3/-N4 helps, too: I get down to 6 ms, compared to the 50 ms for -N1.
nh2: may it be that there are actually 2 recvs going on, but only one can be active at the same time if I'm running on -N1, so the system toggles between them at the interval of the context switch interval -C, which defaults to 20ms, and two of these switches make the ~50ms that I'm seeing?
The text was updated successfully, but these errors were encountered: