Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi callbacks not being called - no data sent over the network #1029

Closed
dhendo opened this issue Apr 4, 2016 · 5 comments
Closed

Multi callbacks not being called - no data sent over the network #1029

dhendo opened this issue Apr 4, 2016 · 5 comments

Comments

@dhendo
Copy link

dhendo commented Apr 4, 2016

  • 2.6.0-1 (both with and without hiredis)
  • Node.js 4.2.3, 4.4, 5.10.0 (Ubuntu 14.04)
  • Offline queue set to false

I'm seeing an issue whereby the callback from a multi() never gets called - it looks similar to some of the other issues raised (that got resolved by upgrading or by separating out the multi).

I'm using https://github.com/dhendo/sifaka to cache some rendered images - among other things it does a get + hget wrapped in a multi (https://github.com/dhendo/sifaka/blob/master/backends/redis.js#L102).

This works OK for a while, but occasionally, while under heavy concurrent load (e.g. repeatedly refreshing a page which contains multiple image requests), callbacks stop being returned (mainly from the get() as described above, as it is the first call that is hit as a request comes in, so it gets no further).

Diagnosis steps:

  • Isolated the image cached images to a separate redis instance
  • Ran MONITOR - observed that no new requests were being displayed
  • Inspected using wireshark - no data was going over the network
  • Stepped through node_redis - correct calls were being made to this.stream.write(str + data) etc
  • The socket appears to be healthy - readable, writable = true
  • The socket has a growing amount of pending callbacks against it, and _bytesDispatched < bytesWritten. Calling getBuffer returns the buffer, containing all the missing requests.
  • Socket.corked = 1
  • Stepped into node's writeOrBuffer confirms that it is buffering due to corked being set.

So it looks to me like an issue where the socket is never correctly uncorked at some point (or recorked if another request comes in before the previous one has returned)?

@BridgeAR
Copy link
Contributor

BridgeAR commented Apr 4, 2016

Thanks a lot for your detailed error report!

I guess it was difficult enough to get so far and you do not have a reproducible case for me?
Cork and uncork are called sync while multi.exec() is called and therefor they should not have the chance of overlapping.
The execution order is as follows:

1. Multi is called including data
1.1. Any other code may execute here
2. Exec is called.
2.1. Exec calls cork
2.2. Exec calls all buffered multi commands and fires them
2.3. Exec calls uncork
2.4. Exec is done and new code may be run

So even if the connection dropped, it could not interfere with the synchronous execution of cork and uncork. I also checked the Node.js source and can't see any problem there.
Also running multiple Multis at the same time is not going to interfere as it all depends on the exec call.

@dhendo
Copy link
Author

dhendo commented Apr 4, 2016

Sorry, I don't think I'm going to be able to create a test-case.

It doesn't feel like it is anything to do with the connection dropping (I believe it happens the same if offline queue is on) - as I have hooked up event handlers that do fire if the connection goes away (even when it is in this locked up state).

I have a gut feeling it is probably down to a call happening before the multi, and the multi is generally the next call to be made - possibly a SET with the binary image data.

I'll keep digging and see if I can spot anything

@BridgeAR
Copy link
Contributor

BridgeAR commented Apr 4, 2016

Maybe. It's still weird that this is possible as there's no other cork / uncork command anywhere.

While being on it please also check if this happens with v.2.5.3 too. I did not change anything that I see connected to this issue, but who knows.

@BridgeAR
Copy link
Contributor

@dhendo ping

@dhendo
Copy link
Author

dhendo commented Apr 12, 2016

@BridgeAR Sorry, been busy investigating:

  • Yes, it looked to be occurring against 2.5.3 as well
  • I added logging, every instance of cork had a matching uncork, and there didn't appear to be any overlapping

I've been refactoring dhendo/sifaka@6ffc05a - specifically around checking that the node_redis instance is connected and ready.

It does look better (I'm doing a phased rollout now), so I'll report back.

I haven't spotted any issues in node_redis, and 2.6.0-1 looks good

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants