Skip to content

http2: fix graceful session close #57808

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

pandeykushagra51
Copy link

@pandeykushagra51 pandeykushagra51 commented Apr 9, 2025

Fix issue where session.close() prematurely destroys the session when response.end() was called with an empty payload while active http2 streams still existed. This change ensures that sessions are closed gracefully only after all http2 streams complete and clients properly receive the GOAWAY frame as per the HTTP/2 spec.

Refs: https://nodejs.org/api/http2.html\#http2sessionclosecallback
Fixes: #57809

Update:

Please refer detailed explanation below to know the intention behind updating test/parallel/test-http2-client-rststream-before-connect.js test.
#57808 (comment)

Fix issue where session.close() prematurely destroys the session
when response.end() was called with an empty payload while active
http2 streams still existed. This change ensures that sessions are
closed gracefully only after all http2 streams complete and clients
properly receive the GOAWAY frame as per the HTTP/2 spec.

Refs: https://nodejs.org/api/http2.html\#http2sessionclosecallback
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/http2
  • @nodejs/net

@nodejs-github-bot nodejs-github-bot added c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Apr 9, 2025
@pandeykushagra51
Copy link
Author

pandeykushagra51 commented Apr 9, 2025

Problem:

When a client calls response.end() with an empty payload, Node.js sends header only frames to nghttp2 as expected, and the JavaScript layer assumes these frames will be successfully transmitted post that. If session.close() is called immediately afterward, the Http2Session class doesn't recognize any active streams (as it only tracks data frames and push promises as active streams), and begins terminating the underlying socket. However, when the C++ layer subsequently attempts to transmit the queued header frames, it fails because the socket has already been closed, resulting in transmission errors and clients never receiving essential GOAWAY frames and response header.

Solution:

The solution ensures proper graceful session closure by delaying socket termination until all data in the nghttp2 queue/buffer is fully processed. This is implemented by:
Adding checks for nghttp2_session_want_write and nghttp2_session_want_read before closing the underlying socket
If nghttp2 has pending data to transmit, postponing the socket closure
Implementing a callback mechanism to notify the JavaScript layer when all writes have completed
Only proceeding with socket closure when all transmission conditions are satisfied
This approach guarantees that all frames, including header-only responses and GOAWAY frames, are properly delivered to the client before connection termination, ensuring robust HTTP/2 protocol compliance.

@pandeykushagra51
Copy link
Author

I will be very happy to update the current solution in case of any scope of improvement and would love to take input to optimise this further.

@pandeykushagra51
Copy link
Author

pandeykushagra51 commented Apr 9, 2025

Optimization Proposal for the current fix

Current Issue:

The JavaScript layer is being called after each write operation to underlying socket.
This creates overhead with repetitive calls to onstreamaftrwrite_string()

Proposed Solution

Flag-Based Approach:
Set a flag in the C++ layer when JavaScript initiates a graceful closure
C++ layer checks this flag during write operations.

Responsibility Shift:

When the flag is set, make the C++ layer responsible for notifying the JavaScript layer
Notification happens either:
After nghttp2 completes all read and write operations (i.e hasPendingData return false)

Benefits:

Reduces repetitive calls to onstreamaftrwrite_string()
Improves performance by minimizing call between js and cpp layer.

Trade-off:

Implementation will be more complex
Solution will be longer but more efficient

Comment on lines -3303 to +3306
session[kMaybeDestroy](err);
closeSession(session, NGHTTP2_NO_ERROR, err);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is done as improvement. This is because when underlying socket is closed there is no need for looking at graceful closure of session by calling [kMaybeDestroy] instead we can immediately call closeSession which will handle all cleanup operation.

@pandeykushagra51
Copy link
Author

One of the simplest approach is to delay socket destruction to next iteration of event loop i.e call inside setImmediate but I don't think this will address underlying problem

Copy link

codecov bot commented Apr 10, 2025

Codecov Report

Attention: Patch coverage is 94.11765% with 3 lines in your changes missing coverage. Please review.

Project coverage is 90.23%. Comparing base (67786c1) to head (d54f12c).
Report is 49 commits behind head on main.

Files with missing lines Patch % Lines
src/node_http2.cc 91.89% 0 Missing and 3 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #57808      +/-   ##
==========================================
+ Coverage   90.21%   90.23%   +0.02%     
==========================================
  Files         630      630              
  Lines      185524   185737     +213     
  Branches    36387    36414      +27     
==========================================
+ Hits       167378   167609     +231     
+ Misses      11037    11009      -28     
- Partials     7109     7119      +10     
Files with missing lines Coverage Δ
lib/internal/http2/core.js 95.56% <100.00%> (+<0.01%) ⬆️
src/node_http2.h 91.66% <100.00%> (+0.25%) ⬆️
src/node_http2.cc 83.76% <91.89%> (+0.33%) ⬆️

... and 52 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mcollina
Copy link
Member

Some tests/linting are failing.

I think the proposed optimization would be useful.

@pandeykushagra51
Copy link
Author

pandeykushagra51 commented Apr 10, 2025

Some tests/linting are failing.

I think the proposed optimization would be useful.

Fine, then I will be implementing more optimised solution as stated above.
Also regarding failing test case, I am running the test on MacOS with below system

Darwin Mac.lan 24.3.0 Darwin Kernel Version 24.3.0: Thu Jan  2 20:24:24 PST 2025; root:xnu-11215.81.4~3/RELEASE_ARM64_T6030 arm64

And all tests are passed every time, so I am a bit confused why it is failing on CI.
I will check for rest of the OS and submit report here.

…allbacks between C++ and JavaScript layers. Added graceful_close_initiated_ flag which will ensure that JS layer will be notified only when session close is initiated and centralized notification logic in CheckAndNotifyJSAfterWrite() to only notify JavaScript when there's no pending data (nghttp2_session_want_write/read return 0). Previously, excessive callbac
@pandeykushagra51
Copy link
Author

Some tests/linting are failing.

I think the proposed optimization would be useful.

hey @mcollina I have implemented following strategy for optimisation:

  1. Introduced a graceful session closed flag in cpp layer which will ensure that the callback will only be called if session close is initiated by js layer.
  2. Added logic to call js layer only if nghttp2_wants_read and nghttp2_wants_write is false

Both of the above things when combined, will ensure that the new callback onstreamafterwrite_string will only be called once during the whole lifecycle of a session.

Regarding Failing tests, I am able to reproduce this occasionally i.e once in 10 run but not able to find root cause yet. I will be investigating further about RC.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina requested review from jasnell and pimterry April 11, 2025 07:34
@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 11, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 11, 2025
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 11, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 11, 2025
@nodejs-github-bot
Copy link
Collaborator

@mcollina
Copy link
Member

CI doesn't seem happy here, can you take a look?

@pandeykushagra51
Copy link
Author

pandeykushagra51 commented Apr 12, 2025

CI doesn't seem happy here, can you take a look?

sorry for delay, I am planning to do by today/tomorrow

@pandeykushagra51
Copy link
Author

pandeykushagra51 commented Apr 12, 2025

Important Update

Original issue with test:

test/parallel/test-http2-client-rststream-before-connect.js
When a client closes an HTTP/2 stream by sending an RST_STREAM frame, the server receives this frame and should trigger the onStreamClose callback. In the test scenario, the following sequence occurred:

  • The client closed a stream with an error code
  • Immediately after, the client closed the entire connection
  • This sent a GOAWAY frame to the server
  • The server began closing its session

Under the original implementation, the server session closed prematurely before the onStreamClose callback could be properly executed for server session. This caused the test to "pass" incorrectly because the expected error was never thrown - the session was terminated and server was closed before the error handling logic could run.

Simple Way to Verify Test Incorrectness:

Update the test to close server and client with some delay (say 5s), i.e replace line 55,56 of original test of test/parallel/test-http2-client-rststream-before-connect.js with below code.

setTimeout(() => {
  client.close();
  server.close();
}, 5000); 

I have tested with above changes on nodejs version 23.7. and the test failed with this change.

This change exposes the hidden bug because:

  • The client sends RST_STREAM to terminate the stream with an error code
  • With the delay, the server has plenty of time to process this frame
  • The onStreamClose callback fully executes and throws the stream error
  • Since there's no error handler, the test fails with an uncaught exception

This confirms that the original test was passing incorrectly by relying on a race condition:

  • Original behavior: Client closes connection so quickly that server session terminates before fully processing the RST_STREAM frame
  • With delay: Server fully processes all frames and correctly throws the expected error

How current graceful closure fix worked and exposes test in correctness:

The new implementation improves HTTP/2 session closure by ensuring graceful termination:

  • When the server receives a GOAWAY frame from the client, it now checks if there's any pending data in the nghttp2 layer
  • If pending operations exist (like an unprocessed RST_STREAM frame), the session remains open until all data is processed
  • The session only closes after all callbacks (including onStreamClose) have completed execution

With this fix, the onStreamClose callback now runs properly, correctly throwing the expected stream error. This reveals that the test was passing incorrectly before because of premature session termination.

So this also fix the flakiness of current test case.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@mcollina mcollina added the request-ci Add this label to start a Jenkins CI on a PR. label Apr 14, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 14, 2025
@nodejs-github-bot
Copy link
Collaborator

Copy link
Member

@pimterry pimterry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. I'd like to confirm we don't get stuck in the read scenario I've mentioned here, and there are some small style tweaks I think would be helpful too, but overall the approach looks solid.

Good catch spotting & explaining this issue and nice work on fixing it @pandeykushagra51! This is a tricky bit of behaviour and a tidy solution.

@pimterry pimterry added request-ci Add this label to start a Jenkins CI on a PR. author ready PRs that have at least one approval, no pending requests for changes, and a CI started. labels Apr 16, 2025
@github-actions github-actions bot removed the request-ci Add this label to start a Jenkins CI on a PR. label Apr 16, 2025
@nodejs-github-bot
Copy link
Collaborator

@nodejs-github-bot
Copy link
Collaborator

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
author ready PRs that have at least one approval, no pending requests for changes, and a CI started. c++ Issues and PRs that require attention from people who are familiar with C++. lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

graceful http2 session closure with empty body
4 participants