Skip to content
This repository was archived by the owner on Nov 6, 2022. It is now read-only.

Expose HTTP header boundary #89

Closed
wants to merge 1 commit into from

Conversation

mmalecki
Copy link
Contributor

@ry
Copy link
Contributor

ry commented Feb 13, 2012

lgtm. this changes binary compatibility - but for Node I don't consider http_parser.h to be public - so it could be landed in v0.6 if desired.

@bnoordhuis
Copy link
Member

Can you sign the CLA? It's different from the Node CLA.

@pgriess
Copy link
Contributor

pgriess commented Feb 13, 2012

What is this feature for? I read nodejs/node-v0.x-archive#2612 but I'm not entirely clear on how this would enable faster proxying. Is the idea that you want to know where the headers end so that you can slice off a Buffer representing the request/status line + headers and send it un-modified to the other end of the connection? To use that, I think you'd also have to know where messages begin. Would you just walk back boundary bytes and consider this the beginning of the message?

Incidentally, this change adds 4 bytes to the http_parser size, which I suspect @clifffrey would like to avoid.

@pgriess
Copy link
Contributor

pgriess commented Feb 13, 2012

Another way to go, which would enable Buffer re-use when handling the message body would be to stop de-chunking bodies when parsing them. This would also have the side-effect of preserving chunking decisions made on either side of the connection, which might be useful in some situations (though not required by the RFC).

@indexzero
Copy link

@pgriess Nice of you to join us.

It's not about faster proxying, it's about better proxying. The websocket proxying code in node-http-proxy is terrible because it has to be written on top of an http.Server. This change allows us to greatly simplify the proxy logic as well as remove a lot of unnecessary logic used through http.IncomingMessage where the parser integration is.

The logic is simple:

Adapted from here. We'll be updating this once this pull-request lands.

  var parsed = false;
  var buffer = [];
  socket.ondata = function (chunk) {
    var ret = parser.execute(d, start, end - start);

    if (parser.boundary && !parsed) {
      parsed = true;
      buffer.push(chunk.slice(0, parser.boundary);

      heyUserModifyTheHeadersIfYouWant(parser._headers, function (_, newHeaders) {
        //
        // Serialize the headers back to the outgoing socket e.g.
        // https://github.com/joyent/node/blob/master/lib/http.js#L471-562
        //
        ready = true;

        //
        // Flush any buffered chunks
        //
        for (var i = 0; i < buffer.length; i++) {
          // Write to outgoing socket.
        }

        buffer.length = 0;
      })
    }
    else if (parser.boundary && parsed && !ready) {
      buffer.push(chunk);
    }
    else {
      //
      // Just write to the outgoing socket.
      //
    }
  };

Oh, and (unrelated) users frequently bother me about why still use a vendored version of node-websocket-client. Can you find some cycles to merge this? pgriess/node-websocket-client#9

@mmalecki
Copy link
Contributor Author

@bnoordhuis CLA signed.

@indexzero
Copy link

After speaking @pgriess I'm actually inclined to agree with him. I did not know that the parser.onBody would actually be called for each chunk of the body. I had assumed it would be called with the entirety of the body.

With this in mind here's the new logic:

  var buffer = [];
  var parsed = false;
  var ready = false;
  var HTTPParser = process.binding('http_parser').HTTPParser;

  parser = new HTTPParser(HTTPParser.REQUEST);
  parser._headers = [];
  parser._url = '';
  parser.socket = socket;

  parser.onBody(function (b) {
    if (ready) {
      //
      // If there is any buffer length write it out first
      //
      if (buffer.length) {
        for (var i = 0; i < buffer.length; i++) {
          // Write buffer to socket
        }

        buffer.length = 0;
      }

      //
      // Write the current body chunk, b, out to the socket
      //
      return;
    }

    buffer.push(b);
  });

  socket.ondata = function (chunk) {
    var ret = parser.execute(d, start, end - start);

    if (parser._headers && !parsed) {
      parsed = true;
      heyUserModifyTheHeadersIfYouWant(parser._headers, function (_, newHeaders) {
        //
        // Serialize the headers back to the outgoing socket e.g.
        // https://github.com/joyent/node/blob/master/lib/http.js#L471-562
        //
        ready = true;
      });
    }
  };

I'm still +1 on this though. 4 bytes per request additional doesn't seem like killer overhead to me.

@indexzero
Copy link

The reason I'm still +1 on this is that the approach using parser.boundary suggests to be more performant because it only invokes a single function on every chunk: ondata instead of both onbody and ondata.

@pgriess
Copy link
Contributor

pgriess commented Feb 13, 2012

A slight clarification: The callbacks that @indexzero is referring to are those that cross the JavaScript<->C++ boundary in Node (which are particularly slow relative to those that do not cross the boundary).

I'm still -1 on this because fundamentally I think this is a Node issue (callbacks into JavaScript are expensive), not an http_parser issue. Providing an API that allowed the caller to inspect the raw byte stream is kind of counter to the ethos of this parser library, which as it stands now owns all framing and data manipulation decisions and delegates action by callbacks: it decides when to eat characters (spurious whitespace), join header values when they span lines, de-chunks body data, etc.

Instead, I think we can solve this in the Node/http_parser integration layer by providing an http_parser_execute() workalike API that populates a structure describing the event stream (event X fired at byte N, Y fired at byte M, etc) and returns it to the caller in one shot. This would be a behavior similar to what's in the un-finished event_stream branch.

@indexzero
Copy link

So after digging around in http.js in node.js internals I realized that this behavior (of calling both socket.ondata and parser.onBody for every chunk) is already in use.

In most cases (except for Upgrade) both of these functions will be invoked:

https://github.com/joyent/node/blob/master/lib/http.js#L1229-1271
https://github.com/joyent/node/blob/master/lib/http.js#L121-129

This won't help parsing scenarios do any better than node.js core already does so might as well scrap it. :(

@emberian
Copy link
Contributor

Can this issue be closed if it's being scrapped?

@bnoordhuis
Copy link
Member

Yes, it turned out it's not necessary.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose HTTP header boundary
6 participants