Expose HTTP header boundary #89

mmalecki · 2012-02-13T03:30:27Z

ry · 2012-02-13T04:07:19Z

lgtm. this changes binary compatibility - but for Node I don't consider http_parser.h to be public - so it could be landed in v0.6 if desired.

bnoordhuis · 2012-02-13T14:05:38Z

Can you sign the CLA? It's different from the Node CLA.

pgriess · 2012-02-13T14:37:10Z

What is this feature for? I read nodejs/node-v0.x-archive#2612 but I'm not entirely clear on how this would enable faster proxying. Is the idea that you want to know where the headers end so that you can slice off a Buffer representing the request/status line + headers and send it un-modified to the other end of the connection? To use that, I think you'd also have to know where messages begin. Would you just walk back boundary bytes and consider this the beginning of the message?

Incidentally, this change adds 4 bytes to the http_parser size, which I suspect @clifffrey would like to avoid.

pgriess · 2012-02-13T14:40:35Z

Another way to go, which would enable Buffer re-use when handling the message body would be to stop de-chunking bodies when parsing them. This would also have the side-effect of preserving chunking decisions made on either side of the connection, which might be useful in some situations (though not required by the RFC).

indexzero · 2012-02-13T16:46:51Z

@pgriess Nice of you to join us.

It's not about faster proxying, it's about better proxying. The websocket proxying code in node-http-proxy is terrible because it has to be written on top of an http.Server. This change allows us to greatly simplify the proxy logic as well as remove a lot of unnecessary logic used through http.IncomingMessage where the parser integration is.

The logic is simple:

Adapted from here. We'll be updating this once this pull-request lands.

  var parsed = false;
  var buffer = [];
  socket.ondata = function (chunk) {
    var ret = parser.execute(d, start, end - start);

    if (parser.boundary && !parsed) {
      parsed = true;
      buffer.push(chunk.slice(0, parser.boundary);

      heyUserModifyTheHeadersIfYouWant(parser._headers, function (_, newHeaders) {
        //
        // Serialize the headers back to the outgoing socket e.g.
        // https://github.com/joyent/node/blob/master/lib/http.js#L471-562
        //
        ready = true;

        //
        // Flush any buffered chunks
        //
        for (var i = 0; i < buffer.length; i++) {
          // Write to outgoing socket.
        }

        buffer.length = 0;
      })
    }
    else if (parser.boundary && parsed && !ready) {
      buffer.push(chunk);
    }
    else {
      //
      // Just write to the outgoing socket.
      //
    }
  };

Oh, and (unrelated) users frequently bother me about why still use a vendored version of node-websocket-client. Can you find some cycles to merge this? pgriess/node-websocket-client#9

mmalecki · 2012-02-13T16:50:50Z

@bnoordhuis CLA signed.

indexzero · 2012-02-13T18:31:45Z

After speaking @pgriess I'm actually inclined to agree with him. I did not know that the parser.onBody would actually be called for each chunk of the body. I had assumed it would be called with the entirety of the body.

With this in mind here's the new logic:

  var buffer = [];
  var parsed = false;
  var ready = false;
  var HTTPParser = process.binding('http_parser').HTTPParser;

  parser = new HTTPParser(HTTPParser.REQUEST);
  parser._headers = [];
  parser._url = '';
  parser.socket = socket;

  parser.onBody(function (b) {
    if (ready) {
      //
      // If there is any buffer length write it out first
      //
      if (buffer.length) {
        for (var i = 0; i < buffer.length; i++) {
          // Write buffer to socket
        }

        buffer.length = 0;
      }

      //
      // Write the current body chunk, b, out to the socket
      //
      return;
    }

    buffer.push(b);
  });

  socket.ondata = function (chunk) {
    var ret = parser.execute(d, start, end - start);

    if (parser._headers && !parsed) {
      parsed = true;
      heyUserModifyTheHeadersIfYouWant(parser._headers, function (_, newHeaders) {
        //
        // Serialize the headers back to the outgoing socket e.g.
        // https://github.com/joyent/node/blob/master/lib/http.js#L471-562
        //
        ready = true;
      });
    }
  };

I'm still +1 on this though. 4 bytes per request additional doesn't seem like killer overhead to me.

indexzero · 2012-02-13T18:36:52Z

The reason I'm still +1 on this is that the approach using parser.boundary suggests to be more performant because it only invokes a single function on every chunk: ondata instead of both onbody and ondata.

pgriess · 2012-02-13T20:24:55Z

A slight clarification: The callbacks that @indexzero is referring to are those that cross the JavaScript<->C++ boundary in Node (which are particularly slow relative to those that do not cross the boundary).

I'm still -1 on this because fundamentally I think this is a Node issue (callbacks into JavaScript are expensive), not an http_parser issue. Providing an API that allowed the caller to inspect the raw byte stream is kind of counter to the ethos of this parser library, which as it stands now owns all framing and data manipulation decisions and delegates action by callbacks: it decides when to eat characters (spurious whitespace), join header values when they span lines, de-chunks body data, etc.

Instead, I think we can solve this in the Node/http_parser integration layer by providing an http_parser_execute() workalike API that populates a structure describing the event stream (event X fired at byte N, Y fired at byte M, etc) and returns it to the caller in one shot. This would be a behavior similar to what's in the un-finished event_stream branch.

indexzero · 2012-02-16T02:59:51Z

So after digging around in http.js in node.js internals I realized that this behavior (of calling both socket.ondata and parser.onBody for every chunk) is already in use.

In most cases (except for Upgrade) both of these functions will be invoked:

https://github.com/joyent/node/blob/master/lib/http.js#L1229-1271
https://github.com/joyent/node/blob/master/lib/http.js#L121-129

This won't help parsing scenarios do any better than node.js core already does so might as well scrap it. :(

emberian · 2012-12-16T02:02:49Z

Can this issue be closed if it's being scrapped?

bnoordhuis · 2012-12-16T09:29:23Z

Yes, it turned out it's not necessary.

Expose HTTP header boundary

b7136a7

Fixes nodejs/node-v0.x-archive#2612.

mmalecki mentioned this pull request Feb 13, 2012

Expose HTTP header boundary nodejs/node-v0.x-archive#2745

Closed

bnoordhuis closed this Dec 16, 2012

AndreLouisCaron mentioned this pull request Oct 26, 2015

Forcing a pause at the end of HTTP headers #97

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Expose HTTP header boundary #89

Expose HTTP header boundary #89

mmalecki commented Feb 13, 2012

ry commented Feb 13, 2012

bnoordhuis commented Feb 13, 2012

pgriess commented Feb 13, 2012

pgriess commented Feb 13, 2012

indexzero commented Feb 13, 2012

mmalecki commented Feb 13, 2012

indexzero commented Feb 13, 2012

indexzero commented Feb 13, 2012

pgriess commented Feb 13, 2012

indexzero commented Feb 16, 2012

emberian commented Dec 16, 2012

bnoordhuis commented Dec 16, 2012

Expose HTTP header boundary #89

Expose HTTP header boundary #89

Conversation

mmalecki commented Feb 13, 2012

ry commented Feb 13, 2012

bnoordhuis commented Feb 13, 2012

pgriess commented Feb 13, 2012

pgriess commented Feb 13, 2012

indexzero commented Feb 13, 2012

mmalecki commented Feb 13, 2012

indexzero commented Feb 13, 2012

indexzero commented Feb 13, 2012

pgriess commented Feb 13, 2012

indexzero commented Feb 16, 2012

emberian commented Dec 16, 2012

bnoordhuis commented Dec 16, 2012