-
Notifications
You must be signed in to change notification settings - Fork 101
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add cgi_mode to parse_header to support LF in CGI headers #166
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks pretty good. I have a few suggestions, please let me know what you think.
lib/webrick/httputils.rb
Outdated
header_line = Regexp.new(/^([A-Za-z0-9!\#$%&'*+\-.^_`|~]+):([^\r\n\0]*?)#{line_break}\z/m) | ||
continued_header_lines = Regexp.new(/^[ \t]+([^\r\n\0]*?)#{line_break}/m) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This allocates 2 regexps per call. Can we instead add 4 Regexp constants (2 for CGI mode and 2 for non-CGI mode), and use those constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. That makes a lot of sense, especially for a frequently called function like this. I'll update it to use constants.
sig/httputils.rbs
Outdated
@@ -26,7 +26,7 @@ module WEBrick | |||
|
|||
HEADER_CLASSES: Hash[String, untyped] | |||
|
|||
def self?.parse_header: (String raw) -> Hash[String, Array[String]] | |||
def self?.parse_header: (String raw, ?bool cgi_mode) -> Hash[String, Array[String]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would also need updating for the keyword argument, but I've never written RBS before, so I'm not sure how.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have never written RBS either, but when I run rbs, it outputs the following:
def self?.parse_header: (untyped raw, ?cgi_mode: bool) -> untyped
So I think it would be written like this:
def self?.parse_header: (String raw, ?cgi_mode: bool) -> Hash[String, Array[String]]
use keyword argument Co-authored-by: Jeremy Evans <[email protected]>
use keyword argument Co-authored-by: Jeremy Evans <[email protected]>
I've applied the suggested changes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thank you!
lib/webrick/httputils.rb
Outdated
REGEXP_CONTINUED_HEADER_LINE = /^[ \t]+([^\r\n\0]*?)\r\n/m | ||
REGEXP_CONTINUED_CGI_HEADER_LINE = /^[ \t]+([^\r\n\0]*?)\r?\n/m | ||
|
||
def parse_header(raw, cgi_mode: false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this a public interface change? or is it internal to CGIHandler
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a public interface change. However, adding an optional keyword argument should be a backwards compatible change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If that's the case, I'd like to set the bar a little higher on the naming of cgi_mode
& related documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering the general state of WEBrick's documentation, lack of documentation hardly seems like a blocker (though documentation improvements are obviously welcomed). If you don't like the argument name, please pick a new one (allow_bare_lf
?) and I'm sure we can switch to it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the feedback. I've updated the comment for parse_header.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not strongly against
cgi_mode: true/false
but.. bear with me and I'll explain my thinking.The only difference between the two patterns is the line terminator right?
Correct.
I feel like it's a bit convoluted to have two completely different regex, and then use a boolean to select which ones to use. It's not very "DRY" although that isn't a strict requirement, just a general feeling.
The separate regexes are a performance optimization, so we don't need to allocate 2 regex per call.
There is almost certainly never going to be code that dynamically selects between one or the other, in other words,
cgi_mode: true
orcgi_mode: false
is likely going to be hard coded by the caller.
Yes, but that's also true of many methods in Ruby, so I don't see why it should be a blocker.
Maybe it's not possible to change the interpretation of
raw
but IMHO, if we are expected to provide a list of lines, I'd change that to be an Array of lines without line termination, in other words, the CGI parser should be responsible for reading lines, not the header parser. That way, it would be the responsibility of the CGI parser to determine "what is a line" rather than trying to shoe-horn that intoparse_header
.
I don't want to make structural changes when they aren't necessary to fix a bug.
In addition, are there any other places (e.g. chunked encoding) where we need to handle this distinction? Maybe if we know that, we can better decide the appropriate design.
AFAIK, CGI mode is the only place. Chunked encoding requires CRLF I believe.
If this is too pedantic we can just accept
cgi_mode
but I hope this gives better context around my thinking and feedback.
I think it's too pendantic. IMO, we should accept this fix, and if you want to make structural changes before the next release, then do so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The separate regexes are a performance optimization, so we don't need to allocate 2 regex per call.
If these are an implementation detail, can we make them private?
Yes, but that's also true of many methods in Ruby, so I don't see why it should be a blocker.
It's not, it's an observation to explain my position.
I don't want to make structural changes when they aren't necessary to fix a bug.
Sometimes the shortest path from A to B is not the best one.
As this is a CGI specific code path, my preference is for this code not to leak outside CGIHandler
. I'd like to hear back from @paulownia but Jeremy I don't mind if you merge this after that. I am not planning on fixing WEBRick's design issues.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the detailed explanation. I understand the point about separating line reading from header parsing, and keeping CGI-specific code within CGIHandler. I agree that a cleaner design would be ideal if possible.
Since no major design changes are required, I will move the CGI-related code. But would simply moving the two constants to CGIHandler be sufficient? I'm not sure this is the best approach—any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and marking them as private would also be a good idea.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I fixed the code, but it seems better to pass the Regexp itself instead of cgi_mode. This way, we can use private_constant to make the constants completely private.
fixes #165
Adds
cgi_mode
option toparse_header
method to allow bare LF line breaks in CGI headers.