Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

URL path shortening for ../ creates problem with other URL parsers that do not follow the whatwg standard #810

Closed
stefanbeigel opened this issue Dec 22, 2023 · 6 comments

Comments

@stefanbeigel
Copy link

What is the issue with the URL Standard?

Hi,

I would like to share with you a common scenario:

  1. A request is recevied via NodeJs Express or Fastify server
  2. Request is forwarded to another service using an http client that uses the URL class to build the target URL using the service hostname + the incoming request.pathname

This scenario can lead to path traversal vulnerabilities as Express and Fastify do not evaluate ../ and ./ but the whatwg URL does. So the route checks of express / fastify match another path. This situation is not good at all, because the developer need to know about the different parsing / evaluation logic.

Example
I have prepared a sample application with fastify.
https://github.com/stefanbeigel/whatwg-fastify-path-traversal/blob/main/index.mjs
Call the app with curl --path-as-is localhost:3000/abc/../foobar

Possible solutions

  1. Http server libraries parses the URL with the whatwg URL standard
  2. Whatwg URL drops the path shortening or gives an option to disable it

As this behavior was introduced by the URL class I created this issue, even you can argue that this is a problem of fastify / express / nodejs.

@annevk
Copy link
Member

annevk commented Dec 22, 2023

Using different parsers is a known security vulnerability and it's one of the reasons we have standardized how URLs are parsed, so more tooling can interoperate with it.

I hope you're not expecting that we change the standard over this as that would undoubtedly break the web.

@stefanbeigel
Copy link
Author

stefanbeigel commented Dec 27, 2023

Using different parsers is a known security vulnerability and it's one of the reasons we have standardized how URLs are parsed, so more tooling can interoperate with it.

Are there any plans to also adapt server side parsing of the url in nodejs / the http server module to use the whatwg URL class?
I know it might be better to ask this the nodejs team but I guess you are also in contact with them.

I hope you're not expecting that we change the standard over this as that would undoubtedly break the web.

Well actually I would question why it's defined in the standard like this, why not leave the path as it is?
With this the whatwg url standard automatically creates issues with other url parsers that do not evaluate ../

BR

@annevk
Copy link
Member

annevk commented Dec 28, 2023

Best to ask Node.js.

It's defined in the standard like this because this is how browsers have behaved for a long time. They parse and mostly-canonicalize at the same time.

@annevk annevk closed this as not planned Won't fix, can't repro, duplicate, stale Dec 28, 2023
@stefanbeigel
Copy link
Author

Hi

It's defined in the standard like this because this is how browsers have behaved for a long time. They parse and mostly-canonicalize at the same time.

Well then why even follow the whatwg standard if it's mainly focused on browsers?
Really wonder why this URL class has been made standard in NodeJs, will open a issue at NodeJs and link it here.

Btw I also opened an issue at fastify.
fastify/fastify#5204

@annevk
Copy link
Member

annevk commented Dec 29, 2023

Because having different parsers causes issues, I thought we already established that...

@karwa
Copy link
Contributor

karwa commented Dec 29, 2023

By the way, the treatment of . and .. components in this standard is consistent with previous RFCs, such as RFC-3986.

When describing URL normalization, RFC-3986 is explicit about what . and .. components are for:

6.2.2.3. Path Segment Normalization

The complete path segments "." and ".." are intended only for use
within relative references (Section 4.1) and are removed as part of
the reference resolution process (Section 5.2). However, some
deployed implementations incorrectly assume that reference resolution
is not necessary when the reference is already a URI and thus fail to
remove dot-segments when they occur in non-relative paths. URI
normalizers should remove dot-segments by applying the
remove_dot_segments algorithm to the path, as described in
Section 5.2.4.

It's been a while since I read the HTTP spec, but if I recall correctly, I believe the idea is that the server takes the path and query (together known as the "request target"), and the host (supplied via the Host: header), and reconstructs the URL using the processing in RFC-3986. That means it would be expected to remove dot segments, as this standard also does.

Furthermore, I think RFC-3986 is clear that . and .. components are intended only for hierarchical traversal within the path, and any servers which give them a different interpretation could reasonably be described as "incorrect". It would be very difficult for clients to robustly interact with such a server because the RFC gives tools and libraries explicit license and even encouragement to remove them ("URI normalizers should remove dot-segments").

So IMO, that part of the standard, and its embodiment in JavaScript's URL class, is consistent with RFC-3986. Which is nearly 20 years old and widely used even outside of browser contexts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants