Skip to content

Commit ac52923

Browse files
TimothyGuevanlucas
authored andcommitted
url: update WHATWG URL API to latest spec
- Update to spec - Add opaque hosts - File state did not correctly deal with lack of base URL - Cleanup API for file and non-special URLs - Allow % and IPv6 addresses in non-special URL hosts - Use specific names for percent-encode sets - Add empty host concept for file and non-special URLs - Clarify IPv6 serializer - Fix existing mistakes - Add missing ':' to forbidden host code point list. - Correct IPv4 parser empty label behavior - Maintain type equivalence in URLContext with spec - scheme, username, and password should always be strings - host, port, query, and fragment may be strings or null - Align scheme state more closely with the spec - Make sure the `special` variable is always synced with URL_FLAG_SPECIAL. PR-URL: #12523 Fixes: #10608 Fixes: #10634 Refs: whatwg/url#185 Refs: whatwg/url#225 Refs: whatwg/url#224 Refs: whatwg/url#218 Refs: whatwg/url#243 Refs: whatwg/url#260 Refs: whatwg/url#268 Reviewed-By: James M Snell <[email protected]> Reviewed-By: Daijiro Wachi <[email protected]> Reviewed-By: Joyee Cheung <[email protected]>
1 parent 313b205 commit ac52923

File tree

5 files changed

+830
-728
lines changed

5 files changed

+830
-728
lines changed

doc/api/url.md

+15-13
Original file line numberDiff line numberDiff line change
@@ -1049,23 +1049,25 @@ located within the structure of the URL. The WHATWG URL Standard uses a more
10491049
selective and fine grained approach to selecting encoded characters than that
10501050
used by the older [`url.parse()`][] and [`url.format()`][] methods.
10511051

1052-
The WHATWG algorithm defines three "encoding sets" that describe ranges of
1053-
characters that must be percent-encoded:
1052+
The WHATWG algorithm defines three "percent-encode sets" that describe ranges
1053+
of characters that must be percent-encoded:
10541054

1055-
* The *simple encode set* includes code points in range U+0000 to U+001F
1056-
(inclusive) and all code points greater than U+007E.
1055+
* The *C0 control percent-encode set* includes code points in range U+0000 to
1056+
U+001F (inclusive) and all code points greater than U+007E.
10571057

1058-
* The *default encode set* includes the *simple encode set* and code points
1059-
U+0020, U+0022, U+0023, U+003C, U+003E, U+003F, U+0060, U+007B, and U+007D.
1058+
* The *path percent-encode set* includes the *C0 control percent-encode set*
1059+
and code points U+0020, U+0022, U+0023, U+003C, U+003E, U+003F, U+0060,
1060+
U+007B, and U+007D.
10601061

1061-
* The *userinfo encode set* includes the *default encode set* and code points
1062-
U+002F, U+003A, U+003B, U+003D, U+0040, U+005B, U+005C, U+005D, U+005E, and
1063-
U+007C.
1062+
* The *userinfo encode set* includes the *path percent-encode set* and code
1063+
points U+002F, U+003A, U+003B, U+003D, U+0040, U+005B, U+005C, U+005D,
1064+
U+005E, and U+007C.
10641065

1065-
The *simple encode set* is used primary for URL fragments and certain specific
1066-
conditions for the path. The *userinfo encode set* is used specifically for
1067-
username and passwords encoded within the URL. The *default encode set* is used
1068-
for all other cases.
1066+
The *userinfo percent-encode set* is used exclusively for username and
1067+
passwords encoded within the URL. The *path percent-encode set* is used for the
1068+
path of most URLs. The *C0 control percent-encode set* is used for all
1069+
other cases, including URL fragments in particular, but also host and path
1070+
under certain specific conditions.
10691071

10701072
When non-ASCII characters appear within a hostname, the hostname is encoded
10711073
using the [Punycode][] algorithm. Note, however, that a hostname *may* contain

0 commit comments

Comments
 (0)