Skip to content

Commit ed7f59a

Browse files
KimeigaMylesBorins
authored andcommitted
url: added url fragment lookup table
Percent-encoded additional characters in fragment state with new FRAGMENT_ENCODE_SET lookup table. The fragment percent-encode set includes the C0 control percent-encode set and code points U+0020, U+0022, U+003C, U+003E, and U+0060. PR-URL: #17627 Fixes: #17540 Reviewed-By: Timothy Gu <[email protected]> Reviewed-By: Daijiro Wachi <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: James M Snell <[email protected]>
1 parent eaa2d91 commit ed7f59a

File tree

4 files changed

+192
-17
lines changed

4 files changed

+192
-17
lines changed

doc/api/url.md

+7-4
Original file line numberDiff line numberDiff line change
@@ -1107,12 +1107,15 @@ forward slash (`/`) character is encoded as `%3C`.
11071107
The [WHATWG URL Standard][] uses a more selective and fine grained approach to
11081108
selecting encoded characters than that used by the Legacy API.
11091109

1110-
The WHATWG algorithm defines three "percent-encode sets" that describe ranges
1110+
The WHATWG algorithm defines four "percent-encode sets" that describe ranges
11111111
of characters that must be percent-encoded:
11121112

11131113
* The *C0 control percent-encode set* includes code points in range U+0000 to
11141114
U+001F (inclusive) and all code points greater than U+007E.
11151115

1116+
* The *fragment percent-encode set* includes the *C0 control percent-encode set*
1117+
and code points U+0020, U+0022, U+003C, U+003E, and U+0060.
1118+
11161119
* The *path percent-encode set* includes the *C0 control percent-encode set*
11171120
and code points U+0020, U+0022, U+0023, U+003C, U+003E, U+003F, U+0060,
11181121
U+007B, and U+007D.
@@ -1123,9 +1126,9 @@ of characters that must be percent-encoded:
11231126

11241127
The *userinfo percent-encode set* is used exclusively for username and
11251128
passwords encoded within the URL. The *path percent-encode set* is used for the
1126-
path of most URLs. The *C0 control percent-encode set* is used for all
1127-
other cases, including URL fragments in particular, but also host and path
1128-
under certain specific conditions.
1129+
path of most URLs. The *fragment percent-encode set* is used for URL fragments.
1130+
The *C0 control percent-encode set* is used for host and path under certain
1131+
specific conditions, in addition to all other cases.
11291132

11301133
When non-ASCII characters appear within a hostname, the hostname is encoded
11311134
using the [Punycode][] algorithm. Note, however, that a hostname *may* contain

src/node_url.cc

+69-1
Original file line numberDiff line numberDiff line change
@@ -325,6 +325,74 @@ const uint8_t C0_CONTROL_ENCODE_SET[32] = {
325325
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80
326326
};
327327

328+
const uint8_t FRAGMENT_ENCODE_SET[32] = {
329+
// 00 01 02 03 04 05 06 07
330+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
331+
// 08 09 0A 0B 0C 0D 0E 0F
332+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
333+
// 10 11 12 13 14 15 16 17
334+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
335+
// 18 19 1A 1B 1C 1D 1E 1F
336+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
337+
// 20 21 22 23 24 25 26 27
338+
0x01 | 0x00 | 0x04 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
339+
// 28 29 2A 2B 2C 2D 2E 2F
340+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
341+
// 30 31 32 33 34 35 36 37
342+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
343+
// 38 39 3A 3B 3C 3D 3E 3F
344+
0x00 | 0x00 | 0x00 | 0x00 | 0x10 | 0x00 | 0x40 | 0x00,
345+
// 40 41 42 43 44 45 46 47
346+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
347+
// 48 49 4A 4B 4C 4D 4E 4F
348+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
349+
// 50 51 52 53 54 55 56 57
350+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
351+
// 58 59 5A 5B 5C 5D 5E 5F
352+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
353+
// 60 61 62 63 64 65 66 67
354+
0x01 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
355+
// 68 69 6A 6B 6C 6D 6E 6F
356+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
357+
// 70 71 72 73 74 75 76 77
358+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
359+
// 78 79 7A 7B 7C 7D 7E 7F
360+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x80,
361+
// 80 81 82 83 84 85 86 87
362+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
363+
// 88 89 8A 8B 8C 8D 8E 8F
364+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
365+
// 90 91 92 93 94 95 96 97
366+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
367+
// 98 99 9A 9B 9C 9D 9E 9F
368+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
369+
// A0 A1 A2 A3 A4 A5 A6 A7
370+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
371+
// A8 A9 AA AB AC AD AE AF
372+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
373+
// B0 B1 B2 B3 B4 B5 B6 B7
374+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
375+
// B8 B9 BA BB BC BD BE BF
376+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
377+
// C0 C1 C2 C3 C4 C5 C6 C7
378+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
379+
// C8 C9 CA CB CC CD CE CF
380+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
381+
// D0 D1 D2 D3 D4 D5 D6 D7
382+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
383+
// D8 D9 DA DB DC DD DE DF
384+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
385+
// E0 E1 E2 E3 E4 E5 E6 E7
386+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
387+
// E8 E9 EA EB EC ED EE EF
388+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
389+
// F0 F1 F2 F3 F4 F5 F6 F7
390+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
391+
// F8 F9 FA FB FC FD FE FF
392+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80
393+
};
394+
395+
328396
const uint8_t PATH_ENCODE_SET[32] = {
329397
// 00 01 02 03 04 05 06 07
330398
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
@@ -1889,7 +1957,7 @@ void URL::Parse(const char* input,
18891957
case 0:
18901958
break;
18911959
default:
1892-
AppendOrEscape(&buffer, ch, C0_CONTROL_ENCODE_SET);
1960+
AppendOrEscape(&buffer, ch, FRAGMENT_ENCODE_SET);
18931961
}
18941962
break;
18951963
default:

test/fixtures/url-setter-tests.js

+43-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
/* The following tests are copied from WPT. Modifications to them should be
44
upstreamed first. Refs:
5-
https://github.com/w3c/web-platform-tests/blob/b30abaecf4/url/setters_tests.json
5+
https://github.com/w3c/web-platform-tests/blob/ed4bb727ed/url/setters_tests.json
66
License: http://www.w3.org/Consortium/Legal/2008/04-testsuite-copyright.html
77
*/
88
module.exports =
@@ -1793,13 +1793,53 @@ module.exports =
17931793
"hash": ""
17941794
}
17951795
},
1796+
{
1797+
"href": "http://example.net",
1798+
"new_value": "#foo bar",
1799+
"expected": {
1800+
"href": "http://example.net/#foo%20bar",
1801+
"hash": "#foo%20bar"
1802+
}
1803+
},
1804+
{
1805+
"href": "http://example.net",
1806+
"new_value": "#foo\"bar",
1807+
"expected": {
1808+
"href": "http://example.net/#foo%22bar",
1809+
"hash": "#foo%22bar"
1810+
}
1811+
},
1812+
{
1813+
"href": "http://example.net",
1814+
"new_value": "#foo<bar",
1815+
"expected": {
1816+
"href": "http://example.net/#foo%3Cbar",
1817+
"hash": "#foo%3Cbar"
1818+
}
1819+
},
1820+
{
1821+
"href": "http://example.net",
1822+
"new_value": "#foo>bar",
1823+
"expected": {
1824+
"href": "http://example.net/#foo%3Ebar",
1825+
"hash": "#foo%3Ebar"
1826+
}
1827+
},
1828+
{
1829+
"href": "http://example.net",
1830+
"new_value": "#foo`bar",
1831+
"expected": {
1832+
"href": "http://example.net/#foo%60bar",
1833+
"hash": "#foo%60bar"
1834+
}
1835+
},
17961836
{
17971837
"comment": "Simple percent-encoding; nuls, tabs, and newlines are removed",
17981838
"href": "a:/",
17991839
"new_value": "\u0000\u0001\t\n\r\u001f !\"#$%&'()*+,-./09:;<=>?@AZ[\\]^_`az{|}~\u007f\u0080\u0081Éé",
18001840
"expected": {
1801-
"href": "a:/#%01%1F !\"#$%&'()*+,-./09:;<=>?@AZ[\\]^_`az{|}~%7F%C2%80%C2%81%C3%89%C3%A9",
1802-
"hash": "#%01%1F !\"#$%&'()*+,-./09:;<=>?@AZ[\\]^_`az{|}~%7F%C2%80%C2%81%C3%89%C3%A9"
1841+
"href": "a:/#%01%1F%20!%22#$%&'()*+,-./09:;%3C=%3E?@AZ[\\]^_%60az{|}~%7F%C2%80%C2%81%C3%89%C3%A9",
1842+
"hash": "#%01%1F%20!%22#$%&'()*+,-./09:;%3C=%3E?@AZ[\\]^_%60az{|}~%7F%C2%80%C2%81%C3%89%C3%A9"
18031843
}
18041844
},
18051845
{

test/fixtures/url-tests.js

+73-9
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
/* The following tests are copied from WPT. Modifications to them should be
44
upstreamed first. Refs:
5-
https://github.com/w3c/web-platform-tests/blob/11757f1/url/urltestdata.json
5+
https://github.com/w3c/web-platform-tests/blob/ed4bb727ed/url/urltestdata.json
66
License: http://www.w3.org/Consortium/Legal/2008/04-testsuite-copyright.html
77
*/
88
module.exports =
@@ -161,7 +161,7 @@ module.exports =
161161
{
162162
"input": "http://f:21/ b ? d # e ",
163163
"base": "http://example.org/foo/bar",
164-
"href": "http://f:21/%20b%20?%20d%20# e",
164+
"href": "http://f:21/%20b%20?%20d%20#%20e",
165165
"origin": "http://f:21",
166166
"protocol": "http:",
167167
"username": "",
@@ -171,12 +171,12 @@ module.exports =
171171
"port": "21",
172172
"pathname": "/%20b%20",
173173
"search": "?%20d%20",
174-
"hash": "# e"
174+
"hash": "#%20e"
175175
},
176176
{
177177
"input": "lolscheme:x x#x x",
178178
"base": "about:blank",
179-
"href": "lolscheme:x x#x x",
179+
"href": "lolscheme:x x#x%20x",
180180
"protocol": "lolscheme:",
181181
"username": "",
182182
"password": "",
@@ -185,7 +185,7 @@ module.exports =
185185
"port": "",
186186
"pathname": "x x",
187187
"search": "",
188-
"hash": "#x x"
188+
"hash": "#x%20x"
189189
},
190190
{
191191
"input": "http://f:/c",
@@ -2268,7 +2268,7 @@ module.exports =
22682268
{
22692269
"input": "http://www.google.com/foo?bar=baz# »",
22702270
"base": "about:blank",
2271-
"href": "http://www.google.com/foo?bar=baz# %C2%BB",
2271+
"href": "http://www.google.com/foo?bar=baz#%20%C2%BB",
22722272
"origin": "http://www.google.com",
22732273
"protocol": "http:",
22742274
"username": "",
@@ -2278,12 +2278,12 @@ module.exports =
22782278
"port": "",
22792279
"pathname": "/foo",
22802280
"search": "?bar=baz",
2281-
"hash": "# %C2%BB"
2281+
"hash": "#%20%C2%BB"
22822282
},
22832283
{
22842284
"input": "data:test# »",
22852285
"base": "about:blank",
2286-
"href": "data:test# %C2%BB",
2286+
"href": "data:test#%20%C2%BB",
22872287
"origin": "null",
22882288
"protocol": "data:",
22892289
"username": "",
@@ -2293,7 +2293,7 @@ module.exports =
22932293
"port": "",
22942294
"pathname": "test",
22952295
"search": "",
2296-
"hash": "# %C2%BB"
2296+
"hash": "#%20%C2%BB"
22972297
},
22982298
{
22992299
"input": "http://www.google.com",
@@ -4795,6 +4795,70 @@ module.exports =
47954795
"searchParams": "qux=",
47964796
"hash": "#foo%08bar"
47974797
},
4798+
{
4799+
"input": "http://foo.bar/baz?qux#foo\"bar",
4800+
"base": "about:blank",
4801+
"href": "http://foo.bar/baz?qux#foo%22bar",
4802+
"origin": "http://foo.bar",
4803+
"protocol": "http:",
4804+
"username": "",
4805+
"password": "",
4806+
"host": "foo.bar",
4807+
"hostname": "foo.bar",
4808+
"port": "",
4809+
"pathname": "/baz",
4810+
"search": "?qux",
4811+
"searchParams": "qux=",
4812+
"hash": "#foo%22bar"
4813+
},
4814+
{
4815+
"input": "http://foo.bar/baz?qux#foo<bar",
4816+
"base": "about:blank",
4817+
"href": "http://foo.bar/baz?qux#foo%3Cbar",
4818+
"origin": "http://foo.bar",
4819+
"protocol": "http:",
4820+
"username": "",
4821+
"password": "",
4822+
"host": "foo.bar",
4823+
"hostname": "foo.bar",
4824+
"port": "",
4825+
"pathname": "/baz",
4826+
"search": "?qux",
4827+
"searchParams": "qux=",
4828+
"hash": "#foo%3Cbar"
4829+
},
4830+
{
4831+
"input": "http://foo.bar/baz?qux#foo>bar",
4832+
"base": "about:blank",
4833+
"href": "http://foo.bar/baz?qux#foo%3Ebar",
4834+
"origin": "http://foo.bar",
4835+
"protocol": "http:",
4836+
"username": "",
4837+
"password": "",
4838+
"host": "foo.bar",
4839+
"hostname": "foo.bar",
4840+
"port": "",
4841+
"pathname": "/baz",
4842+
"search": "?qux",
4843+
"searchParams": "qux=",
4844+
"hash": "#foo%3Ebar"
4845+
},
4846+
{
4847+
"input": "http://foo.bar/baz?qux#foo`bar",
4848+
"base": "about:blank",
4849+
"href": "http://foo.bar/baz?qux#foo%60bar",
4850+
"origin": "http://foo.bar",
4851+
"protocol": "http:",
4852+
"username": "",
4853+
"password": "",
4854+
"host": "foo.bar",
4855+
"hostname": "foo.bar",
4856+
"port": "",
4857+
"pathname": "/baz",
4858+
"search": "?qux",
4859+
"searchParams": "qux=",
4860+
"hash": "#foo%60bar"
4861+
},
47984862
"# IPv4 parsing (via https://github.com/nodejs/node/pull/10317)",
47994863
{
48004864
"input": "http://192.168.257",

0 commit comments

Comments
 (0)