Skip to content

Commit 40a36b3

Browse files
Kimeigarvagg
authored andcommitted
url: added url fragment lookup table
Percent-encoded additional characters in fragment state with new FRAGMENT_ENCODE_SET lookup table. The fragment percent-encode set includes the C0 control percent-encode set and code points U+0020, U+0022, U+003C, U+003E, and U+0060. PR-URL: #17627 Fixes: #17540 Reviewed-By: Timothy Gu <[email protected]> Reviewed-By: Daijiro Wachi <[email protected]> Reviewed-By: Ruben Bridgewater <[email protected]> Reviewed-By: James M Snell <[email protected]>
1 parent 654ce4b commit 40a36b3

File tree

4 files changed

+192
-17
lines changed

4 files changed

+192
-17
lines changed

doc/api/url.md

+7-4
Original file line numberDiff line numberDiff line change
@@ -1112,12 +1112,15 @@ forward slash (`/`) character is encoded as `%3C`.
11121112
The [WHATWG URL Standard][] uses a more selective and fine grained approach to
11131113
selecting encoded characters than that used by the Legacy API.
11141114

1115-
The WHATWG algorithm defines three "percent-encode sets" that describe ranges
1115+
The WHATWG algorithm defines four "percent-encode sets" that describe ranges
11161116
of characters that must be percent-encoded:
11171117

11181118
* The *C0 control percent-encode set* includes code points in range U+0000 to
11191119
U+001F (inclusive) and all code points greater than U+007E.
11201120

1121+
* The *fragment percent-encode set* includes the *C0 control percent-encode set*
1122+
and code points U+0020, U+0022, U+003C, U+003E, and U+0060.
1123+
11211124
* The *path percent-encode set* includes the *C0 control percent-encode set*
11221125
and code points U+0020, U+0022, U+0023, U+003C, U+003E, U+003F, U+0060,
11231126
U+007B, and U+007D.
@@ -1128,9 +1131,9 @@ of characters that must be percent-encoded:
11281131

11291132
The *userinfo percent-encode set* is used exclusively for username and
11301133
passwords encoded within the URL. The *path percent-encode set* is used for the
1131-
path of most URLs. The *C0 control percent-encode set* is used for all
1132-
other cases, including URL fragments in particular, but also host and path
1133-
under certain specific conditions.
1134+
path of most URLs. The *fragment percent-encode set* is used for URL fragments.
1135+
The *C0 control percent-encode set* is used for host and path under certain
1136+
specific conditions, in addition to all other cases.
11341137

11351138
When non-ASCII characters appear within a hostname, the hostname is encoded
11361139
using the [Punycode][] algorithm. Note, however, that a hostname *may* contain

src/node_url.cc

+69-1
Original file line numberDiff line numberDiff line change
@@ -332,6 +332,74 @@ const uint8_t C0_CONTROL_ENCODE_SET[32] = {
332332
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80
333333
};
334334

335+
const uint8_t FRAGMENT_ENCODE_SET[32] = {
336+
// 00 01 02 03 04 05 06 07
337+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
338+
// 08 09 0A 0B 0C 0D 0E 0F
339+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
340+
// 10 11 12 13 14 15 16 17
341+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
342+
// 18 19 1A 1B 1C 1D 1E 1F
343+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
344+
// 20 21 22 23 24 25 26 27
345+
0x01 | 0x00 | 0x04 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
346+
// 28 29 2A 2B 2C 2D 2E 2F
347+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
348+
// 30 31 32 33 34 35 36 37
349+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
350+
// 38 39 3A 3B 3C 3D 3E 3F
351+
0x00 | 0x00 | 0x00 | 0x00 | 0x10 | 0x00 | 0x40 | 0x00,
352+
// 40 41 42 43 44 45 46 47
353+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
354+
// 48 49 4A 4B 4C 4D 4E 4F
355+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
356+
// 50 51 52 53 54 55 56 57
357+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
358+
// 58 59 5A 5B 5C 5D 5E 5F
359+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
360+
// 60 61 62 63 64 65 66 67
361+
0x01 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
362+
// 68 69 6A 6B 6C 6D 6E 6F
363+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
364+
// 70 71 72 73 74 75 76 77
365+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00,
366+
// 78 79 7A 7B 7C 7D 7E 7F
367+
0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x00 | 0x80,
368+
// 80 81 82 83 84 85 86 87
369+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
370+
// 88 89 8A 8B 8C 8D 8E 8F
371+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
372+
// 90 91 92 93 94 95 96 97
373+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
374+
// 98 99 9A 9B 9C 9D 9E 9F
375+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
376+
// A0 A1 A2 A3 A4 A5 A6 A7
377+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
378+
// A8 A9 AA AB AC AD AE AF
379+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
380+
// B0 B1 B2 B3 B4 B5 B6 B7
381+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
382+
// B8 B9 BA BB BC BD BE BF
383+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
384+
// C0 C1 C2 C3 C4 C5 C6 C7
385+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
386+
// C8 C9 CA CB CC CD CE CF
387+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
388+
// D0 D1 D2 D3 D4 D5 D6 D7
389+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
390+
// D8 D9 DA DB DC DD DE DF
391+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
392+
// E0 E1 E2 E3 E4 E5 E6 E7
393+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
394+
// E8 E9 EA EB EC ED EE EF
395+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
396+
// F0 F1 F2 F3 F4 F5 F6 F7
397+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
398+
// F8 F9 FA FB FC FD FE FF
399+
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80
400+
};
401+
402+
335403
const uint8_t PATH_ENCODE_SET[32] = {
336404
// 00 01 02 03 04 05 06 07
337405
0x01 | 0x02 | 0x04 | 0x08 | 0x10 | 0x20 | 0x40 | 0x80,
@@ -1896,7 +1964,7 @@ void URL::Parse(const char* input,
18961964
case 0:
18971965
break;
18981966
default:
1899-
AppendOrEscape(&buffer, ch, C0_CONTROL_ENCODE_SET);
1967+
AppendOrEscape(&buffer, ch, FRAGMENT_ENCODE_SET);
19001968
}
19011969
break;
19021970
default:

test/fixtures/url-setter-tests.js

+43-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
/* The following tests are copied from WPT. Modifications to them should be
44
upstreamed first. Refs:
5-
https://github.com/w3c/web-platform-tests/blob/b30abaecf4/url/setters_tests.json
5+
https://github.com/w3c/web-platform-tests/blob/ed4bb727ed/url/setters_tests.json
66
License: http://www.w3.org/Consortium/Legal/2008/04-testsuite-copyright.html
77
*/
88
module.exports =
@@ -1793,13 +1793,53 @@ module.exports =
17931793
"hash": ""
17941794
}
17951795
},
1796+
{
1797+
"href": "http://example.net",
1798+
"new_value": "#foo bar",
1799+
"expected": {
1800+
"href": "http://example.net/#foo%20bar",
1801+
"hash": "#foo%20bar"
1802+
}
1803+
},
1804+
{
1805+
"href": "http://example.net",
1806+
"new_value": "#foo\"bar",
1807+
"expected": {
1808+
"href": "http://example.net/#foo%22bar",
1809+
"hash": "#foo%22bar"
1810+
}
1811+
},
1812+
{
1813+
"href": "http://example.net",
1814+
"new_value": "#foo<bar",
1815+
"expected": {
1816+
"href": "http://example.net/#foo%3Cbar",
1817+
"hash": "#foo%3Cbar"
1818+
}
1819+
},
1820+
{
1821+
"href": "http://example.net",
1822+
"new_value": "#foo>bar",
1823+
"expected": {
1824+
"href": "http://example.net/#foo%3Ebar",
1825+
"hash": "#foo%3Ebar"
1826+
}
1827+
},
1828+
{
1829+
"href": "http://example.net",
1830+
"new_value": "#foo`bar",
1831+
"expected": {
1832+
"href": "http://example.net/#foo%60bar",
1833+
"hash": "#foo%60bar"
1834+
}
1835+
},
17961836
{
17971837
"comment": "Simple percent-encoding; nuls, tabs, and newlines are removed",
17981838
"href": "a:/",
17991839
"new_value": "\u0000\u0001\t\n\r\u001f !\"#$%&'()*+,-./09:;<=>?@AZ[\\]^_`az{|}~\u007f\u0080\u0081Éé",
18001840
"expected": {
1801-
"href": "a:/#%01%1F !\"#$%&'()*+,-./09:;<=>?@AZ[\\]^_`az{|}~%7F%C2%80%C2%81%C3%89%C3%A9",
1802-
"hash": "#%01%1F !\"#$%&'()*+,-./09:;<=>?@AZ[\\]^_`az{|}~%7F%C2%80%C2%81%C3%89%C3%A9"
1841+
"href": "a:/#%01%1F%20!%22#$%&'()*+,-./09:;%3C=%3E?@AZ[\\]^_%60az{|}~%7F%C2%80%C2%81%C3%89%C3%A9",
1842+
"hash": "#%01%1F%20!%22#$%&'()*+,-./09:;%3C=%3E?@AZ[\\]^_%60az{|}~%7F%C2%80%C2%81%C3%89%C3%A9"
18031843
}
18041844
},
18051845
{

test/fixtures/url-tests.js

+73-9
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
/* The following tests are copied from WPT. Modifications to them should be
44
upstreamed first. Refs:
5-
https://github.com/w3c/web-platform-tests/blob/11757f1/url/urltestdata.json
5+
https://github.com/w3c/web-platform-tests/blob/ed4bb727ed/url/urltestdata.json
66
License: http://www.w3.org/Consortium/Legal/2008/04-testsuite-copyright.html
77
*/
88
module.exports =
@@ -161,7 +161,7 @@ module.exports =
161161
{
162162
"input": "http://f:21/ b ? d # e ",
163163
"base": "http://example.org/foo/bar",
164-
"href": "http://f:21/%20b%20?%20d%20# e",
164+
"href": "http://f:21/%20b%20?%20d%20#%20e",
165165
"origin": "http://f:21",
166166
"protocol": "http:",
167167
"username": "",
@@ -171,12 +171,12 @@ module.exports =
171171
"port": "21",
172172
"pathname": "/%20b%20",
173173
"search": "?%20d%20",
174-
"hash": "# e"
174+
"hash": "#%20e"
175175
},
176176
{
177177
"input": "lolscheme:x x#x x",
178178
"base": "about:blank",
179-
"href": "lolscheme:x x#x x",
179+
"href": "lolscheme:x x#x%20x",
180180
"protocol": "lolscheme:",
181181
"username": "",
182182
"password": "",
@@ -185,7 +185,7 @@ module.exports =
185185
"port": "",
186186
"pathname": "x x",
187187
"search": "",
188-
"hash": "#x x"
188+
"hash": "#x%20x"
189189
},
190190
{
191191
"input": "http://f:/c",
@@ -2268,7 +2268,7 @@ module.exports =
22682268
{
22692269
"input": "http://www.google.com/foo?bar=baz# »",
22702270
"base": "about:blank",
2271-
"href": "http://www.google.com/foo?bar=baz# %C2%BB",
2271+
"href": "http://www.google.com/foo?bar=baz#%20%C2%BB",
22722272
"origin": "http://www.google.com",
22732273
"protocol": "http:",
22742274
"username": "",
@@ -2278,12 +2278,12 @@ module.exports =
22782278
"port": "",
22792279
"pathname": "/foo",
22802280
"search": "?bar=baz",
2281-
"hash": "# %C2%BB"
2281+
"hash": "#%20%C2%BB"
22822282
},
22832283
{
22842284
"input": "data:test# »",
22852285
"base": "about:blank",
2286-
"href": "data:test# %C2%BB",
2286+
"href": "data:test#%20%C2%BB",
22872287
"origin": "null",
22882288
"protocol": "data:",
22892289
"username": "",
@@ -2293,7 +2293,7 @@ module.exports =
22932293
"port": "",
22942294
"pathname": "test",
22952295
"search": "",
2296-
"hash": "# %C2%BB"
2296+
"hash": "#%20%C2%BB"
22972297
},
22982298
{
22992299
"input": "http://www.google.com",
@@ -4795,6 +4795,70 @@ module.exports =
47954795
"searchParams": "qux=",
47964796
"hash": "#foo%08bar"
47974797
},
4798+
{
4799+
"input": "http://foo.bar/baz?qux#foo\"bar",
4800+
"base": "about:blank",
4801+
"href": "http://foo.bar/baz?qux#foo%22bar",
4802+
"origin": "http://foo.bar",
4803+
"protocol": "http:",
4804+
"username": "",
4805+
"password": "",
4806+
"host": "foo.bar",
4807+
"hostname": "foo.bar",
4808+
"port": "",
4809+
"pathname": "/baz",
4810+
"search": "?qux",
4811+
"searchParams": "qux=",
4812+
"hash": "#foo%22bar"
4813+
},
4814+
{
4815+
"input": "http://foo.bar/baz?qux#foo<bar",
4816+
"base": "about:blank",
4817+
"href": "http://foo.bar/baz?qux#foo%3Cbar",
4818+
"origin": "http://foo.bar",
4819+
"protocol": "http:",
4820+
"username": "",
4821+
"password": "",
4822+
"host": "foo.bar",
4823+
"hostname": "foo.bar",
4824+
"port": "",
4825+
"pathname": "/baz",
4826+
"search": "?qux",
4827+
"searchParams": "qux=",
4828+
"hash": "#foo%3Cbar"
4829+
},
4830+
{
4831+
"input": "http://foo.bar/baz?qux#foo>bar",
4832+
"base": "about:blank",
4833+
"href": "http://foo.bar/baz?qux#foo%3Ebar",
4834+
"origin": "http://foo.bar",
4835+
"protocol": "http:",
4836+
"username": "",
4837+
"password": "",
4838+
"host": "foo.bar",
4839+
"hostname": "foo.bar",
4840+
"port": "",
4841+
"pathname": "/baz",
4842+
"search": "?qux",
4843+
"searchParams": "qux=",
4844+
"hash": "#foo%3Ebar"
4845+
},
4846+
{
4847+
"input": "http://foo.bar/baz?qux#foo`bar",
4848+
"base": "about:blank",
4849+
"href": "http://foo.bar/baz?qux#foo%60bar",
4850+
"origin": "http://foo.bar",
4851+
"protocol": "http:",
4852+
"username": "",
4853+
"password": "",
4854+
"host": "foo.bar",
4855+
"hostname": "foo.bar",
4856+
"port": "",
4857+
"pathname": "/baz",
4858+
"search": "?qux",
4859+
"searchParams": "qux=",
4860+
"hash": "#foo%60bar"
4861+
},
47984862
"# IPv4 parsing (via https://github.com/nodejs/node/pull/10317)",
47994863
{
48004864
"input": "http://192.168.257",

0 commit comments

Comments
 (0)