|
2 | 2 |
|
3 | 3 | Stability: 2 - Stable
|
4 | 4 |
|
5 |
| -This module has utilities for URL resolution and parsing. |
6 |
| -Call `require('url')` to use it. |
| 5 | +The `url` module provides utilities for URL resolution and parsing. It can be |
| 6 | +accessed using: |
7 | 7 |
|
8 |
| -## URL Parsing |
| 8 | +```js |
| 9 | +const url = require('url'); |
| 10 | +``` |
9 | 11 |
|
10 |
| -Parsed URL objects have some or all of the following fields, depending on |
11 |
| -whether or not they exist in the URL string. Any parts that are not in the URL |
12 |
| -string will not be in the parsed object. Examples are shown for the URL |
| 12 | +## URL Strings and URL Objects |
13 | 13 |
|
14 |
| -`'http://user:[email protected]:8080/p/a/t/h?query=string#hash'` |
| 14 | +A URL string is a structured string containing multiple meaningful components. |
| 15 | +When parsed, a URL object is returned containing properties for each of these |
| 16 | +components. |
15 | 17 |
|
16 |
| -* `href`: The full URL that was originally parsed. Both the protocol and host are lowercased. |
| 18 | +The following details each of the components of a parsed URL. The example |
| 19 | +`'http://user:[email protected]:8080/p/a/t/h?query=string#hash'` is used to |
| 20 | +illustrate each. |
17 | 21 |
|
18 |
| - Example: `'http://user:[email protected]:8080/p/a/t/h?query=string#hash'` |
| 22 | +``` |
| 23 | ++---------------------------------------------------------------------------+ |
| 24 | +| href | |
| 25 | ++----------++-----------+-----------------+-------------------------+-------+ |
| 26 | +| protocol || auth | host | path | hash | |
| 27 | +| || +----------+------+----------+--------------+ | |
| 28 | +| || | hostname | port | pathname | search | | |
| 29 | +| || | | | +-+------------+ | |
| 30 | +| || | | | | | query | | |
| 31 | +" http: // user:pass @ host.com : 8080 /p/a/t/h ? query=string #hash " |
| 32 | +| || | | | | | | | |
| 33 | ++----------++-----------+-----------+------+----------+-+-----------+-------+ |
| 34 | +(all spaces in the "" line should be ignored -- they're purely for formatting) |
| 35 | +``` |
19 | 36 |
|
20 |
| -* `protocol`: The request protocol, lowercased. |
| 37 | +### urlObject.href |
21 | 38 |
|
22 |
| - Example: `'http:'` |
| 39 | +The `href` property is the full URL string that was parsed with both the |
| 40 | +`protocol` and `host` components converted to lower-case. |
23 | 41 |
|
24 |
| -* `slashes`: The protocol requires slashes after the colon. |
| 42 | +For example: `'http://user:[email protected]:8080/p/a/t/h?query=string#hash'` |
25 | 43 |
|
26 |
| - Example: true or false |
| 44 | +### urlObject.protocol |
27 | 45 |
|
28 |
| -* `host`: The full lowercased host portion of the URL, including port |
29 |
| - information. |
| 46 | +The `protocol` property identifies the URL's lower-cased protocol scheme. |
30 | 47 |
|
31 |
| - Example: `'host.com:8080'` |
| 48 | +For example: `'http:'` |
32 | 49 |
|
33 |
| -* `auth`: The authentication information portion of a URL. |
| 50 | +### urlObject.slashes |
34 | 51 |
|
35 |
| - Example: `'user:pass'` |
| 52 | +The `slashes` property is a `boolean` with a value of `true` if two ASCII |
| 53 | +forward-slash characters (`/`) are required following the colon in the |
| 54 | +`protocol`. |
36 | 55 |
|
37 |
| -* `hostname`: Just the lowercased hostname portion of the host. |
| 56 | +### urlObject.host |
38 | 57 |
|
39 |
| - Example: `'host.com'` |
| 58 | +The `host` property is the full lower-cased host portion of the URL, including |
| 59 | +the `port` if specified. |
40 | 60 |
|
41 |
| -* `port`: The port number portion of the host. |
| 61 | +For example: `'host.com:8080'` |
42 | 62 |
|
43 |
| - Example: `'8080'` |
| 63 | +### urlObject.auth |
44 | 64 |
|
45 |
| -* `pathname`: The path section of the URL, that comes after the host and |
46 |
| - before the query, including the initial slash if present. No decoding is |
47 |
| - performed. |
| 65 | +The `auth` property is the username and password portion of the URL, also |
| 66 | +referred to as "userinfo". This string subset follows the `protocol` and |
| 67 | +double slashes (if present) and preceeds the `host` component, delimited by an |
| 68 | +ASCII "at sign" (`@`). The format of the string is `{username}[:{password}]`, |
| 69 | +with the `[:{password}]` portion being optional. |
48 | 70 |
|
49 |
| - Example: `'/p/a/t/h'` |
| 71 | +For example: `'user:pass'` |
50 | 72 |
|
51 |
| -* `search`: The 'query string' portion of the URL, including the leading |
52 |
| - question mark. |
| 73 | +### urlObject.hostname |
53 | 74 |
|
54 |
| - Example: `'?query=string'` |
| 75 | +The `hostname` property is the lower-cased host name portion of the `host` |
| 76 | +component *without* the `port` included. |
55 | 77 |
|
56 |
| -* `path`: Concatenation of `pathname` and `search`. No decoding is performed. |
| 78 | +For example: `'host.com'` |
57 | 79 |
|
58 |
| - Example: `'/p/a/t/h?query=string'` |
| 80 | +### urlObject.port |
59 | 81 |
|
60 |
| -* `query`: Either the 'params' portion of the query string, or a |
61 |
| - querystring-parsed object. |
| 82 | +The `port` property is the numeric port portion of the `host` component. |
62 | 83 |
|
63 |
| - Example: `'query=string'` or `{'query':'string'}` |
| 84 | +For example: `'8080'` |
64 | 85 |
|
65 |
| -* `hash`: The 'fragment' portion of the URL including the pound-sign. |
| 86 | +### urlObject.pathname |
66 | 87 |
|
67 |
| - Example: `'#hash'` |
| 88 | +The `pathname` property consists of the entire path section of the URL. This |
| 89 | +is everything following the `host` (including the `port`) and before the start |
| 90 | +of the `query` or `hash` components, delimited by either the ASCII question |
| 91 | +mark (`?`) or hash (`#`) characters. |
68 | 92 |
|
69 |
| -### Escaped Characters |
| 93 | +For example `'/p/a/t/h'` |
70 | 94 |
|
71 |
| -Spaces (`' '`) and the following characters will be automatically escaped in the |
72 |
| -properties of URL objects: |
| 95 | +No decoding of the path string is performed. |
73 | 96 |
|
74 |
| -``` |
75 |
| -< > " ` \r \n \t { } | \ ^ ' |
76 |
| -``` |
| 97 | +### urlObject.search |
| 98 | + |
| 99 | +The `search` property consists of the entire "query string" portion of the |
| 100 | +URL, including the leading ASCII question mark (`?`) character. |
| 101 | + |
| 102 | +For example: `'?query=string'` |
| 103 | + |
| 104 | +No decoding of the query string is performed. |
| 105 | + |
| 106 | +### urlObject.path |
| 107 | + |
| 108 | +The `path` property is a concatenation of the `pathname` and `search` |
| 109 | +components. |
| 110 | + |
| 111 | +For example: `'/p/a/t/h?query=string'` |
| 112 | + |
| 113 | +No decoding of the `path` is performed. |
| 114 | + |
| 115 | +### urlObject.query |
| 116 | + |
| 117 | +The `query` property is either the "params" portion of the query string ( |
| 118 | +everything *except* the leading ASCII question mark (`?`), or an object |
| 119 | +returned by the [`querystring`][] module's `parse()` method: |
77 | 120 |
|
78 |
| ---- |
| 121 | +For example: `'query=string'` or `{'query': 'string'}` |
79 | 122 |
|
80 |
| -The following methods are provided by the URL module: |
| 123 | +If returned as a string, no decoding of the query string is performed. If |
| 124 | +returned as an object, both keys and values are decoded. |
81 | 125 |
|
82 |
| -## url.format(urlObj) |
| 126 | +### urlObject.hash |
| 127 | + |
| 128 | +The `hash` property consists of the "fragment" portion of the URL including |
| 129 | +the leading ASCII hash (`#`) character. |
| 130 | + |
| 131 | +For example: `'#hash'` |
| 132 | + |
| 133 | +## url.format(urlObject) |
83 | 134 | <!-- YAML
|
84 | 135 | added: v0.1.25
|
85 | 136 | -->
|
86 | 137 |
|
87 |
| -Take a parsed URL object, and return a formatted URL string. |
88 |
| - |
89 |
| -Here's how the formatting process works: |
90 |
| - |
91 |
| -* `href` will be ignored. |
92 |
| -* `path` will be ignored. |
93 |
| -* `protocol` is treated the same with or without the trailing `:` (colon). |
94 |
| - * The protocols `http`, `https`, `ftp`, `gopher`, `file` will be |
95 |
| - postfixed with `://` (colon-slash-slash) as long as `host`/`hostname` are present. |
96 |
| - * All other protocols `mailto`, `xmpp`, `aim`, `sftp`, `foo`, etc will |
97 |
| - be postfixed with `:` (colon). |
98 |
| -* `slashes` set to `true` if the protocol requires `://` (colon-slash-slash) |
99 |
| - * Only needs to be set for protocols not previously listed as requiring |
100 |
| - slashes, such as `mongodb://localhost:8000/`, or if `host`/`hostname` are absent. |
101 |
| -* `auth` will be used if present. |
102 |
| -* `hostname` will only be used if `host` is absent. |
103 |
| -* `port` will only be used if `host` is absent. |
104 |
| -* `host` will be used in place of `hostname` and `port`. |
105 |
| -* `pathname` is treated the same with or without the leading `/` (slash). |
106 |
| -* `query` (object; see `querystring`) will only be used if `search` is absent. |
107 |
| -* `search` will be used in place of `query`. |
108 |
| - * It is treated the same with or without the leading `?` (question mark). |
109 |
| -* `hash` is treated the same with or without the leading `#` (pound sign, anchor). |
110 |
| - |
111 |
| -## url.parse(urlStr[, parseQueryString][, slashesDenoteHost]) |
| 138 | +* `urlObject` {Object} A URL object (either as returned by `url.parse()` or |
| 139 | + constructed otherwise). |
| 140 | + |
| 141 | +The `url.format()` method processes the given URL object and returns a formatted |
| 142 | +URL string. |
| 143 | + |
| 144 | +The formatting process essentially operates as follows: |
| 145 | + |
| 146 | +* A new empty string `result` is created. |
| 147 | +* If `urlObject.protocol` is a string, it is appended as-is to `result`. |
| 148 | +* Otherwise, if `urlObject.protocol` is not `undefined` and is not a string, an |
| 149 | + [`Error`][] is thrown. |
| 150 | +* For all string values of `urlObject.protocol` that *do not end* with an ASCII |
| 151 | + colon (`:`) character, the literal string `:` will be appended to `result`. |
| 152 | +* If either the `urlObject.slashes` property is true, `urlObject.protocol` |
| 153 | + begins with one of `http`, `https`, `ftp`, `gopher`, or `file`, or |
| 154 | + `urlObject.protocol` is `undefined`, the literal string `//` will be appended |
| 155 | + to `result`. |
| 156 | +* If the value of the `urlObject.auth` property is truthy, and either |
| 157 | + `urlObject.host` or `urlObject.hostname` are not `undefined`, the value of |
| 158 | + `urlObject.auth` will be coerced into a string and appended to `result` |
| 159 | + followed by the literal string `@`. |
| 160 | +* If the `urlObject.host` property is `undefined` then: |
| 161 | + * If the `urlObject.hostname` is a string, it is appended to `result`. |
| 162 | + * Otherwise, if `urlObject.hostname` is not `undefined` and is not a string, |
| 163 | + an [`Error`][] is thrown. |
| 164 | + * If the `urlObject.port` property value is truthy, and `urlObject.hostname` |
| 165 | + is not `undefined`: |
| 166 | + * The literal string `:` is appended to `result`, and |
| 167 | + * The value of `urlObject.port` is coerced to a string and appended to |
| 168 | + `result`. |
| 169 | +* Otherwise, if the `urlObject.host` property value is truthy, the value of |
| 170 | + `urlObject.host` is coerced to a string and appended to `result`. |
| 171 | +* If the `urlObject.pathname` property is a string that is not an empty string: |
| 172 | + * If the `urlObject.pathname` *does not start* with an ASCII forward slash |
| 173 | + (`/`), then the literal string '/' is appended to `result`. |
| 174 | + * The value of `urlObject.pathname` is appended to `result`. |
| 175 | +* Otherwise, if `urlObject.pathname` is not `undefined` and is not a string, an |
| 176 | + [`Error`][] is thrown. |
| 177 | +* If the `urlObject.search` property is `undefined` and if the `urlObject.query` |
| 178 | + property is an `Object`, the literal string `?` is appended to `result` |
| 179 | + followed by the output of calling the [`querystring`][] module's `stringify()` |
| 180 | + method passing the value of `urlObject.query`. |
| 181 | +* Otherwise, if `urlObject.search` is a string: |
| 182 | + * If the value of `urlObject.search` *does not start* with the ASCII question |
| 183 | + mark (`?`) character, the literal string `?` is appended to `result`. |
| 184 | + * The value of `urlObject.search` is appended to `result`. |
| 185 | +* Otherwise, if `urlObject.search` is not `undefined` and is not a string, an |
| 186 | + [`Error`][] is thrown. |
| 187 | +* If the `urlObject.hash` property is a string: |
| 188 | + * If the value of `urlObject.hash` *does not start* with the ASCII hash (`#`) |
| 189 | + character, the literal string `#` is appended to `result`. |
| 190 | + * The value of `urlObject.hash` is appended to `result`. |
| 191 | +* Otherwise, if the `urlObject.hash` property is not `undefined` and is not a |
| 192 | + string, an [`Error`][] is thrown. |
| 193 | +* `result` is returned. |
| 194 | + |
| 195 | + |
| 196 | +## url.parse(urlString[, parseQueryString[, slashesDenoteHost]]) |
112 | 197 | <!-- YAML
|
113 | 198 | added: v0.1.25
|
114 | 199 | -->
|
115 | 200 |
|
116 |
| -Take a URL string, and return an object. |
117 |
| - |
118 |
| -Pass `true` as the second argument to also parse the query string using the |
119 |
| -`querystring` module. If `true` then the `query` property will always be |
120 |
| -assigned an object, and the `search` property will always be a (possibly |
121 |
| -empty) string. If `false` then the `query` property will not be parsed or |
122 |
| -decoded. Defaults to `false`. |
| 201 | +* `urlString` {string} The URL string to parse. |
| 202 | +* `parseQueryString` {boolean} If `true`, the `query` property will always |
| 203 | + be set to an object returned by the [`querystring`][] module's `parse()` |
| 204 | + method. If `false`, the `query` property on the returned URL object will be an |
| 205 | + unparsed, undecoded string. Defaults to `false`. |
| 206 | +* `slashesDenoteHost` {boolean} If `true`, the first token after the literal |
| 207 | + string `//` and preceeding the next `/` will be interpreted as the `host`. |
| 208 | + For instance, given `//foo/bar`, the result would be |
| 209 | + `{host: 'foo', pathname: '/bar'}` rather than `{pathname: '//foo/bar'}`. |
| 210 | + Defaults to `false`. |
123 | 211 |
|
124 |
| -Pass `true` as the third argument to treat `//foo/bar` as |
125 |
| -`{ host: 'foo', pathname: '/bar' }` rather than |
126 |
| -`{ pathname: '//foo/bar' }`. Defaults to `false`. |
| 212 | +The `url.parse()` method takes a URL string, parses it, and returns a URL |
| 213 | +object. |
127 | 214 |
|
128 | 215 | ## url.resolve(from, to)
|
129 | 216 | <!-- YAML
|
130 | 217 | added: v0.1.25
|
131 | 218 | -->
|
132 | 219 |
|
133 |
| -Take a base URL, and a href URL, and resolve them as a browser would for |
134 |
| -an anchor tag. Examples: |
| 220 | +* `from` {string} The Base URL being resolved against. |
| 221 | +* `to` {string} The HREF URL being resolved. |
| 222 | + |
| 223 | +The `url.resolve()` method resolves a target URL relative to a base URL in a |
| 224 | +manner similar to that of a Web browser resolving an anchor tag HREF. |
| 225 | + |
| 226 | +For example: |
135 | 227 |
|
136 | 228 | ```js
|
137 | 229 | url.resolve('/one/two/three', 'four') // '/one/two/four'
|
138 | 230 | url.resolve('http://example.com/', '/one') // 'http://example.com/one'
|
139 | 231 | url.resolve('http://example.com/one', '/two') // 'http://example.com/two'
|
140 | 232 | ```
|
| 233 | + |
| 234 | +## Escaped Characters |
| 235 | + |
| 236 | +URLs are only permitted to contain a certain range of characters. Spaces (`' '`) |
| 237 | +and the following characters will be automatically escaped in the |
| 238 | +properties of URL objects: |
| 239 | + |
| 240 | +``` |
| 241 | +< > " ` \r \n \t { } | \ ^ ' |
| 242 | +``` |
| 243 | + |
| 244 | +For example, the ASCII space character (`' '`) is encoded as `%20`. The ASCII |
| 245 | +forward slash (`/`) character is encoded as `%3C`. |
| 246 | + |
| 247 | + |
| 248 | +[`Error`]: errors.html#errors_class_error |
| 249 | +[`querystring`]: querystring.html |
0 commit comments