|
| 1 | +# CJS Module Lexer |
| 2 | + |
| 3 | +[![Build Status][travis-image]][travis-url] |
| 4 | + |
| 5 | +A [very fast](#benchmarks) JS CommonJS module syntax lexer used to detect the most likely list of named exports of a CommonJS module. |
| 6 | + |
| 7 | +Outputs the list of named exports (`exports.name = ...`) and possible module reexports (`module.exports = require('...')`), including the common transpiler variations of these cases. |
| 8 | + |
| 9 | +Forked from https://github.com/guybedford/es-module-lexer. |
| 10 | + |
| 11 | +_Comprehensively handles the JS language grammar while remaining small and fast. - ~90ms per MB of JS cold and ~15ms per MB of JS warm, [see benchmarks](#benchmarks) for more info._ |
| 12 | + |
| 13 | +### Usage |
| 14 | + |
| 15 | +``` |
| 16 | +npm install cjs-module-lexer |
| 17 | +``` |
| 18 | + |
| 19 | +For use in CommonJS: |
| 20 | + |
| 21 | +```js |
| 22 | +const parse = require('cjs-module-lexer'); |
| 23 | + |
| 24 | +const { exports, reexports } = parse(` |
| 25 | + // named exports detection |
| 26 | + module.exports.a = 'a'; |
| 27 | + (function () { |
| 28 | + exports.b = 'b'; |
| 29 | + })(); |
| 30 | + Object.defineProperty(exports, 'c', { value: 'c' }); |
| 31 | + /* exports.d = 'not detected'; */ |
| 32 | +
|
| 33 | + // reexports detection |
| 34 | + if (maybe) module.exports = require('./dep1.js'); |
| 35 | + if (another) module.exports = require('./dep2.js'); |
| 36 | +
|
| 37 | + // literal exports assignments |
| 38 | + module.exports = { a, b: c, d, 'e': f } |
| 39 | +
|
| 40 | + // __esModule detection |
| 41 | + Object.defineProperty(module.exports, '__esModule', { value: true }) |
| 42 | +`); |
| 43 | + |
| 44 | +// exports === ['a', 'b', 'c', '__esModule'] |
| 45 | +// reexports === ['./dep1.js', './dep2.js'] |
| 46 | +``` |
| 47 | + |
| 48 | +When using the ESM version, Wasm is supported instead: |
| 49 | + |
| 50 | +```js |
| 51 | +import { parse, init } from 'cjs-module-lexer'; |
| 52 | +// init needs to be called and waited upon |
| 53 | +await init(); |
| 54 | +const { exports, reexports } = parse(source); |
| 55 | +``` |
| 56 | + |
| 57 | +The Wasm build is around 1.5x faster and without a cold start. |
| 58 | + |
| 59 | +### Grammar |
| 60 | + |
| 61 | +CommonJS exports matches are run against the source token stream. |
| 62 | + |
| 63 | +The token grammar is: |
| 64 | + |
| 65 | +``` |
| 66 | +IDENTIFIER: As defined by ECMA-262, without support for identifier `\` escapes, filtered to remove strict reserved words: |
| 67 | + "implements", "interface", "let", "package", "private", "protected", "public", "static", "yield", "enum" |
| 68 | +
|
| 69 | +STRING_LITERAL: A `"` or `'` bounded ECMA-262 string literal. |
| 70 | +
|
| 71 | +IDENTIFIER_STRING: ( `"` IDENTIFIER `"` | `'` IDENTIFIER `'` ) |
| 72 | +
|
| 73 | +COMMENT_SPACE: Any ECMA-262 whitespace, ECMA-262 block comment or ECMA-262 line comment |
| 74 | +
|
| 75 | +MODULE_EXPORTS: `module` COMMENT_SPACE `.` COMMENT_SPACE `exports` |
| 76 | +
|
| 77 | +EXPORTS_IDENTIFIER: MODULE_EXPORTS_IDENTIFIER | `exports` |
| 78 | +
|
| 79 | +EXPORTS_DOT_ASSIGN: EXPORTS_IDENTIFIER COMMENT_SPACE `.` COMMENT_SPACE IDENTIFIER COMMENT_SPACE `=` |
| 80 | +
|
| 81 | +EXPORTS_LITERAL_COMPUTED_ASSIGN: EXPORTS_IDENTIFIER COMMENT_SPACE `[` COMMENT_SPACE IDENTIFIER_STRING COMMENT_SPACE `]` COMMENT_SPACE `=` |
| 82 | +
|
| 83 | +EXPORTS_LITERAL_PROP: (IDENTIFIER (COMMENT_SPACE `:` COMMENT_SPACE IDENTIFIER)?) | (IDENTIFIER_STRING COMMENT_SPACE `:` COMMENT_SPACE IDENTIFIER) |
| 84 | +
|
| 85 | +EXPORTS_MEMBER: EXPORTS_DOT_ASSIGN | EXPORTS_LITERAL_COMPUTED_ASSIGN |
| 86 | +
|
| 87 | +EXPORTS_DEFINE: `Object` COMMENT_SPACE `.` COMMENT_SPACE `defineProperty COMMENT_SPACE `(` EXPORTS_IDENTIFIER COMMENT_SPACE `,` COMMENT_SPACE IDENTIFIER_STRING |
| 88 | +
|
| 89 | +EXPORTS_LITERAL: MODULE_EXPORTS COMMENT_SPACE `=` COMMENT_SPACE `{` COMMENT_SPACE (EXPORTS_LITERAL_PROP COMMENT_SPACE `,` COMMENT_SPACE)+ `}` |
| 90 | +
|
| 91 | +REQUIRE: `require` COMMENT_SPACE `(` COMMENT_SPACE STRING_LITERAL COMMENT_SPACE `)` |
| 92 | +
|
| 93 | +EXPORTS_ASSIGN: (`var` | `const` | `let`) IDENTIFIER `=` REQUIRE |
| 94 | +
|
| 95 | +MODULE_EXPORTS_ASSIGN: MODULE_EXPORTS COMMENT_SPACE `=` COMMENT_SPACE REQUIRE |
| 96 | +
|
| 97 | +EXPORT_STAR: (`__export` | `__exportStar`) `(` REQUIRE |
| 98 | +
|
| 99 | +EXPORT_STAR_LIB: `Object.keys(` IDENTIFIER$1 `).forEach(function (` IDENTIFIER$2 `) {` |
| 100 | + ( |
| 101 | + `if (` IDENTIFIER$2 `===` ( `'default'` | `"default"` ) `||` IDENTIFIER$2 `===` ( '__esModule' | `"__esModule"` ) `) return` `;`? | |
| 102 | + `if (` IDENTIFIER$2 `!==` ( `'default'` | `"default"` ) `)` |
| 103 | + ) |
| 104 | + ( |
| 105 | + EXPORTS_IDENTIFIER `[` IDENTIFIER$2 `] =` IDENTIFIER$1 `[` IDENTIFIER$2 `]` `;`? | |
| 106 | + `Object.defineProperty(` EXPORTS_IDENTIFIER `, ` IDENTIFIER$2 `, { enumerable: true, get: function () { return ` IDENTIFIER$1 `[` IDENTIFIER$2 `]` `;`? } })` `;`? |
| 107 | + ) |
| 108 | + `})` |
| 109 | +``` |
| 110 | + |
| 111 | +* The returned export names are the matched `IDENTIFIER` and `IDENTIFIER_STRING` slots for all `EXPORTS_MEMBER`, `EXPORTS_DEFINE` and `EXPORTS_LITERAL` matches. |
| 112 | +* The reexport specifiers are taken to be the `STRING_LITERAL` slots of all `MODULE_EXPORTS_ASSIGN` as well as all _top-level_ `EXPORT_STAR` `REQUIRE` matches and `EXPORTS_ASSIGN` matches whose `IDENTIFIER` also matches the first `IDENTIFIER` in `EXPORT_STAR_LIB`. |
| 113 | + |
| 114 | +### Parsing Examples |
| 115 | + |
| 116 | +#### Named Exports Parsing |
| 117 | + |
| 118 | +The basic matching rules for named exports are `exports.name`, `exports['name']` or `Object.defineProperty(exports, 'name', ...)`. This matching is done without scope analysis and regardless of the expression position: |
| 119 | + |
| 120 | +```js |
| 121 | +// DETECTS EXPORTS: a, b, c |
| 122 | +(function (exports) { |
| 123 | + exports.a = 'a'; |
| 124 | + exports['b'] = 'b'; |
| 125 | + Object.defineProperty(exports, 'c', { value: 'c' }); |
| 126 | +})(exports); |
| 127 | +``` |
| 128 | + |
| 129 | +Because there is no scope analysis, the above detection may overclassify: |
| 130 | + |
| 131 | +```js |
| 132 | +// DETECTS EXPORTS: a, b, c |
| 133 | +(function (exports, Object) { |
| 134 | + exports.a = 'a'; |
| 135 | + exports['b'] = 'b'; |
| 136 | + if (false) |
| 137 | + Object.defineProperty(exports, 'c', { value: 'c' }); |
| 138 | +})(NOT_EXPORTS, NOT_OBJECT); |
| 139 | +``` |
| 140 | + |
| 141 | +It will in turn underclassify in cases where the identifiers are renamed: |
| 142 | + |
| 143 | +```js |
| 144 | +// DETECTS: NO EXPORTS |
| 145 | +(function (e, defineProperty) { |
| 146 | + e.a = 'a'; |
| 147 | + e['b'] = 'b'; |
| 148 | + defineProperty(e, 'c', { value: 'c' }); |
| 149 | +})(exports, defineProperty); |
| 150 | +``` |
| 151 | + |
| 152 | +#### Exports Object Assignment |
| 153 | + |
| 154 | +A best-effort is made to detect `module.exports` object assignments, but because this is not a full parser, arbitrary expressions are not handled in the |
| 155 | +object parsing process. |
| 156 | + |
| 157 | +Simple object definitions are supported: |
| 158 | + |
| 159 | +```js |
| 160 | +// DETECTS EXPORTS: a, b, c |
| 161 | +module.exports = { |
| 162 | + a, |
| 163 | + b: 'c', |
| 164 | + c: c |
| 165 | +}; |
| 166 | +``` |
| 167 | + |
| 168 | +Object properties that are not identifiers or string expressions will bail out of the object detection: |
| 169 | + |
| 170 | +```js |
| 171 | +// DETECTS EXPORTS: a, b |
| 172 | +module.exports = { |
| 173 | + a, |
| 174 | + b: require('c'), |
| 175 | + c: "not detected since require('c') above bails the object detection" |
| 176 | +} |
| 177 | +``` |
| 178 | + |
| 179 | +`Object.defineProperties` is not currently supported either. |
| 180 | + |
| 181 | +#### module.exports reexport assignment |
| 182 | + |
| 183 | +Any `module.exports = require('mod')` assignment is detected as a reexport: |
| 184 | + |
| 185 | +```js |
| 186 | +// DETECTS REEXPORTS: a, b, c |
| 187 | +module.exports = require('a'); |
| 188 | +(module => module.exports = require('b'))(NOT_MODULE); |
| 189 | +if (false) module.exports = require('c'); |
| 190 | +``` |
| 191 | + |
| 192 | +As a result, the total list of exports would be inferred as the union of all of these reexported modules, which can lead to possible over-classification. |
| 193 | + |
| 194 | +#### Transpiler Re-exports |
| 195 | + |
| 196 | +For named exports, transpiler output works well with the rules described above. |
| 197 | + |
| 198 | +But for star re-exports, special care is taken to support common patterns of transpiler outputs from Babel and TypeScript as well as bundlers like RollupJS. |
| 199 | +These reexport and star reexport patterns are restricted to only be detected at the top-level as provided by the direct output of these tools. |
| 200 | + |
| 201 | +For example, `export * from 'external'` is output by Babel as: |
| 202 | + |
| 203 | +```js |
| 204 | +"use strict"; |
| 205 | + |
| 206 | +exports.__esModule = true; |
| 207 | + |
| 208 | +var _external = require("external"); |
| 209 | + |
| 210 | +Object.keys(_external).forEach(function (key) { |
| 211 | + if (key === "default" || key === "__esModule") return; |
| 212 | + exports[key] = _external[key]; |
| 213 | +}); |
| 214 | +``` |
| 215 | + |
| 216 | +Where the `var _external = require("external")` is specifically detected as well as the `Object.keys(_external)` statement, down to the exact |
| 217 | +for of that entire expression including minor variations of the output. The `_external` and `key` identifiers are carefully matched in this |
| 218 | +detection. |
| 219 | + |
| 220 | +Similarly for TypeScript, `export * from 'external'` is output as: |
| 221 | + |
| 222 | +```js |
| 223 | +"use strict"; |
| 224 | +function __export(m) { |
| 225 | + for (var p in m) if (!exports.hasOwnProperty(p)) exports[p] = m[p]; |
| 226 | +} |
| 227 | +Object.defineProperty(exports, "__esModule", { value: true }); |
| 228 | +__export(require("external")); |
| 229 | +``` |
| 230 | + |
| 231 | +Where the `__export(require("external"))` statement is explicitly detected as a reexport, including variations `tslib.__export` and `__exportStar`. |
| 232 | + |
| 233 | +### Environment Support |
| 234 | + |
| 235 | +Node.js 10+, and [all browsers with Web Assembly support](https://caniuse.com/#feat=wasm). |
| 236 | + |
| 237 | +### JS Grammar Support |
| 238 | + |
| 239 | +* Token state parses all line comments, block comments, strings, template strings, blocks, parens and punctuators. |
| 240 | +* Division operator / regex token ambiguity is handled via backtracking checks against punctuator prefixes, including closing brace or paren backtracking. |
| 241 | +* Always correctly parses valid JS source, but may parse invalid JS source without errors. |
| 242 | + |
| 243 | +### Benchmarks |
| 244 | + |
| 245 | +Benchmarks can be run with `npm run bench`. |
| 246 | + |
| 247 | +Current results: |
| 248 | + |
| 249 | +JS Build: |
| 250 | + |
| 251 | +``` |
| 252 | +Module load time |
| 253 | +> 2ms |
| 254 | +Cold Run, All Samples |
| 255 | +test/samples/*.js (3635 KiB) |
| 256 | +> 333ms |
| 257 | +
|
| 258 | +Warm Runs (average of 25 runs) |
| 259 | +test/samples/angular.js (1410 KiB) |
| 260 | +> 16.48ms |
| 261 | +test/samples/angular.min.js (303 KiB) |
| 262 | +> 5.36ms |
| 263 | +test/samples/d3.js (553 KiB) |
| 264 | +> 8.32ms |
| 265 | +test/samples/d3.min.js (250 KiB) |
| 266 | +> 4.28ms |
| 267 | +test/samples/magic-string.js (34 KiB) |
| 268 | +> 1ms |
| 269 | +test/samples/magic-string.min.js (20 KiB) |
| 270 | +> 0.36ms |
| 271 | +test/samples/rollup.js (698 KiB) |
| 272 | +> 10.48ms |
| 273 | +test/samples/rollup.min.js (367 KiB) |
| 274 | +> 6.64ms |
| 275 | +
|
| 276 | +Warm Runs, All Samples (average of 25 runs) |
| 277 | +test/samples/*.js (3635 KiB) |
| 278 | +> 49.28ms |
| 279 | +``` |
| 280 | + |
| 281 | +Wasm Build: |
| 282 | +``` |
| 283 | +Module load time |
| 284 | +> 11ms |
| 285 | +Cold Run, All Samples |
| 286 | +test/samples/*.js (3635 KiB) |
| 287 | +> 48ms |
| 288 | +
|
| 289 | +Warm Runs (average of 25 runs) |
| 290 | +test/samples/angular.js (1410 KiB) |
| 291 | +> 12.32ms |
| 292 | +test/samples/angular.min.js (303 KiB) |
| 293 | +> 3.76ms |
| 294 | +test/samples/d3.js (553 KiB) |
| 295 | +> 6.08ms |
| 296 | +test/samples/d3.min.js (250 KiB) |
| 297 | +> 3ms |
| 298 | +test/samples/magic-string.js (34 KiB) |
| 299 | +> 0.24ms |
| 300 | +test/samples/magic-string.min.js (20 KiB) |
| 301 | +> 0ms |
| 302 | +test/samples/rollup.js (698 KiB) |
| 303 | +> 7.2ms |
| 304 | +test/samples/rollup.min.js (367 KiB) |
| 305 | +> 4.2ms |
| 306 | +
|
| 307 | +Warm Runs, All Samples (average of 25 runs) |
| 308 | +test/samples/*.js (3635 KiB) |
| 309 | +> 33.6ms |
| 310 | +``` |
| 311 | + |
| 312 | +### Wasm Build Steps |
| 313 | + |
| 314 | +To build download the WASI SDK from https://github.com/CraneStation/wasi-sdk/releases. |
| 315 | + |
| 316 | +The Makefile assumes the existence of "wasi-sdk-10.0", "binaryen" and "wabt" (both optional) as sibling folders to this project. |
| 317 | + |
| 318 | +The build through the Makefile is then run via `make lib/lexer.wasm`, which can also be triggered via `npm run build-wasm` to create `dist/lexer.js`. |
| 319 | + |
| 320 | +On Windows it may be preferable to use the Linux subsystem. |
| 321 | + |
| 322 | +After the Web Assembly build, the CJS build can be triggered via `npm run build`. |
| 323 | + |
| 324 | +Optimization passes are run with [Binaryen](https://github.com/WebAssembly/binaryen) prior to publish to reduce the Web Assembly footprint. |
| 325 | + |
| 326 | +### License |
| 327 | + |
| 328 | +MIT |
| 329 | + |
| 330 | +[travis-url]: https://travis-ci.org/guybedford/es-module-lexer |
| 331 | +[travis-image]: https://travis-ci.org/guybedford/es-module-lexer.svg?branch=master |
0 commit comments