Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 1796e49

Browse files
trevnorrisMylesBorins
authored andcommittedApr 21, 2016
doc: add domain postmortem
Do to various reasons, outlined in the committed document, domains were only in core for 2 years before being deprecated. This outline explains why they received criticism from the community and never gained traction with module authors. Also included is an example script that accompanies the postmortem analysis. PR-URL: #6159 Reviewed-By: Robert Lindstädt <[email protected]> Reviewed-By: Kelvin Knighton <[email protected]> Reviewed-By: James M Snell <[email protected]> Reviewed-By: Jeremiah Senkpiel <[email protected]> Reviewed-By: Benjamin Gruenbaum <[email protected]> Reviewed-By: Brian White <[email protected]>
1 parent 02bf459 commit 1796e49

File tree

2 files changed

+435
-0
lines changed

2 files changed

+435
-0
lines changed
 

‎doc/topics/domain-postmortem.md

+299
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,299 @@
1+
# Domain Module Postmortem
2+
3+
## Usability Issues
4+
5+
### Implicit Behavior
6+
7+
It's possible for a developer to create a new domain and then simply run
8+
`domain.enter()`. Which then acts as a catch-all for any exception in the
9+
future that couldn't be observed by the thrower. Allowing a module author to
10+
intercept the exceptions of unrelated code in a different module. Preventing
11+
the originator of the code from knowing about its own exceptions.
12+
13+
Here's an example of how one indirectly linked modules can affect another:
14+
15+
```js
16+
// module a.js
17+
const b = require('./b');
18+
const c = require('./c');
19+
20+
21+
// module b.js
22+
const d = require('domain').create();
23+
d.on('error', () => { /* silence everything */ });
24+
d.enter();
25+
26+
27+
// module c.js
28+
const dep = require('some-dep');
29+
dep.method(); // Uh-oh! This method doesn't actually exist.
30+
```
31+
32+
Since module `b` enters the domain but never exits any uncaught exception will
33+
be swallowed. Leaving module `c` in the dark as to why it didn't run the entire
34+
script. Leaving a potentially partially populated `module.exports`. Doing this
35+
is not the same as listening for `'uncaughtException'`. As the latter is
36+
explicitly meant to globally catch errors. The other issue is that domains are
37+
processed prior to any `'uncaughtException'` handlers, and prevent them from
38+
running.
39+
40+
Another issue is that domains route errors automatically if no `'error'`
41+
handler was set on the event emitter. There is no opt-in mechanism for this,
42+
and automatically propagates across the entire asynchronous chain. This may
43+
seem useful at first, but once asynchronous calls are two or more modules deep
44+
and one of them doesn't include an error handler the creator of the domain will
45+
suddenly be catching unexpected exceptions, and the thrower's exception will go
46+
unnoticed by the author.
47+
48+
The following is a simple example of how a missing `'error'` handler allows
49+
the active domain to hijack the error:
50+
51+
```js
52+
const domain = require('domain');
53+
const net = require('net');
54+
const d = domain.create();
55+
d.on('error', (err) => console.error(err.message));
56+
57+
d.run(() => net.createServer((c) => {
58+
c.end();
59+
c.write('bye');
60+
}).listen(8000));
61+
```
62+
63+
Even manually removing the connection via `d.remove(c)` does not prevent the
64+
connection's error from being automatically intercepted.
65+
66+
Failures that plagues both error routing and exception handling are the
67+
inconsistencies in how errors are bubbled. The following is an example of how
68+
nested domains will and won't bubble the exception based on when they happen:
69+
70+
```js
71+
const domain = require('domain');
72+
const net = require('net');
73+
const d = domain.create();
74+
d.on('error', () => console.error('d intercepted an error'));
75+
76+
d.run(() => {
77+
const server = net.createServer((c) => {
78+
const e = domain.create(); // No 'error' handler being set.
79+
e.run(() => {
80+
// This will not be caught by d's error handler.
81+
setImmediate(() => {
82+
throw new Error('thrown from setImmediate');
83+
});
84+
// Though this one will bubble to d's error handler.
85+
throw new Error('immediately thrown');
86+
});
87+
}).listen(8080);
88+
});
89+
```
90+
91+
It may be expected that nested domains always remain nested, and will always
92+
propagate the exception up the domain stack. Or that exceptions will never
93+
automatically bubble. Unfortunately both these situations occur, leading to
94+
potentially confusing behavior that may even be prone to difficult to debug
95+
timing conflicts.
96+
97+
98+
### API Gaps
99+
100+
While APIs based on using `EventEmitter` can use `bind()` and errback style
101+
callbacks can use `intercept()`, alternative APIs that implicitly bind to the
102+
active domain must be executed inside of `run()`. Meaning if module authors
103+
wanted to support domains using a mechanism alternative to those mentioned they
104+
must manually implement domain support themselves. Instead of being able to
105+
leverage the implicit mechanisms already in place.
106+
107+
108+
### Error Propagation
109+
110+
Propagating errors across nested domains is not straight forward, if even
111+
possible. Existing documentation shows a simple example of how to `close()` an
112+
`http` server if there is an error in the request handler. What it does not
113+
explain is how to close the server if the request handler creates another
114+
domain instance for another async request. Using the following as a simple
115+
example of the failing of error propagation:
116+
117+
```js
118+
const d1 = domain.create();
119+
d1.foo = true; // custom member to make more visible in console
120+
d1.on('error', (er) => { /* handle error */ });
121+
122+
d1.run(() => setTimeout(() => {
123+
const d2 = domain.create();
124+
d2.bar = 43;
125+
d2.on('error', (er) => console.error(er.message, domain._stack));
126+
d2.run(() => {
127+
setTimeout(() => {
128+
setTimeout(() => {
129+
throw new Error('outer');
130+
});
131+
throw new Error('inner')
132+
});
133+
});
134+
}));
135+
```
136+
137+
Even in the case that the domain instances are being used for local storage so
138+
access to resources are made available there is still no way to allow the error
139+
to continue propagating from `d2` back to `d1`. Quick inspection may tell us
140+
that simply throwing from `d2`'s domain `'error'` handler would allow `d1` to
141+
then catch the exception and execute its own error handler. Though that is not
142+
the case. Upon inspection of `domain._stack` you'll see that the stack only
143+
contains `d2`.
144+
145+
This may be considered a failing of the API, but even if it did operate in this
146+
way there is still the issue of transmitting the fact that a branch in the
147+
asynchronous execution has failed, and that all further operations in that
148+
branch must cease. In the example of the http request handler, if we fire off
149+
several asynchronous requests and each one then `write()`'s data back to the
150+
client many more errors will arise from attempting to `write()` to a closed
151+
handle. More on this in _Resource Cleanup on Exception_.
152+
153+
154+
### Resource Cleanup on Exception
155+
156+
The script [`domain-resource-cleanup.js`](domain-resource-cleanup.js)
157+
contains a more complex example of properly cleaning up in a small resource
158+
dependency tree in the case that an exception occurs in a given connection or
159+
any of its dependencies. Breaking down the script into its basic operations:
160+
161+
- When a new connection happens, concurrently:
162+
- Open a file on the file system
163+
- Open Pipe to unique socket
164+
- Read a chunk of the file asynchronously
165+
- Write chunk to both the TCP connection and any listening sockets
166+
- If any of these resources error, notify all other attached resources that
167+
they need to clean up and shutdown
168+
169+
As we can see from this example a lot more must be done to properly clean up
170+
resources when something fails than what can be done strictly through the
171+
domain API. All that domains offer is an exception aggregation mechanism. Even
172+
the potentially useful ability to propagate data with the domain is easily
173+
countered, in this example, by passing the needed resources as a function
174+
argument.
175+
176+
One problem domains perpetuated was the supposed simplicity of being able to
177+
continue execution, contrary to what the documentation stated, of the
178+
application despite an unexpected exception. This example demonstrates the
179+
fallacy behind that idea.
180+
181+
Attempting proper resource cleanup on unexpected exception becomes more complex
182+
as the application itself grows in complexity. This example only has 3 basic
183+
resources in play, and all of them with a clear dependency path. If an
184+
application uses something like shared resources or resource reuse the ability
185+
to cleanup, and properly test that cleanup has been done, grows greatly.
186+
187+
In the end, in terms of handling errors, domains aren't much more than a
188+
glorified `'uncaughtException'` handler. Except with more implicit and
189+
unobservable behavior by third-parties.
190+
191+
192+
### Resource Propagation
193+
194+
Another use case for domains was to use it to propagate data along asynchronous
195+
data paths. One problematic point is the ambiguity of when to expect the
196+
correct domain when there are multiple in the stack (which must be assumed if
197+
the async stack works with other modules). Also the conflict between being
198+
able to depend on a domain for error handling while also having it available to
199+
retrieve the necessary data.
200+
201+
The following is a involved example demonstrating the failing using domains to
202+
propagate data along asynchronous stacks:
203+
204+
```js
205+
const domain = require('domain');
206+
const net = require('net');
207+
208+
const server = net.createServer((c) => {
209+
// Use a domain to propagate data across events within the
210+
// connection so that we don't have to pass arguments
211+
// everywhere.
212+
const d = domain.create();
213+
d.data = { connection: c };
214+
d.add(c);
215+
// Mock class that does some useless async data transformation
216+
// for demonstration purposes.
217+
const ds = new DataStream(dataTransformed);
218+
c.on('data', (chunk) => ds.data(chunk));
219+
}).listen(8080, () => console.log(`listening on 8080`));
220+
221+
function dataTransformed(chunk) {
222+
// FAIL! Because the DataStream instance also created a
223+
// domain we have now lost the active domain we had
224+
// hoped to use.
225+
domain.active.data.connection.write(chunk);
226+
}
227+
228+
function DataStream(cb) {
229+
this.cb = cb;
230+
// DataStream wants to use domains for data propagation too!
231+
// Unfortunately this will conflict with any domain that
232+
// already exists.
233+
this.domain = domain.create();
234+
this.domain.data = { inst: this };
235+
}
236+
237+
DataStream.prototype.data = function data(chunk) {
238+
// This code is self contained, but pretend it's a complex
239+
// operation that crosses at least one other module. So
240+
// passing along "this", etc., is not easy.
241+
this.domain.run(function() {
242+
// Simulate an async operation that does the data transform.
243+
setImmediate(() => {
244+
for (var i = 0; i < chunk.length; i++)
245+
chunk[i] = ((chunk[i] + Math.random() * 100) % 96) + 33;
246+
// Grab the instance from the active domain and use that
247+
// to call the user's callback.
248+
const self = domain.active.data.inst;
249+
self.cb.call(self, chunk);
250+
});
251+
});
252+
};
253+
```
254+
255+
The above shows that it is difficult to have more than one asynchronous API
256+
attempt to use domains to propagate data. This example could possibly be fixed
257+
by assigning `parent: domain.active` in the `DataStream` constructor. Then
258+
restoring it via `domain.active = domain.active.data.parent` just before the
259+
user's callback is called. Also the instantiation of `DataStream` in the
260+
`'connection'` callback must be run inside `d.run()`, instead of simply using
261+
`d.add(c)`, otherwise there will be no active domain.
262+
263+
In short, for this to have a prayer of a chance usage would need to strictly
264+
adhere to a set of guidelines that would be difficult to enforce or test.
265+
266+
267+
## Performance Issues
268+
269+
A significant deterrent from using domains is the overhead. Using node's
270+
built-in http benchmark, `http_simple.js`, without domains it can handle over
271+
22,000 requests/second. Whereas if it's run with `NODE_USE_DOMAINS=1` that
272+
number drops down to under 17,000 requests/second. In this case there is only
273+
a single global domain. If we edit the benchmark so the http request callback
274+
creates a new domain instance performance drops further to 15,000
275+
requests/second.
276+
277+
While this probably wouldn't affect a server only serving a few hundred or even
278+
a thousand requests per second, the amount of overhead is directly proportional
279+
to the number of asynchronous requests made. So if a single connection needs to
280+
connect to several other services all of those will contribute to the overall
281+
latency of delivering the final product to the client.
282+
283+
Using `AsyncWrap` and tracking the number of times
284+
`init`/`pre`/`post`/`destroy` are called in the mentioned benchmark we find
285+
that the sum of all events called is over 170,000 times per second. This means
286+
even adding 1 microsecond overhead per call for any type of setup or tear down
287+
will result in a 17% performance loss. Granted, this is for the optimized
288+
scenario of the benchmark, but I believe this demonstrates the necessity for a
289+
mechanism such as domain to be as cheap to run as possible.
290+
291+
292+
## Looking Ahead
293+
294+
The domain module has been soft deprecated since Dec 2014, but has not yet been
295+
removed because node offers no alternative functionality at the moment. As of
296+
this writing there is ongoing work building out the `AsyncWrap` API and a
297+
proposal for Zones being prepared for the TC39. At such time there is suitable
298+
functionality to replace domains it will undergo the full deprecation cycle and
299+
eventually be removed from core.
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,136 @@
1+
'use strict';
2+
3+
const domain = require('domain');
4+
const EE = require('events');
5+
const fs = require('fs');
6+
const net = require('net');
7+
const util = require('util');
8+
const print = process._rawDebug;
9+
10+
const pipeList = [];
11+
const FILENAME = '/tmp/tmp.tmp';
12+
const PIPENAME = '/tmp/node-domain-example-';
13+
const FILESIZE = 1024;
14+
var uid = 0;
15+
16+
// Setting up temporary resources
17+
const buf = Buffer(FILESIZE);
18+
for (var i = 0; i < buf.length; i++)
19+
buf[i] = ((Math.random() * 1e3) % 78) + 48; // Basic ASCII
20+
fs.writeFileSync(FILENAME, buf);
21+
22+
function ConnectionResource(c) {
23+
EE.call(this);
24+
this._connection = c;
25+
this._alive = true;
26+
this._domain = domain.create();
27+
this._id = Math.random().toString(32).substr(2).substr(0, 8) + (++uid);
28+
29+
this._domain.add(c);
30+
this._domain.on('error', () => {
31+
this._alive = false;
32+
});
33+
}
34+
util.inherits(ConnectionResource, EE);
35+
36+
ConnectionResource.prototype.end = function end(chunk) {
37+
this._alive = false;
38+
this._connection.end(chunk);
39+
this.emit('end');
40+
};
41+
42+
ConnectionResource.prototype.isAlive = function isAlive() {
43+
return this._alive;
44+
};
45+
46+
ConnectionResource.prototype.id = function id() {
47+
return this._id;
48+
};
49+
50+
ConnectionResource.prototype.write = function write(chunk) {
51+
this.emit('data', chunk);
52+
return this._connection.write(chunk);
53+
};
54+
55+
// Example begin
56+
net.createServer((c) => {
57+
const cr = new ConnectionResource(c);
58+
59+
const d1 = domain.create();
60+
fs.open(FILENAME, 'r', d1.intercept((fd) => {
61+
streamInParts(fd, cr, 0);
62+
}));
63+
64+
pipeData(cr);
65+
66+
c.on('close', () => cr.end());
67+
}).listen(8080);
68+
69+
function streamInParts(fd, cr, pos) {
70+
const d2 = domain.create();
71+
var alive = true;
72+
d2.on('error', (er) => {
73+
print('d2 error:', er.message)
74+
cr.end();
75+
});
76+
fs.read(fd, new Buffer(10), 0, 10, pos, d2.intercept((bRead, buf) => {
77+
if (!cr.isAlive()) {
78+
return fs.close(fd);
79+
}
80+
if (cr._connection.bytesWritten < FILESIZE) {
81+
// Documentation says callback is optional, but doesn't mention that if
82+
// the write fails an exception will be thrown.
83+
const goodtogo = cr.write(buf);
84+
if (goodtogo) {
85+
setTimeout(() => streamInParts(fd, cr, pos + bRead), 1000);
86+
} else {
87+
cr._connection.once('drain', () => streamInParts(fd, cr, pos + bRead));
88+
}
89+
return;
90+
}
91+
cr.end(buf);
92+
fs.close(fd);
93+
}));
94+
}
95+
96+
function pipeData(cr) {
97+
const pname = PIPENAME + cr.id();
98+
const ps = net.createServer();
99+
const d3 = domain.create();
100+
const connectionList = [];
101+
d3.on('error', (er) => {
102+
print('d3 error:', er.message);
103+
cr.end();
104+
});
105+
d3.add(ps);
106+
ps.on('connection', (conn) => {
107+
connectionList.push(conn);
108+
conn.on('data', () => {}); // don't care about incoming data.
109+
conn.on('close', () => {
110+
connectionList.splice(connectionList.indexOf(conn), 1);
111+
});
112+
});
113+
cr.on('data', (chunk) => {
114+
for (var i = 0; i < connectionList.length; i++) {
115+
connectionList[i].write(chunk);
116+
}
117+
});
118+
cr.on('end', () => {
119+
for (var i = 0; i < connectionList.length; i++) {
120+
connectionList[i].end();
121+
}
122+
ps.close();
123+
});
124+
pipeList.push(pname);
125+
ps.listen(pname);
126+
}
127+
128+
process.on('SIGINT', () => process.exit());
129+
process.on('exit', () => {
130+
try {
131+
for (var i = 0; i < pipeList.length; i++) {
132+
fs.unlinkSync(pipeList[i]);
133+
}
134+
fs.unlinkSync(FILENAME);
135+
} catch (e) { }
136+
});

0 commit comments

Comments
 (0)
Please sign in to comment.