Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems with cooperation between async generator and stream #38487

Closed
misos1 opened this issue Apr 30, 2021 · 12 comments
Closed

Problems with cooperation between async generator and stream #38487

misos1 opened this issue Apr 30, 2021 · 12 comments
Labels
fs Issues and PRs related to the fs subsystem / file system. stream Issues and PRs related to the stream subsystem.

Comments

@misos1
Copy link

misos1 commented Apr 30, 2021

Is your feature request related to a problem? Please describe.

Consider this simple async generator which is supposed to merge output of two input generators:

let merge = async function *(a, b)
{
	yield *a;
	yield *b;
};

When using with file streams then second stream can be leaked by breaking or throwing from for await:

require("posix").setrlimit("nofile", { soft: 1000 });
require("fs").writeFileSync("file", "file");

(async function()
{
	for(let i = 0; ; i++)
	{
		let a = require("fs").createReadStream("file");
		let b = require("fs").createReadStream("file");
		if(i == 0)
		{
			require("stream").finished(a, e => console.log("a finished", e));
			require("stream").finished(b, e => console.log("b finished", e));
		}
		for await (let chunk of merge(a, b)) break;
		console.log(i, a.destroyed, b.destroyed);
	}
}());

Possible output:

a finished Error [ERR_STREAM_PREMATURE_CLOSE]: Premature close
    at new NodeError (node:internal/errors:329:5)
    at ReadStream.onclose (node:internal/streams/end-of-stream:138:38)
    at ReadStream.emit (node:events:381:22)
    at emitCloseNT (node:internal/streams/destroy:169:10)
    at processTicksAndRejections (node:internal/process/task_queues:80:21) {
  code: 'ERR_STREAM_PREMATURE_CLOSE'
}
...
973 true false
974 true false
975 true false
node:events:346
      throw er; // Unhandled 'error' event
      ^

Error: EMFILE: too many open files, open 'file'
Emitted 'error' event on ReadStream instance at:
    at emitErrorNT (node:internal/streams/destroy:188:8)
    at emitErrorCloseNT (node:internal/streams/destroy:153:3)
    at processTicksAndRejections (node:internal/process/task_queues:81:21) {
  errno: -24,
  code: 'EMFILE',
  syscall: 'open',
  path: 'file'
}

At least the first stream is finished with premature close error but not the second.

Seems it is not possible to resolve this inside async generator itself. One possible solution would be to wrap it in stream and destroy input streams in stream.finished() callback and forget about returning async generator but return this stream instead for consumption:

let wrapped_merge = function(a, b)
{
	let stm = require("stream").Readable.from(merge(a, b));
	require("stream").finished(stm, function()
	{
		// TODO: first check if they are really streams and not for example async generators
		a.destroy();
		b.destroy();
	});
	return stm;
};

But this solution really does not feel right.

Describe the solution you'd like
It is probably not viable to track all streams used in an async generator and destroy them like one which is actually yielding. Maybe there could be added some "finally" handler for async generators. Or function like streams.finished() but for async generators (finished accepts only streams now).

Describe alternatives you've considered
Maybe should be reconsidered some aspects of node streams. Maybe file streams should not open files until something is trying to read it? Or streams could close file handle automatically (when they are garbage collected like in Python). (What about new fs apis that return async generators instead of streams?)

@misos1
Copy link
Author

misos1 commented Apr 30, 2021

Async generator is obviously in some "finished" state after break or throw from for await as demonstrates this example which outputs only "a" (should break do this at all?):

	let a = async function *() { yield "a"; };
	let b = async function *() { yield "b"; };
	let m = merge(a(), b());
	for await (let chunk of m)
	{
		console.log(chunk);
		break;
	}
	for await (let chunk of m)
	{
		console.log(chunk);
	}

@Ayase-252 Ayase-252 added fs Issues and PRs related to the fs subsystem / file system. stream Issues and PRs related to the stream subsystem. labels May 1, 2021
@benjamingr
Copy link
Member

See #38491

@benjamingr
Copy link
Member

@nodejs/streams

@mcollina
Copy link
Member

mcollina commented May 9, 2021

I would like to unpack this issue:

  1. Relying on the garbage collector to clean up native resources like file descriptor is unsafe. The garbage collector does not know about them, so it cannot take them into consideration while cleaning up. We could potentially do this in a best-effort scenario like fs/promises do, however it requires a new implementation around the FileHandle we have there. However, this is not a complete fix as it is best effort, and EMFILE errors could still happen.
    This is a good project for somebody to pick up if they want to know more about Node.js internals. @benjamingr @jasnell wdyt? Feel free to open an issue if you think it's something we should do. I'm happy to mentor somebody in the reimplementation.

  2. The implementation of merge(a, b) at top is unsafe as it does not propagate the break/destroy of the second stream. Check out

    to see how we use finally to implement this.

@misos1
Copy link
Author

misos1 commented May 9, 2021

2. to see how we use finally to implement this.

If I wrapped the body of merge into try/finally it would not help because nothing throws inside an async generator in my example. I already tried it. That is why I wrote what I wrote in the section "Describe the solution you'd like".

@mcollina
Copy link
Member

Here is how you can use finally to implement this behavior:

https://gist.github.com/mcollina/ad918bb6eb115ca86cf3e0f3c48d063f

There are three ways in which you can implement this behavior with slightly different semantics.

I think we should document those somewhere. cc @jasnell @benjamingr take a look.

@misos1
Copy link
Author

misos1 commented May 10, 2021

Actually I did not try try/finally but try/catch instead. I was little surprised because finally is more or less just something like syntactic sugar and one can "simulate" it by catch like this:

try
{
	do_something();
}
catch(e)
{
	cleanup();
	throw e;
}
cleanup();
try
{
	do_something();
}
finally
{
	cleanup();
}

But similarity ends when there is a return statement in the try block. So actually it is not so surprising because an async generator can be externally caused to return from try block.

@mcollina
Copy link
Member

I don't feel I'm qualified to answer about the spec in this regard. Maybe others can step in and add references to the spec of this behavior.

@mcollina
Copy link
Member

I think we can close this.

@benjamingr
Copy link
Member

Note this can still fail with badly behaving generators (the version that doesn't consume both generators to the end). One can write something like:

async function* naughty() {
  while (true) {
    try { 
      yield 'foo';
    } finally {
      yield* naughty(); // .return runs finally blocks and the generator can just not finish
    }
  }
}

Of course, for regular "streams" just destroying the stream in the finally in case it wasn't consumed is sufficient. I don't think Node.js can "fix" this on our side. I think the best path forward is the iterator-helpers proposal that should come with a merge function that guards against bad behaviours (maybe?).

@misos1
Copy link
Author

misos1 commented May 11, 2021

Sorry but which version do you mean?

@benjamingr
Copy link
Member

There hasn't been any activity here for a while and I believe this can be closed. Thanks for the discussion 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
fs Issues and PRs related to the fs subsystem / file system. stream Issues and PRs related to the stream subsystem.
Projects
None yet
Development

No branches or pull requests

4 participants