Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the complete list of requirements for the VFS #28

Merged
merged 18 commits into from
Sep 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ Table of Contents

- [Existing SEA Solutions](./docs/existing-solutions.md)
- [Production Node.js CLIs](./docs/production-nodejs-clis.md)
- Requirements
- [Virtual File System](./docs/virtual-file-system-requirements.md)

Blog
----
Expand Down
154 changes: 154 additions & 0 deletions docs/virtual-file-system-requirements.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,154 @@
Virtual File System Requirements
================================

This document aims to list all the requirements of the Virtual File System.

# Supported

## Random access reads

The VFS must support random access reads just like any other real file system,
so that the read operations can be at least as fast as reading files from the
real file system.

## Symbolic links

This is critical for applications that want to use packages like [dugite][] that
attempt to download [Git executables][] that contain symlinks. Since
Electron's [ASAR][] does not support symlinks, including [dugite][] as a
dependency in an Electron app would expand every symlink into individual files,
thus significantly increase the package size which is not nice.

## Preserve the executable bit of the file permissions

It is important to preserve the executable bit of the file permissions, so that
it is possible for the single-executable to be able to execute only executable
files. Other than that, all the bundled files would be readable and none will be
writable.

## Preserve file-hierarchy information

A filesystem is incomplete without this because there's no way for the
single-executable to be able to access nested file paths.

## No interference with valid paths in the file system

If the bundled files in the VFS correspond to certain paths that already exist
in the real file system, that will break certain use-cases, so it should use
such paths that cannot be used by existing files.

Pkg uses [`/snapshot`](https://github.com/vercel/pkg#snapshot-filesystem) as the
prefix for all the embedded files. This is confusing if `/snapshot` is an
existing directory on the file system. Docker workflows routinely copy files to,
and run things at, the root of the filesystem, so following that approach too
would run into the same problem.

Boxednode allows users to enter a [namespace](https://github.com/mongodb-js/boxednode/blob/6326e3277469e8cfe593616a0ed152600a5f9045/README.md?plain=1#L69-L72)
and uses it like so:
```js
// Specify the entrypoint target name. If this is 'foo', then the resulting
// binary will be able to load the source file as 'require("foo/foo")'.
// This defaults to the basename of sourceFile, e.g. 'bar' for '/path/bar.js'.
namespace?: string;
```

A possible solution is to use the single executable path as the base path for
the files in the VFS, i.e., if the executable has `/a/b/sea` as the path and the
VFS contains a file named `file.txt`, it would be accessible by the application
using `/a/b/sea/file.txt`. This approach is similar to how Electron's [ASAR][]
works, i.e., if the application asar is placed in `/a/b/app.asar`, the
embedded `file.txt` file would use `/a/b/app.asar/file.txt` as the path.

## Globbing

`fs.statSync(process.execPath).isDirectory()` will return `true` and
`fs.statSync(process.execPath).isFile()` will return `false`. That way, if code
within the single-executable does naive globbing using an off-the-shelf glob
library, paths inside the VFS would also get picked up.

## Accept file paths in the VFS as arguments

If a single-executable formatter is run with an argument that is a path to a
file inside the VFS, it should be able to use the `fs` APIs to read, format and
print the formatted contents to `stdout`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this section should be rewritten. The goal is not to allow this. The goal is to document this as a known gotcha, so authors of SEAs understand that this might happen unexpectedly.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess the decision we need to make here is: can files within the VFS be referenced from outside the VFS, or only internally?

If we have an approach that disambiguates files inside and outside of the VFS (like the base path approach), then I don't see why we should not allow this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For those fundamental questions, let's try to document them in the README. I think it will be useful to have a concrete trail of fundamental questions & answers somewhere.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll start GitHub Discussions on a separate category to track these down, and start a PR documenting the results.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess most people in #18 were in favor of keeping the paths inside the VFS transparent? I have also changed the globbing part to better reflect that.


## Cross-platform tooling

The tooling required for archiving / extracting files into / from the VFS must
be available on all the [platforms supported by Node.js][].

## File path contents

Should not limit the size or the character contents of the file paths to stay as
close as possible to what a real file system provides.

## Case Sensitive

From Yarn's experience with zip, forcing case sensitivity within the archives
didn't break anything, improved consistency. By contrast, making the code case
insensitive would have increased the complexity, worsened the runtime
performance, increased the attack surface, for a use case that virtually no-one
cares about. Hence, the paths in the VFS will be case sensitive.

## Dynamic imports and requires

`require(require.resolve('./file.js'))` should work for files that are on the
real file system and the VFS.

## VFS path manipulation as strings and URL objects

If someone proposes that the VFS exist at a `vfs-file://` prefix, then this
might become an issue. `fs` APIs accept `URL` objects, but this means code in
(transitive) dependencies which assumes all native paths are strings may fail
when passed `URL` objects. Perhaps a (transitive) dependency uses
`require.resolve()`.

Using something like `vfs-file://` might be a potential solution for placing the
VFS contents somewhere that has no interference with valid paths in the file
system.

## Interaction with Native Addons

TODO: Still under discussion in https://github.com/nodejs/single-executable/discussions/29.

# Not supported

## No need for supporting write operations

Since the VFS is going to be embedded into the single-executable and also
protected by codesigning, making changes to the contents of the VFS should
invalidate the signature and crash the application if run again. Hence, no write
operation needs to be supported.

# Optionally support

## Increase locality of related files

For performance reasons.

## Format implementation in multiple languages

We want this format to already have implementation in *multiple* languages (not
just JS, since not all tools used in the JS ecosystem are written in JS), all
ideally production-grade and well-maintained.

## Consensus with third-party tools on building native integrations

We want this format to be consensual enough that third-party tools (VSCode,
emacs, ...) won't object to build native integrations with it (for instance,
Esbuild recently added zip support to integrate w/ Yarn's zip installs; it would
have been a much harder sell if Yarn had used a custom-made format).

## Optional data compression

As an application grows, bundling all the source code, dependencies and static
assets into a single file without compression would quickly reach the maximum
segment / file (depending on the embedding approach) size limit imposed by the
single executable file format / OS. A solution to this problem would be to
minify the JS source files but that might not be enough for other kinds of
files, so supporting data compression seems to be a better solution.

[ASAR]: https://github.com/electron/asar
[Git executables]: https://github.com/desktop/dugite-native/releases/
[dugite]: https://www.npmjs.com/package/dugite
[platforms supported by Node.js]: https://github.com/nodejs/node/blob/main/BUILDING.md#supported-platforms