Skip to content
This repository was archived by the owner on Sep 2, 2023. It is now read-only.

Cache system mismatch with web #62

Closed
bmeck opened this issue Apr 11, 2018 · 29 comments
Closed

Cache system mismatch with web #62

bmeck opened this issue Apr 11, 2018 · 29 comments
Labels
web-platform wontfix Things that are unable to change for some reason

Comments

@bmeck
Copy link
Member

bmeck commented Apr 11, 2018

Did a bunch of digging after #60 showed up. Didn't realize some cache mismatching mental models with how web works and how existing tooling and algorithms work until I reviewed.

As currently specced for web given:

/a1.mjs : 302 redirect -> /b/c/d.mjs
/a2.mjs : 302 redirect -> /b/c/d.mjs
/b/c/d.mjs -> imports './e.mjs'
/b/c/e.mjs
<script type="module">
import '/a1.mjs';
import '/a2.mjs';
</script>

For Node we can treat the redirects as symlinks.

Loads 3 module records, which means it does not match current ESM loader, nor --preserve-symlinks. Almost all tooling is written to CJS style realpathing but has a --preserve-symlinks compatible flag. These 3 (web, realpath [node default], symlinks [--preserve-symlinks]) modes have the following loads according to WHATWG upon review and consultation:

web realpath symlinks
module map key of /a1.mjs /a1.mjs /b/c/d.mjs /a1.mjs
module map key of /a2.mjs /a2.mjs /b/c/d.mjs /a2.mjs
module map key of ./e.mjs from /a1.mjs /b/c/e.mjs /b/c/e.mjs /a1/c/e.mjs
module map key of ./e.mjs from /a2.mjs /b/c/e.mjs /b/c/e.mjs /a2/c/e.mjs

Notably, the imports in current web spec are keyed before realpathing/redirecting for the module map (like --preserve-symlinks), but the resolves import specifiers as done post realpathing/redirecting (like default Node behavior).

This affects mismatch is in all build tools that I know of ??? Does anyone know of any build tooling that does this type of resolution, even if it requires configuration to do so?

We should look into this, it looks like web spec won't change it looks like after some talking. They seem to see it as a 2.5-3 year old issue and time to discuss likely won't change decisions :-/

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

modes have the following loads according to WHATWG upon review and consultation:

With who? Links? References?

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

private chats while I tried to reread spec and a blurb on https://freenode.logbot.info/whatwg/20180411#c1498394 that had some other private chats around it.

@benjamingr
Copy link
Member

I realize this is a difference, but is there an actual impact on users caused by this?

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

I'm with @benjamingr here. Since redirects aren't really symlinks and the whatwg doesn't cover file system symlink behavior (does it?) isn't this an unspecified space?

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@benjamingr this has a few things going on here that kind of led to why #60 exists and a difference in how import.meta.url is used. Whenever you import something that uses a redirect you effectively opt out of the destination being a singleton in terms of being loaded once but the imports within that destination all resolve like it was a singleton in terms of loading relative paths. I think anyone using redirects and expecting them to behave like symlinks would be surprised. Depends on if your server ever does redirects ^_^;

@jdalton this is relevant if there is to ever be https: loading in Node. https://fetch.spec.whatwg.org/#scheme-fetch leaves it somewhat unspecified but it errors everywhere due to the origin that I can tell.

This is mostly relevant for ecosystem mismatch and lack of existing behavior like this in the wild. Figured an issue would be good to gather data and see if any existing ESM plans/tools/etc. outside of the web are doing this.

@benjamingr
Copy link
Member

@bmeck thanks, that example completely explains the issue - maybe present it at the meeting today?

I don't think this is a huge deal given symlinks aren't redirects (and a server serving a symlinked file won't respond with a redirect anyway).

What if we just make the module loader cache alias symlink destinations to the URL which would reinstate the "singleton" guarantee? That should be allowed by the spec right?

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@benjamingr HTTP redirects are part of https://github.com/WICG/webpackage 's archive format and also used for some things where you don't have CORS so you make a pass through endpoint. Being unable to use WebPackage to bundle up the file system would be a bit of a snag in plans.

I've got to bad of a cough due to being sick to really present anything today and on vacation starting Friday til end of month. IDK when / who can take a look at all of this.

What if we just make the module loader cache alias symlink destinations to the URL which would reinstate the "singleton" guarantee? That should be allowed by the spec right?

Not sure I understand? To which URL?

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

I don't think this is a huge deal given symlinks aren't redirects (and a server serving a symlinked file won't respond with a redirect anyway).

That 👆. It's good to be aware of browser's behaviors here if ever Node adopts http(s) requests for imports (a tough hill) but I don't think it affects the current file system behaviors of Node as symlinks and redirects are not the same thing.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@jdalton

  • it affects if file: is ever specified in WHATWG and it doesn't match. Since it is done on request URL and not response URL I don't see how it could match.
  • it affects using a shared archive format for Node and the web like WebPackage
    • since it can't represent the behavior of symlinks if the mental models don't match
    • since that archive format needs to model HTTP batches to be web compatible it needs to be using HTTP Headers
  • it affects tooling since none I can find act like this with symlinks, realpathing, or any other configuration when resolving and setting up caches based on disk
  • it affect programmers because now the mental models of http[s]: and file: are treated as having different mental models with no clear reason why they were decided to differ nor documentation on how to approach things.

We should look at this more seriously than saying the mental models aren't related. The underlying technology is less important to me than the mental mismatch of if the cache key is based upon the request or the destination, and which the resolution is based on.

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

it affects if file: is ever specified in WHATWG and it doesn't match.

So if one day WHATWG specs symlinks AND it doesn't match. That's a lot of WHAT-IFs. It still doesn't change the fact that symlinks are not the same thing as redirects. If anything this just means those representing Node need to pound the pavement raising awareness of how Node handles such things so when and if folks in the WHATWG start to look around for references they have clear info to draw from.

We should look at this more seriously than saying the mental models aren't related.

As I said it's good to know but for sure but symlinks and redirects are different things. There is no need to mush them into the same box.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@jdalton even if we remove symlinks they are cacheing the modules using a different specifier->key model than we are. We don't need to focus on the symlinks but more the model differences and the problems that causes for compatibility.

If anything this just means those representing Node need to pound the pavement raising awareness of how Node handles such things so when and if folks in the WHATWG start to look around for references they have clear info to draw from.

This has not historically been successful when I've talked to people about things, often have seen "well, I don't know why Node would do that" or "that is Node's problem" sort of reply as a dismissal which ends the discussion. I don't have your optimism given my experiences.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

To note this is somewhat brought up in the past as well by @guybedford in whatwg/html#613 but was in a different perspective and still discussing the now unused https://github.com/whatwg/loader . A similar way to prevent the mismatch from existing as described in that issue could be done if Node were to error on traversing symlinks during resolution which is probably a non-starter.

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

Working through what the kernel of the concern is, is a good thing. That said I expect caching approaches to be different for different things (symlinks, redirects, etc.). If there is overlap then those areas should be examined.

This has not historically been successful when I've talked to people about things,

We are pretty different.

A similar way to prevent the mismatch from existing as described in that issue could be done if Node were to error on traversing symlinks during resolution which is probably a non-starter.

If we are to move away from symlinks as the example can you rephrase/reframe the concern without them to better highlight the mismatch.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@jdalton the mental model only shows up with a concrete mismatch when a redirect and symlink are compared. I can't remove them because I need a layer of indirection to occur. You need the indirection in the examples (both on web using HTTP redirect, and Node using symlinks) to differentiate if the cache is using the request URL or the response URL. I might be able to think of a mismatch using package.json#main but it won't map to an HTTP redirect that people can run on their local machines since static servers don't do that redirect.

@giltayar
Copy link

giltayar commented Apr 11, 2018 via email

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@giltayar It is implemented in Chrome and Safari (they differ though slightly) but does not match expectation of people who expected it to be related to the cache key used (I thought it was supposed to be tied to cache key when I discussed it last). Without it being tied to cache key it serves a very different purpose than what is is starting to be used for and is not unique.

web realpath (assuming cache key equiv like babel/etc.) symlinks (assuming cache key equiv like babel/etc.)
import.meta.url of /a1.mjs /b/c/d.mjs /b/c/d.mjs /a1.mjs
import.meta.url of /a2.mjs /b/c/d.mjs /b/c/d.mjs /a2.mjs
import.meta.url of ./e.mjs from /a1.mjs /b/c/e.mjs /b/c/e.mjs /a1/c/e.mjs
import.meta.url of ./e.mjs from /a2.mjs /b/c/e.mjs /b/c/e.mjs /a2/c/e.mjs

Note that in the Web /a1.mjs and /a2.mjs have different module records but the same import.meta.url. This value is ~ also what import resolves against in the WHATWG spec for a given module record. Hence why there is one copy of ./e.mjs in the web specified workflow.

Note that there is an open issue about explicitly censoring URL fragments from being available on import.meta.url as specified by WHATWG whatwg/html#3622

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

If given that

even if we remove symlinks they are cacheing the modules using a different specifier->key model than we are.

and then

the mental model only shows up with a concrete mismatch when a redirect and symlink are compared.

we are back to symlinks and redirects are entirely different things so there is no mismatch since there is no match to begin with (apples and oranges).

What you're highlighting is, yes there is a difference because, in fact, they are different things.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@jdalton fine here is one using a package.json since you are so insistent on claiming this is about symlinks. Maybe we can try to discuss the differences in how they work and workflows using them rather than repeatedly stating that different technologies are different from each other. This has slightly different results due to package.json not being like a symlink for --preserve-symlinks but still shows some oddities with similar concerns:


filesystem
  /a1/package.json {"main": "/b/c/d.mjs"}
  /a2/package.json {"main": "/b/c/d.mjs"}

http
  /a1 302 redirects to /b/c/d.mjs
  /a2 302 redirects to /b/c/d.mjs

/b/c/d.mjs imports ./e.mjs
/b/c/e.mjs
web realpath symlinks
module map key of /a1 /a1 /b/c/d.mjs /b/c/d.mjs
module map key of /a2 /a2 /b/c/d.mjs /b/c/d.mjs
module map key of ./e.mjs from /a1 /b/c/e.mjs /b/c/e.mjs /b/c/e.mjs
module map key of ./e.mjs from /a2 /b/c/e.mjs /b/c/e.mjs /b/c/e.mjs

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

fine here is one using a package.json since you are so insistent on claiming this is about symlinks.

I was asking if you could produce an example without symlinks since you had stated the problem could be shown without them.

This has slightly different results due to package.json not being like a symlink

What you're showing is that redirects behave differently than symlinks again. Yes, they totally do and that's fine since they are totally different.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@jdalton the package.json example above does not contain symlinks.

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

the package.json example above does not contain symlinks.

The table in the summary below it does though, so it's unclear what you're trying to illustrate.
Is it that web+redirects behaves differently? That falls to yes redirects are different.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@jdalton the symlinks column represents resolving and preserving symlinks, unlike the realpath column. The realpath column is like running node, the symlinks column is like running node --preserve-symlinks

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

I'm still not seeing the issue here. You've highlighted that the web does a Thing A differently than Node does a Thing B. If the issue was the web does Thing A differently than Node does Thing A I could see the problem for sure.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@jdalton the concern is that Thing A and Thing B differ in usage patterns that they are not suitable to converge over time.

@jdalton
Copy link
Member

jdalton commented Apr 11, 2018

the concern is that Thing A and Thing B differ in usage patterns that they are not suitable to converge over time.

Correct, they are totally different and not likely to converge. I think identifying it is good, and classifying it as such is fine.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

Noted, I think we need to have a more in depth discussion though.

@giltayar
Copy link

Looking at this, makes me think a bit more about Node's choice of realpath vs. symlink. I believe the reason that Node has the two modes is that none of them are satisfactory.

The fact that there are symlinks is usually npm link or other dev variations thereof (at least from what I know...). And in this case, if we use --preserve-symlinks then we don't get the imported module as a singleton, whereas if we don't use --preserve-symlinks the module is a singleton, but does not participate in the module resolution "tree" of the importing module.

The web is adopting something that is different than both of them, and it got me thinking if maybe ESM should adopt a model that combines the advantages of both.

Let's call this mode "mixed". In this mode, the module key map of the imported module will be the realpath, while the module resolution path of this module will be the symlink path.

This enables us to have the advantages of both: we both have the module as a singleton, and yet it participates in the "natural" module resolution of the importing module.

Doing this change in CJS is obviously not going to happen, but maybe ESM is a time for a change.

Hope I was clear enough. This is a hairy issue.

@bmeck
Copy link
Member Author

bmeck commented Apr 11, 2018

@giltayar I have some concerns with the web approach. In particular you get into some situations where things seem unintuitive to myself at least:

/a/b.mjs redirects to /c/d.mjs
/c/d.mjs imports './d.mjs'

Creates 2 module records even though /c/d.mjs is trying to import itself.

cache key /a/b.mjs imports cache key /c/d.mjs
cache key /c/d.mjs imports itself

drawing of cycle oddity

@giltayar
Copy link

@bmeck, my suggestion uses the realpath as a cache key, but change the module resolution algorithm to use the symlink. It's an idea I had, which is admittedly only tangentially related to this issue.

I believe the idea nicely solves the problems people have with npm link, and which --preserve-symlinks is a half-hearted attempt at solving. As somebody who uses monorepos and npm link extensively, this problem is close to my heart.

But maybe I should just try out the idea, and just create a PR for it if it turns out to be practical and useful...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
web-platform wontfix Things that are unable to change for some reason
Projects
None yet
Development

No branches or pull requests

5 participants