Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify if Cache put()/add()/addAll() should mark Request body used #550

Closed
wanderview opened this issue Oct 31, 2014 · 34 comments
Closed

Comments

@wanderview
Copy link
Member

Please clarify how the Cache should treat the Request body for the put()/add()/addAll() methods.

My interpretation is that the Cache must store the Request body in order to reproduce the Request in the keys() method. The Fetch spec does not provide a way to retrieve the body from the Request without marking the body used. So my position is that the current spec requires put()/add()/addAll() to mark the Request body used.

Also, I believe the discussion in #372 came down on the side of avoiding magical implicit clones in favor of explicit clones.

cc / @jakearchibald @inexorabletash

@inexorabletash
Copy link
Member

In F2F discussion, @slightlyoff says these operations should indeed consume the body.

@jungkees
Copy link
Collaborator

jungkees commented Nov 4, 2014

Will add "If request's body's length is not zero, set request's body's used flag." to those operations.

@wanderview
Copy link
Member Author

Will add "If request's body's length is not zero, set request's body's used flag." to those operations.

If the Request has only gotten the headers, it may not reliably know the length of the body.

@annevk can we determine at header resolution time if there is a body or not?

If not, I would say just always mark the body used.

@annevk
Copy link
Member

annevk commented Nov 4, 2014

  1. Note that even if you set the body to used, you might still have to clone it if you want to follow redirects. There's currently no override in place for that in Fetch, although we're considering it.

  2. I'm not sure if you can reliably determine that. In theory there's is some indication through headers, but that might not always be there I suppose. Always marking the body seems logical.

  3. If the request stored in the cache cannot be reused, isn't that problematic if you want to get an update?

@wanderview
Copy link
Member Author

Fetch can auto-clone since the redirect process is not exposed to content. It does not create any "magical" clones as far as content is concerned because the multiple requests are not directly exposed.

If someone wants to use Cache.put() and then reuse the same Request for a new fetch, they need to manually clone() it. Note, you can use the Request in Cache.match() without draining the body because Cache doesn't need to look at it.

@inexorabletash
Copy link
Member

Should put() reject if passed a Request that has an already used body?

@wanderview
Copy link
Member Author

Should put() reject if passed a Request that has an already used body?

Based on the discussion so far, I believe so.

@inexorabletash
Copy link
Member

Blink issue: http://crbug.com/430962

@wanderview
Copy link
Member Author

@inexorabletash @jakearchibald @annevk What do you think about Request objects that conclusively do not have a body? For example:

var r = new Request('./stringurl.html');
cache.put(r, someresponse);

Should this set the body flag used or not?

On the one hand, it would be nice to be consistent. On the other hand, I feel like this is just making this unnecessarily hard for the developer.

@domenic
Copy link
Contributor

domenic commented Nov 7, 2014

IMO the bodyUsed flag should be set when the body is actually read from (you know, used). So if there is no body to read from it should never be set.

Some earlier comment of mine explained how I would implement bodyUsed via the streams API and is consistent with this idea.

@wanderview
Copy link
Member Author

Thanks @domenic. Sounds good to me.

Also, just to clarify. This all goes for the Response object as well. I believe Cache.put() should check for Response.bodyUsed and set it true if it reads a Response body.

@wanderview
Copy link
Member Author

I think this might need a fetch spec tweak. The default "no stream" value in the Body mixin is:

"Let stream be an empty byte stream. "

And there is nothing in the spec about not marking it consumed, as far as I can tell.

Actually, the fetch spec even has:

  1. Otherwise, set used flag and then run these substeps in parallel:
    1. ...
    2. If stream is null, set stream to an empty byte sequence.

@annevk, what do you think? Can we have a way to detect an empty/null stream without set the body used flag?

@annevk
Copy link
Member

annevk commented Nov 8, 2014

So what kind of semantics do we want? Should the convenience methods reject if stream is null?

@wanderview
Copy link
Member Author

So what kind of semantics do we want? Should the convenience methods reject if stream is null?

It seems there are a few possibilities:

  1. Leave current semantics and add a bodyEmpty attribute that does not set bodyUsed.
  2. Change convenience methods to reject as you propose.
  3. Change convenience methods to still resolve an empty byte stream result, but don't set body used.

I think I like (1) or (3). It seems (3) would have the least impact on developers, but seem a bit magical. What do other people think?

@annevk
Copy link
Member

annevk commented Nov 8, 2014

I think we should retain the "feature" that only one of these methods works and that it only works once. 3) doesn't do that.

It kind of makes sense to me that reading from null results in an exception, but it's also rather cumbersome API-wise. Another internal flag could solve both, but is also not really elegant. I have no good ideas.

@domenic
Copy link
Contributor

domenic commented Nov 8, 2014

Do we want developers to be aware of the semantic difference between "no body" and "empty body"? I think that such a distinction doesn't exist on the HTTP level, so maybe it should not exist on the conceptual level?

@annevk
Copy link
Member

annevk commented Nov 9, 2014

I think it does exist on the HTTP level. It's the difference between not having Content-Length and Content-Length: 0. Also, if we pretended there always was a body, how is the current specification wrong? (Which pretends exactly that...)

@wanderview
Copy link
Member Author

Or

  1. Drop the mixin and go back to a Request.body attribute that can be checked for null.

Not sure if I like that.

@annevk
Copy link
Member

annevk commented Nov 9, 2014

No. We'll have a body member, but it'll return a clean stream object. Not exposing json() and friends.

@wanderview
Copy link
Member Author

Well, I would be fine with just fixing this in the SW spec for Cache. Something like "determine if Request/Response body exists and mark used if true".

Of course, this would leave content with the same problem of always marking the body used in order to determine if the body is empty, though.

annevk added a commit to whatwg/fetch that referenced this issue Nov 12, 2014
See w3c/ServiceWorker#550

The reason we do not do this for `text()` et al is because they are
convenience methods and we do not want to end up with a situation where
you can invoke them multiple times, but only if body is null.
@jakearchibald
Copy link
Contributor

Taking the basic cache-as-you-go-along code:

self.onfetch = function(event) {
  event.respondWith(
    caches.match(event.request).then(function(response) {
      return response || fetch(event.request.clone()).then(function(response) {
        caches.open('whatever').then(function(cache) {
          cache.put(event.request, response);
        });
        return response.clone();
      });
    })
  );
};

If you're telling me (as a developer) that fetch sometimes consumes the request/response, I'm always going to .clone() anyway for safety. The worst that happens is I clone a no-body request/response when I didn't need to, but because there's no body it has almost no memory impact.

@wanderview
Copy link
Member Author

To be honest, when we were discussing explicit clone semantics before I was mainly thinking of the Responses (value objects in the cache). Of course now I see it effects the Requests (keys in the cache).

A map-like API with one-time-use keys seems pretty clunky to me from a developer standpoint. :-(

@annevk
Copy link
Member

annevk commented Nov 13, 2014

That code fails if request has a body, unless cache.match doesn't touch it... I agree with the clunkyness comment.

@jakearchibald
Copy link
Contributor

.match doesn't touch body, so it should be ok.

@wanderview
Copy link
Member Author

cache.match() does not touch the body. cache.put() only touches the body because it needs to reproduce it in cache.keys().

Maybe we can avoid having to reproduce request bodies, so we don't have to look at the request body at all?

@jakearchibald
Copy link
Contributor

A map-like API with one-time-use keys seems pretty clunky to me from a developer standpoint.

When you do cache.keys().then(requests => console.log(requests)) none of those requests will have been exhausted. If you exhaust them and do cache.keys().then( again, you'll have fresh unread request objects.

Maybe we can avoid having to reproduce request bodies, so we don't have to look at the request body at all?

I came to a similar conclusion, "ignore request body when storing in the cache". That would be great now, but I think it'd come back to haunt us if we started allowing POST etc requests into the cache.

@jakearchibald
Copy link
Contributor

Eg, if we allowed POST in the cache…

self.onfetch = function(event) {
  if (sendingAnEmail) {
    event.respondWith(
      fetch(event.request.clone()).catch(e => undefined).then(function(response) {
        if (response && response.status == 200) {
          return response;
        }
        // ok, email send looks like it failed
        return caches.open('outbox')
        .then(c => c.put(event.request, new Response('')))
        .then(_ => new Repsonse("Safely in outbox"))
      })
    );
  }
};

It's a bit clunky using the cache for this, probably should be idb, but don't think we should screw the possibility of the above.

@wanderview
Copy link
Member Author

When you do cache.keys().then(requests => console.log(requests)) none of those requests will have been exhausted. If you exhaust them and do cache.keys().then( again, you'll have fresh unread request objects.

Yea, this wasn't really my concern.

I think what I'm trying to say is that this:

If you're telling me (as a developer) that fetch sometimes consumes the request/response, I'm always going to .clone() anyway for safety.

Makes me think developers would want implicit clone() for fetch()/cache.put(). If the API essentially requires them to do something every time for safety, the API should just do it.

@wanderview
Copy link
Member Author

And yes, I'm sorry for arguing for explicit clone here before. I see now, though, that it doesn't really buy us much in terms of memory improvement if developers are going to call clone() all the time anyway. :-(

@jakearchibald
Copy link
Contributor

Summary of an IRC chat with @wanderview:

The problem case is still piping a large response (HD video) to two places, most likely the browser and the cache. If a clone is sent to the browser, and the main response to the cache, we only get buffering if the browser/cache consumes slower than the other, and that should be small. In an auto-clone world, we'll still have the ability (while the credits are rolling) to read the response from the start, which means keeping the whole movie around.

Although (as @annevk points out) the massive-cache may need a new feature that allows the SW to go away during the download and wake up to fire an event once succeeded or failed.

An uncomfortable truth is that fetch(request) already clones internally (although it consumes the request passed in) in case the response after sending the body is a redirect or auth challenge, at which point it needs to replay the request including the body. In future, we may want to allow streaming a web cam video as a request body, in which case we'll need an option to fetch to say "I can't replay this, fail if I need to". This means streaming is sorta opt-out when it comes to fetch, but opt-in elsewhere.

However, it's pretty easy for us to go from a manual-clone model to an auto-clone model in terms of API. .clone would remain so you can manually read the stream twice, or parse twice via .blob, .json etc. Going from auto-clone to manual-clone isn't possible without breaking things.

@wanderview
Copy link
Member Author

Yes, @jakearchibald talked me off the auto-clone ledge. I think we should stick with explicit clone(). Sorry for my confusion/waffling here.

Going back to the original issue, though. I still think we should avoid checking/marking bodyUsed if there is no body. We need something in the SW spec that says that.

@KenjiBaheux
Copy link
Collaborator

http://crbug.com/430962 Merged to blink's MVP release (M40)

@jungkees
Copy link
Collaborator

Addressed those points in put() and addAll(): c800aa4.

@jungkees
Copy link
Collaborator

Closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

7 participants