Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Allow the request URL to be used for subsequent requests #11

Closed
wants to merge 1 commit into from

Conversation

zanieb
Copy link
Contributor

@zanieb zanieb commented Apr 8, 2024

Per discussion in astral-sh/uv#2843

It is not always correct to use the response URL for subsequent requests, so we allow the URL to be provided.

@zanieb zanieb changed the title Allow the request URL to be used for subsequent responses feat: Allow the request URL to be used for subsequent responses Apr 8, 2024
@zanieb zanieb changed the title feat: Allow the request URL to be used for subsequent responses feat: Allow the request URL to be used for subsequent requests Apr 8, 2024
@@ -131,6 +131,15 @@ pub enum CheckSupportMethod {
Head,
}

/// Which URL should be used for subsequent range requests?
pub enum RangeRequestUrlSource {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't love this name

#[case(RangeRequestUrlSource::Request)]
#[case(RangeRequestUrlSource::Response)]
#[tokio::test]
async fn async_range_reader_url_source(#[case] url_source: RangeRequestUrlSource) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to set up a test case that used redirects but it was kind of a pain? I'm not sure what kind of testing you think is valuable here.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After I made #15, I also added a commit that has a test for this in my own fork. It adds an endpoint to the unit-test http server that redirects based on the HTTP method used, and that target URL returns 405 Method Not Allowed if called with another method: torarvid#1

(Didn't want to add it to #15 to keep the PR size manageable, but I can if it's wanted)

Comment on lines +75 to +76
/// let url = response.url().clone();
/// let reader = AsyncHttpRangeReader::from_head_response(client, response, url, HeaderMap::default()).await?;
Copy link
Contributor Author

@zanieb zanieb Apr 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, in particular, is awkward. Perhaps we should have a different API instead? Like AsyncHttpRangeReader::from_head_response_custom_url?

I feel like similar to my commentary in #9 it'd be nice to have some sort of Builder type that can be constructed and mutated before the AsyncHttpRangeReader itself.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I agree that a builder would be a lot nicer now that the options are increasing. Would you be able to wip something up?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can look into it! It might not be quick though :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry if I'm overstepping, but I felt like trying to make a Builder, so I opened #15 😄

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks! I had just asked @charliermarsh to look into this because I was too busy. We'll review it :)

Request,

/// Use the initial response URL
Response,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever correct to use the response URL?

Copy link
Contributor Author

@zanieb zanieb Apr 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, is there a RFC about HEAD/GET range requests?

(this implementation is based on @baszalmstra's comment that either may be preferred)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think only when signed redirects are used this doesnt work. In other cases the extra hops might be avoidable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, it's definitely less efficient to do an extra hop on every subsequent request. I guess it's possible something would return a 302 for HEAD but not for GET but that seems really weird?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe in the comments we could explain the tradeoff?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I can definitely explain the cases.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good question, is there a RFC about HEAD/GET range requests?

@zanieb The RFC that covers Range requests seems to be section 14 of the HTTP Semantics one: https://datatracker.ietf.org/doc/html/rfc9110#name-range-requests

(I first found https://datatracker.ietf.org/doc/html/rfc7233, but it says that it's obsoleted by RFC9110)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ever correct to use the response URL?

To me it feels like an optimization that I can't find being discussed in the HTTP rfc, and if I'm right about that it sounds like something that maybe shouldn't be the default at least, but something you opt into if you know you'll never hit cases like the Amazon links in your application...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah after reading up on this some more I think I agree. @zanieb If you need a quick fix I would be happy to change the default for now without it being configurable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we shipped a change to the default here, I'd probably upgrade us immediately since we still see reports around this.

@baszalmstra
Copy link
Collaborator

@zanieb Do you think you will get back to this PR? Im fine with keeping it simple and using the request url instead of the response url.

@zanieb
Copy link
Contributor Author

zanieb commented May 3, 2024

@baszalmstra sure I can open a new pull request that does that

@charliermarsh
Copy link
Contributor

I put one up here: #16

@zanieb
Copy link
Contributor Author

zanieb commented May 6, 2024

I think #16 and #15 both make more sense than this

@zanieb zanieb closed this May 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants