Skip to content
This repository was archived by the owner on Oct 30, 2019. It is now read-only.

Crate Proposal: Adaptive Compression for Hyper #52

Closed
yoshuawuyts opened this issue Aug 20, 2018 · 16 comments
Closed

Crate Proposal: Adaptive Compression for Hyper #52

yoshuawuyts opened this issue Aug 20, 2018 · 16 comments
Labels
Help Wanted Extra attention is needed WG web Issues relevant to the web subgroup

Comments

@yoshuawuyts
Copy link
Collaborator

yoshuawuyts commented Aug 20, 2018

Crate Proposal: Adaptive Compression for Hyper

Summary

I propose we write a crate that can apply different compression schemes based on a client's Accept-Encoding header.

Motivation

Servers have to compress their content in order to provide good throughput, particularly when serving assets. Clients can express which compression methods they accept using the Accept-Encoding. This is commonly known for use with gzip, but deflate and brotli are often included too and provide better compression.

The goal is to have a single crate that can detect which compression schemes are accepted, and can dynamically choose which compression scheme to apply. This should provide improved reliability especially in situations with non-ideal connectivity (e.g. on subways, rural Australia, etc.)

Expected Behavior

The crate should initialize using configuration, and provide an encoding method. The encoding method should take a Request, Response pair, and accept a byte slice. Ideally it would be thread-friendly, so it can spawn one instance per thread and be reused.

use hyper_compress::Compress;

let compression_ratio = 6; // Ideal for API calls.
let compressor = Compress::new(compression-ratio);

let data = b"some data";
let data = compressor.compress(mut req, mut res, &data)?; // Reads headers from `req`, sets headers on `res`.

It should both support client-side quality value preferences, and a configuration to set a default. This is important because every encoding algorithm has a tradeoff between speed & performance depending on the amount of bytes sent.

Possible crates to use would include:

API methods

Ideally there would be multiple interfaces exposed: one for streams (e.g. accept io::Reader trait and/or tokio::AsyncReader), and one that can just be passed bytes. It's probably best to start with the regular method first (as outlined above), but leave space open to implement the other two methods at a later point.

Drawbacks

The biggest drawback is that this would be tied to hyper, which makes it incompatible with actix-web. But given that this crate should mostly be glue around existing encoding crates + hyper's headers, I think it's okay to tie it to one framework.

Rationale and alternatives

Instead of taking a Request, Response pair the crate could operate on strings instead. This does remove much of the benefits Rust's type system has to offer, so apart from being able to interface with more projects it doesn't have much going for it.

Setting up encoding is often delegated to CDNs, or proxy servers (e.g. apache, nginx), but with HTTP/2 becoming more prominent it's crucial to be able to run it at the application layer too. This crate serves to make compression something you can drop into your application server and it just works.

Prior Art

There exists prior art in Node's http-gzip-maybe which was written for the Bankai web compiler. http-gzip-maybe does not support Brotli.

Middleware

At the time of writing there exist several different middleware solutions, but there is no shared standard yet. Therefore not tying into any specific middleware solution provides the most flexibility. After all, it should be able to work with any framework that uses Hyper as its base.

A way to integrate it with middleware providers would be to create wrapper packages at first. And if a shared middleware standard emerges, it should be straight forward to add a new struct to the project. But it's probably best to start as generic as possible.

Unresolved Questions

Perhaps a future version of this crate could auto-set the compression parameter based on the amount of bytes it's about to send. This would remove even more configuration needed, and further help improve performance.

hyper and actix-web use the http crate under the hood. Both frameworks seem to expose the http structs as new types only. Ideally there would be a way to operate on both hyper, actix-web, and http's structs directly - but I don't know how this can be done.

edit: apparently the http structs exported by hyper are not newtypes.

Conclusion

We propose a crate to handle user-agent dependent variable compression for Hyper. The implementation is left up to volunteers. Comment below if you'd like to work on this; I'd be happy to help mentor the development of this crate. Thanks!

References

Edits

  • Included a note about the http crate.
  • Included note about streaming compression.
  • Included note about middleware.
  • Changed title.
  • Adjusted the statement about newtypes.
@yoshuawuyts yoshuawuyts added the WG web Issues relevant to the web subgroup label Aug 20, 2018
@Nemo157
Copy link
Member

Nemo157 commented Aug 20, 2018

What about for streaming responses? (I haven't used hyper so I don't know if it supports them, but I have written a streaming response for an iron webapp where I would have loved to be able to just plug in some compression middleware).

Also, it seems like transparently supporting Content-Encoding with incoming bodies would need a lot of the same code, might be reasonable to have a single crate cover both directions?

@yoshuawuyts
Copy link
Collaborator Author

@Nemo157 yeah great point; also being able to stream sounds very reasonable!

@rolftimmermans
Copy link

rolftimmermans commented Aug 20, 2018

Ideally this would be implemented as some sort of middleware, where the application can just use the request/response types and encoding of both request and response can be transparently handled (including streaming to/from req/res body).

Is it not really a question of settling on a community accepted way of implementing and using web middleware? This is seems a classic use case for middleware.

@yoshuawuyts
Copy link
Collaborator Author

yoshuawuyts commented Aug 20, 2018

Ideally this would be implemented as some sort of middleware, where the application can just use the request/response types and encoding of both request and response can be transparently handled (including streaming to/from req/res body).

@rolftimmermans Haha, I knew I forgot to address something! Yeah, you're entirely right that this would be perfect to expose as middleware.

You're also right that at the time of writing we don't have any common middleware approach (see #18). But people are deploying real web servers serving real people today. But I don't think we should wait for that before we start building.

So I think the best approach here would be to build this as a standalone module first, and once we have more/common middleware solutions either expose them as separate crates or as new structs within the same crate. E.g. do things step by step.

Does that make sense?

@yoshuawuyts yoshuawuyts changed the title Crate Proposal: Hyper Variable Compression Crate Proposal: Adaptive Compression for Hyper Aug 20, 2018
@yoshuawuyts yoshuawuyts added the Help Wanted Extra attention is needed label Aug 20, 2018
@rolftimmermans
Copy link

Does that make sense?

Absolutely, I agree 100%.

FWIW it could be an opportunity to consider what an ideal API would look like and then slowly try to move into that direction.

I am working on a program that can benefit from automatic encoding/decoding of HTTP requests/responses. As an API consumer I would like as few modifications to my program as possible. A fully transparent implementation should be possible (I think?), which already gives you a middleware API – the only remaining step is to standardise it.

@pickfire
Copy link

Nice idea, what about adaptive Encoding? So far I have only seen libraries working mainly with JSON where there is even json::Value in most of them. Never seen something json-serializable such as CBOR or other stuff.

@yoshuawuyts
Copy link
Collaborator Author

@pickfire the lib I'm proposing here would work at the HTTP layer, making it compatible with JSON, CBOR, Protobuf and any other data encoding schemes.

@sfackler
Copy link

Beyond HTTP, I don't think there's an async-compatible compression and decompression library yet, which would also be great to have!

@seanmonstar
Copy link

you're entirely right that this would be perfect to expose as middleware. [...] we don't have any common middleware approach [...]

I'd think it'd make sense as a tower middleware. It could theoretically be useful for any protocol, though as a first attempt, it might make sense being a Service over http::Request<B1> and http::Reponse<B2>... I know tower-web recently experimented with making hyper's Payload trait generic as a BufStream...

@carllerche
Copy link

To elaborate, tower-web has been experimenting with middleware. This will be extracted into various tower repositories.

The Middleware trait will be moved into tower proper. The HTTP specific middleware implementations will move into tower-h2.

The biggest unknown has been how to deal w/ the body, but as @seanmonstar mentioned, I have been experimenting w/ BufStream, which has been going really well. This trait will be moved into Tokio (probably a tokio-buf).

@ashfordneil
Copy link

I think this crate would be more appropriate if it targeted the http crate rather than hyper. http is the lowest common denominator between various http frameworks and libraries, and so by supporting it we allow not just hyper but also other frameworks to adapt to take advantage of the new crate.

The exposure of http types as newtypes in hyper and actix-web is unfortunate (and for some reason a common issue in areas of the rust ecosystem) but a generic compression crate doesn't seem like the sort of thing framework consumers are going to care about. Rather than writing something that an application developer can bolt on top of hyper or actix web (or other frameworks), I think we should be writing something that hyper or actix web can include themselves. Compression of http data should be something that happens by default, rather than something that every web application needs to include separately.

@carllerche
Copy link

@ashfordneil @yoshuawuyts hyper does not "new type" types from the http crate. They are simply re-exported: https://github.com/hyperium/hyper/blob/master/src/lib.rs#L39-L48

@ashfordneil
Copy link

Even better then I guess. Just to confirm, this means a function that takes http::Request could take hyper::Request instead without any changes?

@yoshuawuyts
Copy link
Collaborator Author

yoshuawuyts commented Aug 21, 2018

Even better then I guess. Just to confirm, this means a function that takes http::Request could take hyper::Request instead without any changes?

Not completely sure, but... I think so? I'd be thrilled if that were the case! Looks like actix-web also doesn't use newtypes (oops), so that might mean that we can have a single crate that can support warp, tower-web and actix-web with minimal work!

I'd think it'd make sense as a tower middleware. It could theoretically be useful for any protocol, though as a first attempt, it might make sense being a Service over http::Request<B1> and http::Reponse<B2 (...)

@seanmonstar I agree it'd be very useful as a Tower middleware! -- but probably not as the only choice. A constraint we have to keep in mind here is that this must also work with the pedagogy-focused tide framework, which at the time of writing has not made any choice of middleware libraries.

That's why I'm proposing we start by making this crate as generic as possible, and then move to specialize it for more narrowly scoped abstractions later.

@rolftimmermans
Copy link

rolftimmermans commented Aug 21, 2018

So, it seems right now this can be implemented roughly as:

extern crate http;

fn create_decoder<B>(/* options */) -> impl Fn(http::Request<B>) -> http::Request<B> {
    |req: http::Request<B>| -> http::Request<B> {
        // ...
    }
}

fn create_encoder<B>(/* options */) -> impl Fn(http::Response<B>) -> http::Response<B> {
    |res: http::Response<B>| -> http::Response<B> {
        // ...
    }
}

And then later we could take advantage of middleware patterns, perhaps https://github.com/tower-rs/tower?

@yoshuawuyts
Copy link
Collaborator Author

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Help Wanted Extra attention is needed WG web Issues relevant to the web subgroup
Projects
None yet
Development

No branches or pull requests

8 participants