-
-
Notifications
You must be signed in to change notification settings - Fork 1.3k
streams #30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi David, when the input image is on the filesystem, libvips only reads data as it is needed further down the processing pipeline. It's not "streaming" in the Node.js sense of the word but it does minimise memory usage. True streaming of compressed input data from a network socket would probably require modification to libvips. Summoning the vastly superior knowledge of @jcupitt, who may be able to offer help here. |
btw you'd want to change the |
Hi @davidmarkclements, as @lovell says, this is currently sort-of supported. When you process a large jpeg image (for example) libvips never decompresses the whole thing all at once, it just decompresses and then recompresses the area being processed as it works down the image. If you want to work over unix pipes or network sockets, you'd need to add a new pair of load and save operations, something like You'd then need to expose that in sharp somehow, @lovell would be the expert there. |
If you're curious, the load-jpeg-from-buffer class is defined here: https://github.com/jcupitt/libvips/blob/master/libvips/foreign/jpegload.c#L249 And it calls this set of functions to do the work: https://github.com/jcupitt/libvips/blob/master/libvips/foreign/jpeg2vips.c#L1042 So you'd probably need to add under 500 lines of C. |
👍 for streams! |
sorry for delay on this guys, look I can happily get involved in the JS side of things, in terms of creating the stream interfaces and hooking them up to the bindings (although I'd have to research that, haven't worked with bindings before). But as for actually editing the C code, it's not my area and I wouldn't want to screw it up - if someone else who's strong in C wants to team up let me know and we'll get this done - I think it would be a huge win to have true stream image processing available in JS. I really don't know of anything else right now that offers that. |
OK, I'll try doing jpeg as an experiment. I've not done much with sockets, I guess I just read() and don't seek(). Would you always be reading from streams with a known type? Sniffing jpeg/png/etc. from a stream would be fiddly. I guess you'd read a mine header (something like that?) first to get the file type, or perhaps you'd be streaming from a source you controlled. |
I've made an experimental branch with a quick hack to support load-jpeg-from-stream: https://github.com/jcupitt/libvips/tree/load-from-stream For example, you can do this: $ cat k2.jpg | vips jpegload_fd 0 x.v Call from C like this: /**
* vips_jpegload_fd:
* @fd: descriptor to load from
* @out: image to write
* @...: %NULL-terminated list of optional named arguments
*
* Optional arguments:
*
* @shrink: shrink by this much on load
* @fail: fail on warnings
*
* Read a JPEG-formatted stream from @fd into a VIPS image. Exactly as
* vips_jpegload(), but read from a descriptor. This operation does not seek()
* and therefore can read from a network socket.
*
* See also: vips_jpegload().
*
* Returns: 0 on success, -1 on error.
*/
int
vips_jpegload_fd( int descriptor, VipsImage **out, ... ) We should try hooking this up to sharp somehow and see how it behaves. |
I thought of a problem with this: it can only load a single image from a socket. If you want to send a series of images, you'll have to close and reopen the connection. I suppose this might be OK in some situations, but probably not over a slow network connection. The reason is buffering. As you can see, it reads as many bytes as it can each time, up to the buffer capacity: So it can read the end of one image and the first part of the next. When image1 is complete, we will have unused bytes buffered which we would have to be saved somehow to be used for the next image. It's unclear how this could be done neatly. vips does not support image streams (reading many images from one source) and adding it would be hard, so that's probably out. We'd probably need to add some sort of intermediate buffer object which supported "unread", ie. allowed unused bytes to be pushed back into it. How does node tackle this sort of issue? Do they have a socket object that supports pushback? |
Thanks @jcupitt, Node.js' net.Socket object is our friend here. I'll try to make some time over the next few days to experiment with the new |
Using a socket would make the streams API super easier, essentially you just expose the socket directly or add methods that act as proxies to the sockets methods. But is that really the right thing to do? The normative way as I understanding is to use C bindings http://nodejs.org/api/addons.html And then wrap a stream interface around the exposed parts of the underlying C lib. Using a socket will add unnecessary overhead IMO |
Sorry on mobile didn't see @jcupitts comment what I said May or may not be relevant. Btw pushback can be done when creating streams using the Stream module but don't know if that fits what we're talking about |
@jcupitt If libvips provided an API that accepted partial chunks of image data then that would fit with @davidmarkclements suggestion about managing the Streams in JavaScript. As soon as the chunk(s) that comprise the image header data have arrived then processing can begin, but a thread(s) may have to block waiting for further chunks of image data. This is essentially the approach taken by the Node.js wrapper around LAME, which takes advantage of the underlying library's chunk-aware methods such as lame_encode_buffer. Sockets and Streams both involve buffering data somewhere; something will block on a lack of data :) Shall we trial the Sockets method first as that currently feels like the quickest way to a proof-of-concept? |
I was thinking about this again. It would work if you're OK only being able to read a single image off a socket. That might be useful in some situations. The problem is that vips (or rather the libjpeg read system) HAS to have a buffer, and therefore you can't send more then one image down a socket, since the end of the first image will eat the first part of the second. We need to have a reusable intermediate read object to handle the buffering, and we need to be able to hand the libjpeg read system pointers into it. Therefore I think the buffer has to be a new vips object. In C, the API might look like: int descriptor;
VipsInputStream *stream = vips_input_stream_new( descriptor );
for( i = 0; i < 10; i++ ) {
VipsImage *x;
if( vips_jpegload_stream( stream, &x, NULL ) )
return( -1 );
... do something with image x
g_object_unref( x );
}
g_object_unref( stream ); So I think we just need to get a file descriptor from the node.js stream. |
After playing with Node.js Streams for the last couple of evenings I now believe the best approach is to avoid file descriptors and instead have sharp extend the stream.Transform class, buffering data in JavaScript-land first. This should then provide support for multiple images passing through the same pipe. I've got some sample code working locally that needs tuning, documenting and benchmarking, but the early signs are that we should be able to support JPEG, PNG and WebP (but not TIFF) as both input and output streams without any modification to libvips. Hooray! |
YES! That's exactly the right approach
|
In terms of buffering the data in JS land, it would be great to ensure the transform stream honours the highWaterMark option (ie the capacity of the buffer) - that way a dev can control the trade off between speed and memory usage. That's one of the problems with graphicks/image magick wrappers - lack of control over memory usage leads to crashing on under resourced servers. Whereas a stream with a configurable highWaterMark allows the dev to operate and lower resources with slower delivery time. And vice versa, if they've plenty to spare they can up the watermark number (as long as the CPU can keep up). If there's anyway I can help let me know (peer review?) |
This sounds great, but I don't really understand how it can work :-( but please prove me wrong, of course. libjpeg's read system just has a thing for "I need more data", it does not tell you how much more data it would like. So you always read 4096 bytes (or whatever) and hand over that buffer. After it has finished reading an image, there will be some unused bytes left in libjpeg's input buffer. These unused bytes will contain the beginning of the next image (perhaps part of a mime header?). They can't be thrown away, they somehow have to be reused. They need to be extracted from libjpeg and given to the input system of whatever image reader you need next. So ... I still think we'll need a |
libpng's input system works in the same way, it asks for more data but won't tell you how much. libtiff needs to be able to seek on the input, so it can't read from a stream, unfortunately. webp, again, just asks for more data. |
In summary, we need to buffer image data somewhere, be that within the underlying decompression libraries themselves, in libvips or in sharp. The memory usage problems of *magick are usually due to them holding the entire uncompressed image in memory at once. If we buffer compressed image data only, that's a big improvement over existing solutions. What I'm suggesting is that, to start with, we have sharp manage Buffer concatenation of compressed image data as chunks arrive (I may move memory management to the C++ part of sharp if Node.js Buffer performance is poor). Then, when libvips provides support for John's suggested The external API of sharp will remain the same and can slowly get a bit more efficient over time, by pushing the responsibility for buffering upstream. Here's a usage example of how elegant the Stream-based API of sharp should hopefully be, in this case resizing to 100px wide whilst maintaining the same format.
This unlocks the ability to create memory-efficient command line tools that pipe/transform image data from stdin to stdout. |
Ah, I see. Sure, sounds great. This is using https://github.com/jcupitt/libvips/blob/master/libvips/iofuncs/image.c#L1916 I'll add a quick hack of the input stream stuff. You should be able to subclass edit if you've not come across it, there's a write to buffer operation too you could maybe use for the output side: https://github.com/jcupitt/libvips/blob/master/libvips/iofuncs/image.c#L2205 |
@lovell I love how you've totally groked the awesomeness. So. Something else we need to consider is handling backpressure. Since it's a transform stream it's going to need to do two things in order to be a stream that plays nice within a stream chain (and indeed within the larger OS environment).
|
@davidmarkclements I think your 1 and 2 should happen automatically. They do on Linux at least.
|
@jcupitt But if the stream is being piped to another stream (say the other stream was slow http connection for instance), the other stream may need to apply back pressure to sharp stream, As for the writing side.. we're talking about a source node stream writing to the sharp node stream,
So say we the read stream from disk is coming in too fast, I don't see how the OS can suspend the stream in this case, Further the input stream doesn't even need to be a file, it could be this scenario for instance:
Each stream implementation in Node has to manage and handle backpressure, |
Given initially we'll be placing input data into a memory buffer, I think we can rely on the OS for read-side back-pressure. For write-side, sharp will have to manage any back-pressure applied to it by the OS. I've got a couple of ideas here, mostly involving reducing the output buffer data into exponentially smaller chunks. The rather handy brake module can use to emulate back-pressure to test this. |
Ah, I thought these were network sockets and local pipes. You're right if you want it to work for JS software streams you'd need a bit more logic. This could go in the |
I think I've worked out how to correctly inherit from stream.Duplex with some minor API breakage. The Currently writing lots of tests. |
Hey @lovell sounds awesome, just to confirm, in case there's anything else that needs hooking up, the |
Yes, the pipeline will be triggered lazily only when |
I've updated the Any final thoughts from @sebpiq, @jonathanong and @davidmarkclements before this hits master? |
The Thanks to everyone involved for all the help getting this far. |
hey many that's awesome - sorry for lack of final thoughts |
Closing this feature request - thank you everyone for all the suggestions and feedback - I look forward to hearing all about your use of Streams. |
Hello, I've spent a few days implementing experimental support for true streaming: It seems to work. Any testing or feedback very welcome. |
Does the whole buffer have to load into run time memory or is it possible with the underlying C library to implementing a true streaming api, so that a file can be streamed from disk, through sharp stream to network socket or file read stream or whatever else.
ref: http://nodejs.org/api/stream.html
Neither graphicks magik, nor image magik wrappers are capable of streams - theres one module (gm I think) that has a stream api, but its faked somewhat, the module still has to load the full image into memory regardless. So a true streaming interface for sharp would be a massive usp
The text was updated successfully, but these errors were encountered: