-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add core support for decoding from Python file-like objects #564
Conversation
", size=", | ||
dataContext->size); | ||
|
||
int64_t numBytesRead = std::min( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that this avoids the negative buffer problem. Before, we were doing a narrowing conversion: we casted the int64_t
value to an int
in order to compare it to an int
. That caused a large positive value to become a negative value. It's safer to cast the int
to an int64_t
and then compare both values as int64_t
. This is fine because std::memcpy()
takes a size_t
, which the int64_t
can safely convert to.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the great work @scotts !
Since the PR is growing quite large, I'm happy to go ahead and merge this now if you'd like, and iterate on the tests as follow-ups.
src/torchcodec/decoders/_core/ops.py
Outdated
# 2. importlib: For pybind11 modules. We load them dynamically, rather | ||
# than using a plain import statement. A plain import statement only | ||
# works when the module name and file name match exactly, and the | ||
# shared library file is in the import path. Our shared libraries do | ||
# not meet those conditions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: I think we should remove this part from the sentence:
and the shared library file is in the import path
Because the so/dylib files we produce are definitely in the "import path" (they're right next to other files we happily import). My point being that the above is not one of the conditions we fail to meet.
src/torchcodec/decoders/_core/ops.py
Outdated
# | ||
# Note that we use two different methods for loading shared libraries: | ||
# | ||
# 1. torch.ops.load_library(): For PyTorch custom ops. Loading libraries |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: not just custom ops, it's also for the "pure C++" decoder library
@scotts has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
@scotts has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Latest changes LGTM still
The purpose of this PR is to provide functionality in the core API that allows users to provide a Python file-like object as the video for us to decode. Specifically, we're exposing:
Everything else in this PR is in direct support of what we need to do to support this new function.
On the kind of file-like object, note that we accept and test both
io.RawIOBase
andio.BytesIO
. I'm confident we should supportRawIOBase
, as it's unbuffered, byte oriented reading. I'm less sure aboutBytesIO
, because it is buffered byte oriented reading. The current tests pass, but we're not stressing it much.On what we had to do to expose this new function:
libtorchcodec_decoderN.so
,libtorchcodec_custom_opsN.so
,libtorchcodec_pybind_opsN.so
.Going over each in turn:
Why directly use pybind11?
We're already using PyTorch C++ custom ops for our interface between Python and C++, and they already have some dependence on pybind11, so why do we need to create non-custom op pybind11 functions?
From what I can tell, the custom ops were not really intended for what we want to do, which is call C++ code which will itself make callbacks back up to user-provide Python functions. That is, when the user passes us a file-like object, we want FFmpeg to call the read and seek methods on the Python file-like object for all reading and seeking. I don't think custom ops are designed for that kind of dynamic callbacks to Python code. (Custom ops definitely can call Python custom ops, but they need to be registered as such ahead of time. We want to call arbitrary Python functions.) I may be wrong here, and we can explore that in the future.
What's I'm more confident of is that we need to store an actual reference to the Python file-like object in the C++ side. Using pybind11 directly, that's easy: we keep a pointer to a
py::object*
. PyTorch custom ops only accept tensors as arguments. We're already smuggling pointers through tensors in the rest of the core API, but we're going from C++ to Python and back to C++. When we store a pointer from C++ in a tensor to go back up to Python, we know it's the right thing. In this instance, we would want to smuggle a pointer from Python through a tensor to C++ - but Python doesn't have pointers. We may be able to get something that will work most of the time by just asking forid(file_like)
, but even then, I'm not sure how to reliably turn that into apy::object*
on the C++ side.If we just use a pybind11 function, all of this difficulty goes away. The cost is that using a file-like object is definitely not going to be compatible with torch.export and torch.compile, but I'm not sure how to make that known to the PyTorch custom ops. This warrants further investigation.
Why do we need multiple shared libraries?
I think things are simpler if we have a shared library just for the custom ops and a separate shared library just for the pybind11 ops. That then means we need a third library which holds the actual decoder logic. I was not able to get anything working until I made this division, as I am currently using
importlib.util.spec_from_file_location()
andimportlib.util.module_from_spec()
to load the pybind11 module. We just usetorch.ops.load_library()
for the custom ops; that function has machinery that then exposes available functions as fields of the module.What I'm currently doing works on Linux, but is failing on Mac, so something is wrong. It may be possible we don't need to do the split.
Generalization of handling AVIOContext
Custom reading and seeking is done in FFmpeg by setting up an
AVIOContext
object. You callavio_alloc_context()
where you provide a pointer to some external state, and then functions for read, seek and write. Then, during decoding, when FFmpeg needs to get more bytes from the file, it calls the callbacks, providing a pointer to the external state. You're responsible for managing that external state in your callbacks.We already were using this for when users provided us the entire file as just plain bytes. I generalized this handling into three classes:
AVIOContexHolder
which is a base class that knows how to allocate anAVIOContext
. It cannot be instantiated directly. TheVideoDecoder
can be instantiated with anAVIOContextHolder
and it uses it appropriately.AVIOBytesContext
which is the existing functionality we already had. It derives fromAVIOContextHolder
.AVIOFileLikeContext
which is the new functionality, and it also derives fromAVIOContextHolder
.