Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add core support for decoding from Python file-like objects #564

Merged
merged 61 commits into from
Mar 27, 2025

Conversation

scotts
Copy link
Contributor

@scotts scotts commented Mar 14, 2025

The purpose of this PR is to provide functionality in the core API that allows users to provide a Python file-like object as the video for us to decode. Specifically, we're exposing:

def create_from_file_like(
    file_like: Union[io.RawIOBase, io.BytesIO], seek_mode: Optional[str] = None
) -> torch.Tensor:

Everything else in this PR is in direct support of what we need to do to support this new function.

On the kind of file-like object, note that we accept and test both io.RawIOBase and io.BytesIO. I'm confident we should support RawIOBase, as it's unbuffered, byte oriented reading. I'm less sure about BytesIO, because it is buffered byte oriented reading. The current tests pass, but we're not stressing it much.

On what we had to do to expose this new function:

  1. Implement pybind11 functions that don't go through PyTorch C++ custom ops.
  2. Split the shared libraries into three: libtorchcodec_decoderN.so, libtorchcodec_custom_opsN.so, libtorchcodec_pybind_opsN.so.
  3. Generalize how we handle AVIOContext objects in the C++ VideoDecoder.

Going over each in turn:

Why directly use pybind11?

We're already using PyTorch C++ custom ops for our interface between Python and C++, and they already have some dependence on pybind11, so why do we need to create non-custom op pybind11 functions?

From what I can tell, the custom ops were not really intended for what we want to do, which is call C++ code which will itself make callbacks back up to user-provide Python functions. That is, when the user passes us a file-like object, we want FFmpeg to call the read and seek methods on the Python file-like object for all reading and seeking. I don't think custom ops are designed for that kind of dynamic callbacks to Python code. (Custom ops definitely can call Python custom ops, but they need to be registered as such ahead of time. We want to call arbitrary Python functions.) I may be wrong here, and we can explore that in the future.

What's I'm more confident of is that we need to store an actual reference to the Python file-like object in the C++ side. Using pybind11 directly, that's easy: we keep a pointer to a py::object*. PyTorch custom ops only accept tensors as arguments. We're already smuggling pointers through tensors in the rest of the core API, but we're going from C++ to Python and back to C++. When we store a pointer from C++ in a tensor to go back up to Python, we know it's the right thing. In this instance, we would want to smuggle a pointer from Python through a tensor to C++ - but Python doesn't have pointers. We may be able to get something that will work most of the time by just asking for id(file_like), but even then, I'm not sure how to reliably turn that into a py::object* on the C++ side.

If we just use a pybind11 function, all of this difficulty goes away. The cost is that using a file-like object is definitely not going to be compatible with torch.export and torch.compile, but I'm not sure how to make that known to the PyTorch custom ops. This warrants further investigation.

Why do we need multiple shared libraries?

I think things are simpler if we have a shared library just for the custom ops and a separate shared library just for the pybind11 ops. That then means we need a third library which holds the actual decoder logic. I was not able to get anything working until I made this division, as I am currently using importlib.util.spec_from_file_location() and importlib.util.module_from_spec() to load the pybind11 module. We just use torch.ops.load_library() for the custom ops; that function has machinery that then exposes available functions as fields of the module.

What I'm currently doing works on Linux, but is failing on Mac, so something is wrong. It may be possible we don't need to do the split.

Generalization of handling AVIOContext

Custom reading and seeking is done in FFmpeg by setting up an AVIOContext object. You call avio_alloc_context() where you provide a pointer to some external state, and then functions for read, seek and write. Then, during decoding, when FFmpeg needs to get more bytes from the file, it calls the callbacks, providing a pointer to the external state. You're responsible for managing that external state in your callbacks.

We already were using this for when users provided us the entire file as just plain bytes. I generalized this handling into three classes:

  1. AVIOContexHolder which is a base class that knows how to allocate an AVIOContext. It cannot be instantiated directly. The VideoDecoder can be instantiated with an AVIOContextHolder and it uses it appropriately.
  2. AVIOBytesContext which is the existing functionality we already had. It derives from AVIOContextHolder.
  3. AVIOFileLikeContext which is the new functionality, and it also derives from AVIOContextHolder.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Meta Open Source bot. label Mar 14, 2025
@scotts scotts marked this pull request as ready for review March 20, 2025 18:54
", size=",
dataContext->size);

int64_t numBytesRead = std::min(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this avoids the negative buffer problem. Before, we were doing a narrowing conversion: we casted the int64_t value to an int in order to compare it to an int. That caused a large positive value to become a negative value. It's safer to cast the int to an int64_t and then compare both values as int64_t. This is fine because std::memcpy() takes a size_t, which the int64_t can safely convert to.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the great work @scotts !

Since the PR is growing quite large, I'm happy to go ahead and merge this now if you'd like, and iterate on the tests as follow-ups.

Comment on lines 39 to 43
# 2. importlib: For pybind11 modules. We load them dynamically, rather
# than using a plain import statement. A plain import statement only
# works when the module name and file name match exactly, and the
# shared library file is in the import path. Our shared libraries do
# not meet those conditions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I think we should remove this part from the sentence:

and the shared library file is in the import path

Because the so/dylib files we produce are definitely in the "import path" (they're right next to other files we happily import). My point being that the above is not one of the conditions we fail to meet.

#
# Note that we use two different methods for loading shared libraries:
#
# 1. torch.ops.load_library(): For PyTorch custom ops. Loading libraries
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: not just custom ops, it's also for the "pure C++" decoder library

@facebook-github-bot
Copy link
Contributor

@scotts has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@scotts has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Latest changes LGTM still

@scotts scotts merged commit d830af9 into pytorch:main Mar 27, 2025
47 of 48 checks passed
@scotts scotts deleted the file_like branch March 27, 2025 16:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Meta Open Source bot.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants