-
-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
smolvlm #208
smolvlm #208
Conversation
Everything is reused except the inputs preparation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a proof-of-concept example for video generation with smolvlm. It's much simpler than trying to integrate within the existing video_generate
script: there are differences in the chat template, and processing is handled automatically by the processor.
mlx_vlm/smolvlm_video_generate.py
Outdated
parser.add_argument( | ||
"--system", type=str, required=False, help="System prompt" | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
System prompt is important, so added as an option.
inputs = processor.apply_chat_template( | ||
messages, | ||
tokenize=True, | ||
add_generation_prompt=True, | ||
return_dict=True, | ||
return_tensors="np", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a recent addition to transformers
: apply_chat_template
can now produce inputs in a single step, including for VLMs.
We could potentially leverage this as well in the video_generate
example (to be tested with various models).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great news!
I will refactor this in the next release and test the models.
From which version? Because we need to pin 📌 that version of transformers in the requirements.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think 4.49.0, released yesterday. SmolVLM2 requires main
or this tagged release: https://github.com/huggingface/transformers/releases/tag/v4.49.0-SmolVLM-2
Awesome 🙌🏽 Thanks Pedro! Just two things, whilst I review the code:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
I will approve once the 3 changes request are done.
@@ -2,7 +2,7 @@ mlx>=0.22.0 | |||
datasets>=2.19.1 | |||
tqdm>=4.66.2 | |||
numpy>=1.23.4 | |||
transformers>=4.47.1 | |||
transformers>=4.49.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This includes the processor changes, but SmolVLM2 will require main
or this release: https://github.com/huggingface/transformers/releases
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
With the latest transformers I get:
One of the other models spits out
so there must be some fiddling to be done with aligning parameters / configs. |
Hello! You need to install transformers from pip install git+https://github.com/huggingface/[email protected] |
No description provided.