Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

smolvlm #208

Merged
merged 8 commits into from
Feb 20, 2025
Merged

smolvlm #208

merged 8 commits into from
Feb 20, 2025

Conversation

pcuenca
Copy link
Contributor

@pcuenca pcuenca commented Feb 20, 2025

No description provided.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a proof-of-concept example for video generation with smolvlm. It's much simpler than trying to integrate within the existing video_generate script: there are differences in the chat template, and processing is handled automatically by the processor.

Comment on lines 25 to 27
parser.add_argument(
"--system", type=str, required=False, help="System prompt"
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

System prompt is important, so added as an option.

Comment on lines +69 to +75
inputs = processor.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_dict=True,
return_tensors="np",
)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a recent addition to transformers: apply_chat_template can now produce inputs in a single step, including for VLMs.

We could potentially leverage this as well in the video_generate example (to be tested with various models).

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great news!

I will refactor this in the next release and test the models.

From which version? Because we need to pin 📌 that version of transformers in the requirements.txt

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think 4.49.0, released yesterday. SmolVLM2 requires main or this tagged release: https://github.com/huggingface/transformers/releases/tag/v4.49.0-SmolVLM-2

@Blaizzy
Copy link
Owner

Blaizzy commented Feb 20, 2025

Awesome 🙌🏽

Thanks Pedro!

Just two things, whilst I review the code:

  1. Run pre-commit
  2. Bump the version to v0.1.14

Copy link
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

I will approve once the 3 changes request are done.

@@ -2,7 +2,7 @@ mlx>=0.22.0
datasets>=2.19.1
tqdm>=4.66.2
numpy>=1.23.4
transformers>=4.47.1
transformers>=4.49.0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This includes the processor changes, but SmolVLM2 will require main or this release: https://github.com/huggingface/transformers/releases

Copy link
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@Blaizzy Blaizzy merged commit ecefa36 into Blaizzy:main Feb 20, 2025
1 check passed
@jrp2014
Copy link

jrp2014 commented Feb 21, 2025

With the latest transformers Requirement already satisfied: transformers in /opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages (4.49.0)

I get:

Running mlx-community/SmolVLM2-2.2B-Instruct-mlx
Fetching 12 files: 100%|███████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 24093.66it/s]
Fetching 12 files: 100%|███████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 29995.02it/s]
==========
Files: ['/Users/xxx/Pictures/Processed/20250215-161252_DSC02535_openWith-1-Edit.jpg'] 

Prompt: <|im_start|>User:<image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<end_of_utterance>
Assistant:
Failed to generate output for model at mlx-community/SmolVLM2-2.2B-Instruct-mlx: GPT2TokenizerFast has no attribute tokenizer

One of the other models spits out

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.

so there must be some fiddling to be done with aligning parameters / configs.

@pcuenca
Copy link
Contributor Author

pcuenca commented Feb 22, 2025

Hello!

You need to install transformers from main, or from this stable branch:

pip install git+https://github.com/huggingface/[email protected]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants