smolvlm #208

pcuenca · 2025-02-20T11:38:59Z

No description provided.

Everything is reused except the inputs preparation

pcuenca · 2025-02-20T13:04:44Z

mlx_vlm/smolvlm_video_generate.py

This is a proof-of-concept example for video generation with smolvlm. It's much simpler than trying to integrate within the existing video_generate script: there are differences in the chat template, and processing is handled automatically by the processor.

pcuenca · 2025-02-20T13:05:08Z

mlx_vlm/smolvlm_video_generate.py

+    parser.add_argument(
+        "--system", type=str, required=False, help="System prompt"
+    )


System prompt is important, so added as an option.

pcuenca · 2025-02-20T13:06:06Z

mlx_vlm/smolvlm_video_generate.py

+    inputs = processor.apply_chat_template(
+        messages,
+        tokenize=True,
+        add_generation_prompt=True,
+        return_dict=True,
+        return_tensors="np",
+    )


This is a recent addition to transformers: apply_chat_template can now produce inputs in a single step, including for VLMs.

We could potentially leverage this as well in the video_generate example (to be tested with various models).

This is great news!

I will refactor this in the next release and test the models.

From which version? Because we need to pin 📌 that version of transformers in the requirements.txt

I think 4.49.0, released yesterday. SmolVLM2 requires main or this tagged release: https://github.com/huggingface/transformers/releases/tag/v4.49.0-SmolVLM-2

Blaizzy · 2025-02-20T14:14:33Z

Awesome 🙌🏽

Thanks Pedro!

Just two things, whilst I review the code:

Run pre-commit
Bump the version to v0.1.14

Blaizzy

LGTM!

I will approve once the 3 changes request are done.

pcuenca · 2025-02-20T14:46:59Z

requirements.txt

@@ -2,7 +2,7 @@ mlx>=0.22.0
 datasets>=2.19.1
 tqdm>=4.66.2
 numpy>=1.23.4
-transformers>=4.47.1
+transformers>=4.49.0


This includes the processor changes, but SmolVLM2 will require main or this release: https://github.com/huggingface/transformers/releases

Blaizzy

LGTM!

jrp2014 · 2025-02-21T22:16:25Z

With the latest transformers Requirement already satisfied: transformers in /opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages (4.49.0)

I get:

Running mlx-community/SmolVLM2-2.2B-Instruct-mlx
Fetching 12 files: 100%|███████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 24093.66it/s]
Fetching 12 files: 100%|███████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 29995.02it/s]
==========
Files: ['/Users/xxx/Pictures/Processed/20250215-161252_DSC02535_openWith-1-Edit.jpg'] 

Prompt: <|im_start|>User:<image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<end_of_utterance>
Assistant:
Failed to generate output for model at mlx-community/SmolVLM2-2.2B-Instruct-mlx: GPT2TokenizerFast has no attribute tokenizer

One of the other models spits out

Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.

so there must be some fiddling to be done with aligning parameters / configs.

pcuenca · 2025-02-22T10:45:29Z

Hello!

You need to install transformers from main, or from this stable branch:

pip install git+https://github.com/huggingface/[email protected]

pcuenca added 4 commits February 13, 2025 15:46

smolvlm

56dc849

Fix image token placement

e9a1506

Use new modeling class

52cf3c1

Everything is reused except the inputs preparation

smolvlm_video_generate example

8858994

pcuenca commented Feb 20, 2025

View reviewed changes

style

47a1896

Blaizzy reviewed Feb 20, 2025

View reviewed changes

pcuenca added 2 commits February 20, 2025 15:39

Bump version

dcbaa4a

Update transformers

9775bce

pcuenca commented Feb 20, 2025

View reviewed changes

isort

b094f9f

Blaizzy approved these changes Feb 20, 2025

View reviewed changes

Blaizzy merged commit ecefa36 into Blaizzy:main Feb 20, 2025
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

smolvlm #208

smolvlm #208

pcuenca commented Feb 20, 2025

pcuenca Feb 20, 2025

pcuenca Feb 20, 2025

pcuenca Feb 20, 2025

Blaizzy Feb 20, 2025

pcuenca Feb 20, 2025

Blaizzy commented Feb 20, 2025

Blaizzy left a comment

pcuenca Feb 20, 2025

Blaizzy left a comment

jrp2014 commented Feb 21, 2025

pcuenca commented Feb 22, 2025

smolvlm #208

smolvlm #208

Conversation

pcuenca commented Feb 20, 2025

pcuenca Feb 20, 2025

Choose a reason for hiding this comment

pcuenca Feb 20, 2025

Choose a reason for hiding this comment

pcuenca Feb 20, 2025

Choose a reason for hiding this comment

Blaizzy Feb 20, 2025

Choose a reason for hiding this comment

pcuenca Feb 20, 2025

Choose a reason for hiding this comment

Blaizzy commented Feb 20, 2025

Blaizzy left a comment

Choose a reason for hiding this comment

pcuenca Feb 20, 2025

Choose a reason for hiding this comment

Blaizzy left a comment

Choose a reason for hiding this comment

jrp2014 commented Feb 21, 2025

pcuenca commented Feb 22, 2025