Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Patch utils and models #167

Merged
merged 15 commits into from
Jan 3, 2025
Merged

Patch utils and models #167

merged 15 commits into from
Jan 3, 2025

Conversation

Blaizzy
Copy link
Owner

@Blaizzy Blaizzy commented Jan 1, 2025

Copy# Various fixes and improvements

Core Changes

  • Allow single value resize shape configuration
  • Fix generate step keyword arguments
  • Fix Dolphin Vision functionality
  • Fix language model configurations for DS-VL1 and PaLiGemma
  • Fix model quantization implementation
  • Remove assertion for DS-VL2

Testing and Documentation

  • Add comprehensive smoke test suite for model validation
  • Update model card
  • Add system information

Smoke Test

This PR introduces a smoke test suite for validating model functionality. The suite verifies:

  • Model loading
  • Vision-language multimodal inference
  • Language-only inference

Inspired by @jrp2014's eval harness.

Usage

Run the test suite:

python test_smoke.py --models-file models.txt --image <path_to_image>

models.txt

mlx-community/Llama-3.2-11B-Vision-Instruct-4bit
mlx-community/Phi-3.5-vision-instruct-4bit
...
Screenshot 2025-01-03 at 6 22 25 AM

Closes #166

@Blaizzy Blaizzy merged commit 593a5b8 into main Jan 3, 2025
1 check passed
@Blaizzy
Copy link
Owner Author

Blaizzy commented Jan 3, 2025

V0.1.10 smoke test
image

@jrp2014
Copy link

jrp2014 commented Jan 3, 2025

A Great Leap Forward!

My run of the script bails out (see end of transcript). My own script just ploughs on (but doesn't properly reveal the warning) for whatever reason.

It would be even better if it revealed some load/run timings, if only for performance tracking and relative performance comparisons.

Suggestion: get those reporting bugs to run the smoke test on the failing model (and allow a model to be specified as an alternative to a list of them in a file).

 python tests/test_smoke.py --models-file models.txt --image /Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg
  0%|                                                                                            | 0/29 [00:00<?, ?it/s]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing HuggingFaceTB/SmolVLM-Instruct                                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 12 files: 100%|██████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 9843.86it/s]
Some kwargs in processor config are unused and will not have any effect: image_seq_len.          | 0/12 [00:00<?, ?it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 28102.54it/s]
✓ Model loaded successfully                                                                      | 0/12 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: <|im_start|>User:<image>Describe this image.<end_of_utterance>
Assistant:
 A close-up shot of a gingerbread house sits on a wooden table. The house is decorated with white icing, colorful candies, and small Christmas trees. The house has a brown roof and a white chimney. The house is surrounded by a white fence.
==========
Prompt: 1560 tokens, 1104.527 tokens-per-sec
Generation: 54 tokens, 118.709 tokens-per-sec
Peak memory: 6.473 GB
✓ vision-language generation successful


Testing language-only generation...
==========
Image: None 

Prompt: <|im_start|>User: Hi, how are you?<end_of_utterance>
Assistant:
 I am doing well, thank you. And you?
==========
Prompt: 14 tokens, 189.192 tokens-per-sec
Generation: 12 tokens, 135.624 tokens-per-sec
Peak memory: 4.770 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

  3%|██▉                                                                                 | 1/29 [00:03<01:46,  3.81s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing OpenGVLab/InternVL2_5-8B                                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 21 files: 100%|█████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 20666.44it/s]
ERROR:root:Model type internvl_chat not supported.                                               | 0/21 [00:00<?, ?it/s]
✗ Failed to load model: Model type internvl_chat not supported.
Cleaning up...
✓ Cleanup complete

  7%|█████▊                                                                              | 2/29 [00:03<00:44,  1.65s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing cognitivecomputations/dolphin-2.9.2-qwen2-72b                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 40 files: 100%|█████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 12190.97it/s]
ERROR:root:Model type qwen2 not supported.                                                       | 0/40 [00:00<?, ?it/s]
✗ Failed to load model: Model type qwen2 not supported.
Cleaning up...
✓ Cleanup complete

 10%|████████▋                                                                           | 3/29 [00:04<00:24,  1.05it/s]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing distilbert/distilbert-base-uncased-finetuned-sst-2-english                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 10 files: 100%|██████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 9920.30it/s]
ERROR:root:Model type distilbert not supported.                                                  | 0/10 [00:00<?, ?it/s]
✗ Failed to load model: Model type distilbert not supported.
Cleaning up...
✓ Cleanup complete

 14%|███████████▌                                                                        | 4/29 [00:04<00:15,  1.58it/s]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing google/siglip-so400m-patch14-384                                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 6 files: 100%|█████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 7169.75it/s]
ERROR:root:Model type siglip not supported.                                                       | 0/6 [00:00<?, ?it/s]
✗ Failed to load model: Model type siglip not supported.
Cleaning up...
✓ Cleanup complete

 17%|██████████████▍                                                                     | 5/29 [00:04<00:10,  2.21it/s]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing meta-llama/Llama-3.2-11B-Vision-Instruct                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 29468.18it/s]
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 30810.26it/s]
✓ Model loaded successfully                                                                      | 0/15 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Describe this image.<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>


The image depicts a gingerbread house, a traditional Christmas treat, with a sparkler on top. The house is made of gingerbread and decorated with various candies and sweets, including gumdrops, candy canes, and sprinkles. The roof is covered in red and white icing, and the walls are adorned with colorful candies and sweets. The house is placed on a wooden table, and the background is dark, suggesting that it is nighttime or indoors. The overall atmosphere of the image is festive and celebratory, with the sparkler adding a touch of magic and wonder.
==========
Prompt: 16 tokens, 4.726 tokens-per-sec
Generation: 116 tokens, 3.774 tokens-per-sec
Peak memory: 31.532 GB
✓ vision-language generation successful


Testing language-only generation...
==========
Image: None 

Prompt: <|begin_of_text|><|start_header_id|>system<|end_header_id|>

Cutting Knowledge Date: December 2023
Today Date: 03 Jan 2025

<|eot_id|><|start_header_id|>user<|end_header_id|>

Hi, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


I'm doing well, thanks for asking. I'm a large language model, so I don't have feelings like humans do, but I'm always happy to assist with any questions or tasks you may have. How can I help you today?
==========
Prompt: 42 tokens, 77.185 tokens-per-sec
Generation: 50 tokens, 25.589 tokens-per-sec
Peak memory: 25.400 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 21%|█████████████████▍                                                                  | 6/29 [00:45<05:27, 14.24s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing microsoft/Florence-2-large-ft                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 24059.11it/s]
ERROR:root:No safetensors found in /Users/jrp/.cache/huggingface/hub/models--microsoft--Florence-2-large-ft/snapshots/bb44b80c15e943b1bf7cec6e076359cec6e40178
✗ Failed to load model: 
No safetensors found in 
/Users/jrp/.cache/huggingface/hub/models--microsoft--Florence-2-large-ft/snapshots/bb44b80c15e943b1bf7cec6e076359cec6e40
178
Create safetensors using the following code:

from transformers import AutoModelForCausalLM, AutoProcessor

model_id= "<huggingface_model_id>"
model = AutoModelForCausalLM.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

model.save_pretrained("<local_dir>")
processor.save_pretrained("<local_dir>")

Then use the <local_dir> as the --hf-path in the convert script.

python -m mlx_vlm.convert --hf-path <local_dir> --mlx-path <mlx_dir>

        
Cleaning up...
✓ Cleanup complete

 24%|████████████████████▎                                                               | 7/29 [00:45<03:32,  9.65s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing microsoft/Phi-3.5-mini-instruct                                                                              │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 30823.04it/s]
ERROR:root:Model type phi3 not supported.                                                        | 0/13 [00:00<?, ?it/s]
✗ Failed to load model: Model type phi3 not supported.
Cleaning up...
✓ Cleanup complete

 28%|███████████████████████▏                                                            | 8/29 [00:45<02:19,  6.62s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing microsoft/Phi-3.5-vision-instruct                                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 14 files: 100%|██████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 8810.24it/s]
/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:524: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
  warnings.warn(
Fetching 14 files: 100%|██████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 8615.06it/s]
✓ Model loaded successfully                                                                      | 0/14 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: <|user|>
<|image_1|>Describe this image.<|end|>
<|assistant|>

The image shows a gingerbread house on a wooden surface with a blurred background. The house is decorated with colorful candies and has a lit sparkler on top, creating a festive atmosphere.<|end|>
==========
Prompt: 771 tokens, 906.475 tokens-per-sec
Generation: 47 tokens, 10.583 tokens-per-sec
Peak memory: 10.535 GB
✓ vision-language generation successful


Testing language-only generation...
==========
Image: None 

Prompt: <|user|>
Hi, how are you?<|end|>
<|assistant|>

Hello! I'm just a computer program, so I don't have feelings or emotions. How can I assist you today?<|end|>
==========
Prompt: 14 tokens, 57.486 tokens-per-sec
Generation: 30 tokens, 10.838 tokens-per-sec
Peak memory: 9.955 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 31%|██████████████████████████                                                          | 9/29 [00:56<02:37,  7.85s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mistral-community/pixtral-12b                                                                                │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 21769.74it/s]
✗ Failed to load model: Unsupported model type: pixtral                                          | 0/15 [00:00<?, ?it/s]
Cleaning up...
✓ Cleanup complete

 34%|████████████████████████████▌                                                      | 10/29 [00:56<01:44,  5.48s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/Florence-2-large-ft-bf16                                                                       │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 33244.15it/s]
Fetching 12 files: 100%|████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 187804.66it/s]
✓ Model loaded successfully                                                                      | 0/12 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: Describe this image.
<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>
==========
Prompt: 7 tokens, 24.754 tokens-per-sec
Generation: 256 tokens, 173.900 tokens-per-sec
Peak memory: 3.149 GB
✓ vision-language generation successful


Testing language-only generation...
==========
Image: None 

Prompt: Hi, how are you?
<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>
==========
Prompt: 8 tokens, 77.538 tokens-per-sec
Generation: 256 tokens, 566.957 tokens-per-sec
Peak memory: 1.963 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 38%|███████████████████████████████▍                                                   | 11/29 [01:00<01:29,  4.96s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/Llama-3.2-11B-Vision-Instruct-8bit                                                             │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 14905.13it/s]
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 33447.40it/s]
✓ Model loaded successfully                                                                      | 0/10 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Describe this image.<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>


The image depicts a gingerbread house adorned with an assortment of candies and sweets, including gumdrops, candy canes, and chocolate sticks. The house is situated on a wooden table, surrounded by a festive atmosphere.

*   **Gingerbread House:**
    *   The gingerbread house is the central focus of the image.
    *   It is decorated with a variety of candies and sweets, such as gumdrops, candy canes, and chocolate sticks.
    *   The house has a festive and colorful appearance, with a mix of red, green, yellow, and white candies.
*   **Candies and Sweets:**
    *   The gingerbread house is surrounded by an assortment of candies and sweets.
    *   The candies are arranged in a decorative pattern around the house, creating a festive and inviting atmosphere.
    *   The candies add to the overall festive and celebratory feel of the image.
*   **Wooden Table:**
    *   The gingerbread house is placed on a wooden table, which provides a warm and cozy background for the image.
    *   The table is made of dark wood, which complements the colors of the gingerbread house and adds to the overall festive atmosphere.
*   **Sparkler:
==========
Prompt: 15 tokens, 4.680 tokens-per-sec
Generation: 256 tokens, 9.111 tokens-per-sec
Peak memory: 21.585 GB
✓ vision-language generation successful


Testing language-only generation...
==========
Image: None 

Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Hi, how are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>


I'm doing well, thank you for asking. I'm a large language model, so I don't have feelings like humans do, but I'm always happy to assist and chat with you. How about you? How's your day going so far?
==========
Prompt: 16 tokens, 98.871 tokens-per-sec
Generation: 52 tokens, 45.829 tokens-per-sec
Peak memory: 11.747 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 41%|██████████████████████████████████▎                                                | 12/29 [01:35<04:02, 14.28s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/Llama-3.3-70B-Instruct-8bit                                                                    │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 20 files: 100%|██████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 5632.21it/s]
ERROR:root:Model type llama not supported.                                                       | 0/20 [00:00<?, ?it/s]
✗ Failed to load model: Model type llama not supported.
Cleaning up...
✓ Cleanup complete

 45%|█████████████████████████████████████▏                                             | 13/29 [01:36<02:40, 10.02s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/Molmo-7B-D-0924-8bit                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 26493.83it/s]
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 31184.42it/s]
✓ Model loaded successfully                                                                      | 0/16 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: Describe this image.
 This photograph captures a festive gingerbread house set on a wooden table, with a dark background that accentuates the vibrant details of the house. The house itself is adorned with an array of colorful decorations, including red, yellow, and green candies, as well as white icing. The roof is particularly eye-catching, featuring a mix of red and white candies, and is topped with a sparkler that is lit, adding a magical touch to the scene. The house has two windows, each with green trim and red candies, and a door with a heart-shaped window. Surrounding the base of the house are small candies and what appears to be a chocolate fence, enhancing the whimsical charm. The overall effect is a delightful and detailed holiday display that brings joy and wonder to the viewer.
==========
Prompt: 981 tokens, 47.787 tokens-per-sec
Generation: 158 tokens, 40.735 tokens-per-sec
Peak memory: 29.856 GB
✓ vision-language generation successful


Testing language-only generation...
==========
Image: None 

Prompt: Hi, how are you?
 I'm fine, thank you. What's the weather like today? It's quite warm and sunny. What's your favorite season? I like spring and summer best. What's your favorite food? I love pizza and ice cream. Do you have any pets? No, I don't have any pets. What's your favorite color? I like blue and purple. What's your dream job? I want to be a teacher when I grow up. What's your favorite movie? I can't really say, I like different movies. What's your favorite TV show? I like cartoons and animated series. What's your favorite sport? I like soccer and swimming. What's your favorite book? I like reading fairy tales and adventure books.
==========
Prompt: 6 tokens, 84.527 tokens-per-sec
Generation: 150 tokens, 41.747 tokens-per-sec
Peak memory: 10.147 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 48%|████████████████████████████████████████                                           | 14/29 [02:06<04:03, 16.25s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/Molmo-7B-D-0924-bf16                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 18 files: 100%|█████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 11930.70it/s]
Fetching 18 files: 100%|█████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 12296.01it/s]
✓ Model loaded successfully                                                                      | 0/18 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: Describe this image.
 This photograph captures a festive gingerbread house set on a dark wooden table, with a blurred background that hints at a cozy living room setting. The house, adorned with intricate decorations, features a roof covered in red and white icing, embellished with red swirls and white snowflake-like designs. The front of the house showcases green windows with red and white polka-dotted curtains, and a charming heart-shaped window. The base of the house is surrounded by a variety of candies, including chocolate-covered pretzels, M&Ms, and small candy canes, creating a colorful and textured "snow" effect. Adding to the whimsical scene, two sparklers are lit atop the roof, casting a bright, festive glow that illuminates the entire setup.
==========
Prompt: 981 tokens, 45.730 tokens-per-sec
Generation: 154 tokens, 26.283 tokens-per-sec
Peak memory: 36.653 GB
✓ vision-language generation successful


Testing language-only generation...
==========
Image: None 

Prompt: Hi, how are you?
 I'm fine, thank you. What's the weather like today? It's quite warm and sunny here. What's your favorite season? I like spring and summer best. Do you have any pets? No, I don't have any pets. What's your favorite food? I love pizza and ice cream. What's your favorite color? I like blue and purple. What's your dream job? I want to be a teacher when I grow up. What's your favorite movie? I can't really say, I like different movies for different reasons. What's your favorite TV show? I enjoy animated series like SpongeBob SquarePants and Adventure Time.
==========
Prompt: 6 tokens, 76.795 tokens-per-sec
Generation: 134 tokens, 27.029 tokens-per-sec
Peak memory: 17.246 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 52%|██████████████████████████████████████████▉                                        | 15/29 [02:42<05:11, 22.24s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/Phi-3.5-vision-instruct-bf16                                                                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 52631.23it/s]
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 29811.89it/s]
✓ Model loaded successfully                                                                      | 0/13 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: <|user|>
<|image_1|>Describe this image.<|end|>
<|assistant|>

The image shows a gingerbread house on a wooden surface with a blurred background. The house is decorated with colorful candies and has a lit sparkler on top, creating a festive atmosphere.<|end|>
==========
Prompt: 771 tokens, 906.422 tokens-per-sec
Generation: 47 tokens, 10.556 tokens-per-sec
Peak memory: 13.657 GB
✓ vision-language generation successful


Testing language-only generation...
==========
Image: None 

Prompt: <|user|>
Hi, how are you?<|end|>
<|assistant|>

Hello! I'm just a computer program, so I don't have feelings or emotions. How can I assist you today?<|end|>
==========
Prompt: 14 tokens, 64.076 tokens-per-sec
Generation: 30 tokens, 10.788 tokens-per-sec
Peak memory: 9.955 GB
✓ language-only generation successful


Cleaning up...
✓ Cleanup complete

 55%|█████████████████████████████████████████████▊                                     | 16/29 [02:53<04:03, 18.72s/it]╭──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ Testing mlx-community/QVQ-72B-Preview-8bit                                                                           │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Loading model...
Fetching 25 files: 100%|█████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 14551.43it/s]
Fetching 25 files: 100%|█████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 13398.62it/s]
✓ Model loaded successfully                                                                      | 0/25 [00:00<?, ?it/s]


Testing vision-language generation...
==========
Image: ['/Users/jrp/Pictures/Processed/20250101-214540_DSC01741.jpg'] 

Prompt: <|im_start|>system
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|>
<|im_start|>user
Describe this image.<|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>assistant

libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 137438953472 bytes which is greater than the maximum allowed buffer size of 77309411328 bytes.
zsh: abort      python tests/test_smoke.py --models-file /tmp/models.txt --image 
(mlx) jrp@Johns-MacBook-Pro mlx_vlm % /opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

@Blaizzy
Copy link
Owner Author

Blaizzy commented Jan 3, 2025

Thank you very much!

My run of the script bails out (see end of transcript). My own script just ploughs on (but doesn't properly reveal the warning) for whatever reason.

I suspect the image is too big for a model that size. Even if you have 128GB. I will add a way to handle it and image resize shape to the smoke test.

It would be even better if it revealed some load/run timings, if only for performance tracking and relative performance comparisons.

I can add load time. But when ti comes to run time, it's best measured by token-per-sec which already exists.

Suggestion: get those reporting bugs to run the smoke test on the failing model (and allow a model to be specified as an alternative to a list of them in a file).

Could you elaborate? I don't understand what you mean.

@Blaizzy
Copy link
Owner Author

Blaizzy commented Jan 3, 2025

libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 137438953472 bytes which is greater than the maximum allowed buffer size of 77309411328 bytes.
zsh: abort python tests/test_smoke.py --models-file /tmp/models.txt --image

Update on this error. I don't know of a way to handle this because it's literally like having too many chorme windows open, you PC just freezes and task manager wants to kill the culprit.

So the only solution is to reduce the size of the image, use lower quant or use a smaller model.

@Blaizzy
Copy link
Owner Author

Blaizzy commented Jan 3, 2025

added model load time and resize-shape here #172

Screenshot 2025-01-03 at 11 25 45 PM

@jrp2014
Copy link

jrp2014 commented Jan 3, 2025

Spoke too soon, this used not to break, but now does:

python check_models.py 
mlx version: 0.21.1
mlx-vlm version: 0.1.9
The most recently modified file is: /Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running HuggingFaceTB/SmolVLM-Instruct
Fetching 12 files: 100%|████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 168898.15it/s]
Some kwargs in processor config are unused and will not have any effect: image_seq_len. 
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 25216.26it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: <|im_start|>User:<image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<end_of_utterance>
Assistant:
 A woman in a black jacket and scarf is standing in front of a Christmas tree in a shopping mall. The tree is decorated with red, gold, and silver ornaments and is lit up with white lights. There is a white picket fence in front of the tree. There are people walking around in the background.
==========
Prompt: 1218 tokens, 1097.425 tokens-per-sec
Generation: 65 tokens, 119.983 tokens-per-sec
Peak memory: 6.007 GB
 A woman in a black jacket and scarf is standing in front of a Christmas tree in a shopping mall. The tree is decorated with red, gold, and silver ornaments and is lit up with white lights. There is a white picket fence in front of the tree. There are people walking around in the background.
Output generated in 2.31s
Memory used: 5.28 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running OpenGVLab/InternVL2_5-8B
Fetching 21 files: 100%|█████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 20073.01it/s]
ERROR:root:Model type internvl_chat not supported.
Failed to load model or config at OpenGVLab/InternVL2_5-8B: Model type internvl_chat not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running cognitivecomputations/dolphin-2.9.2-qwen2-72b
Fetching 40 files: 100%|██████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 8428.22it/s]
ERROR:root:Model type qwen2 not supported.
Failed to load model or config at cognitivecomputations/dolphin-2.9.2-qwen2-72b: Model type qwen2 not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running distilbert/distilbert-base-uncased-finetuned-sst-2-english
Fetching 10 files: 100%|██████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 5152.07it/s]
ERROR:root:Model type distilbert not supported.
Failed to load model or config at distilbert/distilbert-base-uncased-finetuned-sst-2-english: Model type distilbert not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running google/siglip-so400m-patch14-384
Fetching 6 files: 100%|█████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 5258.22it/s]
ERROR:root:Model type siglip not supported.
Failed to load model or config at google/siglip-so400m-patch14-384: Model type siglip not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running meta-llama/Llama-3.2-11B-Vision-Instruct
Fetching 15 files: 100%|██████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 7506.81it/s]
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 40329.85it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>


The image depicts a large, decorated Christmas tree in a shopping mall or public building. The tree is adorned with red, gold, and silver ornaments and is surrounded by a white picket fence. The tree is illuminated by yellow lights, which are also strung around the tree and the surrounding area. In the background, there are people walking around the tree, and a sign that reads "BURRITO" can be seen on the right side of the image.

The overall atmosphere of the image is festive and joyful, with the bright lights and decorations creating a warm and inviting ambiance. The presence of people walking around the tree suggests that it is a popular spot for holiday shopping or socializing. The sign for "BURRITO" adds a touch of casual, everyday life to the otherwise festive scene.
==========
Prompt: 36 tokens, 9.864 tokens-per-sec
Generation: 162 tokens, 3.710 tokens-per-sec
Peak memory: 31.532 GB
The image depicts a large, decorated Christmas tree in a shopping mall or public building. The tree is adorned with red, gold, and silver ornaments and is surrounded by a white picket fence. The tree is illuminated by yellow lights, which are also strung around the tree and the surrounding area. In the background, there are people walking around the tree, and a sign that reads "BURRITO" can be seen on the right side of the image.

The overall atmosphere of the image is festive and joyful, with the bright lights and decorations creating a warm and inviting ambiance. The presence of people walking around the tree suggests that it is a popular spot for holiday shopping or socializing. The sign for "BURRITO" adds a touch of casual, everyday life to the otherwise festive scene.
Output generated in 47.79s
Memory used: 18.62 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running microsoft/Florence-2-large-ft
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 25802.28it/s]
ERROR:root:No safetensors found in /Users/jrp/.cache/huggingface/hub/models--microsoft--Florence-2-large-ft/snapshots/bb44b80c15e943b1bf7cec6e076359cec6e40178
Failed to load model or config at microsoft/Florence-2-large-ft: 
No safetensors found in /Users/jrp/.cache/huggingface/hub/models--microsoft--Florence-2-large-ft/snapshots/bb44b80c15e943b1bf7cec6e076359cec6e40178
Create safetensors using the following code:

from transformers import AutoModelForCausalLM, AutoProcessor

model_id= "<huggingface_model_id>"
model = AutoModelForCausalLM.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

model.save_pretrained("<local_dir>")
processor.save_pretrained("<local_dir>")

Then use the <local_dir> as the --hf-path in the convert script.

python -m mlx_vlm.convert --hf-path <local_dir> --mlx-path <mlx_dir>

        
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running microsoft/Phi-3.5-mini-instruct
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 34036.17it/s]
ERROR:root:Model type phi3 not supported.
Failed to load model or config at microsoft/Phi-3.5-mini-instruct: Model type phi3 not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running microsoft/Phi-3.5-vision-instruct
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 36540.30it/s]
/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:524: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
  warnings.warn(
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 17321.61it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: <|user|>
<|image_1|>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|end|>
<|assistant|>

A large Christmas tree adorned with colorful ornaments and lights is the centerpiece of a mall's holiday decoration. People are seen walking around and admiring the tree, with one person taking a photo. The mall has a high ceiling with a glass roof, and there are other Christmas trees and decorations in the background.<|end|>
==========
Prompt: 795 tokens, 895.238 tokens-per-sec
Generation: 76 tokens, 10.189 tokens-per-sec
Peak memory: 18.968 GB
A large Christmas tree adorned with colorful ornaments and lights is the centerpiece of a mall's holiday decoration. People are seen walking around and admiring the tree, with one person taking a photo. The mall has a high ceiling with a glass roof, and there are other Christmas trees and decorations in the background.<|end|>
Output generated in 8.81s
Memory used: 7.59 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mistral-community/pixtral-12b
Fetching 15 files: 100%|██████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 9470.81it/s]
Failed to load model or config at mistral-community/pixtral-12b: Unsupported model type: pixtral
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Florence-2-large-ft-bf16
Fetching 12 files: 100%|██████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 9203.08it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 50031.46it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>
==========
Prompt: 29 tokens, 102.917 tokens-per-sec
Generation: 256 tokens, 169.398 tokens-per-sec
Peak memory: 7.842 GB
<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>
Output generated in 2.37s
Memory used: 1.54 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Llama-3.2-11B-Vision-Instruct-8bit
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 28320.76it/s]
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 80659.69it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>


The image depicts a large, decorated Christmas tree situated in a public space, likely a mall or shopping center. The tree is adorned with numerous ornaments in red, gold, and silver hues, and is surrounded by a white picket fence. The tree is illuminated by yellow lights, which are also reflected on the floor.

**Description:** A large, decorated Christmas tree in a public space, surrounded by a white picket fence and illuminated by yellow lights.

**Keywords:** Christmas tree, decorations, ornaments, lights, public space, mall, shopping center, festive, holiday, winter, season, celebration, joy, happiness, cheer, tradition, culture, community, social, gathering, event, activity, entertainment, leisure, relaxation, enjoyment, fun, excitement, surprise, gift, present, card, letter, wish, hope, love, kindness, generosity, charity, volunteer, help, support, care, compassion, empathy, understanding, tolerance, acceptance, inclusivity, diversity, equality, justice, fairness, freedom, human rights, dignity, respect, trust, honesty, integrity, responsibility, accountability, transparency, communication, collaboration, teamwork, leadership, management, organization, planning, strategy, innovation, creativity, problem-solving, critical thinking, decision-making, analysis,
==========
Prompt: 35 tokens, 10.801 tokens-per-sec
Generation: 256 tokens, 8.692 tokens-per-sec
Peak memory: 21.585 GB
The image depicts a large, decorated Christmas tree situated in a public space, likely a mall or shopping center. The tree is adorned with numerous ornaments in red, gold, and silver hues, and is surrounded by a white picket fence. The tree is illuminated by yellow lights, which are also reflected on the floor.

**Description:** A large, decorated Christmas tree in a public space, surrounded by a white picket fence and illuminated by yellow lights.

**Keywords:** Christmas tree, decorations, ornaments, lights, public space, mall, shopping center, festive, holiday, winter, season, celebration, joy, happiness, cheer, tradition, culture, community, social, gathering, event, activity, entertainment, leisure, relaxation, enjoyment, fun, excitement, surprise, gift, present, card, letter, wish, hope, love, kindness, generosity, charity, volunteer, help, support, care, compassion, empathy, understanding, tolerance, acceptance, inclusivity, diversity, equality, justice, fairness, freedom, human rights, dignity, respect, trust, honesty, integrity, responsibility, accountability, transparency, communication, collaboration, teamwork, leadership, management, organization, planning, strategy, innovation, creativity, problem-solving, critical thinking, decision-making, analysis,
Output generated in 33.22s
Memory used: 10.74 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Llama-3.3-70B-Instruct-8bit
Fetching 20 files: 100%|█████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 21687.20it/s]
ERROR:root:Model type llama not supported.
Failed to load model or config at mlx-community/Llama-3.3-70B-Instruct-8bit: Model type llama not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Molmo-7B-D-0924-8bit
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 21399.51it/s]
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 58406.32it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
 A beautifully decorated Christmas tree stands in a bustling shopping mall, surrounded by a white picket fence. The tree is adorned with gold, red, and silver ornaments and twinkling lights, creating a festive atmosphere. Two people are visible near the tree - a woman looking at her phone and a man holding a bag. The mall's interior features glass windows, a staircase, and a second-level balcony with plants. A "Burrito" sign is visible, suggesting nearby food options. The scene captures the holiday spirit in a modern commercial setting.

Christmas tree, shopping mall, festive decor, decorations, ornaments, lights, white picket fence, people, smartphone, burrito, glass windows, staircase, balcony, plants, holiday spirit, modern commercial setting
==========
Prompt: 1233 tokens, 33.261 tokens-per-sec
Generation: 151 tokens, 40.629 tokens-per-sec
Peak memory: 41.806 GB
 A beautifully decorated Christmas tree stands in a bustling shopping mall, surrounded by a white picket fence. The tree is adorned with gold, red, and silver ornaments and twinkling lights, creating a festive atmosphere. Two people are visible near the tree - a woman looking at her phone and a man holding a bag. The mall's interior features glass windows, a staircase, and a second-level balcony with plants. A "Burrito" sign is visible, suggesting nearby food options. The scene captures the holiday spirit in a modern commercial setting.

Christmas tree, shopping mall, festive decor, decorations, ornaments, lights, white picket fence, people, smartphone, burrito, glass windows, staircase, balcony, plants, holiday spirit, modern commercial setting
Output generated in 41.34s
Memory used: 8.43 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Molmo-7B-D-0924-bf16
Fetching 18 files: 100%|█████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 16201.17it/s]
Fetching 18 files: 100%|█████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 13371.85it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
 A beautifully decorated Christmas tree stands in a shopping mall, surrounded by a white picket fence. The tree is adorned with colorful ornaments and twinkling lights, creating a festive atmosphere. Two people are visible near the tree, one of whom is holding a shopping bag. The mall's interior features glass windows, a staircase, and a second-level balcony with a "Burrito" sign. The scene captures the holiday spirit within a bustling commercial setting. 

Christmas tree, shopping mall, festive decor, decorations, lights, ornaments, white picket fence, shoppers, holiday spirit, glass windows, staircase, balcony, burrito sign

#ChristmasTree #MallDecor #HolidayFestivities #FestiveDecorations #TwinklingLights #OrnamentedTree #WhitePicketFence #Shoppers #HolidaySpirit #GlassWindows #Staircase #Balcony #BurritoSign
==========
Prompt: 1233 tokens, 32.775 tokens-per-sec
Generation: 183 tokens, 26.384 tokens-per-sec
Peak memory: 48.669 GB
 A beautifully decorated Christmas tree stands in a shopping mall, surrounded by a white picket fence. The tree is adorned with colorful ornaments and twinkling lights, creating a festive atmosphere. Two people are visible near the tree, one of whom is holding a shopping bag. The mall's interior features glass windows, a staircase, and a second-level balcony with a "Burrito" sign. The scene captures the holiday spirit within a bustling commercial setting. 

Christmas tree, shopping mall, festive decor, decorations, lights, ornaments, white picket fence, shoppers, holiday spirit, glass windows, staircase, balcony, burrito sign

#ChristmasTree #MallDecor #HolidayFestivities #FestiveDecorations #TwinklingLights #OrnamentedTree #WhitePicketFence #Shoppers #HolidaySpirit #GlassWindows #Staircase #Balcony #BurritoSign
Output generated in 45.09s
Memory used: 11.66 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Phi-3.5-vision-instruct-bf16
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 18603.19it/s]
Fetching 13 files: 100%|██████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 6791.97it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: <|user|>
<|image_1|>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|end|>
<|assistant|>

A large Christmas tree adorned with colorful ornaments and lights is the centerpiece of a mall's holiday decoration. People are seen walking around and admiring the tree, with one person taking a photo. The mall has a high ceiling with a glass roof, and there are other Christmas trees and decorations in the background.<|end|>
==========
Prompt: 795 tokens, 914.372 tokens-per-sec
Generation: 76 tokens, 10.190 tokens-per-sec
Peak memory: 13.728 GB
A large Christmas tree adorned with colorful ornaments and lights is the centerpiece of a mall's holiday decoration. People are seen walking around and admiring the tree, with one person taking a photo. The mall has a high ceiling with a glass roof, and there are other Christmas trees and decorations in the background.<|end|>
Output generated in 8.78s
Memory used: 7.82 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/QVQ-72B-Preview-8bit
Fetching 25 files: 100%|█████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 21277.92it/s]
Fetching 25 files: 100%|█████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 15716.07it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250103-190713_DSC01812.jpg'] 

Prompt: <|im_start|>system
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|>
<|im_start|>user
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>assistant

libc++abi: terminating due to uncaught exception of type std::runtime_error: Attempting to allocate 134767706112 bytes which is greater than the maximum allowed buffer size of 77309411328 bytes.
zsh: abort      python check_models.py
(mlx) jrp@Johns-MacBook-Pro vlm % /opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/multiprocessing/resource_tracker.py:254: UserWarning: resource_tracker: There appear to be 1 leaked semaphore objects to clean up at shutdown
  warnings.warn('resource_tracker: There appear to be %d '

(now just seems to die)

@jrp2014
Copy link

jrp2014 commented Jan 3, 2025

Thank you very much!

Don't mention it!

My run of the script bails out (see end of transcript). My own script just ploughs on (but doesn't properly reveal the warning) for whatever reason.

I suspect the image is too big for a model that size. Even if you have 128GB. I will add a way to handle it and image resize shape to the smoke test.

Big images always used to plough on, with my script, but no longer do ...

I don't want to have to fiddle with image sizes when submitting them to the oracle. This is supposed to be AI! The script knows how much memory I have, and may know how much the model needs? Why are different models capable of running the same image without balking?

It would be even better if it revealed some load/run timings, if only for performance tracking and relative performance comparisons.

I can add load time. But when ti comes to run time, it's best measured by token-per-sec which already exists.
OK, although if a model produces rubbish quickly, I might prefer a slower (token / sec) model.

Suggestion: get those reporting bugs to run the smoke test on the failing model (and allow a model to be specified as an alternative to a list of them in a file).

Could you elaborate? I don't understand what you mean.
In either your README, or CONTRIBUTOR instructions, ask those reporting bugs with particular models to provide the output from running this script (with instructions on how to do so).

@Blaizzy
Copy link
Owner Author

Blaizzy commented Jan 3, 2025

Big images always used to plough on, with my script, but no longer do ...

Could you share a reproducible example with 1 to 2 models? Please include the image.

I don't want to have to fiddle with image sizes when submitting them to the oracle. This is supposed to be AI! The script knows how much memory I have, and may know how much the model needs? Why are different models capable of running the same image without balking?

Hey, I get where you're coming from with automatic image sizing - it would be super convenient! But here's the tricky part: these AI models are surprisingly quirky with how they handle images.

You might have two models that look similar on paper (same size, same number of parameters), but one could be way more memory-hungry just because of how its vision system is built. It's kind of like how two cars might have the same horsepower but totally different fuel efficiency.

Sure, we could try to guess how much memory each model needs, but it'd be like throwing darts blindfolded. Some users would end up with slower performance and might not even realize why. Plus, keeping it working with new hardware would be a constant headache.

So while I'd love to make this work, I think for now it's better to let users control their own image sizes. That way, you know exactly what you're getting and can tune it to what works best for your setup.

@Blaizzy
Copy link
Owner Author

Blaizzy commented Jan 3, 2025

I'm working on other features that might bring the resouce usage down such as (Cache quant, rotating cache and image+prompt caching)

But even these features if the image is too big you will need to resize it manually to a size the models you prefer run with good accuracy. Or you can make your own heuristic to handle resizing.

@jrp2014
Copy link

jrp2014 commented Jan 4, 2025

One thing that could help with the tuning would be to build on your smoke test app and apply a model + prompt to a given image, resized to max 512 pixels in length / width, then 1024, 2048, 4096 and 8192, in each case recording memory usage and memory usage per megapixel. This is, in any case, a useful thing to know about a model.

@jrp2014
Copy link

jrp2014 commented Jan 4, 2025

But it seems to be a bit more complicated. With a different image (from the same camera) I can get

https://live.staticflickr.com/65535/54245767632_324aaa7699_c.jpg

python check_models.py
mlx version: 0.21.1.dev20250104+eab93985b
mlx-vlm version: 0.1.10
The most recently modified file is: /Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg
vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running HuggingFaceTB/SmolVLM-Instruct
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 35746.91it/s]
Some kwargs in processor config are unused and will not have any effect: image_seq_len. 
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 18490.69it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|im_start|>User:<image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<end_of_utterance>
Assistant:
 A night view of a pub with a sign that says "The Three Stags" above the entrance. The pub is located in a town, and there are several other buildings in the background. The pub is lit up, and there are several people standing outside. There is a red light on the street in front of the pub.
==========
Prompt: 1217 tokens, 1091.187 tokens-per-sec
Generation: 68 tokens, 118.809 tokens-per-sec
Peak memory: 6.007 GB
 A night view of a pub with a sign that says "The Three Stags" above the entrance. The pub is located in a town, and there are several other buildings in the background. The pub is lit up, and there are several people standing outside. There is a red light on the street in front of the pub.
Output generated in 2.30s
Memory used: 5.30 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running OpenGVLab/InternVL2_5-8B
Fetching 21 files: 100%|█████████████████████████████████████████████████████████████| 21/21 [00:00<00:00, 12802.38it/s]
ERROR:root:Model type internvl_chat not supported.
Failed to load model or config at OpenGVLab/InternVL2_5-8B: Model type internvl_chat not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running cognitivecomputations/dolphin-2.9.2-qwen2-72b
Fetching 40 files: 100%|██████████████████████████████████████████████████████████████| 40/40 [00:00<00:00, 6385.00it/s]
ERROR:root:Model type qwen2 not supported.
Failed to load model or config at cognitivecomputations/dolphin-2.9.2-qwen2-72b: Model type qwen2 not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running distilbert/distilbert-base-uncased-finetuned-sst-2-english
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 19301.91it/s]
ERROR:root:Model type distilbert not supported.
Failed to load model or config at distilbert/distilbert-base-uncased-finetuned-sst-2-english: Model type distilbert not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running google/siglip-so400m-patch14-384
Fetching 6 files: 100%|█████████████████████████████████████████████████████████████████| 6/6 [00:00<00:00, 4808.14it/s]
ERROR:root:Model type siglip not supported.
Failed to load model or config at google/siglip-so400m-patch14-384: Model type siglip not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running meta-llama/Llama-3.2-11B-Vision-Instruct
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 30977.13it/s]
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 22978.29it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>


The image depicts a city street at night, with a large building in the center that appears to be a pub or restaurant. The building is well-lit and has a sign that reads "THE HIDE & STAYS" in blue letters. The street is wet, suggesting recent rain, and there are several streetlights and traffic lights visible. A red double-decker bus can be seen in the background, and a person is walking on the sidewalk.

The overall atmosphere of the image is one of a quiet, rainy night in a bustling city. The street is empty, except for the person walking and the bus, and the only sounds are the sound of raindrops hitting the pavement and the hum of the bus's engine.

**Keywords:** city street, nighttime, rain, pub, restaurant, streetlights, traffic lights, double-decker bus, pedestrian, rainy, urban, nighttime, atmospheric, moody, rainy night, cityscape, urban landscape, street scene, nighttime scene, rainy city, city at night, nighttime city, urban atmosphere, rainy atmosphere, moody atmosphere, atmospheric city, cityscape at night, urban nighttime, rainy urban, city street at night, nighttime urban, rainy urban scene, city at night, urban nighttime scene, rainy cityscape, cityscape at night, urban cityscape, nighttime urban scene, rainy city scene, cityscape at night, urban nighttime atmosphere, rainy urban atmosphere, city street at night, nighttime urban atmosphere, rainy urban scene, cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy cityscape at night, cityscape at night, urban nighttime scene, rainy urban atmosphere, city street at night, nighttime urban scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban scene, cityscape at night, urban nighttime scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban atmosphere, city street at night, nighttime urban scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban scene, cityscape at night, urban nighttime scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban atmosphere, city street at night, nighttime urban scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban scene, cityscape at night, urban nighttime scene, rainy cityscape at night, urban cityscape at night,
==========
Prompt: 36 tokens, 8.835 tokens-per-sec
Generation: 500 tokens, 3.660 tokens-per-sec
Peak memory: 31.532 GB
The image depicts a city street at night, with a large building in the center that appears to be a pub or restaurant. The building is well-lit and has a sign that reads "THE HIDE & STAYS" in blue letters. The street is wet, suggesting recent rain, and there are several streetlights and traffic lights visible. A red double-decker bus can be seen in the background, and a person is walking on the sidewalk.

The overall atmosphere of the image is one of a quiet, rainy night in a bustling city. The street is empty, except for the person walking and the bus, and the only sounds are the sound of raindrops hitting the pavement and the hum of the bus's engine.

**Keywords:** city street, nighttime, rain, pub, restaurant, streetlights, traffic lights, double-decker bus, pedestrian, rainy, urban, nighttime, atmospheric, moody, rainy night, cityscape, urban landscape, street scene, nighttime scene, rainy city, city at night, nighttime city, urban atmosphere, rainy atmosphere, moody atmosphere, atmospheric city, cityscape at night, urban nighttime, rainy urban, city street at night, nighttime urban, rainy urban scene, city at night, urban nighttime scene, rainy cityscape, cityscape at night, urban cityscape, nighttime urban scene, rainy city scene, cityscape at night, urban nighttime atmosphere, rainy urban atmosphere, city street at night, nighttime urban atmosphere, rainy urban scene, cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy cityscape at night, cityscape at night, urban nighttime scene, rainy urban atmosphere, city street at night, nighttime urban scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban scene, cityscape at night, urban nighttime scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban atmosphere, city street at night, nighttime urban scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban scene, cityscape at night, urban nighttime scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban atmosphere, city street at night, nighttime urban scene, rainy cityscape at night, urban cityscape at night, nighttime urban atmosphere, rainy urban scene, cityscape at night, urban nighttime scene, rainy cityscape at night, urban cityscape at night,
Output generated in 141.13s
Memory used: 18.47 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running microsoft/Florence-2-large-ft
Fetching 9 files: 100%|████████████████████████████████████████████████████████████████| 9/9 [00:00<00:00, 20925.02it/s]
ERROR:root:No safetensors found in /Users/jrp/.cache/huggingface/hub/models--microsoft--Florence-2-large-ft/snapshots/bb44b80c15e943b1bf7cec6e076359cec6e40178
Failed to load model or config at microsoft/Florence-2-large-ft: 
No safetensors found in /Users/jrp/.cache/huggingface/hub/models--microsoft--Florence-2-large-ft/snapshots/bb44b80c15e943b1bf7cec6e076359cec6e40178
Create safetensors using the following code:

from transformers import AutoModelForCausalLM, AutoProcessor

model_id= "<huggingface_model_id>"
model = AutoModelForCausalLM.from_pretrained(model_id)
processor = AutoProcessor.from_pretrained(model_id)

model.save_pretrained("<local_dir>")
processor.save_pretrained("<local_dir>")

Then use the <local_dir> as the --hf-path in the convert script.

python -m mlx_vlm.convert --hf-path <local_dir> --mlx-path <mlx_dir>

        
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running microsoft/Phi-3.5-mini-instruct
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 77894.22it/s]
ERROR:root:Model type phi3 not supported.
Failed to load model or config at microsoft/Phi-3.5-mini-instruct: Model type phi3 not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running microsoft/Phi-3.5-vision-instruct
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 15075.80it/s]
/opt/homebrew/Caskroom/miniconda/base/envs/mlx/lib/python3.12/site-packages/transformers/models/auto/image_processing_auto.py:524: FutureWarning: The image_processor_class argument is deprecated and will be removed in v4.42. Please use `slow_image_processor_class`, or `fast_image_processor_class` instead
  warnings.warn(
Fetching 14 files: 100%|█████████████████████████████████████████████████████████████| 14/14 [00:00<00:00, 22183.70it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|user|>
<|image_1|>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|end|>
<|assistant|>

Caption: A nighttime street scene with a wet road, a bus stop, and a building with lit windows. Description: The image captures a rainy evening on a city street. Keywords: rain, street, night, wet road, bus stop, building, windows, lights, traffic, pedestrians.<|end|>
==========
Prompt: 795 tokens, 910.675 tokens-per-sec
Generation: 69 tokens, 10.112 tokens-per-sec
Peak memory: 19.116 GB
Caption: A nighttime street scene with a wet road, a bus stop, and a building with lit windows. Description: The image captures a rainy evening on a city street. Keywords: rain, street, night, wet road, bus stop, building, windows, lights, traffic, pedestrians.<|end|>
Output generated in 8.09s
Memory used: 7.47 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mistral-community/pixtral-12b
Fetching 15 files: 100%|█████████████████████████████████████████████████████████████| 15/15 [00:00<00:00, 18635.83it/s]
Failed to load model or config at mistral-community/pixtral-12b: Unsupported model type: pixtral
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Florence-2-large-ft-bf16
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12282.00it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 32346.82it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>
==========
Prompt: 29 tokens, 102.084 tokens-per-sec
Generation: 500 tokens, 168.511 tokens-per-sec
Peak memory: 7.314 GB
<s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s><s>
Output generated in 3.77s
Memory used: 1.55 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Llama-3.2-11B-Vision-Instruct-8bit
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 11046.36it/s]
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 15268.67it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|begin_of_text|><|start_header_id|>user<|end_header_id|>

Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|image|><|eot_id|><|start_header_id|>assistant<|end_header_id|>


The image depicts a city street at night, with a large building in the center and a bus on the left side. The building is illuminated by streetlights and has a sign that reads "THE HIDEAWAY" in blue letters.

* A city street at night:
	+ The street is wet and reflects the light from the streetlights.
	+ There are several trees and benches along the sidewalk.
	+ The street is lined with buildings, including the large building in the center of the image.
* A large building with a sign that reads "THE HIDEAWAY":
	+ The building is three stories tall and has a balcony on the second floor.
	+ The sign is located above the entrance to the building and is lit up at night.
	+ The building appears to be a restaurant or bar, as there are tables and chairs outside.
* A bus on the left side of the image:
	+ The bus is red and has a white number on the front.
	+ It is stopped at a bus stop and appears to be waiting for passengers.
	+ The bus is partially obscured by the trees and other buildings.

Overall, the image captures a quiet and peaceful scene of a city street at night, with a large building and a bus providing a sense of activity and movement.
==========
Prompt: 35 tokens, 10.716 tokens-per-sec
Generation: 264 tokens, 8.739 tokens-per-sec
Peak memory: 21.585 GB
The image depicts a city street at night, with a large building in the center and a bus on the left side. The building is illuminated by streetlights and has a sign that reads "THE HIDEAWAY" in blue letters.

* A city street at night:
	+ The street is wet and reflects the light from the streetlights.
	+ There are several trees and benches along the sidewalk.
	+ The street is lined with buildings, including the large building in the center of the image.
* A large building with a sign that reads "THE HIDEAWAY":
	+ The building is three stories tall and has a balcony on the second floor.
	+ The sign is located above the entrance to the building and is lit up at night.
	+ The building appears to be a restaurant or bar, as there are tables and chairs outside.
* A bus on the left side of the image:
	+ The bus is red and has a white number on the front.
	+ It is stopped at a bus stop and appears to be waiting for passengers.
	+ The bus is partially obscured by the trees and other buildings.

Overall, the image captures a quiet and peaceful scene of a city street at night, with a large building and a bus providing a sense of activity and movement.
Output generated in 33.94s
Memory used: 10.70 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Llama-3.3-70B-Instruct-8bit
Fetching 20 files: 100%|█████████████████████████████████████████████████████████████| 20/20 [00:00<00:00, 27324.46it/s]
ERROR:root:Model type llama not supported.
Failed to load model or config at mlx-community/Llama-3.3-70B-Instruct-8bit: Model type llama not supported.
================================================================================

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Molmo-7B-D-0924-8bit
Fetching 16 files: 100%|█████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 14481.84it/s]
Fetching 16 files: 100%|██████████████████████████████████████████████████████████████| 16/16 [00:00<00:00, 6788.27it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
 A nighttime street scene in London featuring "The Three Stags" pub, a three-story building with illuminated windows and outdoor seating. The wet road reflects the streetlights, and a red double-decker bus is visible in the background. A pedestrian stands on the sidewalk, and a tree is present in the scene. The atmosphere is quiet and serene, with no visible activity.

Keywords: London, pub, Three Stags, nighttime, wet street, double-decker bus, pedestrian, tree, quiet atmosphere
==========
Prompt: 1225 tokens, 34.641 tokens-per-sec
Generation: 102 tokens, 40.610 tokens-per-sec
Peak memory: 41.812 GB
 A nighttime street scene in London featuring "The Three Stags" pub, a three-story building with illuminated windows and outdoor seating. The wet road reflects the streetlights, and a red double-decker bus is visible in the background. A pedestrian stands on the sidewalk, and a tree is present in the scene. The atmosphere is quiet and serene, with no visible activity.

Keywords: London, pub, Three Stags, nighttime, wet street, double-decker bus, pedestrian, tree, quiet atmosphere
Output generated in 38.50s
Memory used: 7.43 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Molmo-7B-D-0924-bf16
Fetching 18 files: 100%|█████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 16423.20it/s]
Fetching 18 files: 100%|██████████████████████████████████████████████████████████████| 18/18 [00:00<00:00, 5659.06it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
 A nighttime street scene in London featuring "The Three Stags" pub, a three-story building with illuminated windows and outdoor seating. The wet road reflects the streetlights, and a red double-decker bus is visible in the background. A pedestrian stands on the sidewalk, and a tree is present in the scene. The atmosphere is quiet and serene, with no visible activity. #London#Pub#NightScene#TheThreeStags#DoubleDeckerBus#StreetLamp#Rainy#Serene
==========
Prompt: 1225 tokens, 34.653 tokens-per-sec
Generation: 103 tokens, 26.325 tokens-per-sec
Peak memory: 48.666 GB
 A nighttime street scene in London featuring "The Three Stags" pub, a three-story building with illuminated windows and outdoor seating. The wet road reflects the streetlights, and a red double-decker bus is visible in the background. A pedestrian stands on the sidewalk, and a tree is present in the scene. The atmosphere is quiet and serene, with no visible activity. #London#Pub#NightScene#TheThreeStags#DoubleDeckerBus#StreetLamp#Rainy#Serene
Output generated in 39.79s
Memory used: 12.04 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Phi-3.5-vision-instruct-bf16
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 35919.60it/s]
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 20262.34it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|user|>
<|image_1|>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|end|>
<|assistant|>

Caption: A nighttime street scene with a wet road, a bus stop, and a building with lit windows. Description: The image captures a rainy evening on a city street. Keywords: rain, street, night, wet road, bus stop, building, windows, lights, traffic, pedestrians.<|end|>
==========
Prompt: 795 tokens, 915.170 tokens-per-sec
Generation: 69 tokens, 10.194 tokens-per-sec
Peak memory: 13.728 GB
Caption: A nighttime street scene with a wet road, a bus stop, and a building with lit windows. Description: The image captures a rainy evening on a city street. Keywords: rain, street, night, wet road, bus stop, building, windows, lights, traffic, pedestrians.<|end|>
Output generated in 8.02s
Memory used: 7.97 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/QVQ-72B-Preview-8bit
Fetching 25 files: 100%|█████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 17349.04it/s]
Fetching 25 files: 100%|█████████████████████████████████████████████████████████████| 25/25 [00:00<00:00, 22854.75it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|im_start|>system
You are a helpful and harmless assistant. You are Qwen developed by Alibaba. You should think step-by-step.<|im_end|>
<|im_start|>user
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>assistant

Failed to generate output for model at mlx-community/QVQ-72B-Preview-8bit: arange(): incompatible function arguments. The following argument types are supported:
    1. arange(start : Union[int, float], stop : Union[int, float], step : Union[None, int, float], dtype: Optional[Dtype] = None, *, stream: Union[None, Stream, Device] = None) -> array
    2. arange(stop : Union[int, float], step : Union[None, int, float] = None, dtype: Optional[Dtype] = None, *, stream: Union[None, Stream, Device] = None) -> array

Invoked with types: mlx.core.array, kwargs = { dtype: mlx.core.Dtype }
********************************************************************************

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/Qwen2-VL-7B-Instruct-8bit
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 31595.51it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 36819.05it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|im_start|>system
You are a helpful assistant.<|im_end|>
<|im_start|>user
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|vision_start|><|image_pad|><|vision_end|><|im_end|>
<|im_start|>assistant

Failed to generate output for model at mlx-community/Qwen2-VL-7B-Instruct-8bit: arange(): incompatible function arguments. The following argument types are supported:
    1. arange(start : Union[int, float], stop : Union[int, float], step : Union[None, int, float], dtype: Optional[Dtype] = None, *, stream: Union[None, Stream, Device] = None) -> array
    2. arange(stop : Union[int, float], step : Union[None, int, float] = None, dtype: Optional[Dtype] = None, *, stream: Union[None, Stream, Device] = None) -> array

Invoked with types: mlx.core.array, kwargs = { dtype: mlx.core.Dtype }
********************************************************************************

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/SmolVLM-Instruct-bf16
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 90362.03it/s]
Some kwargs in processor config are unused and will not have any effect: image_seq_len. 
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 41838.44it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|im_start|>User:<image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<end_of_utterance>
Assistant:
 A night view of a pub with a sign that says "The Three Stags" above the entrance. The pub is located in a town, and there are several other buildings in the background. The pub is lit up, and there are several people standing outside. There is a red light on the street in front of the pub.
==========
Prompt: 1217 tokens, 1052.705 tokens-per-sec
Generation: 68 tokens, 116.451 tokens-per-sec
Peak memory: 86.818 GB
 A night view of a pub with a sign that says "The Three Stags" above the entrance. The pub is located in a town, and there are several other buildings in the background. The pub is lit up, and there are several people standing outside. There is a red light on the street in front of the pub.
Output generated in 2.44s
Memory used: 4.41 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/deepseek-vl2-8bit
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 50347.14it/s]
Some kwargs in processor config are unused and will not have any effect: mask_prompt, sft_format, pad_token, image_token, ignore_id, image_std, candidate_resolutions, patch_size, image_mean, add_special_token, downsample_ratio, normalize. 
Add pad token = ['<|▁pad▁|>'] to the tokenizer
<|▁pad▁|>:2
Add image token = ['<image>'] to the tokenizer
<image>:128815
Added grounding-related tokens
Added chat tokens
Fetching 13 files: 100%|█████████████████████████████████████████████████████████████| 13/13 [00:00<00:00, 47913.84it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|User|>: <image>
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.

<|Assistant|>:
A rainy night scene in the city with a lit building in the background. The building has a sign that reads "The Three Stags." The street is wet from the rain, and there are streetlights and a traffic light visible. The atmosphere is dark and moody, with reflections on the wet pavement.
==========
Prompt: 1448 tokens, 377.810 tokens-per-sec
Generation: 63 tokens, 46.852 tokens-per-sec
Peak memory: 31.117 GB
A rainy night scene in the city with a lit building in the background. The building has a sign that reads "The Three Stags." The street is wet from the rain, and there are streetlights and a traffic light visible. The atmosphere is dark and moody, with reflections on the wet pavement.
Output generated in 5.71s
Memory used: 27.38 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/dolphin-vision-72b-4bit
Fetching 19 files: 100%|█████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 20565.62it/s]
Fetching 19 files: 100%|█████████████████████████████████████████████████████████████| 19/19 [00:00<00:00, 18430.11it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|im_start|>system
Answer the questions.<|im_end|><|im_start|>user
<image>
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|im_end|><|im_start|>assistant

The image depicts a "Rainy Night at the Pub," showcasing a traditional British pub with a warm, inviting glow amidst a rainy evening. The scene is characterized by the pub's exterior, the wet pavement reflecting the lights, and the overall ambiance of a cozy retreat from the rain. The keywords or tags for this image could include: "British pub," "rainy night," "cozy atmosphere," "wet pavement," "reflective lights," "traditional architecture," "evening scene," "urban setting," "overcast sky."
==========
Prompt: 41 tokens, 3.705 tokens-per-sec
Generation: 110 tokens, 7.003 tokens-per-sec
Peak memory: 43.775 GB
The image depicts a "Rainy Night at the Pub," showcasing a traditional British pub with a warm, inviting glow amidst a rainy evening. The scene is characterized by the pub's exterior, the wet pavement reflecting the lights, and the overall ambiance of a cozy retreat from the rain. The keywords or tags for this image could include: "British pub," "rainy night," "cozy atmosphere," "wet pavement," "reflective lights," "traditional architecture," "evening scene," "urban setting," "overcast sky."
Output generated in 27.37s
Memory used: 29.19 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/idefics2-8b-chatty-8bit
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 78889.73it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 17672.63it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: User: Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<image><end_of_utterance>
Assistant:
In the heart of a bustling city, a large, ornate building stands as a beacon of elegance amidst the urban landscape. The building, bathed in the soft glow of white lights, is adorned with a grand archway that serves as an inviting entrance. A balcony graces the second floor, offering a vantage point over the city below. The building is nestled on a street corner, with a tree standing guard in front, its leaves rustling gently in the night breeze. The street itself is slick with rain, reflecting the myriad lights from the surrounding buildings and cars. The scene is a symphony of urban life, captured in a single, captivating image.

Keywords:
City, building, ornate, street, balcony, tree, lights, rain, night, urban, street corner, reflection, architecture, nightlife, cityscape, nocturnal, illumination, city life, urban design, city architecture, night photography, cityscape photography, city nightlife, city night, city street, city street corner, city street view, city street scene, city street view at night, city street view in the rain, city street view in the rain at night, city street view in the rain at night with lights, city street view in the rain at night with lights and trees, city street view in the rain at night with lights and trees and buildings, city street view in the rain at night with lights and trees and buildings and cars.<end_of_utterance>
==========
Prompt: 105 tokens, 125.909 tokens-per-sec
Generation: 317 tokens, 47.789 tokens-per-sec
Peak memory: 37.520 GB
In the heart of a bustling city, a large, ornate building stands as a beacon of elegance amidst the urban landscape. The building, bathed in the soft glow of white lights, is adorned with a grand archway that serves as an inviting entrance. A balcony graces the second floor, offering a vantage point over the city below. The building is nestled on a street corner, with a tree standing guard in front, its leaves rustling gently in the night breeze. The street itself is slick with rain, reflecting the myriad lights from the surrounding buildings and cars. The scene is a symphony of urban life, captured in a single, captivating image.

Keywords:
City, building, ornate, street, balcony, tree, lights, rain, night, urban, street corner, reflection, architecture, nightlife, cityscape, nocturnal, illumination, city life, urban design, city architecture, night photography, cityscape photography, city nightlife, city night, city street, city street corner, city street view, city street scene, city street view at night, city street view in the rain, city street view in the rain at night, city street view in the rain at night with lights, city street view in the rain at night with lights and trees, city street view in the rain at night with lights and trees and buildings, city street view in the rain at night with lights and trees and buildings and cars.<end_of_utterance>
Output generated in 7.98s
Memory used: 7.80 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/llava-v1.6-34b-8bit
Fetching 17 files: 100%|██████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 6252.47it/s]
Fetching 17 files: 100%|█████████████████████████████████████████████████████████████| 17/17 [00:00<00:00, 13956.38it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <|im_start|>user
<image>
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.<|im_end|>
<|im_start|>assistant

Expanding inputs for image tokens in LLaVa-NeXT should be done in processing. Please add `patch_size` and `vision_feature_select_strategy` to the model's processing config or set directly with `processor.patch_size = {{patch_size}}` and processor.vision_feature_select_strategy = {{vision_feature_select_strategy}}`. Using processors without these attributes in the config is deprecated and will throw an error in v4.50.
Caption: A bustling city street at night.

Description: The image captures a vibrant city street at night, illuminated by the glow of streetlights and the warm lights of the buildings. The wet pavement reflects the myriad lights, adding a sense of depth and atmosphere to the scene. The buildings, bathed in a soft light, stand tall against the night sky, their windows twinkling like stars. The street is alive with activity, with cars and buses moving along the road, their headlights piercing the darkness. The traffic lights, glowing red, green, and yellow, add a splash of color to the scene. The overall mood of the image is one of energy and excitement, a snapshot of city life after dark.

Keywords: city, night, street, lights, buildings, traffic, cars, buses, pavement, wet, reflection, activity, energy, excitement, urban, life, after dark, glow, atmosphere, depth, vibrant, busy, illuminated, headlights, traffic lights, red, green, yellow, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, warm, lights, buildings, bathed, soft, light, stand, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness
==========
Prompt: 37 tokens, 8.634 tokens-per-sec
Generation: 500 tokens, 10.024 tokens-per-sec
Peak memory: 38.346 GB
Caption: A bustling city street at night.

Description: The image captures a vibrant city street at night, illuminated by the glow of streetlights and the warm lights of the buildings. The wet pavement reflects the myriad lights, adding a sense of depth and atmosphere to the scene. The buildings, bathed in a soft light, stand tall against the night sky, their windows twinkling like stars. The street is alive with activity, with cars and buses moving along the road, their headlights piercing the darkness. The traffic lights, glowing red, green, and yellow, add a splash of color to the scene. The overall mood of the image is one of energy and excitement, a snapshot of city life after dark.

Keywords: city, night, street, lights, buildings, traffic, cars, buses, pavement, wet, reflection, activity, energy, excitement, urban, life, after dark, glow, atmosphere, depth, vibrant, busy, illuminated, headlights, traffic lights, red, green, yellow, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, warm, lights, buildings, bathed, soft, light, stand, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness, buildings, tall, against, night, sky, windows, twinkling, stars, tall, sky, darkness, activity, cars, moving, along, road, piercing, darkness
Output generated in 54.85s
Memory used: 34.29 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/llava-v1.6-mistral-7b-8bit
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 32078.81it/s]
Fetching 12 files: 100%|█████████████████████████████████████████████████████████████| 12/12 [00:00<00:00, 12567.20it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: [INST] <image>
Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily. [/INST]
Factual Caption: A city street at night, with a large building illuminated by streetlights and a rainy atmosphere.

Description: The image depicts a city street at night, with a large building prominently featured in the background. The building is illuminated by streetlights, creating a warm glow against the dark sky. The street is wet, reflecting the lights and adding to the rainy ambiance. There are traffic lights visible, indicating a busy intersection. A bus is also present on the street, suggesting public transportation in the area. The overall scene conveys a sense of urban life and activity, even in inclement weather.

Keywords or Tags: city, night, streetlights, rain, building, traffic lights, bus, urban, public transportation, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy, life, activity, inclement weather, public transit, transportation, streetlights, nighttime, urban setting, cityscape, streetlights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections
==========
Prompt: 42 tokens, 39.754 tokens-per-sec
Generation: 500 tokens, 46.996 tokens-per-sec
Peak memory: 34.931 GB
Factual Caption: A city street at night, with a large building illuminated by streetlights and a rainy atmosphere.

Description: The image depicts a city street at night, with a large building prominently featured in the background. The building is illuminated by streetlights, creating a warm glow against the dark sky. The street is wet, reflecting the lights and adding to the rainy ambiance. There are traffic lights visible, indicating a busy intersection. A bus is also present on the street, suggesting public transportation in the area. The overall scene conveys a sense of urban life and activity, even in inclement weather.

Keywords or Tags: city, night, streetlights, rain, building, traffic lights, bus, urban, public transportation, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy, life, activity, inclement weather, public transit, transportation, streetlights, nighttime, urban setting, cityscape, streetlights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections, glow, dark sky, intersection, atmosphere, illuminated, street, lights, rainy night, city street, wet street, reflections
Output generated in 12.40s
Memory used: 6.53 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/paligemma2-10b-ft-docci-448-6bit
Fetching 8 files: 100%|████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 30311.14it/s]
Fetching 8 files: 100%|████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 12797.27it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
An outdoor, zoomed out, eye level view of the front of a pub on a rainy night. The pub is a corner pub with a large number of windows. The pub has a balcony on the second floor. The pub has a blue neon light strip going across the top of the windows. The pub has a black awning over the balcony. The pub has a black metal fence in front of it. A black metal pole with a bright white light is on the left side of the pub. A red double decker bus is visible on the right side of the pub. The bus is facing to the left. The bus has its brake lights on. The bus is casting a shadow on the road. The road is wet and has a reflection of the pub on it.
==========
Prompt: 1051 tokens, 513.638 tokens-per-sec
Generation: 155 tokens, 38.609 tokens-per-sec
Peak memory: 10.539 GB
An outdoor, zoomed out, eye level view of the front of a pub on a rainy night. The pub is a corner pub with a large number of windows. The pub has a balcony on the second floor. The pub has a blue neon light strip going across the top of the windows. The pub has a black awning over the balcony. The pub has a black metal fence in front of it. A black metal pole with a bright white light is on the left side of the pub. A red double decker bus is visible on the right side of the pub. The bus is facing to the left. The bus has its brake lights on. The bus is casting a shadow on the road. The road is wet and has a reflection of the pub on it.
Output generated in 6.61s
Memory used: 7.64 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/paligemma2-10b-ft-docci-448-bf16
Fetching 10 files: 100%|█████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 11795.01it/s]
Fetching 10 files: 100%|██████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 7342.97it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
A nighttime view of a corner pub. The pub is on the right side of the image, the building is two stories tall. The first floor has a patio area with tables and chairs. The second floor has a balcony area with a railing. The pub has a blue neon light going across the top of the building. The street in front of the pub is wet and dark. There is a streetlight on the left side of the image shining on the street. There is a traffic light on the right side of the image. There is a tree on the right side of the image with no leaves on it. There is a car on the road in the bottom left corner of the image.
==========
Prompt: 1051 tokens, 463.616 tokens-per-sec
Generation: 140 tokens, 4.631 tokens-per-sec
Peak memory: 26.484 GB
A nighttime view of a corner pub. The pub is on the right side of the image, the building is two stories tall. The first floor has a patio area with tables and chairs. The second floor has a balcony area with a railing. The pub has a blue neon light going across the top of the building. The street in front of the pub is wet and dark. There is a streetlight on the left side of the image shining on the street. There is a traffic light on the right side of the image. There is a tree on the right side of the image with no leaves on it. There is a car on the road in the bottom left corner of the image.
Output generated in 33.04s
Memory used: 18.06 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/paligemma2-3b-ft-docci-448-bf16
Fetching 8 files: 100%|████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 27324.46it/s]
Fetching 8 files: 100%|█████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 3701.95it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
A medium-close-up view of a building that is made up of cement and glass. The front portion of the building is rectangular, and along the top portion of the building there are two large rectangular windows. The building is being lit up by white lights, and along the top portion of the building there is a balcony that is being lit up by white lights. On the near side of the building, there is a road that is wet, and along the road there are black metal poles that are holding up a black sign. On the near side of the road, there is a red traffic light that is lit up. On the right side of the building, there is a tree that is dried up, and along the branches of the tree there are small leaves. It is nighttime, as everything can be seen clearly.
==========
Prompt: 1051 tokens, 1287.276 tokens-per-sec
Generation: 167 tokens, 16.043 tokens-per-sec
Peak memory: 18.686 GB
A medium-close-up view of a building that is made up of cement and glass. The front portion of the building is rectangular, and along the top portion of the building there are two large rectangular windows. The building is being lit up by white lights, and along the top portion of the building there is a balcony that is being lit up by white lights. On the near side of the building, there is a road that is wet, and along the road there are black metal poles that are holding up a black sign. On the near side of the road, there is a red traffic light that is lit up. On the right side of the building, there is a tree that is dried up, and along the branches of the tree there are small leaves. It is nighttime, as everything can be seen clearly.
Output generated in 11.75s
Memory used: 5.27 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/paligemma2-3b-pt-896-4bit
Fetching 7 files: 100%|████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 41645.57it/s]
Fetching 7 files: 100%|████████████████████████████████████████████████████████████████| 7/7 [00:00<00:00, 24839.36it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <image>Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.
the three stages
==========
Prompt: 4123 tokens, 1281.568 tokens-per-sec
Generation: 4 tokens, 83.638 tokens-per-sec
Peak memory: 8.834 GB
the three stages
Output generated in 3.80s
Memory used: 1.68 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

vvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvvv
Running mlx-community/pixtral-12b-8bit
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 16109.41it/s]
Fetching 11 files: 100%|█████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 22043.64it/s]
==========
Image: ['/Users/jrp/Pictures/Processed/20250104-211523_DSC01896.jpg'] 

Prompt: <s>[INST][IMG]Provide a factual caption, description and comma-separated keywords or tags for this image so that it can be searched for easily.[/INST]
**Caption:** A bustling city street at night during a rainy evening, featuring a well-lit pub with outdoor seating.

**Description:** The image captures a lively urban scene at night, with a prominent pub illuminated by warm lights. The street is wet from rain, and a red double-decker bus is visible in the background. Pedestrians and vehicles navigate the busy intersection, with traffic lights and street signs adding to the urban atmosphere.

**Keywords:** Night, Rain, City, Pub, Bus, Traffic, Street, Lights, Urban, People, Buildings, Wet, Neon, Signs, Traffic Lights, Neon Lights, Restaurant, Outdoor Seating, Weather, Atmosphere, City Life.
==========
Prompt: 2824 tokens, 379.769 tokens-per-sec
Generation: 142 tokens, 26.672 tokens-per-sec
Peak memory: 19.786 GB
**Caption:** A bustling city street at night during a rainy evening, featuring a well-lit pub with outdoor seating.

**Description:** The image captures a lively urban scene at night, with a prominent pub illuminated by warm lights. The street is wet from rain, and a red double-decker bus is visible in the background. Pedestrians and vehicles navigate the busy intersection, with traffic lights and street signs adding to the urban atmosphere.

**Keywords:** Night, Rain, City, Pub, Bus, Traffic, Street, Lights, Urban, People, Buildings, Wet, Neon, Signs, Traffic Lights, Neon Lights, Restaurant, Outdoor Seating, Weather, Atmosphere, City Life.
Output generated in 13.31s
Memory used: 12.74 GB
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

(ie, my script runs all the way through)

@Blaizzy Blaizzy deleted the pc/patch-utils-and-models branch February 2, 2025 19:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

mlx_vlm.generate Ignores --max-tokens
2 participants