🌔 Moondream2 Plugin

This plugin integrates Moondream2, a powerful vision-language model, into FiftyOne, enabling various visual AI capabilities like image captioning, visual question answering, object detection, and point localization.

Plugin Overview

The plugin provides a seamless interface to Moondream2's capabilities within FiftyOne, offering:

Multiple vision-language tasks:
- Image captioning (short or detailed)
- Visual question answering
- Object detection
- Point localization
Hardware acceleration (CUDA/MPS) when available
Dynamic version selection from HuggingFace
Full integration with FiftyOne's Dataset and UI

Installation

If you haven't already, install FiftyOne and required dependencies:

pip install -U fiftyone transformers torch pvips

On Ubuntu you may need to also install the pvips libraries

sudo apt install imagemagick libvips

Then, install the plugin:

fiftyone plugins download https://github.com/harpreetsahota204/moondream2-plugin

Usage in FiftyOne App

You can use Moondream2 directly through the FiftyOne App:

Launch the FiftyOne App with your dataset
Open the "Operators Browser" (click the icon or press `)
Search for "Run Moondream2"
Configure the parameters based on your chosen task:

Available Tasks:

Image Captioning

Choose between short or detailed captions
Select model revision
Specify output field name

Visual Question Answering

Enter your question about the image
Select model revision
Specify output field name

Object Detection

Specify the object type to detect
Select model revision
Specify output field name

Point Localization

Specify the object to locate
Select model revision
Specify output field name

Operator Usage via SDK

Once installed, you can use the operator programmatically:

import fiftyone.operators as foo

moondream_operator = foo.get_operator("@harpreetsahota/moondream2/moondream")

For image captioning

moondream_operator(
    dataset,
    revision="2025-01-09",
    operation="caption",
    output_field="moondream_caption",
    length="normal"  # or "short"
)

For visual question answering

moondream_operator(
    dataset,
    revision="2025-01-09",
    operation="query",
    output_field="moondream_answer",
    query_text="What color is the car?"
)

For object detection

moondream_operator(
    dataset,
    revision="2025-01-09",
    operation="detect",
    output_field="moondream_detections",
    object_type="car"
)

For point localization

moondream_operator(
    dataset,
    revision="2025-01-09",
    operation="point",
    output_field="moondream_points",
    object_type="car"
)

If using delegated operation in an notebook, first run: fiftyone delegated launch and then use await with any of the operations.

await moondream_operator(
    dataset,
    revision="2025-01-09",
    operation="caption",
    output_field="moondream_caption",
    length="normal",
    delegate=True
)

Citation

Model weights are pulled from the Moondream2 Hugging Face model card.

You can visit the original GitHub or the Moondream website for additional information.

@misc{moondream2024,
    author = {Korrapati, Vikhyat and others},
    title = {Moondream: A Tiny Vision Language Model},
    year = {2024},
    publisher = {GitHub},
    journal = {GitHub repository},
    url = {https://github.com/vikhyat/moondream},
    commit = {main}
}

Name	Name	Last commit message	Last commit date
Latest commit harpreetsahota204 Merge pull request #1 from thesteve0/patch-1 Mar 6, 2025 94fc805 · Mar 6, 2025 History 17 Commits
assets	assets	adding icon	Jan 13, 2025
.gitignore	.gitignore	Initial commit	Jan 13, 2025
LICENSE	LICENSE	Initial commit	Jan 13, 2025
README.md	README.md	Update README.md	Mar 6, 2025
__init__.py	__init__.py	update janky imports	Feb 18, 2025
example.ipynb	example.ipynb	adding example	Jan 14, 2025
fiftyone.yaml	fiftyone.yaml	update janky imports	Feb 18, 2025
moondream.py	moondream.py	implement operator	Jan 13, 2025
requirements.txt	requirements.txt	refactors	Jan 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌔 Moondream2 Plugin

Plugin Overview

Installation

Usage in FiftyOne App

Available Tasks:

Image Captioning

Visual Question Answering

Object Detection

Point Localization

Operator Usage via SDK

For image captioning

For visual question answering

For object detection

For point localization

Citation

About

Releases

Packages

Contributors 2

Languages

License

harpreetsahota204/moondream2-plugin

Folders and files

Latest commit

History

Repository files navigation

🌔 Moondream2 Plugin

Plugin Overview

Installation

Usage in FiftyOne App

Available Tasks:

Image Captioning

Visual Question Answering

Object Detection

Point Localization

Operator Usage via SDK

For image captioning

For visual question answering

For object detection

For point localization

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages