This plugin integrates Moondream2, a powerful vision-language model, into FiftyOne, enabling various visual AI capabilities like image captioning, visual question answering, object detection, and point localization.
The plugin provides a seamless interface to Moondream2's capabilities within FiftyOne, offering:
-
Multiple vision-language tasks:
-
Image captioning (short or detailed)
-
Visual question answering
-
Object detection
-
Point localization
-
-
Hardware acceleration (CUDA/MPS) when available
-
Dynamic version selection from HuggingFace
-
Full integration with FiftyOne's Dataset and UI
If you haven't already, install FiftyOne and required dependencies:
pip install -U fiftyone transformers torch pvips
On Ubuntu you may need to also install the pvips libraries
sudo apt install imagemagick libvips
Then, install the plugin:
fiftyone plugins download https://github.com/harpreetsahota204/moondream2-plugin
You can use Moondream2 directly through the FiftyOne App:
-
Launch the FiftyOne App with your dataset
-
Open the "Operators Browser" (click the icon or press `)
-
Search for "Run Moondream2"
-
Configure the parameters based on your chosen task:
- Choose between short or detailed captions
- Select model revision
- Specify output field name
- Enter your question about the image
- Select model revision
- Specify output field name
- Specify the object type to detect
- Select model revision
- Specify output field name
- Specify the object to locate
- Select model revision
- Specify output field name
Once installed, you can use the operator programmatically:
import fiftyone.operators as foo
moondream_operator = foo.get_operator("@harpreetsahota/moondream2/moondream")
moondream_operator(
dataset,
revision="2025-01-09",
operation="caption",
output_field="moondream_caption",
length="normal" # or "short"
)
moondream_operator(
dataset,
revision="2025-01-09",
operation="query",
output_field="moondream_answer",
query_text="What color is the car?"
)
moondream_operator(
dataset,
revision="2025-01-09",
operation="detect",
output_field="moondream_detections",
object_type="car"
)
moondream_operator(
dataset,
revision="2025-01-09",
operation="point",
output_field="moondream_points",
object_type="car"
)
If using delegated operation in an notebook, first run: fiftyone delegated launch
and then use await
with any of the operations.
await moondream_operator(
dataset,
revision="2025-01-09",
operation="caption",
output_field="moondream_caption",
length="normal",
delegate=True
)
Model weights are pulled from the Moondream2 Hugging Face model card.
You can visit the original GitHub or the Moondream website for additional information.
@misc{moondream2024,
author = {Korrapati, Vikhyat and others},
title = {Moondream: A Tiny Vision Language Model},
year = {2024},
publisher = {GitHub},
journal = {GitHub repository},
url = {https://github.com/vikhyat/moondream},
commit = {main}
}