Skip to content

This project combines segmentation model and logic tensor network to realize the reasoning of object relationship in images and improve image content analysis through first-order logic formula and multi-layer perceptron network.

License

Notifications You must be signed in to change notification settings

CestMerNeil/LogicVision

Repository files navigation

English Français 中文 日本語

Understanding visual scenes using logistic tensor neural networks 🚀🤖

Python 3.12 CUDA 12.4 LTNTorch Visual Genome YOLO OneFormer

This project combines segmentation model and logic tensor network to realize the reasoning of object relationship in images and improve image content analysis through first-order logic formula and multi-layer perceptron network. ✨


Overall architecture and module division

Overall Architecture

  1. ✨ Image segmentation and feature extraction: The YOLO-Seg model from UltraLytics or the OneFormer model from SHI-Labs is used to segment and extract features from the input image. image for segmentation and feature extraction.
  2. ✨Goal relation detection: using a logic tensor network from LTNTorch, each goal is converted into a logical predicate, which is then reasoned over by the logic tensor network.
  3. ✨Logical Relationship Training: Logistic tensor networks were trained using relational data from the Visual Genome database.
  4. ✨ Output of reasoning results: reads the relations found by the user using the form of a ternary and outputs the results of the reasoning.

Installation Guide

Training environment (Ubuntu 22.04)

pip install -r requirements.train.txt

Reasoning environment (macOS 15.3)

pip install -r requirements.inference.txt

Pre-trained models for YOLO and OneFormer are automatically downloaded when the program is run.

Guidelines for use

Example of training

from utils.Trainer import trainer

predicate = ["in", "on", "next to"]
for pred in predicate:
    print(f"🚂 Training {pred} ...")
    trainer(
        pos_predicate=pred,
        neg_predicates=[p for p in predicate if p != pred],
        epoches=50,
        batch_size=32,
        lr=1e-4
    )

Examples of inference

from utils.Inferencer import Inferencer

# Initialize the inferencer
analyzer = Inferencer(
    subj_class="person",
    obj_class="bicycle",
    predicate="near"
)

# Perform inference on a single image
result = analyzer.inference_single("demo.jpg")
print(f"🔎 Get :{result['relation']} (Confidence:{result['confidence']:.2f})")

# Perform inference on a folder of images
analyzer.process_folder("input_images/")

Database

The relationships and image metadata data from the Visual Genome database were used to extract image information and feature pair information.

Visual Genole Example

The project extracts data and target locations from relational data, and extracts image data to normalize the target locations.

Code Style and Documentation

This project uses the black and isort to automatically enforce a consistent code style. All code comments and documentation follow the Google Python Style Guide to maintain clarity and consistency.

Use the following command to keep the code in the same format before submitting.

black . && isort . 

Acknowledgements

This project is based on the LTNTorch project and uses the Visual Genome database for data extraction. The project uses the YOLO and OneFormer models for object detection and segmentation.

License

This project is licensed under the GNU3.0 License - see the LICENSE file for details.

About

This project combines segmentation model and logic tensor network to realize the reasoning of object relationship in images and improve image content analysis through first-order logic formula and multi-layer perceptron network.

Topics

Resources

License

Stars

Watchers

Forks