Understanding visual scenes using logistic tensor neural networks 🚀🤖

This project combines segmentation model and logic tensor network to realize the reasoning of object relationship in images and improve image content analysis through first-order logic formula and multi-layer perceptron network. ✨

Overall architecture and module division

✨ Image segmentation and feature extraction: The YOLO-Seg model from UltraLytics or the OneFormer model from SHI-Labs is used to segment and extract features from the input image. image for segmentation and feature extraction.
✨Goal relation detection: using a logic tensor network from LTNTorch, each goal is converted into a logical predicate, which is then reasoned over by the logic tensor network.
✨Logical Relationship Training: Logistic tensor networks were trained using relational data from the Visual Genome database.
✨ Output of reasoning results: reads the relations found by the user using the form of a ternary and outputs the results of the reasoning.

Installation Guide

Training environment (Ubuntu 22.04)

pip install -r requirements.train.txt

Reasoning environment (macOS 15.3)

pip install -r requirements.inference.txt

Pre-trained models for YOLO and OneFormer are automatically downloaded when the program is run.

Guidelines for use

Example of training

from utils.Trainer import trainer

predicate = ["in", "on", "next to"]
for pred in predicate:
    print(f"🚂 Training {pred} ...")
    trainer(
        pos_predicate=pred,
        neg_predicates=[p for p in predicate if p != pred],
        epoches=50,
        batch_size=32,
        lr=1e-4
    )

Examples of inference

from utils.Inferencer import Inferencer

# Initialize the inferencer
analyzer = Inferencer(
    subj_class="person",
    obj_class="bicycle",
    predicate="near"
)

# Perform inference on a single image
result = analyzer.inference_single("demo.jpg")
print(f"🔎 Get ：{result['relation']} (Confidence：{result['confidence']:.2f})")

# Perform inference on a folder of images
analyzer.process_folder("input_images/")

Database

The relationships and image metadata data from the Visual Genome database were used to extract image information and feature pair information.

The project extracts data and target locations from relational data, and extracts image data to normalize the target locations.

Code Style and Documentation

This project uses the black and isort to automatically enforce a consistent code style. All code comments and documentation follow the Google Python Style Guide to maintain clarity and consistency.

Use the following command to keep the code in the same format before submitting.

black . && isort .

Acknowledgements

This project is based on the LTNTorch project and uses the Visual Genome database for data extraction. The project uses the YOLO and OneFormer models for object detection and segmentation.

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
README		README
images		images
logs		logs
models		models
results/person_near_sky		results/person_near_sky
utils		utils
weights		weights
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
config.toml		config.toml
environment.yaml		environment.yaml
gantt.ipynb		gantt.ipynb
inference.py		inference.py
requirements.inference.txt		requirements.inference.txt
requirements.train.txt		requirements.train.txt
train.py		train.py
train_parallel.py		train_parallel.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Understanding visual scenes using logistic tensor neural networks 🚀🤖

Overall architecture and module division

Installation Guide

Training environment (Ubuntu 22.04)

Reasoning environment (macOS 15.3)

Guidelines for use

Example of training

Examples of inference

Database

Code Style and Documentation

Acknowledgements

License

This project is licensed under the GNU3.0 License - see the LICENSE file for details.

About

Languages

License

CestMerNeil/LogicVision

Folders and files

Latest commit

History

Repository files navigation

Understanding visual scenes using logistic tensor neural networks 🚀🤖

Overall architecture and module division

Installation Guide

Training environment (Ubuntu 22.04)

Reasoning environment (macOS 15.3)

Guidelines for use

Example of training

Examples of inference

Database

Code Style and Documentation

Acknowledgements

License

This project is licensed under the GNU3.0 License - see the LICENSE file for details.

About

Topics

Resources

License

Stars

Watchers

Forks

Languages