This project combines segmentation model and logic tensor network to realize the reasoning of object relationship in images and improve image content analysis through first-order logic formula and multi-layer perceptron network. ✨
- ✨ Image segmentation and feature extraction: The YOLO-Seg model from UltraLytics or the OneFormer model from SHI-Labs is used to segment and extract features from the input image. image for segmentation and feature extraction.
- ✨Goal relation detection: using a logic tensor network from LTNTorch, each goal is converted into a logical predicate, which is then reasoned over by the logic tensor network.
- ✨Logical Relationship Training: Logistic tensor networks were trained using relational data from the Visual Genome database.
- ✨ Output of reasoning results: reads the relations found by the user using the form of a ternary and outputs the results of the reasoning.
pip install -r requirements.train.txt
pip install -r requirements.inference.txt
Pre-trained models for YOLO and OneFormer are automatically downloaded when the program is run.
from utils.Trainer import trainer
predicate = ["in", "on", "next to"]
for pred in predicate:
print(f"🚂 Training {pred} ...")
trainer(
pos_predicate=pred,
neg_predicates=[p for p in predicate if p != pred],
epoches=50,
batch_size=32,
lr=1e-4
)
from utils.Inferencer import Inferencer
# Initialize the inferencer
analyzer = Inferencer(
subj_class="person",
obj_class="bicycle",
predicate="near"
)
# Perform inference on a single image
result = analyzer.inference_single("demo.jpg")
print(f"🔎 Get :{result['relation']} (Confidence:{result['confidence']:.2f})")
# Perform inference on a folder of images
analyzer.process_folder("input_images/")
The relationships and image metadata data from the Visual Genome database were used to extract image information and feature pair information.
The project extracts data and target locations from relational data, and extracts image data to normalize the target locations.
This project uses the black
and isort
to automatically enforce a consistent code style. All code comments and documentation follow the Google Python Style Guide to maintain clarity and consistency.
Use the following command to keep the code in the same format before submitting.
black . && isort .
This project is based on the LTNTorch project and uses the Visual Genome database for data extraction. The project uses the YOLO and OneFormer models for object detection and segmentation.