Skip to content

feat: added graph rag section #1

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Feb 12, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,8 @@
.venv
__pycache__
outputs
_unsloth*
_unsloth*
cache_claims/
cache_embeddings/
cache_graphs/
.env
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ This workshop provides a hands-on exploration of applying Large Language Models
- Problem statement and challenges
- Dataset introduction and exploration

### 2. Retrieval-Augmented Generation (RAG)
### 2. (Retrieval-Augmented Generation (RAG))[graph_rag/README.md]
- Understanding RAG architecture
- Building a RAG pipeline for claim coverage verification
- Implementation considerations and results analysis
Expand Down
950 changes: 950 additions & 0 deletions data/AMLD2025.cypherl

Large diffs are not rendered by default.

3,227 changes: 3,227 additions & 0 deletions data/llamaparse/motor_2021.json

Large diffs are not rendered by default.

1,008 changes: 1,008 additions & 0 deletions data/llamaparse/motor_2021.md

Large diffs are not rendered by default.

138 changes: 138 additions & 0 deletions data/llamaparse/motor_2021_short.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Part A: Loss and damage

## What is covered

1. Loss of or damage to your car or spare parts

If your car, accessories or spare parts are lost, stolen or damaged, we will:

- repair the damage;
- replace what is lost or damaged and is too expensive to repair; or
- pay you the cost of the loss or damage.

We can choose which of these actions we will take for any claim we agree to and the repairer can use parts that have not been produced by the vehicle manufacturer.

If your car is damaged, we will use one of our recommended repairers to repair it. If you choose not to use them, we may not pay more than our recommended repairer would have charged and we may choose to settle the claim by a financial payment. Following damage to your car, we may move your car to a place of safe and free storage pending settlement of any claim.

If you cannot use your car because of loss or damage that is insured under this policy, we will also pay the reasonable cost of protecting your car and taking it to our nearest recommended repairer. After the repair, we will pay the reasonable cost of delivering your car to your address in the UK.

Where your car is not recovered following a theft or is beyond economical repair we will pay you the market value of your car, including accessories and spare parts at the time they are lost, stolen or damaged.

If we settle a claim as a total loss, we will then take ownership of your car.

Accessories and spare parts of your car, which are in your private garage at the time of their loss or damage, will also be covered.

## New car replacement

If during the period of one year after the first registration as new your car is:

- stolen and not recovered; or
- damaged so that repairs will cost more than 60% of the manufacturer's price list (including taxes and the cost of accessories) at the time of the loss or damage;

and provided your car is owned by you then we will replace your car with a new one of the same make, model and specification.

Provided that:
- one is available
- you and anyone else we know who has a financial interest in your car agree.

If your car is recovered before a new replacement is ordered and the cost of repairs are less than 60% of the manufacturers list price, we will do one of the following:

- repair the damage
- replace what is lost or damaged beyond economical repair or
- pay you cash for the amount of the loss or damage.

## Courtesy car

Following a claim under Part A – Loss and damage, you will be provided with the use of a courtesy car whilst your car is
---
Part A: Loss and damage continued

undergoing repair, subject to the repairer's terms and conditions. A courtesy car is not available in respect of:

- claims where your car is identified as being beyond economical repair
- claims where your car has been stolen and has not been recovered
- claims where a recommended repairer has not been used
- losses which occur outside of the UK.

2 Glass damage

We will pay for the repair or replacement of glass in windows or windscreens (including panoramic windscreens) in your car and scratching of the bodywork caused by the glass breaking.

If this is the only damage you claim for, your no claim discount will not be affected.

Our windscreen supplier can use parts that have not been produced by the vehicle manufacturer.

If you choose not to use one of our approved repairers we will limit the amount we pay under this section to £175.

3 Audio – visual equipment and in-car entertainment systems

We will pay for loss or damage to your car's permanently fitted in-car navigational equipment, car phones, radios, CD players, cassette players, games consoles or any other audio or visual equipment. Removable equipment is covered if it can only be used whilst it is attached to your car and is designed to be totally or partially removed.

- If the equipment was fitted by the manufacturer of your car and was part of the standard specification of your car when it was first registered then we will provide unlimited cover for the loss or damage of the equipment.
- If the equipment was not fitted by the manufacturer of your car or the equipment was not part of the standard specification of your car when it was first registered then the maximum we will pay for the loss or damage of the equipment is £500.

4 Replacement locks

If the keys, lock transmitter or entry card for the keyless entry system of your car are lost or stolen, we will pay up to £1,000 towards the cost of replacing:

- The door and boot locks
- The ignition and steering locks
- The lock transmitter; and
- The entry card
- Any other device designed and made by the manufacturer to access and start your car

Providing you report the loss to the police within 24 hours of discovering the loss.

5 Medical expenses

If you, your driver or any of your passengers are injured in an accident involving your car, we will pay medical expenses, which can include physiotherapy if you ask us to and we agree to provide the treatment, of up to £250 for each injured person.

6 Hotel expenses and alternative transport

In the event that your car is not road worthy following an accident and you have reported a claim under Part A – Loss and damage (subsection 1), we will pay up to a maximum of £250 in the event

15
---
# Part A: Loss and damage continued

that you can not complete your planned journey to cover:

- overnight accommodation, including the cost of meals and drinks, for the driver and passengers of your car; or
- public transport for the driver and the passengers of your car to return to your home or your original planned destination.

- Recovery of your car, the driver and up to 6 passengers to the nearest repairer to drain and flush the fuel tank.
- Replenishing the fuel tank with 10 litres of the correct fuel.
- Damage to your car engine caused solely and directly by misfuelling.

For damage to the engine, the excess shown in your schedule under accidental damage will apply.

## 7 Child car seats

If your car is fitted with any child car seats, we will pay up to £300 for their replacement with the same or similar model following an accident covered by this policy. We will pay for the replacement whether or not visible damage has been caused to the child car seat.

You should purchase the replacement seat and we will reimburse you on presentation of the receipt.

A £75 excess applies in respect of claims for draining and flushing the fuel tank.

Claims for misfuelling should be supported by original receipts and a written report from the specialist who drained or recovered your car.

## 8 Misfuelling

If you or any named driver accidentally fill your car with the wrong fuel please do not start the engine. Please call us on our claims line as soon as possible. If your car is subject to misfuelling during the period of insurance we will pay up to a maximum of £250 per claim for:

- Drainage and flushing of the fuel tank on site using a specialist roadside vehicle. Or

### Driver excesses

If your car or any of its accessories or spare parts are damaged while your car is being driven by a driver as shown in the table below, you will have to pay this additional amount, on top of any other excess shown in your schedule, towards any claim.

An inexperienced driver is someone who holds a provisional driving licence, or has held a full driving licence for less than 12 months.

If we pay the inexperienced driver excess, you will have to repay that amount to us as soon as possible.

| Age of driver | Level of experience | Excess |
| ------------------------------ | ------------------- | ------ |
| 25 years and over | Inexperienced | £100 |
| 21 years to 24 years inclusive | Experienced | £150 |
| 21 years to 24 years inclusive | Inexperienced | £200 |
| 17 years to 20 years inclusive | All drivers | £500 |
35 changes: 35 additions & 0 deletions graph_rag/.env.example
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# LlamaIndex Configuration
LLAMA_INDEX_API_KEY=your_llama_index_key

# Neo4j Database Configuration
NEO4J_URI="bolt://localhost:7687"
NEO4J_USER="your_neo4j_username"
NEO4J_PASS="your_neo4j_password"

CLEAR_DATABASE=false

# LLM Provider Selection
# Choose your LLM provider: "openai", "azure", or "deepseek"
LLM_PROVIDER=openai

# OpenAI Configuration
OPENAI_API_KEY=your_openai_api_key
OPENAI_API_BASE=https://api.openai.com/v1
OPENAI_MODEL=gpt-4o
OPENAI_COMPLETION_MODEL=gpt-4
OPENAI_EMBEDDING_MODEL=text-embedding-3-small

# Azure OpenAI Configuration
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
AZURE_OPENAI_KEY=your_azure_openai_key
AZURE_OPENAI_API_VERSION=2023-05-15
AZURE_OPENAI_COMPLETION_MODEL=your_deployment_name
AZURE_OPENAI_DEPLOYMENT=your_deployment_name

# Deepseek Configuration
DEEPSEEK_API_KEY=your_deepseek_api_key
DEEPSEEK_API_BASE=https://api.deepseek.com
DEEPSEEK_MODEL=deepseek-reasoner # deepseek-chat

# Embedding Configuration
EMBEDDING_DIMENSION=1536 # 1536 for text-embedding-3-small, 3072 for text-embedding-3-large
190 changes: 190 additions & 0 deletions graph_rag/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,190 @@
# Graph RAG 🌟

A beginner-friendly graph-based Retrieval Augmented Generation (RAG) system designed for insurance policy analysis. This system combines the power of graph databases with AI to help understand and analyze insurance policies.

[Hosted retriever notebook](https://colab.research.google.com/drive/1QrPbC3a9dlSjNsMoWkSl4cRuPGjopVZ2?usp=sharing)

## What is RAG? 🤔

RAG (Retrieval Augmented Generation) is a technique that helps AI models give more accurate answers by:
1. First finding relevant information from a database
2. Then using that information to generate accurate responses

Think of it like an AI assistant that first looks up information in a reference book before answering your question!

## How This System Works 🔄

Our system works in two main phases:

### 1. Learning Phase (Ingestion) 📚
- Takes insurance policy documents (in markdown format)
- Breaks them into smaller, manageable pieces
- Uses AI (GPT-4) to understand and extract important concepts
- Creates connections between related concepts
- Stores everything in a graph database (like a smart mind map!)

### 2. Question-Answering Phase (Retrieval) 💡
- Takes your question about insurance coverage
- Finds the most relevant parts of the policy
- Follows connections to related information
- Uses AI to analyze your question against the policy
- Gives you a clear, structured answer

## Getting Started 🚀

### 1. Set Up Your Environment 🛠️

First, make sure you have Python 3.9+ installed. Then:

```bash
# Install uv if you haven't already
curl -LsSf https://astral.sh/uv/install.sh | sh

# Create and activate a virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`

# Install dependencies using uv
uv pip install -r requirements.txt
```

### 2. Configuration ⚙️
1. Copy `.env.example` to `.env`:
```bash
cp .env.example .env
```
2. Open `.env` and add your settings:
```env
OPENAI_API_KEY=your-api-key-here
OPENAI_MODEL=gpt-4-turbo-preview
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
```

### 3. Start the Database 🗄️
```bash
docker-compose --env-file .env -f memgraph-platform/docker-compose.yml up
```

## Using the System 🎯

Our system provides three main commands:

### 1. Load Policy Documents 📝
```bash
python -m graph_rag.cli ingest --verbose
```
This command:
- Reads the policy document from `data/llamaparse/motor_2021.md`
- Processes it and stores it in the database
- Creates special indexes for quick searching

### 2. Ask Questions ❓
```bash
python -m graph_rag.cli query --query "Your question here"

Example:
```bash
python -m graph_rag.cli query --query "If I hit a deer and damaged my front bumper, am I covered?"
```

### 3. Process Multiple Questions 📊
```bash
python -m graph_rag.cli batch --claims_path PATH_TO_CLAIMS_FILE --verbose
```
This is useful to run the benchmark.

Need help? Try:
```bash
python -m graph_rag.cli --help
python -m graph_rag.cli <command> --help
```

## How Answers Are Formatted 📋

The system gives answers in a clear, structured format:
```json
{
"covered": true/false,
"explanation": "Clear explanation of why",
"limits": ["Any coverage limits that apply"],
"exclusions": ["Any relevant exclusions"]
}
```

## System Components 🏗️

### Core Parts
- `main.py`: Handles document processing
- `retrieve.py`: Manages question answering
- `retriever/`: Core search logic
- `database/`: Database operations
- `services/`: AI integration
- `utils/`: Helper functions
- `config/`: System settings
- `experiments/`: Testing and evaluation

### Visual Overview
```
┌─────────────────┐
│ Policy Docs │
└────────┬────────┘
┌────────▼────────┐
│ Text Chunking │
└────────┬────────┘
┌───────────────────────┴──────────────────────┐
│ │
┌────────▼────────┐ ┌────────▼────────┐
│ KG Extraction │ │ Embeddings │
│ (GPT-4) │ │ Generation │
└────────┬────────┘ └────────┬────────┘
│ │
│ │
┌────────▼───────────────────────────────────────┐ │
│ Memgraph │◄─────┘
│ (Nodes, Relationships, Vector Index) │
└───────────────────┬────────────────────────────┘
│ ┌─────────────┐
│ │ Query │
│ └──────┬──────┘
│ │
┌──────────▼────────────────▼──────────┐
│ Retrieval Pipeline │
│ 1. Vector Similarity Search │
│ 2. Graph Relationship Traversal │
│ 3. Context Building │
│ 4. GPT Analysis │
└─────────────────┬────────────────────┘
┌──────▼──────┐
│ Answer │
└─────────────┘
```

## Important Notes 📌

- This is a proof of concept - great for learning but not for production use
- Requires an OpenAI API key
- Works best with insurance policy documents
- Performance depends on your database settings


## Caching 💾

To work faster (and cheaper), the system remembers:
- Document embeddings
- AI responses
- Knowledge graph data

Cache location: `cache/` directory

## Need Help? 🆘

- Check the error messages - they're designed to be helpful
- Look at the examples in the code
- Make sure your API keys are set correctly
- Verify your database is running

Remember: This is a learning tool - feel free to experiment and learn from how it works! 🌟
1 change: 1 addition & 0 deletions graph_rag/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
from .retriever.retriever import PolicyRetriever
Loading