Skip to content

Commit 933cb99

Browse files
idealleygrll
authored andcommitted
feat: added graph rag section
1 parent 78c47fe commit 933cb99

30 files changed

+23001
-54
lines changed

Diff for: .gitignore

+5-1
Original file line numberDiff line numberDiff line change
@@ -2,4 +2,8 @@
22
.venv
33
__pycache__
44
outputs
5-
_unsloth*
5+
_unsloth*
6+
cache_claims/
7+
cache_embeddings/
8+
cache_graphs/
9+
.env

Diff for: README.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@ This workshop provides a hands-on exploration of applying Large Language Models
2121
- Problem statement and challenges
2222
- Dataset introduction and exploration
2323

24-
### 2. Retrieval-Augmented Generation (RAG)
24+
### 2. (Retrieval-Augmented Generation (RAG))[graph_rag/README.md]
2525
- Understanding RAG architecture
2626
- Building a RAG pipeline for claim coverage verification
2727
- Implementation considerations and results analysis

Diff for: data/AMLD2025.cypherl

+950
Large diffs are not rendered by default.

Diff for: data/llamaparse/motor_2021.json

+3,227
Large diffs are not rendered by default.

Diff for: data/llamaparse/motor_2021.md

+1,008
Large diffs are not rendered by default.

Diff for: data/llamaparse/motor_2021_short.md

+138
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,138 @@
1+
# Part A: Loss and damage
2+
3+
## What is covered
4+
5+
1. Loss of or damage to your car or spare parts
6+
7+
If your car, accessories or spare parts are lost, stolen or damaged, we will:
8+
9+
- repair the damage;
10+
- replace what is lost or damaged and is too expensive to repair; or
11+
- pay you the cost of the loss or damage.
12+
13+
We can choose which of these actions we will take for any claim we agree to and the repairer can use parts that have not been produced by the vehicle manufacturer.
14+
15+
If your car is damaged, we will use one of our recommended repairers to repair it. If you choose not to use them, we may not pay more than our recommended repairer would have charged and we may choose to settle the claim by a financial payment. Following damage to your car, we may move your car to a place of safe and free storage pending settlement of any claim.
16+
17+
If you cannot use your car because of loss or damage that is insured under this policy, we will also pay the reasonable cost of protecting your car and taking it to our nearest recommended repairer. After the repair, we will pay the reasonable cost of delivering your car to your address in the UK.
18+
19+
Where your car is not recovered following a theft or is beyond economical repair we will pay you the market value of your car, including accessories and spare parts at the time they are lost, stolen or damaged.
20+
21+
If we settle a claim as a total loss, we will then take ownership of your car.
22+
23+
Accessories and spare parts of your car, which are in your private garage at the time of their loss or damage, will also be covered.
24+
25+
## New car replacement
26+
27+
If during the period of one year after the first registration as new your car is:
28+
29+
- stolen and not recovered; or
30+
- damaged so that repairs will cost more than 60% of the manufacturer's price list (including taxes and the cost of accessories) at the time of the loss or damage;
31+
32+
and provided your car is owned by you then we will replace your car with a new one of the same make, model and specification.
33+
34+
Provided that:
35+
- one is available
36+
- you and anyone else we know who has a financial interest in your car agree.
37+
38+
If your car is recovered before a new replacement is ordered and the cost of repairs are less than 60% of the manufacturers list price, we will do one of the following:
39+
40+
- repair the damage
41+
- replace what is lost or damaged beyond economical repair or
42+
- pay you cash for the amount of the loss or damage.
43+
44+
## Courtesy car
45+
46+
Following a claim under Part A – Loss and damage, you will be provided with the use of a courtesy car whilst your car is
47+
---
48+
Part A: Loss and damage continued
49+
50+
undergoing repair, subject to the repairer's terms and conditions. A courtesy car is not available in respect of:
51+
52+
- claims where your car is identified as being beyond economical repair
53+
- claims where your car has been stolen and has not been recovered
54+
- claims where a recommended repairer has not been used
55+
- losses which occur outside of the UK.
56+
57+
2 Glass damage
58+
59+
We will pay for the repair or replacement of glass in windows or windscreens (including panoramic windscreens) in your car and scratching of the bodywork caused by the glass breaking.
60+
61+
If this is the only damage you claim for, your no claim discount will not be affected.
62+
63+
Our windscreen supplier can use parts that have not been produced by the vehicle manufacturer.
64+
65+
If you choose not to use one of our approved repairers we will limit the amount we pay under this section to £175.
66+
67+
3 Audio – visual equipment and in-car entertainment systems
68+
69+
We will pay for loss or damage to your car's permanently fitted in-car navigational equipment, car phones, radios, CD players, cassette players, games consoles or any other audio or visual equipment. Removable equipment is covered if it can only be used whilst it is attached to your car and is designed to be totally or partially removed.
70+
71+
- If the equipment was fitted by the manufacturer of your car and was part of the standard specification of your car when it was first registered then we will provide unlimited cover for the loss or damage of the equipment.
72+
- If the equipment was not fitted by the manufacturer of your car or the equipment was not part of the standard specification of your car when it was first registered then the maximum we will pay for the loss or damage of the equipment is £500.
73+
74+
4 Replacement locks
75+
76+
If the keys, lock transmitter or entry card for the keyless entry system of your car are lost or stolen, we will pay up to £1,000 towards the cost of replacing:
77+
78+
- The door and boot locks
79+
- The ignition and steering locks
80+
- The lock transmitter; and
81+
- The entry card
82+
- Any other device designed and made by the manufacturer to access and start your car
83+
84+
Providing you report the loss to the police within 24 hours of discovering the loss.
85+
86+
5 Medical expenses
87+
88+
If you, your driver or any of your passengers are injured in an accident involving your car, we will pay medical expenses, which can include physiotherapy if you ask us to and we agree to provide the treatment, of up to £250 for each injured person.
89+
90+
6 Hotel expenses and alternative transport
91+
92+
In the event that your car is not road worthy following an accident and you have reported a claim under Part A – Loss and damage (subsection 1), we will pay up to a maximum of £250 in the event
93+
94+
15
95+
---
96+
# Part A: Loss and damage continued
97+
98+
that you can not complete your planned journey to cover:
99+
100+
- overnight accommodation, including the cost of meals and drinks, for the driver and passengers of your car; or
101+
- public transport for the driver and the passengers of your car to return to your home or your original planned destination.
102+
103+
- Recovery of your car, the driver and up to 6 passengers to the nearest repairer to drain and flush the fuel tank.
104+
- Replenishing the fuel tank with 10 litres of the correct fuel.
105+
- Damage to your car engine caused solely and directly by misfuelling.
106+
107+
For damage to the engine, the excess shown in your schedule under accidental damage will apply.
108+
109+
## 7 Child car seats
110+
111+
If your car is fitted with any child car seats, we will pay up to £300 for their replacement with the same or similar model following an accident covered by this policy. We will pay for the replacement whether or not visible damage has been caused to the child car seat.
112+
113+
You should purchase the replacement seat and we will reimburse you on presentation of the receipt.
114+
115+
A £75 excess applies in respect of claims for draining and flushing the fuel tank.
116+
117+
Claims for misfuelling should be supported by original receipts and a written report from the specialist who drained or recovered your car.
118+
119+
## 8 Misfuelling
120+
121+
If you or any named driver accidentally fill your car with the wrong fuel please do not start the engine. Please call us on our claims line as soon as possible. If your car is subject to misfuelling during the period of insurance we will pay up to a maximum of £250 per claim for:
122+
123+
- Drainage and flushing of the fuel tank on site using a specialist roadside vehicle. Or
124+
125+
### Driver excesses
126+
127+
If your car or any of its accessories or spare parts are damaged while your car is being driven by a driver as shown in the table below, you will have to pay this additional amount, on top of any other excess shown in your schedule, towards any claim.
128+
129+
An inexperienced driver is someone who holds a provisional driving licence, or has held a full driving licence for less than 12 months.
130+
131+
If we pay the inexperienced driver excess, you will have to repay that amount to us as soon as possible.
132+
133+
| Age of driver | Level of experience | Excess |
134+
| ------------------------------ | ------------------- | ------ |
135+
| 25 years and over | Inexperienced | £100 |
136+
| 21 years to 24 years inclusive | Experienced | £150 |
137+
| 21 years to 24 years inclusive | Inexperienced | £200 |
138+
| 17 years to 20 years inclusive | All drivers | £500 |

Diff for: graph_rag/.env.example

+35
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
# LlamaIndex Configuration
2+
LLAMA_INDEX_API_KEY=your_llama_index_key
3+
4+
# Neo4j Database Configuration
5+
NEO4J_URI="bolt://localhost:7687"
6+
NEO4J_USER="your_neo4j_username"
7+
NEO4J_PASS="your_neo4j_password"
8+
9+
CLEAR_DATABASE=false
10+
11+
# LLM Provider Selection
12+
# Choose your LLM provider: "openai", "azure", or "deepseek"
13+
LLM_PROVIDER=openai
14+
15+
# OpenAI Configuration
16+
OPENAI_API_KEY=your_openai_api_key
17+
OPENAI_API_BASE=https://api.openai.com/v1
18+
OPENAI_MODEL=gpt-4o
19+
OPENAI_COMPLETION_MODEL=gpt-4
20+
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
21+
22+
# Azure OpenAI Configuration
23+
AZURE_OPENAI_ENDPOINT=https://your-resource.openai.azure.com
24+
AZURE_OPENAI_KEY=your_azure_openai_key
25+
AZURE_OPENAI_API_VERSION=2023-05-15
26+
AZURE_OPENAI_COMPLETION_MODEL=your_deployment_name
27+
AZURE_OPENAI_DEPLOYMENT=your_deployment_name
28+
29+
# Deepseek Configuration
30+
DEEPSEEK_API_KEY=your_deepseek_api_key
31+
DEEPSEEK_API_BASE=https://api.deepseek.com
32+
DEEPSEEK_MODEL=deepseek-reasoner # deepseek-chat
33+
34+
# Embedding Configuration
35+
EMBEDDING_DIMENSION=1536 # 1536 for text-embedding-3-small, 3072 for text-embedding-3-large

Diff for: graph_rag/README.md

+190
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,190 @@
1+
# Graph RAG 🌟
2+
3+
A beginner-friendly graph-based Retrieval Augmented Generation (RAG) system designed for insurance policy analysis. This system combines the power of graph databases with AI to help understand and analyze insurance policies.
4+
5+
[Hosted retriever notebook](https://colab.research.google.com/drive/1QrPbC3a9dlSjNsMoWkSl4cRuPGjopVZ2?usp=sharing)
6+
7+
## What is RAG? 🤔
8+
9+
RAG (Retrieval Augmented Generation) is a technique that helps AI models give more accurate answers by:
10+
1. First finding relevant information from a database
11+
2. Then using that information to generate accurate responses
12+
13+
Think of it like an AI assistant that first looks up information in a reference book before answering your question!
14+
15+
## How This System Works 🔄
16+
17+
Our system works in two main phases:
18+
19+
### 1. Learning Phase (Ingestion) 📚
20+
- Takes insurance policy documents (in markdown format)
21+
- Breaks them into smaller, manageable pieces
22+
- Uses AI (GPT-4) to understand and extract important concepts
23+
- Creates connections between related concepts
24+
- Stores everything in a graph database (like a smart mind map!)
25+
26+
### 2. Question-Answering Phase (Retrieval) 💡
27+
- Takes your question about insurance coverage
28+
- Finds the most relevant parts of the policy
29+
- Follows connections to related information
30+
- Uses AI to analyze your question against the policy
31+
- Gives you a clear, structured answer
32+
33+
## Getting Started 🚀
34+
35+
### 1. Set Up Your Environment 🛠️
36+
37+
First, make sure you have Python 3.9+ installed. Then:
38+
39+
```bash
40+
# Install uv if you haven't already
41+
curl -LsSf https://astral.sh/uv/install.sh | sh
42+
43+
# Create and activate a virtual environment
44+
python -m venv .venv
45+
source .venv/bin/activate # On Windows, use `.venv\Scripts\activate`
46+
47+
# Install dependencies using uv
48+
uv pip install -r requirements.txt
49+
```
50+
51+
### 2. Configuration ⚙️
52+
1. Copy `.env.example` to `.env`:
53+
```bash
54+
cp .env.example .env
55+
```
56+
2. Open `.env` and add your settings:
57+
```env
58+
OPENAI_API_KEY=your-api-key-here
59+
OPENAI_MODEL=gpt-4-turbo-preview
60+
OPENAI_EMBEDDING_MODEL=text-embedding-3-small
61+
```
62+
63+
### 3. Start the Database 🗄️
64+
```bash
65+
docker-compose --env-file .env -f memgraph-platform/docker-compose.yml up
66+
```
67+
68+
## Using the System 🎯
69+
70+
Our system provides three main commands:
71+
72+
### 1. Load Policy Documents 📝
73+
```bash
74+
python -m graph_rag.cli ingest --verbose
75+
```
76+
This command:
77+
- Reads the policy document from `data/llamaparse/motor_2021.md`
78+
- Processes it and stores it in the database
79+
- Creates special indexes for quick searching
80+
81+
### 2. Ask Questions ❓
82+
```bash
83+
python -m graph_rag.cli query --query "Your question here"
84+
85+
Example:
86+
```bash
87+
python -m graph_rag.cli query --query "If I hit a deer and damaged my front bumper, am I covered?"
88+
```
89+
90+
### 3. Process Multiple Questions 📊
91+
```bash
92+
python -m graph_rag.cli batch --claims_path PATH_TO_CLAIMS_FILE --verbose
93+
```
94+
This is useful to run the benchmark.
95+
96+
Need help? Try:
97+
```bash
98+
python -m graph_rag.cli --help
99+
python -m graph_rag.cli <command> --help
100+
```
101+
102+
## How Answers Are Formatted 📋
103+
104+
The system gives answers in a clear, structured format:
105+
```json
106+
{
107+
"covered": true/false,
108+
"explanation": "Clear explanation of why",
109+
"limits": ["Any coverage limits that apply"],
110+
"exclusions": ["Any relevant exclusions"]
111+
}
112+
```
113+
114+
## System Components 🏗️
115+
116+
### Core Parts
117+
- `main.py`: Handles document processing
118+
- `retrieve.py`: Manages question answering
119+
- `retriever/`: Core search logic
120+
- `database/`: Database operations
121+
- `services/`: AI integration
122+
- `utils/`: Helper functions
123+
- `config/`: System settings
124+
- `experiments/`: Testing and evaluation
125+
126+
### Visual Overview
127+
```
128+
┌─────────────────┐
129+
│ Policy Docs │
130+
└────────┬────────┘
131+
132+
┌────────▼────────┐
133+
│ Text Chunking │
134+
└────────┬────────┘
135+
136+
┌───────────────────────┴──────────────────────┐
137+
│ │
138+
┌────────▼────────┐ ┌────────▼────────┐
139+
│ KG Extraction │ │ Embeddings │
140+
│ (GPT-4) │ │ Generation │
141+
└────────┬────────┘ └────────┬────────┘
142+
│ │
143+
│ │
144+
┌────────▼───────────────────────────────────────┐ │
145+
│ Memgraph │◄─────┘
146+
│ (Nodes, Relationships, Vector Index) │
147+
└───────────────────┬────────────────────────────┘
148+
149+
│ ┌─────────────┐
150+
│ │ Query │
151+
│ └──────┬──────┘
152+
│ │
153+
┌──────────▼────────────────▼──────────┐
154+
│ Retrieval Pipeline │
155+
│ 1. Vector Similarity Search │
156+
│ 2. Graph Relationship Traversal │
157+
│ 3. Context Building │
158+
│ 4. GPT Analysis │
159+
└─────────────────┬────────────────────┘
160+
161+
┌──────▼──────┐
162+
│ Answer │
163+
└─────────────┘
164+
```
165+
166+
## Important Notes 📌
167+
168+
- This is a proof of concept - great for learning but not for production use
169+
- Requires an OpenAI API key
170+
- Works best with insurance policy documents
171+
- Performance depends on your database settings
172+
173+
174+
## Caching 💾
175+
176+
To work faster (and cheaper), the system remembers:
177+
- Document embeddings
178+
- AI responses
179+
- Knowledge graph data
180+
181+
Cache location: `cache/` directory
182+
183+
## Need Help? 🆘
184+
185+
- Check the error messages - they're designed to be helpful
186+
- Look at the examples in the code
187+
- Make sure your API keys are set correctly
188+
- Verify your database is running
189+
190+
Remember: This is a learning tool - feel free to experiment and learn from how it works! 🌟

Diff for: graph_rag/__init__.py

+1
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
from .retriever.retriever import PolicyRetriever

0 commit comments

Comments
 (0)