System Design Document

1. Introduction

This document outlines the system design for the Veracity Evaluation Backend, a sophisticated platform designed to detect and mitigate misinformation using language models and web search capabilities.

1.1 Purpose

The purpose of this system is to provide s backend service that can analyze claims, evaluate their veracity, and provide detailed responses to combat misinformation.

1.2 Scope

This system will handle user queries, interact with language models, perform web searches, store and retrieve data, and manage user interactions.

2. System Architecture

2.1 High-Level Architecture Diagram

graph TD
    A[Frontend/Browser Extension] -->|RESTful API| B[FastAPI Backend]
    B --> C[LLM - Llama 3.1 70B]
    B --> D[Google Web Search API]
    B --> E[(Cloud SQL PostgreSQL)]
    F[Google Kubernetes Engine] -->|Orchestration| B
    F -->|Orchestration| C
    F -->|Orchestration| D
    G[Google Cloud Platform] -->|Hosts| F
    G -->|Hosts| E
    H[Memorystore for Redis] -->|Fast Data Access| B
    I[Cloud Load Balancing] -->|Load Distribution| B
    J[Secret Manager] -->|Secrets Management| B
    K[Certificate Manager] -->|SSL/TLS| I
    L[Auth0] -->|Authentication| B
    M[Cloud CDN] -->|Static Assets| A

2.2 Component Description

FastAPI Backend: Core application logic, API endpoints, and request handling.
LLM (Llama 3.1 70B): Large language model for advanced text analysis and generation.
Google Web Search API: Provides real-time web search results to enrich responses.
Cloud SQL (PostgreSQL): Main database for storing user data, claims, and analysis results.
Google Kubernetes Engine: Orchestrates and manages containerized application components.
Memorystore for Redis: In-memory cache for fast data retrieval and temporary storage.
Cloud Load Balancing: Distributes incoming traffic across multiple backend instances.
Secret Manager: Securely stores and manages sensitive information like API keys.
Certificate Manager: Handles SSL/TLS certificates for secure communications.
Auth0: Manages user authentication and authorization.
Cloud CDN: Delivers static assets with low latency.

3. Data Flow

3.1 Query Processing Flow

User submits a query through the frontend or browser extension.
Request is received by FastAPI backend via RESTful API.
Backend validates the request and authenticates the user.
Query is sent to the LLM for initial analysis.
Relevant keywords are extracted and sent to Google Web Search API.
Search results are processed and combined with LLM analysis.
Final response is generated and sent back to the user.
Query and results are stored in the database for future reference.

3.2 User Authentication Flow

User initiates login or registration process.
Request is sent to Auth0 for authentication.
Upon successful authentication, a JWT token is generated.
Token is sent back to the client for use in subsequent API calls.

4. Database Schema (High-Level)

4.1 Users Table

id (UUID)
username (String)
email (String)
auth0_id (String)
created_at (Timestamp)
last_login (Timestamp)

4.2 Claims Table

id (UUID)
user_id (UUID, foreign key to Users)
claim_text (Text)
context (Text)
created_at (Timestamp)

4.3 Analysis Table

id (UUID)
claim_id (UUID, foreign key to Claims)
veracity_score (Float)
confidence_score (Float)
analysis_text (Text)
created_at (Timestamp)

4.4 Sources Table

id (UUID)
analysis_id (UUID, foreign key to Analysis)
url (String)
title (String)
snippet (Text)
credibility_score (Float)

4.5 Feedback Table

id (UUID)
analysis_id (UUID, foreign key to Analysis)
user_id (UUID, foreign key to Users
rating (Float, rating >= 1 and rating <= 5)
comment (Text)
created_at (Timestamp)

4.6 Conversations Table

id (UUID)
user_id (UUID, foreign key to Users)
start_time (Timestamp)
end_time (Timestamp, nullable)
status (String, default 'active')

4.7 Messages Table

Note

(either conversation_id or claim_conversation_id must be non-null, but not both)

id (UUID)
conversation_id (UUID, nullable, foreign key to Conversations)
claim_conversation_id (UUID, nullable, foreign key to Claim_Conversations)
sender_type (String, enum: 'user' or 'bot')
content (Text)
timestamp (Timestamp)
claim_id (UUID, foreign key to Claims, nullable)
analysis_id (UUID, foreign key to Analysis, nullable)

4.8 Domains table

id (UUID)
domain_name (String, Unique)
credibility_score (Float)
is_reliable (Boolean)
description (Text, Nullable)
created_at (Timestamp)
updated_at (Timestamp)

4.9 Claim_Conversations Table

id (UUID, primary key)
conversation_id (UUID, foreign key to Conversations)
claim_id (UUID, foreign key to Claims)
start_time (Timestamp)
end_time (Timestamp, nullable)
status (String, default "active")

5. API Endpoints

Refer to the API Specification for detailed information on available endpoints.

6. Security Considerations

All communications are encrypted using SSL/TLS.
API keys and sensitive configurations are stored in Secret Manager.
Regular security audits and penetration testing will be conducted.

7. Scalability and Performance

Kubernetes allows for easy horizontal scaling of application components.
Redis cache reduces database load for frequently accessed data.
Cloud Load Balancing ensures efficient distribution of incoming requests.
Cloud CDN minimizes latency for static asset delivery.

8. Monitoring and Logging

Application logs will be centralized and analyzed for performance and error tracking.
Key metrics (response times, error rates, etc.) will be monitored and alerted on.
Regular performance reviews will be conducted to identify optimization opportunities.

9. Disaster Recovery and Backup

Regular database backups will be performed and stored in a separate geographic region.
A disaster recovery plan will be developed and tested periodically.

10. Relationships

Users can create multiple Claims, Feedback, and initiate multiple Conversations.
Each Claim can have one Analysis.
Each Analysis can have multiple Sources and Feedback.
Each Conversation can have multiple Messages.
Messages can optionally be associated with a Claim and an Analysis.
Domains are standalone entities used to evaluate the credibility of Sources.

flowchart TD
    USER --> |"initiates (1:n)"| CONVERSATION
    USER --> |"sends (1:n)"| MESSAGE
    USER --> |"provides (1:n)"| FEEDBACK
    
    CONVERSATION --> |"contains (1:n)"| CLAIM_CONVERSATION
    CONVERSATION --> |"has general (1:n)"| MESSAGE
    
    CLAIM_CONVERSATION --> |"has specific (1:n)"| MESSAGE
    CLAIM_CONVERSATION --> |"is about (1:1)"| CLAIM
    
    CLAIM --> |"has (1:1)"| ANALYSIS
    
    ANALYSIS --> |"cites (1:n)"| SOURCE
    ANALYSIS --> |"receives (1:n)"| FEEDBACK

11. Future Considerations

Implementing a feedback loop for continuous improvement of the system's accuracy.

This document serves as a high-level overview of the system design and will be updated as the project evolves.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

System Design Document

System Design Document

1. Introduction

1.1 Purpose

1.2 Scope

2. System Architecture

2.1 High-Level Architecture Diagram

2.2 Component Description

3. Data Flow

3.1 Query Processing Flow

3.2 User Authentication Flow

4. Database Schema (High-Level)

4.1 Users Table

4.2 Claims Table

4.3 Analysis Table

4.4 Sources Table

4.5 Feedback Table

4.6 Conversations Table

4.7 Messages Table

4.8 Domains table

4.9 Claim_Conversations Table

5. API Endpoints

6. Security Considerations

7. Scalability and Performance

8. Monitoring and Logging

9. Disaster Recovery and Backup

10. Relationships

11. Future Considerations

Clone this wiki locally