-
Notifications
You must be signed in to change notification settings - Fork 0
System Design Document
This document outlines the system design for the Veracity Evaluation Backend, a sophisticated platform designed to detect and mitigate misinformation using language models and web search capabilities.
The purpose of this system is to provide s backend service that can analyze claims, evaluate their veracity, and provide detailed responses to combat misinformation.
This system will handle user queries, interact with language models, perform web searches, store and retrieve data, and manage user interactions.
graph TD
A[Frontend/Browser Extension] -->|RESTful API| B[FastAPI Backend]
B --> C[LLM - Llama 3.1 70B]
B --> D[Google Web Search API]
B --> E[(Cloud SQL PostgreSQL)]
F[Google Kubernetes Engine] -->|Orchestration| B
F -->|Orchestration| C
F -->|Orchestration| D
G[Google Cloud Platform] -->|Hosts| F
G -->|Hosts| E
H[Memorystore for Redis] -->|Fast Data Access| B
I[Cloud Load Balancing] -->|Load Distribution| B
J[Secret Manager] -->|Secrets Management| B
K[Certificate Manager] -->|SSL/TLS| I
L[Auth0] -->|Authentication| B
M[Cloud CDN] -->|Static Assets| A
- FastAPI Backend: Core application logic, API endpoints, and request handling.
- LLM (Llama 3.1 70B): Large language model for advanced text analysis and generation.
- Google Web Search API: Provides real-time web search results to enrich responses.
- Cloud SQL (PostgreSQL): Main database for storing user data, claims, and analysis results.
- Google Kubernetes Engine: Orchestrates and manages containerized application components.
- Memorystore for Redis: In-memory cache for fast data retrieval and temporary storage.
- Cloud Load Balancing: Distributes incoming traffic across multiple backend instances.
- Secret Manager: Securely stores and manages sensitive information like API keys.
- Certificate Manager: Handles SSL/TLS certificates for secure communications.
- Auth0: Manages user authentication and authorization.
- Cloud CDN: Delivers static assets with low latency.
- User submits a query through the frontend or browser extension.
- Request is received by FastAPI backend via RESTful API.
- Backend validates the request and authenticates the user.
- Query is sent to the LLM for initial analysis.
- Relevant keywords are extracted and sent to Google Web Search API.
- Search results are processed and combined with LLM analysis.
- Final response is generated and sent back to the user.
- Query and results are stored in the database for future reference.
- User initiates login or registration process.
- Request is sent to Auth0 for authentication.
- Upon successful authentication, a JWT token is generated.
- Token is sent back to the client for use in subsequent API calls.
- id (UUID)
- username (String)
- email (String)
- auth0_id (String)
- created_at (Timestamp)
- last_login (Timestamp)
- id (UUID)
- user_id (UUID, foreign key to Users)
- claim_text (Text)
- context (Text)
- created_at (Timestamp)
- id (UUID)
- claim_id (UUID, foreign key to Claims)
- veracity_score (Float)
- confidence_score (Float)
- analysis_text (Text)
- created_at (Timestamp)
- id (UUID)
- analysis_id (UUID, foreign key to Analysis)
- url (String)
- title (String)
- snippet (Text)
- credibility_score (Float)
- id (UUID)
- analysis_id (UUID, foreign key to Analysis)
- user_id (UUID, foreign key to Users
- rating (Float, rating >= 1 and rating <= 5)
- comment (Text)
- created_at (Timestamp)
- id (UUID)
- user_id (UUID, foreign key to Users)
- start_time (Timestamp)
- end_time (Timestamp, nullable)
- status (String, default 'active')
Note
(either conversation_id
or claim_conversation_id
must be non-null, but not both)
- id (UUID)
- conversation_id (UUID, nullable, foreign key to Conversations)
- claim_conversation_id (UUID, nullable, foreign key to Claim_Conversations)
- sender_type (String, enum: 'user' or 'bot')
- content (Text)
- timestamp (Timestamp)
- claim_id (UUID, foreign key to Claims, nullable)
- analysis_id (UUID, foreign key to Analysis, nullable)
- id (UUID)
- domain_name (String, Unique)
- credibility_score (Float)
- is_reliable (Boolean)
- description (Text, Nullable)
- created_at (Timestamp)
- updated_at (Timestamp)
- id (UUID, primary key)
- conversation_id (UUID, foreign key to Conversations)
- claim_id (UUID, foreign key to Claims)
- start_time (Timestamp)
- end_time (Timestamp, nullable)
- status (String, default "active")
Refer to the API Specification for detailed information on available endpoints.
- All communications are encrypted using SSL/TLS.
- API keys and sensitive configurations are stored in Secret Manager.
- Regular security audits and penetration testing will be conducted.
- Kubernetes allows for easy horizontal scaling of application components.
- Redis cache reduces database load for frequently accessed data.
- Cloud Load Balancing ensures efficient distribution of incoming requests.
- Cloud CDN minimizes latency for static asset delivery.
- Application logs will be centralized and analyzed for performance and error tracking.
- Key metrics (response times, error rates, etc.) will be monitored and alerted on.
- Regular performance reviews will be conducted to identify optimization opportunities.
- Regular database backups will be performed and stored in a separate geographic region.
- A disaster recovery plan will be developed and tested periodically.
- Users can create multiple Claims, Feedback, and initiate multiple Conversations.
- Each Claim can have one Analysis.
- Each Analysis can have multiple Sources and Feedback.
- Each Conversation can have multiple Messages.
- Messages can optionally be associated with a Claim and an Analysis.
- Domains are standalone entities used to evaluate the credibility of Sources.
flowchart TD
USER --> |"initiates (1:n)"| CONVERSATION
USER --> |"sends (1:n)"| MESSAGE
USER --> |"provides (1:n)"| FEEDBACK
CONVERSATION --> |"contains (1:n)"| CLAIM_CONVERSATION
CONVERSATION --> |"has general (1:n)"| MESSAGE
CLAIM_CONVERSATION --> |"has specific (1:n)"| MESSAGE
CLAIM_CONVERSATION --> |"is about (1:1)"| CLAIM
CLAIM --> |"has (1:1)"| ANALYSIS
ANALYSIS --> |"cites (1:n)"| SOURCE
ANALYSIS --> |"receives (1:n)"| FEEDBACK
- Implementing a feedback loop for continuous improvement of the system's accuracy.
This document serves as a high-level overview of the system design and will be updated as the project evolves.