System Architecture
This document provides a comprehensive overview of the LLM-Powered Analytics Backend architecture, including component relationships, data flow, and design principles.
System Overview
The LLM-Powered Analytics Backend is designed to process natural language queries about data, perform appropriate analysis, and return meaningful insights in various formats (charts, descriptions, or comprehensive reports). The system leverages Large Language Models (LLMs) to understand user queries and guide the analytics process.
High-Level Architecture
Core Components
API Layer (app.py
)
The API layer provides HTTP endpoints for client applications to interact with the system:
/api/query
(POST): Processes analytical queries and returns results/api/health
(GET): Checks system health, including database connectivity
Pipeline Orchestration (pipeline.py
)
The main pipeline orchestrates the data flow through the system, handling:
- Query validation
- Query classification
- Collection selection
- Data processing
- Result generation
Processing Modules
Query Processor (a_query_processor
)
Handles query validation and classification:
query_validator.py
: Ensures queries are well-formed and processablequery_classifier.py
: Categorizes queries as chart, description, or report requests
Data Processor (b_data_processor
)
Manages data selection and processing:
collection_selector.py
: Determines which data collection to use for a querycollection_processor.py
: Retrieves and processes data from MongoDB
Regular Generator (c_regular_generator
)
Generates standard outputs for simple queries:
chart_generator.py
: Creates visual representations of datadescription_generator.py
: Produces textual analysis of data
Report Generator (d_report_generator
)
Handles complex queries requiring multi-step analysis:
report_generator.py
: Orchestrates the report generation processgenerate_analysis_queries.py
: Breaks down complex queries into individual analysis taskstruncated_pipeline.py
: Processes individual analysis queries within a report
Utilities (utils
)
Supporting components and services:
database.py
: MongoDB connection and operationsllm_config.py
: LLM service configurationlogging_config.py
: Centralized logging setupschema.py
: Data schema definitions
Data Flow
LLM Integration
The system uses LLMs provided by Groq for several critical tasks:
- Query classification
- Determining relevant data collections
- Breaking down complex report queries
- Generating natural language descriptions
Security Model
The system implements several security measures:
- Collection access restrictions (RESTRICTED_COLLECTIONS in database.py)
- Input validation using Pydantic models
- Environment-based configuration for sensitive parameters
- Error handling to prevent information leakage
Design Principles
The architecture follows these key design principles:
- Modularity: Each component has a clear, distinct responsibility
- Pipeline Processing: Data flows through well-defined stages
- Error Isolation: Errors in one stage don't crash the entire system
- Configurability: System behavior can be adjusted through configuration
- Testability: Components can be tested in isolation