Utilities
The Utilities module provides shared functionality used throughout the LLM-Powered Analytics Backend system. It includes database handling, logging configuration, LLM setup, and common data structures.
Module Structure
Core Components
Database Handling (database.py
)
The Database module provides a centralized interface for MongoDB operations, implementing a singleton pattern to maintain a single database connection throughout the application.
Key Classes and Functions
class Database:
"""
Singleton class for managing MongoDB database connections and operations.
"""
client = None
db = None
@classmethod
def initialize(cls) -> bool:
"""
Initialize the MongoDB connection.
Returns:
bool: True if connection was successful, False otherwise
"""
@classmethod
def get_collection(cls, collection_name: str):
"""
Get a reference to a MongoDB collection, if it's not restricted.
Args:
collection_name: Name of the collection to retrieve
Returns:
Collection: MongoDB collection reference, or None if the collection is restricted
"""
@classmethod
def list_collections(cls) -> List[str]:
"""
List all accessible (non-restricted) collections in the database.
Returns:
list: List of accessible collection names
"""
@classmethod
def analyze_collections(cls) -> Dict[str, Dict]:
"""
Analyze all accessible collections to extract field information and statistics.
Returns:
dict: Nested dictionary with collection metadata
"""
Security Features
The database module implements security features to protect sensitive data:
# Restricted collections that should not be accessible
RESTRICTED_COLLECTIONS = ["users", "prophet_predictions"]
def is_collection_accessible(collection_name: str) -> bool:
"""
Check if a collection is accessible or restricted.
Args:
collection_name: The name of the collection to check
Returns:
bool: True if the collection is accessible, False if restricted
"""
return collection_name not in RESTRICTED_COLLECTIONS
Collection Analysis
The analyze_collections
method provides rich metadata about data collections, which is used by the Collection Selector module to match queries with appropriate data sources.
Logging Configuration (logging_config.py
)
The Logging Configuration module provides centralized, consistent logging setup for all components in the system, ensuring proper log formatting and appropriate verbosity levels.
Key Functions
def setup_logging(logger_name: str, log_level: str = None) -> logging.Logger:
"""
Configure a logger with consistent formatting and level settings.
Args:
logger_name: The name of the logger to configure
log_level: Optional override for the default log level
Returns:
Logger: Configured logger instance
"""
Logging Levels
The logging system supports various verbosity levels:
- DEBUG: Detailed information for debugging and troubleshooting
- INFO: General operational information
- WARNING: Issues that might cause problems but don't prevent operation
- ERROR: Serious problems that prevent specific operations
- CRITICAL: Critical issues that may cause system failure
The default level is set via the LOG_LEVEL
environment variable or falls back to INFO
.
LLM Configuration (llm_config.py
)
The LLM Configuration module centralizes settings for Large Language Model services, providing a factory function for creating properly configured LLM instances.
Key Variables and Functions
# Groq API configuration
GROQ_API_KEY = os.getenv("GROQ_API_KEY")
# Model configurations
CLASSIFIER_MODEL = "llama3-8b-8192"
VALIDATOR_MODEL = "llama3-8b-8192"
COLLECTION_SELECTOR_MODEL = "llama3-8b-8192"
COLLECTION_PROCESSOR_MODEL = "qwen-2.5-coder-32b"
DESCRIPTION_GENERATOR_MODEL = "deepseek-r1-distill-llama-70b"
ANALYSIS_QUERIES_MODEL = "llama3-8b-8192"
CHART_DATA_MODEL = "llama3-8b-8192"
def get_groq_llm(model_name=None):
"""
Get a configured Groq LLM instance.
Args:
model_name: The model name to use (defaults to CLASSIFIER_MODEL)
Returns:
ChatGroq: A configured ChatGroq instance
Raises:
ValueError: If GROQ_API_KEY is not found
"""
Model Selection Strategy
Different parts of the system use different LLM models based on their specific requirements:
The system follows these principles for model selection:
- Efficiency: Smaller models for simpler tasks that need faster response times
- Capability: Larger models for complex reasoning or generation tasks
- Consistency: Same model for similar tasks to ensure consistent behavior
Schema Definitions (schema.py
)
The Schema module defines data structures and validation schemas used throughout the application, using Pydantic models for type safety and validation.
Key Models
class ColumnMetadata(BaseModel):
"""
Metadata about a column/field in a collection.
"""
name: str
type: str
stats: Dict[str, Any]
class CollectionMetadata(BaseModel):
"""
Metadata about a MongoDB collection.
"""
name: str
fields: List[ColumnMetadata]
sample_size: int
class QueryParameter(BaseModel):
"""
Parameter extracted from a user query.
"""
type: str
value: Any
confidence: float
Schema Validation
The Schema module provides validation functions to ensure data conforms to expected formats:
def validate_chart_parameters(parameters: Dict[str, Any]) -> Dict[str, Any]:
"""
Validate and normalize chart generation parameters.
Args:
parameters: Dictionary of chart parameters
Returns:
Validated and normalized parameters
Raises:
ValueError: If required parameters are missing or invalid
"""
Integration with Other Components
The Utilities are used throughout the system:
Database Usage
from mypackage.utils.database import Database
# Initialize database connection
Database.initialize()
# Get a reference to a collection
collection = Database.get_collection("campaign_performance")
# Query the database
results = collection.find({"channel": "Facebook"})
Logging Usage
from mypackage.utils.logging_config import setup_logging
# Set up a logger for a module
logger = setup_logging("my_module")
# Log messages at different levels
logger.debug("Detailed debug information")
logger.info("General operational information")
logger.warning("Something might be wrong")
logger.error("Something is definitely wrong")
LLM Configuration Usage
from mypackage.utils.llm_config import get_groq_llm
# Get a configured LLM instance
llm = get_groq_llm("llama3-8b-8192")
# Use the LLM
response = llm.invoke("Classify this query: What was our ad spend last month?")
Error Handling
The Utilities module implements comprehensive error handling:
- Database Connection Errors: Graceful handling of connection failures
- Authentication Errors: Clear error messages for API key issues
- Configuration Errors: Validation of environment variables with sensible defaults
- Schema Validation Errors: Descriptive error messages for invalid data
Configuration
Configuration for the Utilities is primarily through environment variables:
# MongoDB Configuration
MONGO_URI="mongodb://root:example@mongodb:27017/"
MONGO_DB_NAME="test_database"
# Logging Configuration
LOG_LEVEL="INFO"
# LLM Configuration
GROQ_API_KEY="your-api-key-here"
These can be set directly in the environment or through a .env
file loaded at startup.
Performance Considerations
The Utilities module includes several performance optimizations:
- Connection Pooling: The Database class uses MongoDB's connection pooling
- Singleton Pattern: Ensures a single database connection is shared
- Lazy Initialization: Database connections are established only when needed
- Caching: Collection metadata is cached to reduce repeated analysis
Security Considerations
Security features in the Utilities module include:
- Restricted Collections: Prevents access to sensitive collections
- Environment Variables: Secrets are loaded from environment, not hardcoded
- Error Messages: Careful crafting of error messages to avoid information leakage
- Validation: Input validation to prevent injection attacks