Utilities

The Utilities module provides shared functionality used throughout the LLM-Powered Analytics Backend system. It includes database handling, logging configuration, LLM setup, and common data structures.

Module Structure

Core Components

Database Handling (`database.py`)

The Database module provides a centralized interface for MongoDB operations, implementing a singleton pattern to maintain a single database connection throughout the application.

Key Classes and Functions

python

class Database:
    """
    Singleton class for managing MongoDB database connections and operations.
    """
    client = None
    db = None

    @classmethod
    def initialize(cls) -> bool:
        """
        Initialize the MongoDB connection.

        Returns:
            bool: True if connection was successful, False otherwise
        """

    @classmethod
    def get_collection(cls, collection_name: str):
        """
        Get a reference to a MongoDB collection, if it's not restricted.

        Args:
            collection_name: Name of the collection to retrieve

        Returns:
            Collection: MongoDB collection reference, or None if the collection is restricted
        """

    @classmethod
    def list_collections(cls) -> List[str]:
        """
        List all accessible (non-restricted) collections in the database.

        Returns:
            list: List of accessible collection names
        """

    @classmethod
    def analyze_collections(cls) -> Dict[str, Dict]:
        """
        Analyze all accessible collections to extract field information and statistics.

        Returns:
            dict: Nested dictionary with collection metadata
        """

Security Features

The database module implements security features to protect sensitive data:

python

# Restricted collections that should not be accessible
RESTRICTED_COLLECTIONS = ["users", "prophet_predictions"]

def is_collection_accessible(collection_name: str) -> bool:
    """
    Check if a collection is accessible or restricted.

    Args:
        collection_name: The name of the collection to check

    Returns:
        bool: True if the collection is accessible, False if restricted
    """
    return collection_name not in RESTRICTED_COLLECTIONS

Collection Analysis

The analyze_collections method provides rich metadata about data collections, which is used by the Collection Selector module to match queries with appropriate data sources.

Logging Configuration (`logging_config.py`)

The Logging Configuration module provides centralized, consistent logging setup for all components in the system, ensuring proper log formatting and appropriate verbosity levels.

Key Functions

python

def setup_logging(logger_name: str, log_level: str = None) -> logging.Logger:
    """
    Configure a logger with consistent formatting and level settings.

    Args:
        logger_name: The name of the logger to configure
        log_level: Optional override for the default log level

    Returns:
        Logger: Configured logger instance
    """

Logging Levels

The logging system supports various verbosity levels:

DEBUG: Detailed information for debugging and troubleshooting
INFO: General operational information
WARNING: Issues that might cause problems but don't prevent operation
ERROR: Serious problems that prevent specific operations
CRITICAL: Critical issues that may cause system failure

The default level is set via the LOG_LEVEL environment variable or falls back to INFO.

LLM Configuration (`llm_config.py`)

The LLM Configuration module centralizes settings for Large Language Model services, providing a factory function for creating properly configured LLM instances.

Key Variables and Functions

python

# Groq API configuration
GROQ_API_KEY = os.getenv("GROQ_API_KEY")

# Model configurations
CLASSIFIER_MODEL = "llama3-8b-8192"
VALIDATOR_MODEL = "llama3-8b-8192"
COLLECTION_SELECTOR_MODEL = "llama3-8b-8192"
COLLECTION_PROCESSOR_MODEL = "qwen-2.5-coder-32b"
DESCRIPTION_GENERATOR_MODEL = "deepseek-r1-distill-llama-70b"
ANALYSIS_QUERIES_MODEL = "llama3-8b-8192"
CHART_DATA_MODEL = "llama3-8b-8192"

def get_groq_llm(model_name=None):
    """
    Get a configured Groq LLM instance.

    Args:
        model_name: The model name to use (defaults to CLASSIFIER_MODEL)

    Returns:
        ChatGroq: A configured ChatGroq instance

    Raises:
        ValueError: If GROQ_API_KEY is not found
    """

Model Selection Strategy

Different parts of the system use different LLM models based on their specific requirements:

The system follows these principles for model selection:

Efficiency: Smaller models for simpler tasks that need faster response times
Capability: Larger models for complex reasoning or generation tasks
Consistency: Same model for similar tasks to ensure consistent behavior

Schema Definitions (`schema.py`)

The Schema module defines data structures and validation schemas used throughout the application, using Pydantic models for type safety and validation.

Key Models

python

class ColumnMetadata(BaseModel):
    """
    Metadata about a column/field in a collection.
    """
    name: str
    type: str
    stats: Dict[str, Any]

class CollectionMetadata(BaseModel):
    """
    Metadata about a MongoDB collection.
    """
    name: str
    fields: List[ColumnMetadata]
    sample_size: int

class QueryParameter(BaseModel):
    """
    Parameter extracted from a user query.
    """
    type: str
    value: Any
    confidence: float

Schema Validation

The Schema module provides validation functions to ensure data conforms to expected formats:

python

def validate_chart_parameters(parameters: Dict[str, Any]) -> Dict[str, Any]:
    """
    Validate and normalize chart generation parameters.

    Args:
        parameters: Dictionary of chart parameters

    Returns:
        Validated and normalized parameters

    Raises:
        ValueError: If required parameters are missing or invalid
    """

Integration with Other Components

The Utilities are used throughout the system:

Database Usage

python

from mypackage.utils.database import Database

# Initialize database connection
Database.initialize()

# Get a reference to a collection
collection = Database.get_collection("campaign_performance")

# Query the database
results = collection.find({"channel": "Facebook"})

Logging Usage

python

from mypackage.utils.logging_config import setup_logging

# Set up a logger for a module
logger = setup_logging("my_module")

# Log messages at different levels
logger.debug("Detailed debug information")
logger.info("General operational information")
logger.warning("Something might be wrong")
logger.error("Something is definitely wrong")

LLM Configuration Usage

python

from mypackage.utils.llm_config import get_groq_llm

# Get a configured LLM instance
llm = get_groq_llm("llama3-8b-8192")

# Use the LLM
response = llm.invoke("Classify this query: What was our ad spend last month?")

Error Handling

The Utilities module implements comprehensive error handling:

Database Connection Errors: Graceful handling of connection failures
Authentication Errors: Clear error messages for API key issues
Configuration Errors: Validation of environment variables with sensible defaults
Schema Validation Errors: Descriptive error messages for invalid data

Configuration

Configuration for the Utilities is primarily through environment variables:

# MongoDB Configuration
MONGO_URI="mongodb://root:example@mongodb:27017/"
MONGO_DB_NAME="test_database"

# Logging Configuration
LOG_LEVEL="INFO"

# LLM Configuration
GROQ_API_KEY="your-api-key-here"

These can be set directly in the environment or through a .env file loaded at startup.

Performance Considerations

The Utilities module includes several performance optimizations:

Connection Pooling: The Database class uses MongoDB's connection pooling
Singleton Pattern: Ensures a single database connection is shared
Lazy Initialization: Database connections are established only when needed
Caching: Collection metadata is cached to reduce repeated analysis

Security Considerations

Security features in the Utilities module include:

Restricted Collections: Prevents access to sensitive collections
Environment Variables: Secrets are loaded from environment, not hardcoded
Error Messages: Careful crafting of error messages to avoid information leakage
Validation: Input validation to prevent injection attacks

Utilities ​

Module Structure ​

Core Components ​

Database Handling (database.py) ​

Key Classes and Functions ​

Security Features ​

Collection Analysis ​

Logging Configuration (logging_config.py) ​

Key Functions ​

Logging Levels ​

LLM Configuration (llm_config.py) ​

Key Variables and Functions ​

Model Selection Strategy ​

Schema Definitions (schema.py) ​

Key Models ​

Schema Validation ​

Integration with Other Components ​

Database Usage ​

Logging Usage ​

LLM Configuration Usage ​

Error Handling ​

Configuration ​

Performance Considerations ​

Security Considerations ​

Utilities

Module Structure

Core Components

Database Handling (`database.py`)

Key Classes and Functions

Security Features

Collection Analysis

Logging Configuration (`logging_config.py`)

Key Functions

Logging Levels

LLM Configuration (`llm_config.py`)

Key Variables and Functions

Model Selection Strategy

Schema Definitions (`schema.py`)

Key Models

Schema Validation

Integration with Other Components

Database Usage

Logging Usage

LLM Configuration Usage

Error Handling

Configuration

Performance Considerations

Security Considerations