Query Processor

The Query Processor module is responsible for validating and classifying user queries in the LLM-Powered Analytics Backend system. It ensures that incoming queries are well-formed and categorizes them into appropriate types for downstream processing.

Module Structure

Core Components

Query Validator (`query_validator.py`)

The Query Validator ensures that incoming queries are well-formed, meaningful, and processable. It rejects queries that are ambiguous, malformed, or outside the system's capabilities.

Key Functions

python

def get_valid_query(query: str) -> str:
    """
    Validates and normalizes a user query.

    Args:
        query: The raw query string from the user

    Returns:
        A normalized, validated query string

    Raises:
        ValueError: If the query is invalid or cannot be processed
    """

Validation Process

Implementation Details

The validator uses a Groq LLM to assess query validity. It sends the query to the LLM along with validation criteria and parses the response to determine if the query is valid.

The prompt instructs the LLM to check for:

Relevance to marketing/analytics data
Clarity and specificity
Feasibility given available data
Appropriate scope

If a query fails validation, the system returns a clear error message explaining why the query cannot be processed.

Query Classifier (`query_classifier.py`)

The Query Classifier categorizes valid queries into specific types that determine how they will be processed by the system.

Key Classes and Functions

python

class QueryTypeEnum(str, Enum):
    """
    Enumeration of possible query types that can be classified.
    """
    DESCRIPTION = "description"
    REPORT = "report"
    CHART = "chart"
    ERROR = "error"

class QueryType(BaseModel):
    """
    Pydantic model representing the classification result.
    """
    query_type: QueryTypeEnum

def classify_query(user_query: str) -> str:
    """
    Public function to classify a user query into one of the predefined types.

    Args:
        user_query: The raw query text from the user

    Returns:
        String representation of the query type (description, report, chart, or error)

    Raises:
        Exception: If there is an error in the classification process
    """

Classification Process

Query Types

Description: Queries asking for specific details, explanations, or summaries about particular aspects of the data.
- Example: "What is the total ad spend by country?"
- Processing: Generate a textual analysis of the specific data point
Chart: Queries specifically requesting visual representation or graphs of data.
- Example: "Show me a bar chart of revenue by channel"
- Processing: Generate a chart visualization of the relevant data
Report: Queries requesting comprehensive analysis across multiple datasets.
- Example: "Create a full marketing report for all channels in Q2"
- Processing: Break down into multiple analysis tasks and generate a combined report
Error: For ambiguous, unclear queries that passed validation but cannot be clearly classified.
- Example: "Show me data in a nice way"
- Processing: Return an error response asking for clarification

Implementation Details

The classifier uses a Groq LLM with a few-shot learning approach, providing examples of each query type to help the model accurately classify new queries.

The implementation:

Prepares the query and example prompt
Sends the prompt to the LLM
Extracts and normalizes the classification from the response
Returns the classification as a string

Interaction with Other Components

The Query Processor components interact with other parts of the system as follows:

Error Handling

The Query Processor implements comprehensive error handling:

Validation Errors: Clearly communicates why a query is invalid
Classification Errors: Defaults to "error" type when classification is uncertain
Processing Errors: Captures and logs exceptions during processing

Configuration

The Query Processor uses models specified in llm_config.py:

python

VALIDATOR_MODEL = "llama3-8b-8192"
CLASSIFIER_MODEL = "llama3-8b-8192"

These models can be configured to use different LLMs based on performance needs.

Example Usage

python

from mypackage.a_query_processor import query_validator, query_classifier

# Validate a query
try:
    validated_query = query_validator.get_valid_query("Show me ad spend by country")
    print(f"Query is valid: {validated_query}")
except ValueError as e:
    print(f"Query validation failed: {str(e)}")

# Classify a query
try:
    query_type = query_classifier.classify_query("Generate a bar chart of revenue by month")
    print(f"Query classified as: {query_type}")
except Exception as e:
    print(f"Classification failed: {str(e)}")

Performance Considerations

Caching: Consider implementing a cache for common query validations and classifications
Timeout Handling: Include proper timeout handling for LLM API calls
Error Resilience: Implement retries for transient errors in LLM API calls

Query Processor ​

Module Structure ​

Core Components ​

Query Validator (query_validator.py) ​

Key Functions ​

Validation Process ​

Implementation Details ​

Query Classifier (query_classifier.py) ​

Key Classes and Functions ​

Classification Process ​

Query Types ​

Implementation Details ​

Interaction with Other Components ​

Error Handling ​

Configuration ​

Example Usage ​

Performance Considerations ​

Query Processor

Module Structure

Core Components

Query Validator (`query_validator.py`)

Key Functions

Validation Process

Implementation Details

Query Classifier (`query_classifier.py`)

Key Classes and Functions

Classification Process

Query Types

Implementation Details

Interaction with Other Components

Error Handling

Configuration

Example Usage

Performance Considerations