Regular Generator
The Regular Generator module is responsible for producing standard outputs from processed data, including visual charts and textual descriptions. It handles the final stage of processing for non-report queries, transforming structured data into human-friendly insights.
Module Structure
Core Components
Chart Generator (chart_generator.py
)
The Chart Generator creates visual representations of data in response to chart-specific queries, using matplotlib and seaborn to generate appropriate visualizations based on the query intent and data characteristics.
Key Functions
def generate_chart(df: pd.DataFrame, query: str) -> bytes:
"""
Generate a chart visualization from a DataFrame based on the query.
Args:
df: The processed pandas DataFrame containing the data to visualize
query: The validated user query
Returns:
Bytes containing the PNG image of the generated chart
Raises:
ValueError: If the chart type cannot be determined or generated
"""
Chart Generation Process
Supported Chart Types
The Chart Generator can produce various visualization types:
- Bar Charts: For comparing categories (vertical or horizontal)
- Line Charts: For time series and trend analysis
- Pie/Donut Charts: For part-to-whole relationships
- Scatter Plots: For correlation analysis
- Heatmaps: For multi-dimensional data visualization
- Box Plots: For distribution analysis
- Area Charts: For cumulative totals over time
- Combined Charts: Multiple chart types on the same visualization
Chart Parameter Extraction
The generator uses an LLM to extract chart-specific parameters from the query:
- Chart Type: The type of visualization to generate
- X-Axis/Y-Axis: Which fields to use for the axes
- Grouping: How to segment the data
- Aggregation: How to aggregate the data (sum, average, etc.)
- Color Scheme: Which colors to use for the visualization
- Title/Labels: Text for the chart title and axis labels
Implementation Examples
def create_bar_chart(df: pd.DataFrame, params: Dict[str, Any]) -> plt.Figure:
"""
Create a bar chart based on the specified parameters.
Args:
df: The DataFrame containing the data
params: Dictionary of chart parameters
Returns:
Matplotlib Figure object containing the chart
"""
fig, ax = plt.subplots(figsize=(10, 6))
# Extract parameters
x_column = params.get("x_column")
y_column = params.get("y_column")
title = params.get("title", f"{y_column} by {x_column}")
# Create the bar chart
sns.barplot(data=df, x=x_column, y=y_column, ax=ax)
# Set title and labels
ax.set_title(title)
ax.set_xlabel(params.get("x_label", x_column))
ax.set_ylabel(params.get("y_label", y_column))
# Additional styling
plt.tight_layout()
return fig
Description Generator (description_generator.py
)
The Description Generator creates textual narratives and insights from data in response to description-specific queries, using statistical analysis and LLM-based natural language generation.
Key Functions
def generate_description(df: pd.DataFrame, query: str) -> str:
"""
Generate a textual description of the data based on the query.
Args:
df: The processed pandas DataFrame containing the data to analyze
query: The validated user query
Returns:
A string containing the generated description
Raises:
ValueError: If the description cannot be generated
"""
Description Generation Process
Analysis Types
The Description Generator supports various analysis types:
- Trend Analysis: Identifying patterns over time
- Distribution Analysis: Describing data distributions and outliers
- Correlation Analysis: Identifying relationships between variables
- Comparison Analysis: Comparing different categories or segments
- Summary Analysis: Providing high-level overview of the data
Statistical Analysis
For each analysis type, the generator performs appropriate statistical calculations:
- Descriptive Statistics: Mean, median, mode, min, max, standard deviation
- Trend Detection: Linear regression, moving averages, seasonality analysis
- Outlier Detection: Z-score analysis, IQR-based detection
- Correlation Analysis: Pearson/Spearman correlation coefficients
- Significance Testing: T-tests, chi-square tests, ANOVA
LLM-Based Narrative Generation
After performing statistical analysis, the generator uses an LLM to convert the statistical insights into natural language:
- The statistical results are formatted as a structured context
- This context is sent to the LLM along with the original query
- The LLM generates a coherent narrative describing the insights
- The narrative is post-processed to ensure accuracy and readability
Implementation Example
def analyze_trend(df: pd.DataFrame, time_column: str, value_column: str) -> Dict[str, Any]:
"""
Perform trend analysis on a time series.
Args:
df: The DataFrame containing the data
time_column: The column containing time/date values
value_column: The column containing the values to analyze
Returns:
Dictionary of trend analysis results
"""
# Ensure data is sorted by time
df = df.sort_values(by=time_column)
# Calculate basic trend metrics
first_value = df[value_column].iloc[0]
last_value = df[value_column].iloc[-1]
change = last_value - first_value
pct_change = (change / first_value) * 100 if first_value != 0 else float('inf')
# Linear regression for trend line
x = np.arange(len(df))
slope, intercept, r_value, p_value, std_err = stats.linregress(x, df[value_column])
return {
"first_value": first_value,
"last_value": last_value,
"change": change,
"percent_change": pct_change,
"trend_direction": "upward" if slope > 0 else "downward",
"trend_strength": abs(r_value),
"p_value": p_value,
"significant": p_value < 0.05
}
Integration with the Pipeline
The Regular Generator integrates with the main pipeline as follows:
Error Handling
The Regular Generator implements comprehensive error handling:
- Data Validation: Checks that the DataFrame contains the required columns
- Parameter Validation: Ensures that extracted parameters are valid
- Chart Type Selection: Falls back to appropriate chart types if the requested type is unsuitable
- Error Messages: Provides clear error messages when generation fails
- Fallback Generation: Implements fallback strategies for partial failures
Visualization Styling
The Chart Generator applies consistent styling to ensure readability and visual appeal:
- Color Schemes: Uses color-blind friendly palettes
- Font Sizes: Ensures text is legible at various display sizes
- Legends: Includes informative legends for multi-series charts
- Annotations: Adds annotations for key data points where appropriate
- Grid Lines: Configures grid lines for better readability
Example Usage
from mypackage.c_regular_generator import chart_generator, description_generator
import pandas as pd
# Sample DataFrame
data = {
'date': pd.date_range(start='2023-01-01', periods=12, freq='M'),
'channel': ['Facebook'] * 6 + ['Google'] * 6,
'ad_spend': [1000, 1200, 1100, 1300, 1500, 1600, 800, 900, 950, 1000, 1100, 1200],
'revenue': [3000, 3600, 3300, 3900, 4500, 4800, 2400, 2700, 2850, 3000, 3300, 3600]
}
df = pd.DataFrame(data)
# Generate a chart
try:
chart_bytes = chart_generator.generate_chart(
df,
"Create a line chart showing ad spend by channel over time"
)
with open('ad_spend_chart.png', 'wb') as f:
f.write(chart_bytes)
print("Chart saved to ad_spend_chart.png")
except Exception as e:
print(f"Error generating chart: {str(e)}")
# Generate a description
try:
description = description_generator.generate_description(
df,
"Analyze the trend of ad spend and revenue by channel"
)
print(description)
except Exception as e:
print(f"Error generating description: {str(e)}")
Configuration
The Regular Generator uses the following configurations:
# From llm_config.py
DESCRIPTION_GENERATOR_MODEL = "deepseek-r1-distill-llama-70b"
CHART_DATA_MODEL = "llama3-8b-8192"
# Default chart configurations
DEFAULT_FIGURE_SIZE = (10, 6) # inches
DEFAULT_DPI = 100
DEFAULT_FONT_SIZE = 12
COLOR_PALETTE = "Set2" # seaborn color palette
These settings can be adjusted based on output quality requirements and performance considerations.