TicketWorld: Synthetic Customer Service Dataset Generator

TicketWorld generates realistic customer service datasets and environments for training and evaluating LLM systems. Creates customer support scenarios with interconnected databases, policy documents, and resolution plans.

🎯 Project Goals

TicketWorld generates synthetic customer service data that challenges LLM systems with:

Multi-hop policy reasoning: Tickets require understanding interactions between multiple company policies
Tool use and effecitve lookup: For all tickets, access to customer information, product information, order information, and company policy document is required to create an accurate resolution. These assets are stored separately in a database and standalone .txt file, requiring effective multi-hop queries and search.
Realistic customer scenarios: Edge cases, partial information, and complex situations
Policy compliance validation: Resolutions must reference and apply specific policy clauses
Authentic data relationships: Customers, orders, and products with realistic transaction histories

🏗️ How Tickets Are Generated

The system uses a carefully orchestrated synthetic data pipeline that respects asset dependencies and provides targeted information access:

Policy Graph Creation: Company policies are modeled as interconnected clauses with relationships (overrides, modifies, requires)
Scenario Templates: Pre-built templates define customer situations (returns, exchanges, warranty claims, etc.) with varying conditions that require combining and reasoning over multiple policy rules
Asset Generation Pipeline:
- Generate customers with realistic profiles and contact information
- Generate products with pricing, categories, and specifications
- Generate orders using both customers and products, creating authentic transaction relationships
Email Generation: Using scenario templates and specific customer/order context, LLM creates customer emails from the customer's perspective. Emails are realistic due to varying levels of information provided by the customer: missing order numbers, misspelled order numbers, emails sent from secondary email address, etc. requiring database lookups, best-guess inference, or followup clarification requests.
Resolution Generation: Using all previous assets plus metadata, LLM acts as customer service rep to create policy-compliant resolutions

The Key Innovation: This synthetic data pipeline addresses the core challenge of generating high-quality datasets that nevertheless remains difficult for LLMs to solve. During generation, we provide targeted information access (specific customer records, relevant policies) and deterministic metadata to ensure consistency and minimize hallucination. However, during evaluation, these scaffolds are removed - the LLM must accurately retrieve information from large databases and reason over numerous possibly irrelevant pieces of data.

This approach generates datasets with minimal errors and maximum consistency while creating genuinely challenging multi-hop reasoning scenarios that require effective tool use and lookup capabilities.

In addition, this setup allows for the generation of new, fresh batches of ticket data. This helps mitigate overfitting, data contaimnation, and staleness often seen on static train and test sets, similar to FreshStack.

🚀 Setup & Installation

Prerequisites

Python 3.11+
uv for dependency management
Google Gemini API access

Installation

# Clone the repository
cd ticketworld

# Install dependencies
uv sync

# Set up environment variables
cp .env.example .env  # Create this file

Environment Configuration

Create a .env file in the project root:

# Required: Google Gemini API key
GEMINI_API_KEY=your-gemini-api-key-here

Get your API key from Google AI Studio.

📋 Usage

Basic Usage

Generate a dataset with default settings:

# Run with test configuration (100 tickets, 50 customers, 35 products, 70 orders)
uv run python factory.py

Custom Configuration

# Generate larger dataset
uv run python factory.py --tickets 500 --customers 200 --products 100 --orders 300

# Append to existing dataset
uv run python factory.py --mode append --tickets 100

# Custom output directory
uv run python factory.py --output-dir ./my_dataset --tickets 200

# Exclude debug metadata (for clean training data)
uv run python factory.py --no-debug --tickets 1000

Complete Workflow

For a full dataset with all enhancements:

# 1. Generate core dataset
uv run python factory.py --tickets 500 --customers 200

# 2. Add policy dilution (makes policy document more realistic)
uv run python utils/policy_dilution_script.py

# 3. Convert to SQLite for easier querying
uv run python utils/convert_to_sqlite.py

🛠️ Utilities (`utils/` directory)

Core Workflow Utils

Script	Purpose	When to Run
`policy_dilution_script.py`	Adds irrelevant content to policy document to simulate real-world policy complexity	After factory.py
`convert_to_sqlite.py`	Converts JSON customer database to SQLite for easier querying and analysis	After factory.py

Development & Analysis Utils

Script	Purpose	Use Case
`audit_tickets.py`	Reviews generated tickets for policy compliance and errors	Quality assurance, debugging
`validate_templates.py`	Analyzes scenario templates and discovers policy interactions	Template development, validation

Running Utilities

# Add policy dilution
cd utils && python policy_dilution_script.py

# Convert to SQLite
cd utils && python convert_to_sqlite.py

# Audit ticket quality (optional)
cd utils && python audit_tickets.py

# Validate templates (development tool)
cd utils && python validate_templates.py

📁 Generated Assets

After running the factory, the assets/ directory contains:

Core Dataset Files

File	Description	Size (typical)
`support_tickets.json`	Complete ticket dataset with customer emails and resolutions	~350KB (100 tickets)
`customer_database.json`	Customer profiles, orders, and product catalog	~80KB (50 customers)
`company_policy.txt`	Clean company policy document	~3KB

Enhanced Files (after utils)

File	Description	Generated By
`company_policy_full.txt`	Policy document with realistic dilution content	`policy_dilution_script.py`
`customer_database.db`	SQLite version of customer database	`convert_to_sqlite.py`

Analysis Files

File	Description	Contents
`policy_graph.json`	Policy interaction structure and metadata	Policy relationships, complexity analysis
`ticket_audit_results.json`	Quality analysis of generated tickets	Compliance scores, error detection
`ticket_audit_report.txt`	Human-readable audit summary	Policy violations, recommendations

Example ticket (JSON)

Shows customer email, resolution plan, and metadata (product, customer, scenario template, policy interations, etc.) used by the system to generate both.

{
  "ticket_id": "TK-20250618-2052",
  "customer_email": "[email protected]",
  "subject": "Defective Tablet - Order ORD-20250609-1002 - Exchange Request",
  "body": "Dear Customer Support,\n\nI am writing to you today because I received a defective item in my recent order, ORD-20250609-1002. I ordered the Tablet Basic 10-inch on June 9, 2025, so it's only been a little over a week since it arrived.\n\nUnfortunately, the tablet is not working correctly. The screen frequently flickers and freezes, making it impossible to use. I've tried restarting it several times, but the problem persists. It's really frustrating to receive a brand new item that's already faulty.\n\nI would like to request an exchange for a working Tablet Basic 10-inch. I really need this specific model and would prefer to get a replacement rather than a refund. Could you please let me know the process for exchanging a defective item?\n\nThank you for your time and assistance.\n\nSincerely,\nDavid Chen",
  "timestamp": "2025-06-18T10:30:00",
  "customer_id": "CUST-0002",
  "order_id": "ORD-20250609-1002",
  "resolution_plan": {
    "order_id": "ORD-20250609-1002",
    "order_date": "2025-06-09",
    "customer_lookup": {
      "status": "found",
      "customer_id": "CUST-0002",
      "lookup_method": "email_match",
      "notes": "Customer found in database"
    },
    "policy_references": [
      "POL-EXCHANGE-002",
      "POL-RETURN-001",
      "POL-EXCHANGE-001",
      "POL-SHIP-006"
    ],
    "policy_reasoning": "The customer reported receiving a defective Tablet Basic 10-inch within 9 days of purchase. This falls within the 30-day return/exchange window as per POL-RETURN-001 and POL-EXCHANGE-001. According to POL-EXCHANGE-002, defective items are to be exchanged for the same item at no cost. Since the item's value is $249.99, which is under $500, an immediate replacement can be authorized based on POL-SHIP-006 (Damaged items under $500).",
    "actions": [
      {
        "type": "process_exchange",
        "reason": "Customer is requesting an exchange for a defective item received within the exchange window, as per POL-EXCHANGE-002 and POL-RETURN-001. The item's value is under $500, allowing for immediate replacement per POL-SHIP-006.",
        "value": 249.99,
        "details": "Exchange for one (1) Tablet Basic 10-inch (PROD-1031) due to defect. No additional cost to customer."
      },
      {
        "type": "send_replacement",
        "reason": "Replacement authorized for a defective item under $500 as per POL-SHIP-006.",
        "value": 249.99,
        "details": "Ship one (1) new Tablet Basic 10-inch (PROD-1031) to customer David Chen. Provide return label for the defective unit."
      }
    ],
    "escalation_required": false,
    "escalation_reason": null,
    "priority": "medium",
    "total_resolution_value": 249.99
  },
  "_scenario_dimensions": {
    "query_type": "exchange_request",
    "information_completeness": "complete",
    "complexity": "requires_lookup",
    "customer_sentiment": "pleading"
  },
  "_scenario_template": {
    "scenario_id": "EXCHANGE-002",
    "name": "exchange_defective_product",
    "primary_policy": "POL-EXCHANGE-002",
    "complexity_level": 2,
    "expected_outcome": "approve"
  },
  "_policy_analysis": {
    "all_relevant_policies": [
      "POL-EXCHANGE-002",
      "POL-RETURN-004"
    ],
    "applicable_policies": [
      "POL-EXCHANGE-002",
      "POL-RETURN-004"
    ],
    "context_used": {
      "has_receipt": true,
      "customer_tier": "standard",
      "days_since_purchase": 9,
      "months_since_purchase": 0.2956636005256242,
      "order_status": "delivered",
      "total_order_value": 249.99,
      "item_value": 249.99,
      "product_warranty_days": 365,
      "item_condition": "defective",
      "exchange_reason": "defective",
      "purchase_month": 6
    },
    "policy_interactions": "Multi-hop reasoning required"
  }
}

🎛️ Configuration Options

Factory Parameters

--tickets N          # Number of tickets to generate (default: 100)
--customers N        # Number of customers (default: 50)  
--products N         # Number of products (default: 35)
--orders N           # Number of orders (default: 70)
--mode MODE          # "create" or "append" (default: create)
--output-dir DIR     # Output directory (default: ./assets)
--company-name NAME  # Company name for policies (default: TechNest)
--no-debug          # Exclude debug metadata for clean training data

Dataset Composition

The generator creates realistic distributions:

Ticket Types: Returns (25%), Shipping Issues (20%), Billing Disputes (20%), Warranty Claims (15%), etc.
Complexity Levels: Simple (40%), Requires Lookup (35%), Edge Cases (20%), Escalation Required (5%)
Customer Tiers: Standard (70%), Premium (20%), VIP (10%)
Information Completeness: Complete (30%), Missing Details (40%), Wrong Info (30%)

🎯 Use Cases

LLM Training & Evaluation

Policy Reasoning: Test multi-hop policy application
Customer Service: Train on realistic support scenarios
Edge Case Handling: Challenge models with incomplete information
Business Logic: Validate understanding of complex rules

Dataset Analysis

import json

# Load tickets
with open('assets/support_tickets.json') as f:
    tickets = json.load(f)

# Analyze policy complexity
complex_tickets = [t for t in tickets if len(t['_policy_analysis']['applicable_policies']) > 2]
print(f"Multi-policy tickets: {len(complex_tickets)}")

SQL Querying (after SQLite conversion)

-- Find high-value orders with issues
SELECT c.name, o.order_id, o.total_amount 
FROM customers c 
JOIN orders o ON c.customer_id = o.customer_id 
WHERE o.total_amount > 500;

-- Customer purchase patterns
SELECT customer_id, COUNT(*) as order_count, AVG(total_amount) as avg_order
FROM orders 
GROUP BY customer_id 
ORDER BY order_count DESC;

📊 Quality Features

Policy Compliance: All resolutions reference specific policy clauses
Realistic Timing: Email timestamps align with customer descriptions ("last week", "a few months ago")
Data Consistency: Customer/order relationships are maintained across all tickets
Edge Cases: Wrong emails, missing information, partial customer matches
Multi-hop Reasoning: Complex scenarios requiring multiple policy interactions

🔧 Development

Adding New Scenarios

Edit scenario templates in factory.py (create_scenario_templates())
Run utils/validate_templates.py to discover policy interactions
Test with utils/audit_tickets.py for compliance

Extending Policies

Add new policy clauses in create_policy_graph()
Define relationships (overrides, modifies, requires)
Update scenario templates to reference new policies

📈 Performance

Generation Speed: ~1-2 tickets/second (depends on LLM response time)
Memory Usage: ~100MB for typical datasets
Output Size: ~5MB for 1000 tickets with full metadata

TicketWorld creates comprehensive testing environments for customer service AI systems, ensuring robust handling of real-world complexity and multi-policy reasoning scenarios.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
assets		assets
utils		utils
.gitignore		.gitignore
.python-version		.python-version
CLAUDE.md		CLAUDE.md
README.md		README.md
factory.py		factory.py
pyproject.toml		pyproject.toml
ticketworld_diagram.png		ticketworld_diagram.png
uv.lock		uv.lock

nickcdryan/ticketworld

Folders and files

Latest commit

History

Repository files navigation