Skip to content

nickcdryan/ticketworld

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

16 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

TicketWorld: Synthetic Customer Service Dataset Generator

TicketWorld generates realistic customer service datasets and environments for training and evaluating LLM systems. Creates customer support scenarios with interconnected databases, policy documents, and resolution plans.

TicketWorld System Overview

🎯 Project Goals

TicketWorld generates synthetic customer service data that challenges LLM systems with:

  • Multi-hop policy reasoning: Tickets require understanding interactions between multiple company policies
  • Tool use and effecitve lookup: For all tickets, access to customer information, product information, order information, and company policy document is required to create an accurate resolution. These assets are stored separately in a database and standalone .txt file, requiring effective multi-hop queries and search.
  • Realistic customer scenarios: Edge cases, partial information, and complex situations
  • Policy compliance validation: Resolutions must reference and apply specific policy clauses
  • Authentic data relationships: Customers, orders, and products with realistic transaction histories

πŸ—οΈ How Tickets Are Generated

The system uses a carefully orchestrated synthetic data pipeline that respects asset dependencies and provides targeted information access:

  1. Policy Graph Creation: Company policies are modeled as interconnected clauses with relationships (overrides, modifies, requires)
  2. Scenario Templates: Pre-built templates define customer situations (returns, exchanges, warranty claims, etc.) with varying conditions that require combining and reasoning over multiple policy rules
  3. Asset Generation Pipeline:
    • Generate customers with realistic profiles and contact information
    • Generate products with pricing, categories, and specifications
    • Generate orders using both customers and products, creating authentic transaction relationships
  4. Email Generation: Using scenario templates and specific customer/order context, LLM creates customer emails from the customer's perspective. Emails are realistic due to varying levels of information provided by the customer: missing order numbers, misspelled order numbers, emails sent from secondary email address, etc. requiring database lookups, best-guess inference, or followup clarification requests.
  5. Resolution Generation: Using all previous assets plus metadata, LLM acts as customer service rep to create policy-compliant resolutions

The Key Innovation: This synthetic data pipeline addresses the core challenge of generating high-quality datasets that nevertheless remains difficult for LLMs to solve. During generation, we provide targeted information access (specific customer records, relevant policies) and deterministic metadata to ensure consistency and minimize hallucination. However, during evaluation, these scaffolds are removed - the LLM must accurately retrieve information from large databases and reason over numerous possibly irrelevant pieces of data.

This approach generates datasets with minimal errors and maximum consistency while creating genuinely challenging multi-hop reasoning scenarios that require effective tool use and lookup capabilities.

In addition, this setup allows for the generation of new, fresh batches of ticket data. This helps mitigate overfitting, data contaimnation, and staleness often seen on static train and test sets, similar to FreshStack.

πŸš€ Setup & Installation

Prerequisites

  • Python 3.11+
  • uv for dependency management
  • Google Gemini API access

Installation

# Clone the repository
cd ticketworld

# Install dependencies
uv sync

# Set up environment variables
cp .env.example .env  # Create this file

Environment Configuration

Create a .env file in the project root:

# Required: Google Gemini API key
GEMINI_API_KEY=your-gemini-api-key-here

Get your API key from Google AI Studio.

πŸ“‹ Usage

Basic Usage

Generate a dataset with default settings:

# Run with test configuration (100 tickets, 50 customers, 35 products, 70 orders)
uv run python factory.py

Custom Configuration

# Generate larger dataset
uv run python factory.py --tickets 500 --customers 200 --products 100 --orders 300

# Append to existing dataset
uv run python factory.py --mode append --tickets 100

# Custom output directory
uv run python factory.py --output-dir ./my_dataset --tickets 200

# Exclude debug metadata (for clean training data)
uv run python factory.py --no-debug --tickets 1000

Complete Workflow

For a full dataset with all enhancements:

# 1. Generate core dataset
uv run python factory.py --tickets 500 --customers 200

# 2. Add policy dilution (makes policy document more realistic)
uv run python utils/policy_dilution_script.py

# 3. Convert to SQLite for easier querying
uv run python utils/convert_to_sqlite.py

πŸ› οΈ Utilities (utils/ directory)

Core Workflow Utils

Script Purpose When to Run
policy_dilution_script.py Adds irrelevant content to policy document to simulate real-world policy complexity After factory.py
convert_to_sqlite.py Converts JSON customer database to SQLite for easier querying and analysis After factory.py

Development & Analysis Utils

Script Purpose Use Case
audit_tickets.py Reviews generated tickets for policy compliance and errors Quality assurance, debugging
validate_templates.py Analyzes scenario templates and discovers policy interactions Template development, validation

Running Utilities

# Add policy dilution
cd utils && python policy_dilution_script.py

# Convert to SQLite
cd utils && python convert_to_sqlite.py

# Audit ticket quality (optional)
cd utils && python audit_tickets.py

# Validate templates (development tool)
cd utils && python validate_templates.py

πŸ“ Generated Assets

After running the factory, the assets/ directory contains:

Core Dataset Files

File Description Size (typical)
support_tickets.json Complete ticket dataset with customer emails and resolutions ~350KB (100 tickets)
customer_database.json Customer profiles, orders, and product catalog ~80KB (50 customers)
company_policy.txt Clean company policy document ~3KB

Enhanced Files (after utils)

File Description Generated By
company_policy_full.txt Policy document with realistic dilution content policy_dilution_script.py
customer_database.db SQLite version of customer database convert_to_sqlite.py

Analysis Files

File Description Contents
policy_graph.json Policy interaction structure and metadata Policy relationships, complexity analysis
ticket_audit_results.json Quality analysis of generated tickets Compliance scores, error detection
ticket_audit_report.txt Human-readable audit summary Policy violations, recommendations

Example ticket (JSON)

Shows customer email, resolution plan, and metadata (product, customer, scenario template, policy interations, etc.) used by the system to generate both.

{
  "ticket_id": "TK-20250618-2052",
  "customer_email": "[email protected]",
  "subject": "Defective Tablet - Order ORD-20250609-1002 - Exchange Request",
  "body": "Dear Customer Support,\n\nI am writing to you today because I received a defective item in my recent order, ORD-20250609-1002. I ordered the Tablet Basic 10-inch on June 9, 2025, so it's only been a little over a week since it arrived.\n\nUnfortunately, the tablet is not working correctly. The screen frequently flickers and freezes, making it impossible to use. I've tried restarting it several times, but the problem persists. It's really frustrating to receive a brand new item that's already faulty.\n\nI would like to request an exchange for a working Tablet Basic 10-inch. I really need this specific model and would prefer to get a replacement rather than a refund. Could you please let me know the process for exchanging a defective item?\n\nThank you for your time and assistance.\n\nSincerely,\nDavid Chen",
  "timestamp": "2025-06-18T10:30:00",
  "customer_id": "CUST-0002",
  "order_id": "ORD-20250609-1002",
  "resolution_plan": {
    "order_id": "ORD-20250609-1002",
    "order_date": "2025-06-09",
    "customer_lookup": {
      "status": "found",
      "customer_id": "CUST-0002",
      "lookup_method": "email_match",
      "notes": "Customer found in database"
    },
    "policy_references": [
      "POL-EXCHANGE-002",
      "POL-RETURN-001",
      "POL-EXCHANGE-001",
      "POL-SHIP-006"
    ],
    "policy_reasoning": "The customer reported receiving a defective Tablet Basic 10-inch within 9 days of purchase. This falls within the 30-day return/exchange window as per POL-RETURN-001 and POL-EXCHANGE-001. According to POL-EXCHANGE-002, defective items are to be exchanged for the same item at no cost. Since the item's value is $249.99, which is under $500, an immediate replacement can be authorized based on POL-SHIP-006 (Damaged items under $500).",
    "actions": [
      {
        "type": "process_exchange",
        "reason": "Customer is requesting an exchange for a defective item received within the exchange window, as per POL-EXCHANGE-002 and POL-RETURN-001. The item's value is under $500, allowing for immediate replacement per POL-SHIP-006.",
        "value": 249.99,
        "details": "Exchange for one (1) Tablet Basic 10-inch (PROD-1031) due to defect. No additional cost to customer."
      },
      {
        "type": "send_replacement",
        "reason": "Replacement authorized for a defective item under $500 as per POL-SHIP-006.",
        "value": 249.99,
        "details": "Ship one (1) new Tablet Basic 10-inch (PROD-1031) to customer David Chen. Provide return label for the defective unit."
      }
    ],
    "escalation_required": false,
    "escalation_reason": null,
    "priority": "medium",
    "total_resolution_value": 249.99
  },
  "_scenario_dimensions": {
    "query_type": "exchange_request",
    "information_completeness": "complete",
    "complexity": "requires_lookup",
    "customer_sentiment": "pleading"
  },
  "_scenario_template": {
    "scenario_id": "EXCHANGE-002",
    "name": "exchange_defective_product",
    "primary_policy": "POL-EXCHANGE-002",
    "complexity_level": 2,
    "expected_outcome": "approve"
  },
  "_policy_analysis": {
    "all_relevant_policies": [
      "POL-EXCHANGE-002",
      "POL-RETURN-004"
    ],
    "applicable_policies": [
      "POL-EXCHANGE-002",
      "POL-RETURN-004"
    ],
    "context_used": {
      "has_receipt": true,
      "customer_tier": "standard",
      "days_since_purchase": 9,
      "months_since_purchase": 0.2956636005256242,
      "order_status": "delivered",
      "total_order_value": 249.99,
      "item_value": 249.99,
      "product_warranty_days": 365,
      "item_condition": "defective",
      "exchange_reason": "defective",
      "purchase_month": 6
    },
    "policy_interactions": "Multi-hop reasoning required"
  }
}

πŸŽ›οΈ Configuration Options

Factory Parameters

--tickets N          # Number of tickets to generate (default: 100)
--customers N        # Number of customers (default: 50)  
--products N         # Number of products (default: 35)
--orders N           # Number of orders (default: 70)
--mode MODE          # "create" or "append" (default: create)
--output-dir DIR     # Output directory (default: ./assets)
--company-name NAME  # Company name for policies (default: TechNest)
--no-debug          # Exclude debug metadata for clean training data

Dataset Composition

The generator creates realistic distributions:

  • Ticket Types: Returns (25%), Shipping Issues (20%), Billing Disputes (20%), Warranty Claims (15%), etc.
  • Complexity Levels: Simple (40%), Requires Lookup (35%), Edge Cases (20%), Escalation Required (5%)
  • Customer Tiers: Standard (70%), Premium (20%), VIP (10%)
  • Information Completeness: Complete (30%), Missing Details (40%), Wrong Info (30%)

🎯 Use Cases

LLM Training & Evaluation

  • Policy Reasoning: Test multi-hop policy application
  • Customer Service: Train on realistic support scenarios
  • Edge Case Handling: Challenge models with incomplete information
  • Business Logic: Validate understanding of complex rules

Dataset Analysis

import json

# Load tickets
with open('assets/support_tickets.json') as f:
    tickets = json.load(f)

# Analyze policy complexity
complex_tickets = [t for t in tickets if len(t['_policy_analysis']['applicable_policies']) > 2]
print(f"Multi-policy tickets: {len(complex_tickets)}")

SQL Querying (after SQLite conversion)

-- Find high-value orders with issues
SELECT c.name, o.order_id, o.total_amount 
FROM customers c 
JOIN orders o ON c.customer_id = o.customer_id 
WHERE o.total_amount > 500;

-- Customer purchase patterns
SELECT customer_id, COUNT(*) as order_count, AVG(total_amount) as avg_order
FROM orders 
GROUP BY customer_id 
ORDER BY order_count DESC;

πŸ“Š Quality Features

  • Policy Compliance: All resolutions reference specific policy clauses
  • Realistic Timing: Email timestamps align with customer descriptions ("last week", "a few months ago")
  • Data Consistency: Customer/order relationships are maintained across all tickets
  • Edge Cases: Wrong emails, missing information, partial customer matches
  • Multi-hop Reasoning: Complex scenarios requiring multiple policy interactions

πŸ”§ Development

Adding New Scenarios

  1. Edit scenario templates in factory.py (create_scenario_templates())
  2. Run utils/validate_templates.py to discover policy interactions
  3. Test with utils/audit_tickets.py for compliance

Extending Policies

  1. Add new policy clauses in create_policy_graph()
  2. Define relationships (overrides, modifies, requires)
  3. Update scenario templates to reference new policies

πŸ“ˆ Performance

  • Generation Speed: ~1-2 tickets/second (depends on LLM response time)
  • Memory Usage: ~100MB for typical datasets
  • Output Size: ~5MB for 1000 tickets with full metadata

TicketWorld creates comprehensive testing environments for customer service AI systems, ensuring robust handling of real-world complexity and multi-policy reasoning scenarios.

About

Synthetic data pipeline for customer support tickets

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages