AI JSON Cleanroom (PHP)

Your AI returns broken JSON? Put this in between.

Works with any AI model: ChatGPT, Claude, Gemini, Llama. Zero dependencies beyond PHP standard library.

Automatically extracts JSON from markdown/text, repairs common AI mistakes, validates structure. Returns clean data when successful, detailed feedback for retries when not.

This is the PHP port of AI JSON Cleanroom.

Quick Links: Fast Track (2 min) • Why This Tool? • Code Example • Install • Configuration Guide • Troubleshooting • Integrations • Full Documentation ↓

Fast Track: Integration in 3 Steps

Want to start using this right away? Here's how:

Download the ai_json_cleanroom.php file to your project
Include it in your code: require_once 'ai_json_cleanroom.php';
Done. Start processing AI responses through validate_ai_json()

Ready in 2 minutes. Works immediately.

Show me the code → • Why do I need this? →

Why You Need This

The situation: You request JSON from your AI. Sometimes you receive:

What you get	What breaks
`Sure! Here's the JSON: {"name": "Alice"}`	Extra text crashes `json_decode()`
`{'name': 'Alice'}`	Single quotes instead of double quotes
`{"users": [{"id": 1}, {"i`	Truncated mid-response (token limit)

Current solution: Try/catch blocks, regex patterns, manual fixes, repeated API calls.

This tool: Handles all cases automatically. One function call.

Installation

Via Composer (Recommended)

composer require jordicor/ai-json-cleanroom-php

Manual Installation

Download ai_json_cleanroom.php to your project:

wget https://raw.githubusercontent.com/jordicor/ai-json-cleanroom-php/main/ai_json_cleanroom.php

Then include it:

<?php
require_once 'ai_json_cleanroom.php';

Requirements:

PHP 8.1 or higher
ext-mbstring (for proper UTF-8 handling)
ext-json

Ready. Start using: validate_ai_json($response)

Quick Start

<?php
require_once 'ai_json_cleanroom.php';

// Anything your AI returns (messy, wrapped, incomplete)
$aiResponse = "Here's your data:\n```json\n{'name': 'Alice', age: 30}  // Invalid JSON syntax\n```\n";

// One line to clean and validate
$result = validate_ai_json($aiResponse);

if ($result->jsonValid) {
    print_r($result->data);  // Clean: ['name' => 'Alice', 'age' => 30]
} else {
    print_r($result->errors);  // Detailed error information
}

Done. No configuration needed. It works out of the box.

Check $result->warnings to see what was fixed automatically.

What Just Happened?

The cleaner automatically:

Found the JSON inside markdown code fence
Fixed single quotes to double quotes
Added quotes to the unquoted key age
Removed the inline comment
Validated the final structure

Processing time: ~1ms. Zero configuration required.

Useful tip: Check $result->likelyTruncated to detect when the AI hit its token limit. This saves unnecessary retry API calls.

You're All Set

That's everything you need. The tool works immediately with smart defaults.

Everything below is optional documentation for:

Understanding how the tool works internally
Advanced configuration options
Framework integrations (Laravel, Symfony, etc.)
Your AI assistant to read and understand the full API

For most users: The sections above are sufficient. Start building.

Want to learn more? Continue reading below.

💡 Found this useful? Star the repo ⭐ to help others discover it!

Features Overview

1. Smart Extraction

Automatically extracts JSON from various formats:

// From markdown code fence
$markdown = 'Here is the data:\n```json\n{"status": "success"}\n```\n';
$result = validate_ai_json($markdown);
// Extracted: ["status" => "success"]

// From mixed text
$mixed = 'The result is {"status": "success"} as requested.';
$result = validate_ai_json($mixed);
// Extracted: ["status" => "success"]

2. Conservative Repair

Fixes common AI mistakes with configurable safeguards:

// Single quotes → double quotes
$result = validate_ai_json("{'name': 'Alice'}");
// Repaired: ["name" => "Alice"]

// Boolean constants (True/False/None) → JSON
$result = validate_ai_json('{"active": True, "value": None}');
// Repaired: ["active" => true, "value" => null]

// Unquoted keys → quoted keys
$result = validate_ai_json('{name: "Alice", age: 30}');
// Repaired: ["name" => "Alice", "age" => 30]

// Comments removal
$result = validate_ai_json('{
  "name": "Alice",  // user name
  /* age field */ "age": 30
}');
// Repaired: ["name" => "Alice", "age" => 30]

Safeguards:

Maximum modifications limit (default: 200 changes or 2% of input size)
Disabled if truncation detected
Incremental parse-check after each repair pass
Detailed repair metadata in $result->info

3. Truncation Detection

Identifies incomplete outputs before wasting retries:

$truncated = '{"users": [{"name": "Alice", "age": 30}, {"name": "Bob", "age":';

$result = validate_ai_json($truncated);
echo $result->likelyTruncated;  // true
echo $result->errors[0]->message;
// "No JSON payload found in input."
print_r($result->errors[0]->detail);
// ['truncation_reasons' => ['unclosed_braces_or_brackets', 'suspicious_trailing_character']]

Detection signals:

Unclosed strings
Unbalanced braces/brackets
Suspicious trailing characters (,, :, {, [)
Ellipsis at end (...)

4. Schema Validation

Validate against JSON Schema subset:

$schema = [
    "type" => "object",
    "required" => ["name", "email"],
    "properties" => [
        "name" => [
            "type" => "string",
            "minLength" => 1,
            "maxLength" => 100
        ],
        "email" => [
            "type" => "string",
            "pattern" => '/^[\w\.-]+@[\w\.-]+\.\w+$/'
        ],
        "age" => [
            "type" => "integer",
            "minimum" => 0,
            "maximum" => 150
        ]
    ],
    "additionalProperties" => false
];

$result = validate_ai_json($aiOutput, schema: $schema);

if (!$result->jsonValid) {
    foreach ($result->errors as $error) {
        echo "{$error->code}: {$error->message} at {$error->path}\n";
    }
}

Supported schema keywords:

Types: object, array, string, number, integer, boolean, null
Object: required, properties, patternProperties, additionalProperties
Array: items, additionalItems, minItems, maxItems, uniqueItems
String: minLength, maxLength, pattern
Number: minimum, maximum, exclusiveMinimum, exclusiveMaximum, multipleOf
Combinators: anyOf, oneOf, allOf
Constraints: enum, const, allow_empty

5. Path-Based Expectations

Validate specific paths with wildcard support:

$expectations = [
    [
        "path" => "users[*].email",
        "required" => true,
        "pattern" => '/^[\w\.-]+@[\w\.-]+\.\w+$/'
    ],
    [
        "path" => "users[*].status",
        "required" => true,
        "in" => ["active", "pending", "inactive"]
    ],
    [
        "path" => "metadata.version",
        "required" => true,
        "type" => "string",
        "pattern" => '/^\d+\.\d+\.\d+$/'
    ]
];

$result = validate_ai_json($aiOutput, expectations: $expectations);

6. Non-Throwing API

Always returns a ValidationResult - never crashes:

$result = validate_ai_json($anyInput);

// Always safe to access
echo "Valid: " . ($result->jsonValid ? 'yes' : 'no') . "\n";
echo "Truncated: " . ($result->likelyTruncated ? 'yes' : 'no') . "\n";
echo "Errors: " . count($result->errors) . "\n";
echo "Warnings: " . count($result->warnings) . "\n";
print_r($result->data);  // null if invalid
print_r($result->info);  // Extraction/parsing metadata

// Structured error handling
foreach ($result->errors as $error) {
    echo "Code: {$error->code}\n";
    echo "Path: {$error->path}\n";
    echo "Message: {$error->message}\n";
    print_r($error->detail);
}

Understanding the Configuration Options

Not sure which options to enable? This guide explains each repair strategy with practical examples.

When to Use Each Repair Strategy

`fixSingleQuotes` (Default: true)

What it does: Converts single quotes 'text' to JSON-compliant double quotes "text"

When to keep it ON:

Working with AI models that output single-quoted strings
Processing outputs from code-generation models
General use - this is safe and commonly needed

When to turn it OFF:

Your AI model never uses single quotes (rare)
You're processing pure JSON from a non-AI source

Example scenario:

// GPT often returns this mix:
$input = "{'name': 'Alice', \"age\": 30}";  // Mixed quotes

// With fixSingleQuotes = true:
// ✅ Becomes: {"name": "Alice", "age": 30}

// With fixSingleQuotes = false:
// ❌ Parse fails on single quotes

`quoteUnquotedKeys` (Default: true)

What it does: Adds quotes to JavaScript-style unquoted object keys

When to keep it ON:

Working with models trained on JavaScript/TypeScript code
Processing outputs that might include object literals
Claude models (sometimes output JS-style objects)

When to turn it OFF:

Strict JSON-only environment
You want to detect and reject JS-style syntax

Real-world example:

// Claude sometimes returns:
$input = "{name: 'Alice', age: 30, active: true}";

// With quoteUnquotedKeys = true:
// ✅ Becomes: {"name": "Alice", "age": 30, "active": true}

`replaceConstants` (Default: true)

What it does: Converts capitalized boolean constants (True/False/None) to JSON (true/false/null)

When to keep it ON:

Always, unless you have a specific reason not to
Essential for AI models that output capitalized booleans

Example:

// AI models sometimes output capitalized booleans:
$input = '{"active": True, "deleted": False, "parent": None}';

// With replaceConstants = true:
// ✅ Becomes: {"active": true, "deleted": false, "parent": null}

`stripJsComments` (Default: true)

What it does: Removes JavaScript-style comments (// and /* */)

When to keep it ON:

Models that explain their JSON with comments
When processing configuration-style outputs

Example:

$input = <<<'JSON'
{
  "name": "Alice",  // user name
  /* age field */ "age": 30
}
JSON;
// ✅ Comments are safely removed

`normalizeCurlyQuotes` (Default: "always")

What it does: Handles smart/typographic quotes that break JSON parsing

Options:

"always" - Convert smart quotes before parsing (safest)
"auto" - Only convert if initial parse fails (balanced approach)
"never" - Keep smart quotes as-is (when you want to preserve them)

When to use each:

"always": Default choice, handles copy-paste from documents
"auto": When performance matters and smart quotes are rare
"never": When processing content where quote style matters

Example:

// From copy-paste or models trained on web text:
$input = '{"text": "She said "hello" to me"}';  // Smart quotes

// With normalizeCurlyQuotes = "always":
// ✅ Becomes: {"text": "She said \"hello\" to me"}

`enableSafeRepairs` (Default: true)

What it does: Master toggle for all repair strategies

When to turn OFF:

You want to validate only, not repair
Debugging to see raw parsing errors
You have your own repair logic

`maxTotalRepairs` and `maxRepairsPercent` (Defaults: 200, 0.02)

What they do: Safety limits to prevent over-correction

When to increase:

Very messy outputs from older models
Known high-error scenarios

When to decrease:

You want stricter validation
Suspicious of too many modifications

Example configuration:

// For very messy outputs:
$options = new ValidateOptions();
$options->maxTotalRepairs = 500;      // Allow more fixes
$options->maxRepairsPercent = 0.05;   // Allow 5% of content to be modified

// For strict validation:
$options = new ValidateOptions();
$options->maxTotalRepairs = 10;       // Minimal fixes only
$options->maxRepairsPercent = 0.001;  // Less than 0.1% modifications

📝 Note: Start with defaults. They're battle-tested on thousands of real AI outputs. Only adjust if you have specific issues.

Common Scenarios & Solutions

Scenario 1: "My AI model keeps adding explanations"

The Problem: You explicitly ask for JSON only, but get:

I'll help you with that! Here's the JSON data:
{"status": "success"}
Let me know if you need anything else!

The Solution:

// Cleanroom automatically extracts the JSON part
$result = validate_ai_json($chattyResponse);
print_r($result->data);  // Just the JSON: ["status" => "success"]
echo $result->info['source'];  // Tells you where it found it: 'balanced_block'

Scenario 2: "Token limits are cutting off my JSON"

The Problem: Large responses get truncated:

{"users": [{"id": 1, "name": "Alice"}, {"id": 2, "na

The Solution:

$result = validate_ai_json($truncatedResponse);

if ($result->likelyTruncated) {
    // You know exactly what happened
    echo "Response truncated - reasons: ";
    print_r($result->errors[0]->detail['truncation_reasons']);
    // Output: ['unclosed_braces_or_brackets', 'unterminated_string']

    // Smart retry with higher token limit
    retryWithHigherLimit();
}

Scenario 3: "Mixed quote styles are breaking everything"

The Problem: Your AI model uses single quotes instead of valid JSON double quotes:

$output = "{'users': [\"Alice\", \"Bob\"], 'count': 2}";

The Solution:

$result = validate_ai_json($output);
// Automatically fixes to: ["users" => ["Alice", "Bob"], "count" => 2]

Scenario 4: "I need to validate specific fields exist"

The Problem: You need certain fields but don't want full schema validation.

The Solution: Use path expectations:

$expectations = [
    ["path" => "users[*].email", "required" => true],
    ["path" => "metadata.version", "pattern" => '/^\d+\.\d+\.\d+$/']
];

$result = validate_ai_json($aiOutput, expectations: $expectations);
// Validates that all users have emails and version is semver

Scenario 5: "The JSON has comments and I want to keep the information"

The Problem: AI model adds helpful comments that contain important context:

{
  "temperature": 0.7,  // Higher for creativity
  "max_tokens": 100   // Keep responses concise
}

The Solution:

// First, extract with comments preserved to see them
$rawResponse = $aiOutput;

// Clean for parsing
$result = validate_ai_json($rawResponse);

// The comments are removed for valid JSON
print_r($result->data);  // ["temperature" => 0.7, "max_tokens" => 100]

// If you need the comments, parse them separately from $rawResponse

Scenario 6: "Different AI models fail in different ways"

The Problem: GPT may use single quotes and unquoted keys, Claude wraps in markdown, Gemini may truncate.

The Solution: One configuration handles all:

// Same code for ALL models
function cleanAnyAiOutput(string $output): array
{
    $result = validate_ai_json($output);  // Default options handle everything

    if ($result->jsonValid) {
        return $result->data;
    } elseif ($result->likelyTruncated) {
        throw new RuntimeException("Output truncated - increase token limit");
    } else {
        $errorMsg = implode(", ", array_map(fn($e) => $e->message, $result->errors));
        throw new RuntimeException("Could not parse: {$errorMsg}");
    }
}

// Works with GPT, Claude, Gemini, Llama, etc.

⚠️ Important: Truncation detection always runs first. If JSON is truncated, repairs are skipped to avoid corrupting partial data.

Troubleshooting Guide

"Why isn't my JSON being repaired?"

Possible causes and solutions:

Truncation detected
- Cleanroom disables repairs for truncated input (safety measure)
- Solution: Get complete output first, then retry

Repair limit reached

Default limit: 200 changes or 2% of input size
Solution: Increase limits if needed:

$options = new ValidateOptions();
$options->maxTotalRepairs = 500;  // Raise limit
$options->maxRepairsPercent = 0.05;  // Allow 5% modifications

Specific repair disabled
- Check your options - maybe fixSingleQuotes = false?
- Solution: Enable the specific repair you need

"The parser says JSON is invalid but it looks fine to me"

Common hidden issues:

Invisible Unicode characters (zero-width spaces, etc.)
Smart quotes from copy-paste: "text" vs "text"
Line breaks inside strings without proper escaping

Diagnosis:

$result = validate_ai_json($yourInput, options: new ValidateOptions([
    'normalizeCurlyQuotes' => 'always'  // Fixes smart quotes
]));
print_r($result->errors);  // See specific character positions

"It works with GPT but fails with Claude"

Issue: Different models have different quirks.

Solution: Check the extraction source:

$result = validate_ai_json($claudeOutput);
echo "Found JSON in: {$result->info['source']}\n";
// 'code_fence' = markdown block
// 'balanced_block' = found in text
// 'raw' = was already clean

"Performance is slow with large outputs"

Solutions:

Disable unnecessary repairs:

$options = new ValidateOptions();
$options->stripJsComments = false;  // If you never have comments
$options->normalizeCurlyQuotes = 'never';  // If you never have smart quotes

Use opcache (PHP's bytecode cache):

// Check if opcache is enabled
echo opcache_get_status()['opcache_enabled'] ? 'Enabled' : 'Disabled';

"I want to see what was changed"

Solution: Check warnings and info:

$result = validate_ai_json($messyJson);

// See all repairs applied
foreach ($result->warnings as $warning) {
    if ($warning->code === ErrorCode::REPAIRED) {
        echo "Repairs applied: " . implode(", ", $warning->detail['applied']) . "\n";
        echo "Number of changes: ";
        print_r($warning->detail['counts']);
    }
}

// See extraction details
echo "Extraction method: {$result->info['source']}\n";
echo "Parser used: {$result->info['parse_backend']}\n";

"Schema validation is rejecting valid data"

Common issues:

Pattern escaping: Remember to use delimiters in PHP regex: '/^\d+$/' not '^\d+$'
Type mismatches: JSON numbers include floats - use "type" => "number" not "integer" unless you're sure
Required fields: Double-check field names are exact matches

Debug approach:

// Start without schema to see actual structure
$result = validate_ai_json($output);
print_r($result->data);

// Then add schema gradually
$schema = ["type" => "object"];  // Start simple
// Add requirements one by one

"mbstring extension not found"

Issue: PHP complains about missing mbstring functions.

Solution:

# Ubuntu/Debian
sudo apt-get install php-mbstring

# macOS with Homebrew
brew install php
# (mbstring is included by default)

# Windows
# Enable in php.ini:
extension=mbstring

# Verify installation
php -m | grep mbstring

Real-World Integrations

With OpenAI API

<?php
require_once 'ai_json_cleanroom.php';

$apiKey = getenv('OPENAI_API_KEY');
$client = new \GuzzleHttp\Client();

$response = $client->post('https://api.openai.com/v1/chat/completions', [
    'headers' => [
        'Authorization' => "Bearer {$apiKey}",
        'Content-Type' => 'application/json',
    ],
    'json' => [
        'model' => 'gpt-5.1',
        'messages' => [
            ['role' => 'system', 'content' => 'You are a helpful assistant that outputs JSON.'],
            ['role' => 'user', 'content' => 'Generate user profile for Alice Johnson, age 30']
        ],
        'response_format' => ['type' => 'json_object']
    ]
]);

$data = json_decode($response->getBody(), true);
$aiOutput = $data['choices'][0]['message']['content'];

// Clean and validate
$result = validate_ai_json(
    $aiOutput,
    schema: [
        'type' => 'object',
        'required' => ['name', 'age'],
        'properties' => [
            'name' => ['type' => 'string'],
            'age' => ['type' => 'integer', 'minimum' => 0]
        ]
    ]
);

if ($result->jsonValid) {
    $userData = $result->data;
    echo "User: {$userData['name']}, Age: {$userData['age']}\n";
} else {
    echo "Validation failed:\n";
    foreach ($result->errors as $error) {
        echo "- {$error->message}\n";
    }
}

With Anthropic Claude

<?php
require_once 'ai_json_cleanroom.php';

$apiKey = getenv('ANTHROPIC_API_KEY');
$client = new \GuzzleHttp\Client();

$response = $client->post('https://api.anthropic.com/v1/messages', [
    'headers' => [
        'x-api-key' => $apiKey,
        'anthropic-version' => '2023-06-01',
        'Content-Type' => 'application/json',
    ],
    'json' => [
        'model' => 'claude-haiku-4-5',
        'max_tokens' => 1024,
        'messages' => [
            [
                'role' => 'user',
                'content' => 'Generate a JSON object with user info for Alice, age 30'
            ]
        ]
    ]
]);

$data = json_decode($response->getBody(), true);
$aiOutput = $data['content'][0]['text'];

// Claude might return:
// "Here's the user data:\n```json\n{\"name\": \"Alice\", \"age\": 30}\n```\nLet me know if you need anything else!"

$result = validate_ai_json($aiOutput);

if ($result->jsonValid) {
    echo "Extracted data:\n";
    print_r($result->data);
    echo "Extraction source: {$result->info['source']}\n";  // 'code_fence'
} else {
    if ($result->likelyTruncated) {
        echo "Response was truncated, increasing max_tokens...\n";
    } else {
        echo "Validation errors:\n";
        print_r($result->errors);
    }
}

Retry Logic with Structured Feedback

<?php
require_once 'ai_json_cleanroom.php';

function generateWithRetry(string $prompt, array $schema, int $maxRetries = 3): ?array
{
    for ($attempt = 0; $attempt < $maxRetries; $attempt++) {
        $aiOutput = callAiApi($prompt);  // Your AI API call

        $result = validate_ai_json($aiOutput, schema: $schema);

        if ($result->jsonValid) {
            return $result->data;
        }

        // Build feedback for retry
        if ($result->likelyTruncated) {
            $prompt .= "\n\nIMPORTANT: Your previous response was truncated. Please ensure the complete JSON is returned.";
        } else {
            $errorMessages = array_map(
                fn($e) => "- {$e->path}: {$e->message}",
                $result->errors
            );
            $feedback = implode("\n", $errorMessages);
            $prompt .= "\n\nYour previous JSON had these issues:\n{$feedback}\n\nPlease fix these and return valid JSON.";
        }
    }

    throw new RuntimeException("Failed to generate valid JSON after {$maxRetries} attempts");
}

// Usage
$schema = [
    'type' => 'object',
    'required' => ['name', 'email', 'age'],
    'properties' => [
        'name' => ['type' => 'string'],
        'email' => ['type' => 'string', 'pattern' => '/^[\w\.-]+@[\w\.-]+\.\w+$/'],
        'age' => ['type' => 'integer', 'minimum' => 0]
    ]
];

$userData = generateWithRetry(
    'Generate a user profile for Alice Johnson',
    $schema
);
print_r($userData);

With Laravel Framework

<?php

namespace App\Services;

use Illuminate\Support\Facades\Http;

class AiJsonService
{
    public function generateUserProfile(string $prompt): array
    {
        // Call AI API using Laravel HTTP client
        $response = Http::withHeaders([
            'Authorization' => 'Bearer ' . config('services.openai.key'),
        ])->post('https://api.openai.com/v1/chat/completions', [
            'model' => 'gpt-5.1',
            'messages' => [
                ['role' => 'user', 'content' => $prompt]
            ]
        ]);

        $aiOutput = $response->json()['choices'][0]['message']['content'];

        // Clean and validate with ai-json-cleanroom
        $result = validate_ai_json(
            $aiOutput,
            schema: [
                'type' => 'object',
                'required' => ['name', 'email'],
                'properties' => [
                    'name' => ['type' => 'string'],
                    'email' => ['type' => 'string', 'pattern' => '/^[\w\.-]+@[\w\.-]+\.\w+$/']
                ]
            ]
        );

        if (!$result->jsonValid) {
            // Log validation errors
            \Log::warning('AI JSON validation failed', [
                'errors' => array_map(fn($e) => $e->message, $result->errors),
                'truncated' => $result->likelyTruncated
            ]);

            throw new \RuntimeException('Invalid AI response');
        }

        return $result->data;
    }
}

Usage in Laravel controller:

<?php

namespace App\Http\Controllers;

use App\Services\AiJsonService;
use Illuminate\Http\JsonResponse;

class UserController extends Controller
{
    public function __construct(private AiJsonService $aiService)
    {
    }

    public function generateProfile(): JsonResponse
    {
        try {
            $userData = $this->aiService->generateUserProfile(
                'Generate a user profile for Alice Johnson'
            );

            return response()->json([
                'success' => true,
                'data' => $userData
            ]);
        } catch (\Exception $e) {
            return response()->json([
                'success' => false,
                'error' => $e->getMessage()
            ], 422);
        }
    }
}

With Symfony Framework

<?php

namespace App\Service;

use Symfony\Contracts\HttpClient\HttpClientInterface;

class AiJsonProcessor
{
    public function __construct(
        private HttpClientInterface $httpClient,
        private string $apiKey
    ) {
    }

    public function processAiResponse(string $prompt): array
    {
        // Make API call using Symfony HTTP client
        $response = $this->httpClient->request('POST',
            'https://api.anthropic.com/v1/messages',
            [
                'headers' => [
                    'x-api-key' => $this->apiKey,
                    'anthropic-version' => '2023-06-01',
                    'Content-Type' => 'application/json',
                ],
                'json' => [
                    'model' => 'claude-haiku-4-5',
                    'max_tokens' => 1024,
                    'messages' => [
                        ['role' => 'user', 'content' => $prompt]
                    ]
                ]
            ]
        );

        $data = $response->toArray();
        $aiOutput = $data['content'][0]['text'];

        // Clean and validate
        $result = validate_ai_json($aiOutput);

        if (!$result->jsonValid) {
            throw new \RuntimeException(
                sprintf('AI JSON validation failed: %s',
                    implode(', ', array_map(fn($e) => $e->message, $result->errors))
                )
            );
        }

        return $result->data;
    }
}

Configuration in services.yaml:

services:
    App\Service\AiJsonProcessor:
        arguments:
            $apiKey: '%env(ANTHROPIC_API_KEY)%'

With Streaming Responses

<?php
require_once 'ai_json_cleanroom.php';

function processStreamingResponse(string $apiUrl, array $headers, array $payload): array
{
    // Initialize streaming request
    $ch = curl_init($apiUrl);
    curl_setopt_array($ch, [
        CURLOPT_POST => true,
        CURLOPT_POSTFIELDS => json_encode($payload),
        CURLOPT_HTTPHEADER => $headers,
        CURLOPT_RETURNTRANSFER => true,
        CURLOPT_WRITEFUNCTION => function($curl, $data) use (&$chunks) {
            $chunks[] = $data;
            return strlen($data);
        }
    ]);

    // Collect all chunks
    $chunks = [];
    curl_exec($ch);
    curl_close($ch);

    // Combine chunks
    $fullOutput = implode('', $chunks);

    // Validate complete output
    $result = validate_ai_json($fullOutput);

    if ($result->likelyTruncated) {
        // Stream was truncated - reasons available
        error_log('Stream truncated: ' . json_encode($result->errors[0]->detail['truncation_reasons']));
        throw new RuntimeException('Response was truncated, consider retrying with higher limits');
    }

    if (!$result->jsonValid) {
        throw new RuntimeException('Failed to parse streamed JSON');
    }

    return $result->data;
}

With Guzzle Async/Promises

<?php
require_once 'ai_json_cleanroom.php';
use GuzzleHttp\Client;
use GuzzleHttp\Promise;

function processMultipleAiRequests(array $prompts): array
{
    $client = new Client();
    $promises = [];

    // Create async requests
    foreach ($prompts as $key => $prompt) {
        $promises[$key] = $client->postAsync('https://api.openai.com/v1/chat/completions', [
            'headers' => [
                'Authorization' => 'Bearer ' . getenv('OPENAI_API_KEY'),
            ],
            'json' => [
                'model' => 'gpt-5.1',
                'messages' => [['role' => 'user', 'content' => $prompt]]
            ]
        ]);
    }

    // Wait for all responses
    $responses = Promise\Utils::settle($promises)->wait();

    $results = [];
    foreach ($responses as $key => $response) {
        if ($response['state'] === 'fulfilled') {
            $data = json_decode($response['value']->getBody(), true);
            $aiOutput = $data['choices'][0]['message']['content'];

            // Validate each response
            $result = validate_ai_json($aiOutput);

            if ($result->jsonValid) {
                $results[$key] = $result->data;
            } else {
                $results[$key] = [
                    'error' => 'Validation failed',
                    'details' => array_map(fn($e) => $e->message, $result->errors)
                ];
            }
        } else {
            $results[$key] = ['error' => 'Request failed'];
        }
    }

    return $results;
}

// Usage
$prompts = [
    'user1' => 'Generate profile for Alice',
    'user2' => 'Generate profile for Bob',
    'user3' => 'Generate profile for Charlie',
];

$results = processMultipleAiRequests($prompts);
print_r($results);

API Reference

`validate_ai_json()`

Main validation function with comprehensive options.

function validate_ai_json(
    string|array $inputData,
    ?array $schema = null,
    ?array $expectations = null,
    ?ValidateOptions $options = null
): ValidationResult

Parameters:

$inputData: String or already-parsed array
$schema: JSON Schema subset for validation
$expectations: List of path-based validation rules
$options: Configuration for parsing, extraction, and repair

Returns: ValidationResult with jsonValid, errors, warnings, data, and info

`ValidationResult`

Result object returned by validate_ai_json().

class ValidationResult
{
    public bool $jsonValid;              // True if parsing and validation succeeded
    public bool $likelyTruncated;        // True if input appears truncated
    public array $errors;                // ValidationIssue[] - validation errors
    public array $warnings;              // ValidationIssue[] - non-blocking warnings
    public mixed $data;                  // Parsed JSON if valid, else null
    public array $info;                  // Extraction/parsing metadata

    public function toArray(): array;    // Convert result to associative array
}

Metadata in $info:

source: How JSON was found ("raw", "code_fence", "balanced_block", "object")
extraction: Details about extraction process
parse_backend: Parser used ("json")
curly_quotes_normalization_used: Whether typographic quotes were normalized
repair: Details about applied repairs (if any)

`ValidationIssue`

Individual validation error or warning.

class ValidationIssue
{
    public ErrorCode $code;              // Error type (enum)
    public string $path;                 // JSONPath where error occurred
    public string $message;              // Human-readable description
    public ?array $detail;               // Additional context

    public function toArray(): array;    // Convert issue to associative array
}

`ValidateOptions`

Configuration for validation behavior.

class ValidateOptions
{
    // Extraction options
    public bool $strict = false;
    public bool $extractJson = true;
    public bool $allowJsonInCodeFences = true;
    public bool $allowBareTopLevelScalars = false;
    public bool $tolerateTrailingCommas = true;
    public bool $stopOnFirstError = false;

    // Repair options
    public bool $enableSafeRepairs = true;
    public bool $allowJson5Like = true;         // Master toggle for JSON5-like repairs
    public bool $replaceConstants = true;        // True/False/None → true/false/null
    public bool $replaceNansInfinities = true;   // NaN/Infinity → null
    public int $maxTotalRepairs = 200;
    public float $maxRepairsPercent = 0.02;      // 2% of input size

    // Granular repair control
    public string $normalizeCurlyQuotes = "always";  // "always"|"auto"|"never"
    public bool $fixSingleQuotes = true;
    public bool $quoteUnquotedKeys = true;
    public bool $stripJsComments = true;

    // Custom repair hooks
    public ?array $customRepairHooks = null;
}

Curly quotes normalization modes:

"always" (default): Normalize typographic quotes before parsing
"auto": Try parsing first; only normalize if parse fails
"never": Never normalize (preserves typographic quotes as-is)

`ErrorCode`

Enumeration of validation error types.

enum ErrorCode: string
{
    case PARSE_ERROR = 'parse_error';
    case TRUNCATED = 'truncated';
    case MISSING_REQUIRED = 'missing_required';
    case TYPE_MISMATCH = 'type_mismatch';
    case ENUM_MISMATCH = 'enum_mismatch';
    case CONST_MISMATCH = 'const_mismatch';
    case NOT_ALLOWED_EMPTY = 'not_allowed_empty';
    case ADDITIONAL_PROPERTY = 'additional_property';
    case PATTERN_MISMATCH = 'pattern_mismatch';
    case MIN_LENGTH = 'min_length';
    case MAX_LENGTH = 'max_length';
    case MIN_ITEMS = 'min_items';
    case MAX_ITEMS = 'max_items';
    case MINIMUM = 'minimum';
    case MAXIMUM = 'maximum';
    case REPAIRED = 'repaired';  // Warning: repair was applied
    // ... and more
}

PHP-Specific Notes

Differences from Python Version

JSON Engine: PHP uses native json_decode()/json_encode(). Unlike the Python version which can optionally use orjson for performance, PHP relies on its built-in JSON extension which is fast and reliable.
Type System: PHP 8.1+ enums and typed properties used throughout
Arrays: PHP associative arrays instead of Python dicts
Namespace: Functions are global (no module imports needed)
Error Handling: Non-throwing design (no exceptions from validate_ai_json)
Regex Patterns: PHP regex requires delimiters (e.g., '/pattern/' not 'pattern')

UTF-8 Handling

This library requires ext-mbstring for proper UTF-8 multibyte character handling. All string operations use multibyte-safe functions (mb_strlen(), mb_substr(), mb_str_split()).

Why mbstring is required:

Proper character counting for repair limits
Correct string slicing in multibyte contexts
Safe handling of emojis and international characters
Prevention of string corruption during repairs

Performance

PHP's native JSON parser (ext-json) is highly optimized and written in C. Performance characteristics:

Typical Processing Times

Scenario	Time	Notes
Clean JSON (no repairs)	~0.1-1ms	Direct `json_decode()`
Simple extraction + parse	~1-2ms	From markdown code fence
Multiple repairs + parse	~2-5ms	Fix quotes, constants, comments
Complex schema validation	~5-20ms	Deep nested structure validation
Large payload (>100KB)	~10-50ms	Depends on complexity

Performance Optimization Tips

Enable OPcache (PHP's bytecode cache):

; In php.ini
opcache.enable=1
opcache.memory_consumption=128
opcache.interned_strings_buffer=8
opcache.max_accelerated_files=4000

Disable unnecessary repairs:

$options = new ValidateOptions();
$options->stripJsComments = false;  // If you never have comments
$options->normalizeCurlyQuotes = 'never';  // If you never have smart quotes

Use schema validation selectively:
- Schema validation adds overhead proportional to complexity
- For simple checks, use path expectations instead
- Only validate what you actually need

For high-throughput scenarios:

// Cache the ValidateOptions instance
static $options = null;
if ($options === null) {
    $options = new ValidateOptions([
        'maxTotalRepairs' => 100,  // Lower limit for faster processing
        'stopOnFirstError' => true  // Fail fast
    ]);
}

$result = validate_ai_json($input, options: $options);

Memory Usage

Memory consumption is proportional to input size:

Small payloads (<10KB): ~100-500KB peak memory
Medium payloads (10-100KB): ~500KB-2MB peak memory
Large payloads (>100KB): ~2-10MB peak memory

The library processes inputs in a single pass where possible to minimize memory overhead.

Comparison with Python Version

While the Python version can use orjson for ~3-4x faster JSON parsing, PHP's native json_decode() is already quite fast (comparable to Python's stdlib json). The difference is negligible for most use cases (microseconds for typical AI outputs).

Examples

See the examples/ directory for complete, runnable examples:

basic_usage.php - Core features demonstration
openai_integration.php - OpenAI API integration
anthropic_claude.php - Anthropic Claude integration
streaming_responses.php - Handling streaming outputs
retry_logic_advanced.php - Intelligent retry strategies
custom_repair_hooks.php - Domain-specific repairs

Run any example:

php examples/basic_usage.php

Should I Use This Tool?

Quick Decision Guide

Use AI JSON Cleanroom if you:

Work with any AI model (GPT, Claude, Gemini, Llama)
Receive JSON wrapped in explanations or markdown
Face token limit truncations
Need detailed error messages for retries
Want one solution for all AI quirks
Value zero dependencies (stdlib only)
Use Laravel, Symfony, or vanilla PHP

You might not need it if you:

Only work with clean, guaranteed JSON
Control token generation completely
Never hit token limits
Your AI model never adds explanatory text
You have a custom parsing pipeline that already works

Comparison with Common Approaches

Your Current Approach → With Cleanroom

Without Cleanroom	With Cleanroom
`try { json_decode(); }`	Always get a result, never crashes
Regex extraction	Automatic markdown/fence detection
Custom retry logic	Structured errors for targeted retries
"Is it truncated?"	Immediate truncation detection with reasons
Multiple fix attempts	One call handles everything
Scattered error handling	Unified validation pipeline

Real-World Use Cases

Use Case 1: AI-Powered SaaS Application

// Before: Fragile and unreliable
try {
    $data = json_decode($aiOutput, true);
    if (json_last_error() !== JSON_ERROR_NONE) {
        // Retry? Log? Give up? ¯\_(ツ)_/¯
    }
} catch (Exception $e) {
    // Something went wrong...
}

// After: Robust and informative
$result = validate_ai_json($aiOutput, schema: $userSchema);
if ($result->jsonValid) {
    return $result->data;  // ✅ Clean, validated data
} elseif ($result->likelyTruncated) {
    return retryWithHigherTokens();  // ✅ Know exactly what to do
} else {
    return buildRetryPrompt($result->errors);  // ✅ Targeted fixes
}

Use Case 2: Laravel API Endpoint

// Clean AI responses reliably in your Laravel services
class AiService {
    public function getStructuredData(string $prompt): array {
        $aiResponse = $this->callAiApi($prompt);
        $result = validate_ai_json($aiResponse);

        if (!$result->jsonValid) {
            Log::warning('AI JSON validation failed', [
                'errors' => $result->errors,
                'truncated' => $result->likelyTruncated
            ]);
            throw new AiResponseException('Invalid response');
        }

        return $result->data;
    }
}

Use Case 3: Batch Processing

// Process hundreds of AI outputs reliably
foreach ($aiOutputs as $output) {
    $result = validate_ai_json($output);

    if ($result->jsonValid) {
        $processed[] = $result->data;
    } elseif ($result->likelyTruncated) {
        $needsRetry[] = $output;
    } else {
        $failed[] = [
            'output' => $output,
            'errors' => $result->errors
        ];
    }
}

The Bottom Line

If you've ever written code like this:

// This is a common scenario...
try {
    $data = json_decode($aiOutput, true);
} catch (Exception $e) {
    // Try to extract JSON with regex
    preg_match('/\{.*\}/s', $aiOutput, $matches);
    if ($matches) {
        try {
            // Fix quotes maybe?
            $fixed = str_replace("'", '"', $matches[0]);
            $data = json_decode($fixed, true);
        } catch (Exception $e2) {
            // Give up
            throw new RuntimeException("Can't parse AI output");
        }
    }
}

Then yes, you need this tool. It handles all of that (and much more) in one line:

$result = validate_ai_json($aiOutput);  // Done.

Benefits:

✅ No more silent failures
✅ No more guessing why parsing failed
✅ No more wasted API calls on truncated responses
✅ No more fragile regex patterns
✅ No more scattered error handling

Testing

This library includes a comprehensive PHPUnit test suite.

Run Tests

# Install dependencies
composer install

# Run all tests
composer test

# Run with coverage (requires Xdebug)
composer test-coverage

# Run specific test
./vendor/bin/phpunit tests/ExtractionTest.php

See tests/README.md for detailed testing documentation.

License

MIT License

See LICENSE file for details.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

Issues: GitHub Issues
Source: GitHub Repository
Python Version: Original Project

If you find this tool useful, please consider starring the repo! ⭐

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
examples		examples
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai_json_cleanroom.php		ai_json_cleanroom.php
composer.json		composer.json
phpunit.xml.dist		phpunit.xml.dist

License

jordicor/ai-json-cleanroom-php

Folders and files

Latest commit

History

Repository files navigation

AI JSON Cleanroom (PHP)

Fast Track: Integration in 3 Steps

Why You Need This

Installation

Via Composer (Recommended)

Manual Installation

Quick Start

What Just Happened?

You're All Set

Features Overview

1. Smart Extraction

2. Conservative Repair

3. Truncation Detection

4. Schema Validation

5. Path-Based Expectations

6. Non-Throwing API

Understanding the Configuration Options

When to Use Each Repair Strategy

fixSingleQuotes (Default: true)

quoteUnquotedKeys (Default: true)

replaceConstants (Default: true)

stripJsComments (Default: true)

normalizeCurlyQuotes (Default: "always")

enableSafeRepairs (Default: true)

maxTotalRepairs and maxRepairsPercent (Defaults: 200, 0.02)

Common Scenarios & Solutions

Scenario 1: "My AI model keeps adding explanations"

Scenario 2: "Token limits are cutting off my JSON"

Scenario 3: "Mixed quote styles are breaking everything"

Scenario 4: "I need to validate specific fields exist"

Scenario 5: "The JSON has comments and I want to keep the information"

Scenario 6: "Different AI models fail in different ways"

Troubleshooting Guide

"Why isn't my JSON being repaired?"

"The parser says JSON is invalid but it looks fine to me"

"It works with GPT but fails with Claude"

"Performance is slow with large outputs"

"I want to see what was changed"

"Schema validation is rejecting valid data"

"mbstring extension not found"

Real-World Integrations

With OpenAI API

With Anthropic Claude

Retry Logic with Structured Feedback

With Laravel Framework

With Symfony Framework

With Streaming Responses

With Guzzle Async/Promises

API Reference

validate_ai_json()

ValidationResult

ValidationIssue

ValidateOptions

ErrorCode

PHP-Specific Notes

Differences from Python Version

UTF-8 Handling

Performance

Typical Processing Times

Performance Optimization Tips

Memory Usage

Comparison with Python Version

Examples

Should I Use This Tool?

Quick Decision Guide

Comparison with Common Approaches

Real-World Use Cases

Use Case 1: AI-Powered SaaS Application

Use Case 2: Laravel API Endpoint

Use Case 3: Batch Processing

The Bottom Line

Testing

Run Tests

License

`fixSingleQuotes` (Default: true)

`quoteUnquotedKeys` (Default: true)

`replaceConstants` (Default: true)

`stripJsComments` (Default: true)

`normalizeCurlyQuotes` (Default: "always")

`enableSafeRepairs` (Default: true)

`maxTotalRepairs` and `maxRepairsPercent` (Defaults: 200, 0.02)

`validate_ai_json()`

`ValidationResult`

`ValidationIssue`

`ValidateOptions`

`ErrorCode`

Packages