Skip to content

younwoochoi/InterlocutorAwarenessLLM

Repository files navigation

InterlocutorAwarenessLLM

A comprehensive framework for evaluating Large Language Models' ability to identify other LLMs based on their responses. This project tests "situational awareness" in AI systems by examining whether models can recognize the distinctive patterns and characteristics of different AI families.

🎯 Overview

This framework evaluates how well LLMs can identify:

  • Model Families (GPT, Claude, Gemini, DeepSeek, Qwen, Llama)
  • Exact Models (GPT-4o, Claude-3.5-Sonnet, etc.)
  • Response Patterns across different task types

The evaluation covers multiple domains:

  • Code - Programming tasks and completions
  • Math - Mathematical problem solving
  • ChatBot Arena - Comparative evaluations
  • Jailbreaking - Safety and robustness testing

🚀 Quick Start

Prerequisites

pip install anthropic openai google-generativeai datasets pandas numpy scikit-learn matplotlib seaborn tqdm torch transformers together

API Keys Setup

Create api_keys.json:

{
  "anthropic": "your-anthropic-api-key",
  "openai": "your-openai-api-key", 
  "gemini": "your-google-api-key",
  "deepseek": "your-deepseek-api-key",
  "together": "your-together-api-key"
}

Basic Usage

1. Generate Model Responses

# Generate responses for CCP dataset
python unified_response_generator.py --dataset_type ccp --target_model claude-3-7-sonnet --num_samples 100

# Generate responses for code tasks
python unified_response_generator.py --dataset_type code --target_model gpt-4o --num_samples 50

2. Run Identity Inference Evaluation

# Evaluate model identification accuracy
python unified_evaluation.py --dataset_type ccp --identifier_model claude-3-7-sonnet --target_model gpt-4o

# Cross-model evaluation
python unified_evaluation.py --dataset_type math --identifier_model deepseek-v3 --target_model claude-3-5-haiku

📁 Project Structure

Core Framework

├── base_inference.py          # Base classes and API clients
├── evaluation_utils.py        # Utility functions for metrics
├── unified_evaluation.py      # Main evaluation script
├── configs.py                 # Model configurations
├── prompts.py                # Prompt templates
└── sampled_data_indicies.py  # Data sampling utilities

Configuration Files

  • api_keys.json - API credentials (not in repo)
  • configs.py - Model configurations and endpoints
  • prompts.py - Evaluation prompts and templates

📈 Usage Examples

Basic Model Identification

from base_inference import BaseLLMIdentityInference
from evaluation_utils import compute_accuracy_metrics

# Initialize framework
evaluator = BaseLLMIdentityInference()

# Generate response from target model  
response = await evaluator.generate_response(
    model_name="claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Use identifier model to classify the response
classification = await evaluator.generate_response(
    model_name="gpt-4o", 
    messages=[{"role": "user", "content": f"Identify this response: {response}"}]
)

Batch Evaluation

from unified_evaluation import UnifiedEvaluator

evaluator = UnifiedEvaluator(
    dataset_type="ccp",
    identifier_model="claude-3-7-sonnet", 
    target_model="gpt-4o"
)

results = await evaluator.run_inference()
print(f"Accuracy: {results['metrics']['family_accuracy']['accuracy']:.3f}")

Custom Analysis

from evaluation_utils import compute_multilabel_auc, plot_confusion_matrix

# Compute detailed metrics
auc_results = compute_multilabel_auc(y_true, y_pred_proba)
accuracy_results = compute_accuracy_metrics(y_true, y_pred)

# Visualize results
plot_confusion_matrix(
    accuracy_results['confusion_matrix'], 
    accuracy_results['labels'],
    title="Model Identification Results"
)

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages