InterlocutorAwarenessLLM

A comprehensive framework for evaluating Large Language Models' ability to identify other LLMs based on their responses. This project tests "situational awareness" in AI systems by examining whether models can recognize the distinctive patterns and characteristics of different AI families.

🎯 Overview

This framework evaluates how well LLMs can identify:

Model Families (GPT, Claude, Gemini, DeepSeek, Qwen, Llama)
Exact Models (GPT-4o, Claude-3.5-Sonnet, etc.)
Response Patterns across different task types

The evaluation covers multiple domains:

Code - Programming tasks and completions
Math - Mathematical problem solving
ChatBot Arena - Comparative evaluations
Jailbreaking - Safety and robustness testing

🚀 Quick Start

Prerequisites

pip install anthropic openai google-generativeai datasets pandas numpy scikit-learn matplotlib seaborn tqdm torch transformers together

API Keys Setup

Create api_keys.json:

{
  "anthropic": "your-anthropic-api-key",
  "openai": "your-openai-api-key", 
  "gemini": "your-google-api-key",
  "deepseek": "your-deepseek-api-key",
  "together": "your-together-api-key"
}

Basic Usage

1. Generate Model Responses

# Generate responses for CCP dataset
python unified_response_generator.py --dataset_type ccp --target_model claude-3-7-sonnet --num_samples 100

# Generate responses for code tasks
python unified_response_generator.py --dataset_type code --target_model gpt-4o --num_samples 50

2. Run Identity Inference Evaluation

# Evaluate model identification accuracy
python unified_evaluation.py --dataset_type ccp --identifier_model claude-3-7-sonnet --target_model gpt-4o

# Cross-model evaluation
python unified_evaluation.py --dataset_type math --identifier_model deepseek-v3 --target_model claude-3-5-haiku

📁 Project Structure

Core Framework

├── base_inference.py          # Base classes and API clients
├── evaluation_utils.py        # Utility functions for metrics
├── unified_evaluation.py      # Main evaluation script
├── configs.py                 # Model configurations
├── prompts.py                # Prompt templates
└── sampled_data_indicies.py  # Data sampling utilities

Configuration Files

api_keys.json - API credentials (not in repo)
configs.py - Model configurations and endpoints
prompts.py - Evaluation prompts and templates

📈 Usage Examples

Basic Model Identification

from base_inference import BaseLLMIdentityInference
from evaluation_utils import compute_accuracy_metrics

# Initialize framework
evaluator = BaseLLMIdentityInference()

# Generate response from target model  
response = await evaluator.generate_response(
    model_name="claude-3-7-sonnet",
    messages=[{"role": "user", "content": "Explain quantum computing"}]
)

# Use identifier model to classify the response
classification = await evaluator.generate_response(
    model_name="gpt-4o", 
    messages=[{"role": "user", "content": f"Identify this response: {response}"}]
)

Batch Evaluation

from unified_evaluation import UnifiedEvaluator

evaluator = UnifiedEvaluator(
    dataset_type="ccp",
    identifier_model="claude-3-7-sonnet", 
    target_model="gpt-4o"
)

results = await evaluator.run_inference()
print(f"Accuracy: {results['metrics']['family_accuracy']['accuracy']:.3f}")

Custom Analysis

from evaluation_utils import compute_multilabel_auc, plot_confusion_matrix

# Compute detailed metrics
auc_results = compute_multilabel_auc(y_true, y_pred_proba)
accuracy_results = compute_accuracy_metrics(y_true, y_pred)

# Visualize results
plot_confusion_matrix(
    accuracy_results['confusion_matrix'], 
    accuracy_results['labels'],
    title="Model Identification Results"
)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InterlocutorAwarenessLLM

🎯 Overview

🚀 Quick Start

Prerequisites

API Keys Setup

Basic Usage

1. Generate Model Responses

2. Run Identity Inference Evaluation

📁 Project Structure

Core Framework

Configuration Files

📈 Usage Examples

Basic Model Identification

Batch Evaluation

Custom Analysis

About

Uh oh!

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
api_keys.json		api_keys.json
base_inference.py		base_inference.py
configs.py		configs.py
evaluation_utils.py		evaluation_utils.py
identity_inference_generate_responses_ccp_refactored.py		identity_inference_generate_responses_ccp_refactored.py
prompts.py		prompts.py
sampled_data_indicies.py		sampled_data_indicies.py
unified_evaluation.py		unified_evaluation.py
unified_response_generator.py		unified_response_generator.py

younwoochoi/InterlocutorAwarenessLLM

Folders and files

Latest commit

History

Repository files navigation

InterlocutorAwarenessLLM

🎯 Overview

🚀 Quick Start

Prerequisites

API Keys Setup

Basic Usage

1. Generate Model Responses

2. Run Identity Inference Evaluation

📁 Project Structure

Core Framework

Configuration Files

📈 Usage Examples

Basic Model Identification

Batch Evaluation

Custom Analysis

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages