- Introduction
- Setup and Installation
- Overview of the Script
- Detailed Breakdown
- Best Practices
- Conclusion
The ARC AGI Solver is a comprehensive Python script designed to automate the process of solving puzzles from the Abstraction and Reasoning Corpus (ARC). This script leverages OpenAI's GPT models to generate transformation steps from input-output grid pairs, evaluate these steps, and apply them to solve corresponding test cases. The solver is built with scalability, robustness, and maintainability in mind, ensuring it can handle a wide range of puzzles efficiently.
Before running the ARC AGI Solver, ensure that your environment is correctly set up with all necessary dependencies and configurations.
-
Clone the Repository and Install Dependencies:
# Upgrade pip and install required packages pip install --upgrade pip pip install openai==0.28
-
Environment Variables:
-
OpenAI API Key: The script requires access to OpenAI's API. Store your API key securely using environment variables.
-
Create a
.env
file in your project directory with the following content:OPENAI_API_KEY=your_openai_api_key_here
-
Security Note: Ensure that the
.env
file is added to.gitignore
to prevent accidental commits of sensitive information.
-
-
-
Directory Structure:
- The script assumes a specific directory structure, especially for input and output files. Ensure that paths like
/kaggle/input/arc-prize-2024/arc-agi_training_challenges.json
are correctly set or adjusted based on your environment.
- The script assumes a specific directory structure, especially for input and output files. Ensure that paths like
The ARC AGI Solver is organized into several key sections, each responsible for different aspects of the puzzle-solving process:
- Data Loading and Preparation: Loads and preprocesses the ARC AGI dataset.
- Plot Generation: Visualizes input and output grids for better understanding.
- Prompt Templates: Defines various prompts used to interact with OpenAI's models.
- Response Generation: Generates transformation steps based on grid pairs.
- Scoring Responses: Evaluates the quality of the generated transformation steps.
- Evaluating Transformation Steps: Assesses the effectiveness of transformation steps.
- Solving Test Cases: Applies transformation steps to solve corresponding test puzzles.
- Aggregating and Saving Results: Compiles and stores all results for analysis.
Each section is modular, allowing for easy maintenance and potential future enhancements.
Functionality:
- Load JSON Data: Reads the ARC AGI training challenges from a JSON file.
- Flatten Data: Converts the nested JSON structure into a flat pandas DataFrame for easier processing.
- Grid Validation: Ensures that each grid is a valid 2D list of integers to prevent runtime errors.
Key Functions:
load_json_data(filepath: str) -> Dict[str, Any]
: Loads JSON data from the specified file.flatten_data(data: Dict[str, Any]) -> pd.DataFrame
: Flattens the nested JSON data into a pandas DataFrame.validate_grid(grid: List[List[int]]) -> bool
: Validates the structure and content of each grid.
Usage:
data = load_json_data(data_filepath)
df = flatten_data(data)
Functionality:
- Visual Representation: Generates visual plots of input and output grids to aid in understanding the transformations.
- Dynamic Text Coloring: Adjusts text color based on cell background for better readability.
Key Functions:
plot_grid(grid, ax, title="Grid", color_map='viridis', dpi=300)
: Plots a single grid.save_plot(df_row, folder_path, idx, dpi=300)
: Saves side-by-side plots of input and output grids for each puzzle.
Usage:
df = save_and_load_plots(df)
Functionality:
- Define Interaction Patterns: Specifies how prompts are structured when interacting with OpenAI's models.
- Variety of Approaches: Includes original, few-shot, and detailed prompt versions to experiment with different response qualities.
Templates Included:
- Original: Basic prompt asking for transformation steps.
- Few-Shot: Provides examples to guide the model.
- Detailed: In-depth instructions for generating comprehensive transformation steps.
Usage:
prompt = prompt_templates[prompt_version].format(
input_grid=row['ascii_input'],
output_grid=row['ascii_output']
)
Functionality:
- Leverage GPT Models: Uses OpenAI's ChatCompletion API to generate transformation steps based on input-output grid pairs.
- Retry Logic: Implements exponential backoff to handle transient API errors gracefully.
Key Function:
generate_response(prompt: str, generation_args: dict, max_retries: int = 5, backoff_factor: float = 0.5) -> str
: Generates a response from the model with retry capabilities.
Usage:
response_text = generate_response(prompt, full_generation_args)
Functionality:
- Evaluate Transformation Steps: Assesses the quality of generated transformation steps based on criteria like correctness, clarity, completeness, and creativity.
- JSON Formatting: Ensures that scores are returned in a structured JSON format for easy analysis.
Key Function:
score_response(response_text: str, scoring_args: dict, max_retries: int = 5, backoff_factor: float = 0.5) -> Dict[str, Optional[float]]
: Scores the response using the model.
Usage:
score_json = score_response(response_text, scoring_args)
Functionality:
- Detailed Assessment: Provides a comprehensive evaluation of transformation steps, including reflections and suggestions for improvements.
- Aggregate Scoring: Combines individual scores into an aggregated metric for easier comparison.
Key Components:
- Few-Shot Evaluation Examples: Supplies examples to guide the evaluator in scoring.
- Evaluation Prompt Template: Structures the prompt for evaluating transformation steps.
Usage:
final_evaluation_df = evaluate_transformation_rules(
detailed_results_df=detailed_results_df,
data_filepath=data_filepath,
output_filepath=evaluation_results_path,
few_shot_examples=few_shot_evaluation
)
Functionality:
- Apply Transformations: Utilizes the best transformation steps derived from training examples to solve corresponding test puzzles.
- JSON Parsing: Extracts transformation steps and output grids from model responses.
Key Functions:
solve_test_case(sample_input, sample_output, transformation_steps, test_input) -> Tuple[str, List[List[int]]]
: Solves a test case using provided transformation steps.evaluate_transformation_rules(...)
: Assesses and selects the best transformation steps for each test case.
Usage:
generated_steps, generated_output = solve_test_case(
sample_input=train_row['input'],
sample_output=train_row['output'],
transformation_steps=transformation_steps,
test_input=test_row['input']
)
Functionality:
- Compile Results: Gathers all generation, scoring, evaluation, and test case solution data into structured DataFrames.
- Save to CSV: Exports detailed and summarized results for further analysis.
Key Steps:
- Detailed Results:
arc_generation_scoring_results.csv
contains all generated transformation steps and their scores. - Average Scores:
arc_average_scores.csv
summarizes average scores across different model and prompt configurations. - Evaluation Scores:
arc_evaluation_scoring_results.csv
includes evaluations of transformation steps. - Test Case Solutions:
arc_test_case_solutions.csv
holds solutions to all test cases. - Best Transformations:
arc_best_transformations.csv
highlights top-performing transformations based on aggregated scores.
Usage:
final_df.to_csv(evaluation_results_path, index=False)
test_case_solutions_df.to_csv(test_case_solutions_path, index=False)
best_df.to_csv("/kaggle/working/arc_best_transformations.csv", index=False)