LLM-Based Super Mario Bros Reinforcement Learning

This project explores using vision-language models (VLMs) as decision-making agents for playing Super Mario Bros through reinforcement learning. We implement two distinct approaches: a PPO-trained vision-language model and a traditional Double Deep Q-Network (DDQN) for comparison.

Project Overview

Our project implements two complementary approaches:

1. Vision-Language Model with PPO (`llm_mario/`)

Model: Qwen2.5-VL-3B-Instruct vision-language model
Fine-tuning: LoRA (Low-Rank Adaptation) with 4-bit quantization
Training: Proximal Policy Optimization (PPO) with custom policy/value heads
Input: Game screenshots (base64 encoded) + game state information
Output: Action selection with value estimation

The VLM agent processes both visual game screenshots and textual game state information to make informed decisions. The model is fine-tuned using LoRA adapters and trained with PPO to learn optimal Mario gameplay policies.

2. Traditional DDQN Baseline (`ddqn/`)

Architecture: Convolutional neural network with experience replay
Algorithm: Double Deep Q-Network (DDQN)
Input: Preprocessed game frames
Purpose: Baseline comparison for the VLM approach

Setup

Prerequisites

Install the required dependencies:
```
pip install -r requirements.txt
```
Ensure CUDA is available for GPU acceleration (required for the VLM approach).

Running the Code

Vision-Language Model (PPO)

cd llm_mario
python main.py --train --display --wandb  # Training mode with visualization and logging
python main.py --display  # Evaluation mode

DDQN Baseline

cd ddqn
python main.py  # Run DDQN training

Implementation Details

Vision-Language Model Features

Multi-modal input: Combines visual screenshots with structured game state
Memory efficient: 4-bit quantization with gradient checkpointing
Advanced training: PPO with Generalized Advantage Estimation (GAE)
Monitoring: Weights & Biases integration for experiment tracking

Technical Architecture

Base Model: Qwen2.5-VL-3B-Instruct (3 billion parameters)
Adaptation: LoRA fine-tuning (rank 16, alpha 32)
Policy Network: Custom policy/value heads on top of VLM features
Environment: Super Mario Bros 1-1 with action space simplification

Team

Ishan Khare
Gabriel Seir
Anthony Zhan

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
Data Parsing		Data Parsing
baselines		baselines
ddqn		ddqn
llm_mario		llm_mario
.gitignore		.gitignore
MarioLLM-Paper.pdf		MarioLLM-Paper.pdf
README.md		README.md
image.jpg		image.jpg
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM-Based Super Mario Bros Reinforcement Learning

Project Overview

1. Vision-Language Model with PPO (`llm_mario/`)

2. Traditional DDQN Baseline (`ddqn/`)

Setup

Prerequisites

Running the Code

Vision-Language Model (PPO)

DDQN Baseline

Implementation Details

Vision-Language Model Features

Technical Architecture

Team

Related Work

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

iskhare/MarioLLM

Folders and files

Latest commit

History

Repository files navigation

LLM-Based Super Mario Bros Reinforcement Learning

Project Overview

1. Vision-Language Model with PPO (llm_mario/)

2. Traditional DDQN Baseline (ddqn/)

Setup

Prerequisites

Running the Code

Vision-Language Model (PPO)

DDQN Baseline

Implementation Details

Vision-Language Model Features

Technical Architecture

Team

Related Work

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

1. Vision-Language Model with PPO (`llm_mario/`)

2. Traditional DDQN Baseline (`ddqn/`)

Packages