Name	Name	Last commit message	Last commit date
parent directory ..
src/self_debug	src/self_debug
README.md	README.md
pyproject.toml	pyproject.toml
requirements.txt	requirements.txt
setup.py	setup.py

JavaMigration - SDFeedback

1. 📖 Overview
- 1.1 MigrationBench: Datasets and Evaluation Framework
- 1.2 JavaMigration (SDFeedback): Migration with LLMs
2. 🤗 MigrationBench Datasets
3. Code Migration with LLMs
- 3.1 Single Job
- 3.2 Batch Job
  - 3.2.1 ~~Local Run~~
  - 3.2.2 EMRS Run
4. 📚 Citation

1. 📖 Overview

JavaMigration (SDFeedback) is a library to conduct code migration with LLMs, and improves efficacy by providing feedback to LLMs as specific as possible, motivated by Teaching Large Language Models to Self-Debug.

Reference paper: MigrationBench: Repository-Level Code Migration Benchmark from Java 8

1.1 MigrationBench: Datasets and Evaluation Framework

🤗 MigrationBench is a large-scale code migration benchmark dataset at the repository level, across multiple programming languages.
- Current and initial release includes java 8 repositories with the maven build system, as of May 2025.
- See more details in 2. 🤗 MigrationBench Datasets
MigrationBench is the evaluation framework to assess code migration success, from java 8 to 17 or any other long-term support versions.

1.2 JavaMigration (SDFeedback): Migration with LLMs

JavaMigration (SDFeedback) (current package) is to conduct code migration with LLMs as a baseline solution, and it relies on the MigrationBench package for the final evaluation.

It builds an ECR image and then
It runs both code migration and final evaluation with AWS Elastic Map Reduce Serverless (EMRS) in a scalable way.

2. 🤗 MigrationBench Datasets

There are three datasets in 🤗 MigrationBench:

All repositories included in the datasets are available on GitHub, under the MIT or Apache-2.0 license.

Index	Dataset	Size	Notes
1	🤗 `AmazonScience/migration-bench-java-full`	5,102	Each repo has a test directory or at least one test case
2	🤗 `AmazonScience/migration-bench-java-selected`	300	A subset of 🤗 `migration-bench-java-full`
3	🤗 `AmazonScience/migration-bench-java-utg`	4,814	The unit test generation (utg) dataset, disjoint with 🤗 `migration-bench-java-full`

3. Code Migration with LLMs

We support running code migration for MigrationBench in two modes:

Single job mode: For a single repository and
Batch job mode: For multiple repositories with EMRS
- TL;DR: To run batch mode, one can skip to 3.2.2 EMRS Run directly.

3.1 Single Job

To get started with code migration with LLMs from java 8 to 17, under either minimal migration or maximal migration (See the arXiv paper for the definition):

3.1.1 Basic Setup

Verify you have java 17, maven 3.9.6 and conda (optional) locally:

# java
~ $ java --version
openjdk 17.0.15 2025-04-15 LTS
OpenJDK Runtime Environment Corretto-17.0.15.6.1 (build 17.0.15+6-LTS)
OpenJDK 64-Bit Server VM Corretto-17.0.15.6.1 (build 17.0.15+6-LTS, mixed mode, sharing)

# maven
~ $ mvn --version
Apache Maven 3.9.6 (bc0240f3c744dd6b6ec2920b3cd08dcc295161ae)
Maven home: /usr/local/bin/apache-maven-3.9.6
Java version: 17.0.15, vendor: Amazon.com Inc., runtime: /usr/lib/jvm/java-17-amazon-corretto.x86_64
Default locale: en_US, platform encoding: UTF-8
OS name: "linux", version: "5.10.236-208.928.amzn2int.x86_64", arch: "amd64", family: "unix"

If you haven't done it yet, follow the instructions in MigrationBench to install Maven.

# conda (Optional)
$ conda --version
conda 25.1.1

3.1.2 Install JavaMigration (SDFeedback)

Option A: Using uv (Recommended)

git clone https://github.com/amazon-science/JavaMigration.git
cd JavaMigration/self_debug

# Create and activate virtual environment with uv
uv venv --python 3.9
source .venv/bin/activate

# Install package (MigrationBench is installed automatically as a dependency)
uv pip install -e .

# Or with dev dependencies
uv pip install -e ".[dev]"

Option B: Using conda

git clone https://github.com/amazon-science/JavaMigration.git
cd JavaMigration/self_debug

# Optional: create a conda env
# conda create -n sd-feedback python=3.9
# conda activate sd-feedback

# Install package (MigrationBench is installed automatically as a dependency)
pip install -r requirements.txt -e .

# conda deactivate

3.1.3 Local Run

To run code migration for a single repository:

# cd .../JavaMigration/self_debug/
cd src/self_debug/

# Explicit `max_iteration` will override it in the `config_file`
python run_self_debugging.py --config_file configs/java_config.pbtxt  # --max_iterations 3

3.2 Batch Job

To run code migration in batch mode for multiple repositories, one can run it ~~either locally or~~ through EMRs.

3.2.1 Local Run

TL;DR: Local run for batch job is typically for debugging and integration test purposes, and it's NOT recommended.

See relevant spark scripts for reference:

src/self_debug/batch/spark_build.py
src/self_debug/batch/spark_debug.py

3.2.2 EMRS Run

Before submitting a job to EMRS, make sure you have the following ready:

Set up IAM roles, network, security groups, etc correctly
Set up ECR repository
Set up SES (optional)

Build an ECR image

# cd .../JavaMigration/self_debug/
cd src/self_debug/container

# To build ECR image: 552793110740.dkr.ecr.us-east-1.amazonaws.com/$USER:java
./image.sh java $USER 1 docker/java.Dockerfile  # 999999999999.dkr.ecr.us-west-2.amazonaws.com

Submit a spark job to EMRS

Note that security keys might be subject to 12h timeout.

# cd .../JavaMigration/self_debug/
cd src/self_debug/batch

# Update config file as needed for `emrs.py`, e.g. use the right ECR image in step `#1`
CONFIG=...
export APPLICATION=emrs-dbg-{user}--{date}--run00
export SCRIPT=debugger

python emrs.py --config_file=$CONFIG --application=$APPLICATION --script=$SCRIPT --user=$USER  # --dry_run=1

4. 📚 Citation

@misc{liu2025migrationbenchrepositorylevelcodemigration,
      title={MigrationBench: Repository-Level Code Migration Benchmark from Java 8},
      author={Linbo Liu and Xinle Liu and Qiang Zhou and Lin Chen and Yihan Liu and Hoan Nguyen and Behrooz Omidvar-Tehrani and Xi Shen and Jun Huan and Omer Tripp and Anoop Deoras},
      year={2025},
      eprint={2505.09569},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2505.09569},
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

JavaMigration - SDFeedback

1. 📖 Overview

1.1 MigrationBench: Datasets and Evaluation Framework

1.2 JavaMigration (SDFeedback): Migration with LLMs

2. 🤗 MigrationBench Datasets

3. Code Migration with LLMs

3.1 Single Job

3.1.1 Basic Setup

3.1.2 Install JavaMigration (SDFeedback)

3.1.3 Local Run

3.2 Batch Job

3.2.1 Local Run

3.2.2 EMRS Run

4. 📚 Citation

FilesExpand file tree

self_debug

Directory actions

More options

Directory actions

More options

Latest commit

History

self_debug

Folders and files

parent directory

README.md

JavaMigration - SDFeedback

1. 📖 Overview

1.1 MigrationBench: Datasets and Evaluation Framework

1.2 JavaMigration (SDFeedback): Migration with LLMs

2. 🤗 MigrationBench Datasets

3. Code Migration with LLMs

3.1 Single Job

3.1.1 Basic Setup

3.1.2 Install JavaMigration (SDFeedback)

3.1.3 Local Run

3.2 Batch Job

3.2.1 Local Run

3.2.2 EMRS Run

4. 📚 Citation