Agent Refactoring Analysis

Research infrastructure for studying how AI-assisted commits refactor Java code. The pipeline starts from the AIDev dataset, filters Java pull requests, detects refactorings (pattern-based + RefactoringMiner), and quantifies quality impacts with Designite metrics.

Overview

Dataset ingestion – Scripts 0–2 pull the HuggingFace snapshot, keep Java PRs, and expand them into commit-level tables.
Refactoring detection – Scripts 3a/3b combine heuristic signals with RefactoringMiner to build refactoring_commits.parquet and raw JSON traces.
Quality analysis – Scripts 6a–6b run DesigniteJava to emit commit/entity deltas for structural

Research questions – scripts/10_research_questions.py aggregates the outputs to answer RQ1–RQ5 (counts, Agentic vs human comparisons, smell deltas) with plots and CSVs under outputs/research_questions/.
Auxiliary tooling – Extra scripts (10a, 11–15, calculate_cohen_kappa.py) generate level-based breakdowns, visualisations, and inter-rater metrics for GPT motivation labels.

All derived artefacts live in data/analysis/ (parquet/CSV) and outputs/ (publication-ready plots). Environment variables such as DESIGNITE_JAVA_PATH, REPOS_BASE, READABILITY_TOOL_CMD, and REFMINER_MAX_COMMITS control the tooling without modifying code.

1. Repository Tour

config/                        Dataset paths, caching, heuristics
scripts/                       CLI entry points (0_* → 6_*)
src/                           Core libraries
  data_loader/                 HuggingFace dataset accessors
  phase1_java_extraction/      PR + commit filtering
  phase3_refactoring_analysis/ RefactoringMiner orchestration + parsers
  research_questions/          Analysis + metric aggregation utilities
  utils/                       Shared helpers

data/                          Downloaded inputs and generated outputs
  analysis/                    Phase-specific reports (refactoring, designite, quality)
  filtered/                    Intermediate parquet/csv extracted from dataset
  designite/                   Raw Designite CSV dumps grouped by repo/sha

tools/RefactoringMiner/        Vendored RefactoringMiner source/build

Key output folders:

data/filtered/java_repositories/* – filtered PRs and commit tables.
data/analysis/refactoring_instances/* – refactoring commits, type counts, summaries.
data/analysis/designite/* – commit-level metrics, box-plot stats, entity deltas.

2. Local Setup (recommended)

python -m venv .venv
source .venv/bin/activate
pip install --upgrade pip
pip install -r requirements.txt

Python 3.11+ and Java 17+ are required. The repository uses relative paths only; run scripts from the repo root.

Environment configuration

Create a .env (optional) or export variables in your shell:

Variable	Purpose	Example
`HF_TOKEN`	Needed if the HuggingFace dataset requires authentication	`hf_...`
`GITHUB_TOKEN` / `OAuthToken`	Used by RefactoringMiner for GitHub API access	`ghp_...` (write to `github-oauth.properties`)
`DESIGNITE_JAVA_PATH`	Absolute path to `DesigniteJava.jar`	`/path/to/DesigniteJava.jar`
`REPOS_BASE`	Root directory containing `<owner>/<repo>` clones	`/work/.../DesigniteRunner/cloned_repos`
`REFMINER_LOCAL_REPO`	(Optional) analyze a single local repo instead of using `REPOS_BASE`	`/path/to/repo`
`READABILITY_TOOL_CMD`	Shell command template to compute readability (`{input}`, `{output}` placeholders)	`python tools/CoRed/run.py {input} --verbose --output {output}`
`DESIGNITE_AUTO`	Toggle automatic Designite runs for missing snapshots	`0` to disable (default `1`)
`DESIGNITE_AUTO_CLONE`	Automatically clone repos missing under `REPOS_BASE`	`0` to disable (default `1`)
`DESIGNITE_GIT_REMOTE_TMPL`	Template for cloning repos when auto-clone is enabled	`https://github.com/{owner}/{repo}.git`
`DESIGNITE_MAX_RETRIES`	Retry count when DesigniteJava fails transiently	`3`
`DESIGNITE_TIMEOUT`	Seconds before a Designite run aborts	`600`
`DESIGNITE_WORKERS`	Repository workers for Designite delta generation	`4`
`READABILITY_WORKERS`	Repository workers for readability delta generation	`4`
`READABILITY_TIMEOUT`	Seconds before the readability command aborts	`600`

If you plan to re-run Designite or RefactoringMiner, make sure java is available on your PATH.

Docker-based workflow (optional)

The repository ships with a container image that bundles Python 3.11, Java 17, Git LFS, and the Python requirements. To build it:

docker build -t agent-refactoring-analysis .

Run an interactive shell with the data, tools, and outputs mounted so results persist on the host:

docker run --rm -it \
  -v "$(pwd)/data:/app/data" \
  -v "$(pwd)/outputs:/app/outputs" \
  -v "$(pwd)/tools:/app/tools" \
  -v "$(pwd)/github-oauth.properties:/app/github-oauth.properties:ro" \
  agent-refactoring-analysis

On first start the entrypoint clones and builds RefactoringMiner under tools/RefactoringMiner. The container exports the same environment variables as the local setup (HF_HOME, PYTHONPATH, and MPLCONFIGDIR=/tmp/matplotlib), so Matplotlib caches are written to a writable location.

Prefer docker compose if you want reusable service definitions (for example the bundled Jupyter profile):

docker compose up --build agent-refactoring-analysis
# or `docker compose --profile jupyter up` for notebooks

Remember to populate github-oauth.properties (or provide a different path via a bind mount) so RefactoringMiner can authenticate against the GitHub API.

3. Data Pipeline Overview

Each numbered script builds on the previous outputs. Run them from the repository root.

Step	Command	Description	Key Outputs
0	`python scripts/0_download_dataset.py`	Download & cache the source dataset snapshot from HuggingFace	`data/huggingface/...`
1	`python scripts/1_simple_java_extraction.py`	Filter to Java pull requests and persist metadata for downstream phases	`data/filtered/java_repositories/simple_java_prs.*`
2	`python scripts/2_extract_commits.py`	Expand PRs into commit-level rows and drop merge commits	`data/filtered/java_repositories/java_file_commits_for_refactoring.*`
3a	`python scripts/3_detect_refactoring.py`	Heuristic refactoring signals based on PR metadata and file changes	`data/analysis/refactoring_instances/commits_with_refactoring.parquet`
3b	`python scripts/3_apply_refactoringminer.py`	Run RefactoringMiner on candidate commits (saves raw JSON + parquet)	`data/analysis/refactoring_instances/refminer_refactorings.parquet`, `refminer_raw/`
4	`python scripts/4_analyze_refactoring_instance_and_type.py`	Merge heuristics + RefactoringMiner output into commit/type summaries	`data/analysis/refactoring_instances/refactoring_commits.parquet`, `refactoring_analysis.json`
6	`python scripts/6_designite_impact_analysis.py`	Compare Designite metrics for refactoring vs. control commits	`data/analysis/designite/commit_designite_metrics.*`, `designite_impact_summary.json`
6a	`python scripts/6a_prepare_designite_projects.py`	Emit Designite project list for bulk analysis	`tools/DesigniteRunner/projects.txt`
6b	`python scripts/6b_compute_designite_deltas.py [--workers N]`	Fetch parent/child snapshots, run Designite, emit per-entity deltas	`data/analysis/designite/deltas/*.parquet`

| 6d | python scripts/6_quality_analysis.py | Combine Designite + readability metrics into a quality-impact bundle | data/analysis/quality/*.parquet, quality_summary.json | | 7 | python scripts/7b_label_repositories_by_chatgpt.py | Tag repositories (production vs. toy) via GPT prompts using README excerpts | data/filtered/java_repositories/gpt_repository_labels.csv | | 10 | python scripts/10_research_questions.py | Generate the RQ1–RQ5 tables, plots, and CSV artefacts | outputs/research_questions/** |

Tip: Most scripts accept environment variables (e.g., REFMINER_MAX_COMMITS, REFMINER_SAVE_JSON, DESIGNITE_WORKERS) to control sample size and caching. Check the script source or run --help for details.

Additional analysis & visualisation helpers

The pipeline ships with focused scripts that build on the core outputs:

python scripts/7_manual_inspection_by_chatgpt.py – export structured prompts for manual GPT review of specific commits.
python scripts/10_research_questions.py – end-to-end RQ1–RQ5 bundle (tables, stats, violin plots, CSVs under outputs/research_questions/).
python scripts/10a_compute_rq3_totals_summary.py – derive Agentic vs. human refactoring totals for the RQ3 plots and tables.
python scripts/11_analyze_designite_smells.py – deep dive into Designite smell deltas (per-type aggregations and CSV summaries).
python scripts/12_visualize_designite_metrics.py – render per-metric before/after charts from the Designite delta outputs.
python scripts/13_visualize_rq_findings.py – collect the headline RQ plots into a single PDF bundle.
python scripts/14_metric_distribution_analysis.py – histogram/ECDF views for selected Designite metrics.
python scripts/15_visualize_metric_groups.py – grouped radar/heatmap views for metric families.
python scripts/calculate_cohen_kappa.py – compute inter-rater reliability and classification metrics for the GPT motivation labels (--summary and --confusion-dir flags control exports).

Utility scripts such as dataset_summary.py, plot_dataset.py, render_table.py, and sample_refactoring_motivations.py are living notebooks in script form—run them ad-hoc to explore subsets of the data without touching the main pipeline artefacts.

4. RefactoringMiner Integration

Build or drop the RefactoringMiner jar inside tools/RefactoringMiner:

mkdir -p tools
cd tools
git clone https://github.com/tsantalis/RefactoringMiner.git
cd RefactoringMiner
./gradlew shadowJar   # produces build/libs/RM-fat.jar

Provide GitHub credentials by writing github-oauth.properties with an OAuth token (see GitHub instructions below).
Use either scripts/3_detect_refactoring.py (quick validation) or scripts/4_refminer_analysis.py (full batch with raw JSON). Useful environment knobs:
- REFMINER_MAX_COMMITS=<N> – limit the number of commits analysed.
- REFMINER_SAVE_JSON=1 – persist raw responses under data/analysis/refactoring_instances/refminer_raw/.

Outputs include:

refminer_refactorings.parquet/csv – flattened refactoring instances.
refactoring_analysis.json – commit/file-level summary (instances per agent, percentages, etc.).

5. Designite Quality Metrics

Clone DesigniteRunner tools/DesigniteRunner:

cd tools
git clone https://github.com/Mont9165/DesigniteRunner.git
cd DesigniteRunner

Prepare designite projects (come back project directory):

python scripts/6a_prepare_designite_projects.py

5.1 Generating commit-level metrics

Prerequisites:

Designite command-line jar (DESIGNITE_JAVA_PATH).
Designite output directories populated under data/designite/outputs/<owner>/<repo>/<sha>/. You can populate them by running Designite manually or by invoking the helper in scripts/tools/.

Run:

python scripts/6_designite_impact_analysis.py

Outputs:

commit_designite_metrics.parquet|csv – aggregated metrics per commit.
designite_impact_summary.json – mean/median deltas, descriptive stats, agent breakdowns.
designite_boxplot_data.json – five-number summaries for plotting box plots.

5.2 Before/after deltas for refactoring entities

To quantify deltas for specific classes/methods touched by refactorings:

export DESIGNITE_JAVA_PATH=/absolute/path/to/DesigniteJava.jar
export REPOS_BASE=/absolute/path/to/cloned_repos
# optional: export HF_TOKEN, REFMINER_LOCAL_REPO, etc.
python scripts/6b_compute_designite_deltas.py --max-commits 50  # remove flag for full run

# (Optional) Run readability deltas with CoRed
export READABILITY_TOOL_CMD="python tools/CoRed/run.py {input} --verbose --output {output}"
python scripts/6c_readability_impact.py --max-commits 50

> **Parallelism:** Add `--workers N` or export `DESIGNITE_WORKERS` / `READABILITY_WORKERS` to process repositories concurrently. On an M1 Pro (8 cores), `N=4`–`6` keeps the JVM busy without overwhelming the system.
> **Timeouts:** Use `--timeout` (or export `DESIGNITE_TIMEOUT` / `READABILITY_TIMEOUT`) to relax the default 600s limit when running expensive snapshots.

### 5.3 Readability tooling (CoRed)

[CoRed](https://github.com/grosa1/CoRed) implements the readability model from Scalabrino et al. (JSEP 2018). This repository includes a Python wrapper (`tools/CoRed/run.py`) that batches large repositories and accepts an `--output` path so the pipeline can consume its CSV results directly.

python tools/CoRed/run.py path/to/project --verbose --output report.csv


To integrate with the analysis scripts, export `READABILITY_TOOL_CMD` so it points to the wrapper (placeholders `{input}` and `{output}` are substituted at runtime):

export READABILITY_TOOL_CMD="python tools/CoRed/run.py {input} --verbose --output {output}"


The wrapper always writes `file_name,score,level` rows, which `scripts/6c_readability_impact.py` uses to measure before/after changes in files touched by each refactoring.

Behaviour:

Looks up each refactoring commit in commits_with_refactoring.parquet.
Ensures a clone exists under REPOS_BASE (auto-clones from GitHub when DESIGNITE_AUTO_CLONE=1).
Fetches the child commit and its parent and checks them out in temporary worktrees.
Runs DesigniteJava.jar for both snapshots (respects DESIGNITE_MAX_RETRIES).
Matches entity keys (type or method) using file paths + qualified names from RefactoringMiner locations.
Emits per-metric deltas and an aggregated summary (delta_summary.*).

Troubleshooting:

Designite outputs still missing: confirm DESIGNITE_JAVA_PATH points to a licensed jar and that java -jar … succeeds for the target commit. The script logs failures with the offending SHA.
Repo clone fails: set DESIGNITE_AUTO_CLONE=0 to disable auto-clone and provide the repo manually under REPOS_BASE.
Large runs: use --max-commits to limit the batch size while validating your setup.

6. GitHub Token Setup (for RefactoringMiner)

Visit Settings → Developer settings → Personal access tokens (classic).
Generate a token with public_repo scope (or broader if needed).
Store it in github-oauth.properties at the repo root:
```
OAuthToken=ghp_your_token_here
```
The scripts automatically read this file when invoking RefactoringMiner.

7. Research Question Artifacts

RQ	Output	Location
RQ1 / RQ2 (Frequency)	Counts of refactoring commits/instances & self-affirmation	`data/analysis/refactoring_instances/refactoring_analysis.json`, `refactoring_commits.parquet`
RQ3 (Type)	Refactoring type distributions	`data/analysis/refactoring_instances/refactoring_type_counts_*.{csv,json}`
RQ4 (Purpose)	Qualitative summaries / prompts (if generated)	`outputs/` or `notebooks/` (see repo history)
RQ5 (Impact)	Quality metrics before/after refactorings	`data/analysis/designite/`, `data/analysis/quality/`, `outputs/research_questions/rq5_quality/*`

After running the numbered scripts you can generate the RQ summary bundle (including the new RQ5 quality impact report) with:

python scripts/10_research_questions.py

All JSON/CSV artifacts land in outputs/research_questions/. For custom analyses, load the parquet files into notebooks under notebooks/ or create new scripts under scripts/.

8. Troubleshooting Checklist

Missing dataset files: re-run scripts/0_download_dataset.py; ensure HF_TOKEN is set if the dataset is private.
RefactoringMiner fails: double-check the jar path in tools/RefactoringMiner, ensure the GitHub token is valid, and confirm java is Java 17+.
Designite delta script prints "No deltas computed":
- DESIGNITE_JAVA_PATH must point to the jar.
- REPOS_BASE must contain <owner>/<repo> clones with the analysed SHAs (git fetch origin <sha> if missing).
- Existing child Designite outputs live under data/designite/outputs/...; the script creates parent snapshots automatically when possible.
Large parquet reads fail: install optional dependencies like pyarrow==15.x (already listed in requirements.txt).

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
config		config
data		data
outputs		outputs
scripts		scripts
src		src
tests		tests
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
HISTORY.md		HISTORY.md
README.md		README.md
Refactoring_Level_Classification.md		Refactoring_Level_Classification.md
docker-compose.yml		docker-compose.yml
docker-entrypoint.sh		docker-entrypoint.sh
github-oauth.properties		github-oauth.properties
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Agent Refactoring Analysis

Overview

1. Repository Tour

2. Local Setup (recommended)

Environment configuration

Docker-based workflow (optional)

3. Data Pipeline Overview

Additional analysis & visualisation helpers

4. RefactoringMiner Integration

5. Designite Quality Metrics

5.1 Generating commit-level metrics

5.2 Before/after deltas for refactoring entities

6. GitHub Token Setup (for RefactoringMiner)

7. Research Question Artifacts

8. Troubleshooting Checklist

About

Uh oh!

Releases

Packages

Languages

Mont9165/Agent_Refactoring_Analysis

Folders and files

Latest commit

History

Repository files navigation

Agent Refactoring Analysis

Overview

1. Repository Tour

2. Local Setup (recommended)

Environment configuration

Docker-based workflow (optional)

3. Data Pipeline Overview

Additional analysis & visualisation helpers

4. RefactoringMiner Integration

5. Designite Quality Metrics

5.1 Generating commit-level metrics

5.2 Before/after deltas for refactoring entities

6. GitHub Token Setup (for RefactoringMiner)

7. Research Question Artifacts

8. Troubleshooting Checklist

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages