WidgetAnchor

training / proper data collection is pending ...

WidgetAnchor

Fine-tunes GLM-4.1V-9B-Thinking (a 9B-param VLM) with LoRA to reverse-engineer UI widgets: screenshot → React+Tailwind TSX component. Pipeline: data collection → SFT → GRPO → inference → evaluation.

Overview

WidgetAnchor fine-tunes zai-org/GLM-4.1V-9B-Thinking in two phases:

Supervised Fine-Tuning (SFT) — cross-entropy on structured chain-of-thought completions (5-section reasoning trace + React component)
Group Relative Policy Optimization (GRPO) — custom PPO-clip implementation with hybrid reward: programmatic image similarity metrics + VLM pairwise judging via round-robin tournament

The model takes a widget screenshot as input and outputs <think>CoT reasoning</think><code>React+Tailwind TSX</code>.

Project Structure

data/
  collection/
    collect_widget_factory.py   # Reverse-engineer 10K widget screenshots via GPT-4o
    generate_synthetic.py       # Generate 10K+ synthetic widgets from random categories
  processing/
    render_verify.py            # Render TSX, filter by SSIM >= 0.85
    dedup.py                    # dHash + LSH deduplication
    tag_difficulty.py           # simple/medium/complex by token count
    split_data.py               # 90/5/5 train/val/test split
  annotation/
    generate_cot.py             # GPT-4o generates structured CoT traces
  run_pipeline.sh               # Master script for full data pipeline

training/
  sft.py                        # SFT loop (defines SYSTEM_PROMPT, shared constants)
  training_loop_grpo.py         # Custom GRPO with reference adapter trick
  pipeline.py                   # Orchestrates SFT → GRPO
  run_grpo_only.py              # Resume GRPO from SFT checkpoint

reward/
  programmatic.py               # 5-metric blend (SSIM, LPIPS, palette, contrast, polarity)
  vlm_reward.py                 # GPT-4o absolute + pairwise scoring
  round_robin.py                # Tournament ranking for GRPO rollouts
  composite.py                  # 40% programmatic + 60% VLM hybrid

inference/
  generate.py                   # Single-shot generation
  best_of_n.py                  # N candidates → highest scoring
  polish.py                     # Iterative multi-round refinement

evaluation/
  run_benchmarks.py             # Generate TSX + PNG for Widget2Code benchmark
  setup_benchmark.sh            # Download benchmark data

render/                         # esbuild + Playwright TSX → PNG pipeline

Setup

pip install torch transformers peft einops flask playwright openai scikit-image opencv-python lpips
playwright install chromium
cd render && npm install

Requires an .env file with OPENAI_API_KEY for GPT-4o calls (data collection, CoT annotation, VLM reward).

Usage

1. Collect and generate training data

Two parallel data collection paths writing to output/raw/ with non-colliding ID prefixes (widget-* and synthetic-*):

# Full pipeline (collection + processing + annotation + split)
bash data/run_pipeline.sh

# Or run individually:
python -m data.collection.collect_widget_factory --num_samples 10000 --output_dir ./output/raw --workers 8
python -m data.collection.generate_synthetic --num_samples 10000 --output_dir ./output/raw --workers 8

2. Train

Runs SFT then GRPO. Checkpoints saved to checkpoints/.

python -m training.pipeline

Or resume GRPO from an existing SFT checkpoint:

python -m training.run_grpo_only

3. Inference

# Single-shot
python -m inference.generate --checkpoint /path/to/checkpoint --image widget.png

# Best-of-N + iterative polish
python -m inference.polish --checkpoint /path/to/checkpoint --image widget.png

4. Evaluate

bash evaluation/setup_benchmark.sh
python -m evaluation.run_benchmarks

Reward Signal

Programmatic (`compute_reward_code`) — weighted blend of 5 metrics:

Metric	Weight	Description
SSIM	0.15	Structural similarity
LPIPS	0.15	Perceptual distance (AlexNet)
Palette	0.20	K-means color clustering + Hungarian matching
Contrast	0.15	Grayscale std deviation similarity
Polarity	0.10	Lightness histogram correlation (LAB space)
Layout	0.20	Intersection over Union

GRPO reward — round-robin tournament:

Pre-filter: programmatic score < 0.3 → eliminated
All-pairs VLM pairwise comparison on survivors
Win counts = reward signal

Key Hyperparameters

Parameter	SFT	GRPO
Learning rate	1e-4	1e-5
LoRA rank / alpha	16 / 32	16 / 32
Clipping (eps)	-	0.2
KL penalty (beta)	-	0.05
Generations per prompt (n)	-	5
Inner epochs	-	4

Model

Base: zai-org/GLM-4.1V-9B-Thinking (9B param VLM)
Adaptation: LoRA r=16, α=32, dropout=0.05, targets: q/k/v/o projections
Precision: bfloat16 with gradient checkpointing

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.vscode		.vscode
data		data
evaluation		evaluation
inference		inference
render		render
reward		reward
training		training
.gitignore		.gitignore
README.md		README.md
grpo_log.txt		grpo_log.txt
sft.png		sft.png
sft_hf.png		sft_hf.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WidgetAnchor

Overview

Project Structure

Setup

Usage

1. Collect and generate training data

2. Train

3. Inference

4. Evaluate

Reward Signal

Programmatic (`compute_reward_code`) — weighted blend of 5 metrics:

GRPO reward — round-robin tournament:

Key Hyperparameters

Model

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

WidgetAnchor

Overview

Project Structure

Setup

Usage

1. Collect and generate training data

2. Train

3. Inference

4. Evaluate

Reward Signal

Programmatic (compute_reward_code) — weighted blend of 5 metrics:

GRPO reward — round-robin tournament:

Key Hyperparameters

Model

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Programmatic (`compute_reward_code`) — weighted blend of 5 metrics:

Packages