Skip to content

swatigoyal911/document_localizer_challenge

Repository files navigation

Document Localization Studio (Free Stack)

A structure-aware document localization project for the GitHub Copilot CLI Challenge, built only with free libraries.

Supported inputs

  • .txt
  • .docx
  • text-based .pdf
  • screenshots/images: .png, .jpg, .jpeg (OCR pipeline)

Supported locales (requested set)

  • de_de, es_es, fr_fr, it_it, ja_jp, ko_kr, pt_br, zh_cn, zh_tw

Feature coverage

  • Language/term localization (rule-based)
  • Currency + locale default USD FX (editable)
  • Date/time + timezone conversion
  • Measurement/unit conversion (mi -> km, lb -> kg, F -> C)
  • Address/phone/postal adaptation by locale
  • Tax/VAT/GST label adaptation + compliance labels
  • Legal clause lock/protect zones ([[LOCK]]...[[/LOCK]])
  • Terminology memory (term_memory.json)
  • Style/tone presets (formal, legal, technical, marketing)
  • Table overflow risk hints (DOCX)
  • Cross-reference/TOC/page-reference QA warnings
  • Font/script fallback QA checks for CJK
  • Approval workflow states (Draft, Legal Review, Final)

UI highlights

  • Attractive dashboard UI (hero header, styled cards)
  • Animated Before/After scorecards
  • Visual side-by-side diff
  • Layout risk heatmap

Stack

  • streamlit
  • python-docx
  • pypdf
  • reportlab
  • pymupdf (layout-preserving PDF localization)
  • pillow
  • pytesseract

Setup

cd "/Users/swatigoyal/Documents/New project/document_localizer_challenge"
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

Run UI

streamlit run app.py

Run CLI

python -m localizer.cli input.docx output.docx --locale de_de
python -m localizer.cli input.pdf output.pdf --locale ja_jp --source-timezone America/Los_Angeles --tone legal
python -m localizer.cli screenshot.png localized.txt --locale fr_fr --workflow "Legal Review"

OCR note

Screenshot OCR requires a local Tesseract binary in addition to pytesseract.

  • macOS (example): brew install tesseract

Demo script for challenge

  1. Upload DOCX/PDF/screenshot.
  2. Change locale and watch default FX auto-update.
  3. Adjust tone + workflow state.
  4. Run localization and show scorecards, diff, heatmap, QA.
  5. Download localized output + QA report.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors