humanai-foundation · sushantkhemalapure · Apr 8, 2026 · Apr 29, 2026
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,11 @@
 RenAIssance_Transformer_OCR_Utsav_Rai/weights
 RenAIssance_Transformer_OCR_Utsav_Rai/models
-RenAIssance_Transformer_OCR_Utsav_Rai/quantized_model
+RenAIssance_Transformer_OCR_Utsav_Rai/quantized_model
+RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/models/*.pt
+RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/models/*.pth
+RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/ssl/word_images/*
+!RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/ssl/word_images/.gitkeep
+RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/*/word_images/*
+!RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/perfecto/word_images/.gitkeep
+!RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/ezcaray/word_images/.gitkeep
+!RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/virtuosa/word_images/.gitkeep
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/README.md b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/README.md
@@ -1,24 +1,70 @@
 # Spanish Historical OCR using Self-Supervised Learning
 
 ## Overview
-This repository implements a word-level OCR model for Renaissance Spanish documents using Self-Supervised Learning. The model was developed with reference to SeqCLR ([Aberdam A., et al., 2021](https://arxiv.org/abs/2012.10873)). According to the paper, SeqCLR employs a Contrastive Learning method, wherein its encoder learns to become robust against certain image transformations. The architecture includes a combination of ResNet50(or ViT tiny) and a 2-layer BiLSTM as the Encoder, and an Attention LSTM Decoder. At this point, the model achieves approximately 4% CER. This model can be tested in `test_model.ipynb`. For further information, please refer to my [blog](https://medium.com/@yamanko1234/historical-ocr-with-self-supervised-learning-c4f00da6637f).
+This repository implements a word-level OCR model for Renaissance Spanish documents using self-supervised learning. The model was developed with reference to SeqCLR ([Aberdam A., et al., 2021](https://arxiv.org/abs/2012.10873)). According to the paper, SeqCLR uses contrastive learning so its encoder becomes robust to image transformations. The architecture combines a ResNet50 (or ViT tiny) and a 2-layer BiLSTM encoder with an attention LSTM decoder.
+
+At this point, the model achieves approximately 4% CER. This model can be tested in `test_model.ipynb`. For more background, see the [project blog post](https://medium.com/@yamanko1234/historical-ocr-with-self-supervised-learning-c4f00da6637f).
+
+## Portable Configuration
+The default `config.json` now uses paths relative to this folder instead of machine-specific absolute paths. That makes the project easier to clone and configure on another machine.
+
+Populate the directories below with your local datasets and checkpoints, or update `config.json` to match your own layout:
+
+```text
+RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/
+├── config.json
+├── data/
+│   ├── ssl/
+│   │   └── word_images/
+│   └── finetuning/
+│       ├── perfecto/
+│       │   ├── word_images/
+│       │   └── word_images.csv
+│       ├── ezcaray/
+│       │   ├── word_images/
+│       │   └── word_images.csv
+│       └── virtuosa/
+│           ├── word_images/
+│           └── word_images.csv
+├── models/
+└── test_images/
+```
+
+The bundled `test_images/` folder is used as the default `test dataset` path so contributors can validate notebook setup without first changing that entry.
+
+Before running the notebooks, you can verify the configured paths:
+
+```bash
+python check_config_paths.py
+```
 
 ## File/Folder Descriptions
-- **Tokenizer**: A folder containing Tokenizer pickle files for the Decoder training.
-- **test_image**: A folder containing images used for testing.
-- **Decoder.py**: Implementation of the SeqCLR’s Decoder.
-- **ResNet.py**: Implementation of ResNet, a component of the Encoder.
-- **config.json**: A JSON file that sets the configuration for training.
-- **custom_dataset.py**: Implementation of a custom dataset used in training.
-- **decoder_training.ipynb**: A notebook to train the Decoder.
-- **encoder.py**: Implementation of the SeqCLR’s Encoder.
-- **ViT_encoder.py** Implementation of ViT version Encoder.
-- **encoder_training.ipynb**: A notebook to train the Encoder.
-- **test_model.ipynb**: A notebook to test a saved model.
+- **Tokenizer**: Pickle files used for decoder training and decoding.
+- **data**: Local SSL and fine-tuning datasets referenced by `config.json`.
+- **models**: Saved encoder and decoder checkpoints.
+- **test_images**: Sample images used for testing.
+- **Decoder.py**: SeqCLR decoder implementation.
+- **ResNet.py**: ResNet implementation used by the encoder.
+- **config.json**: Training and inference configuration.
+- **check_config_paths.py**: Helper script that verifies configured dataset and model paths exist.
+- **custom_dataset.py**: Custom dataset implementations used in training.
+- **decoder_training.ipynb**: Notebook for decoder training and evaluation.
+- **encoder.py**: SeqCLR encoder implementation.
+- **ViT encoder support**: The notebooks include an optional ViT encoder path controlled by `config.json`.
+- **encoder_training.ipynb**: Notebook for encoder training.
+- **test_model.ipynb**: Notebook for testing a saved model.
 
 ## Testing the Model
-First, you need to install the dependencies:
-```
+Install the dependencies:
+
+```bash
 pip install -r requirements.txt
 ```
-Then, you can test the saved model by executing the cells in `test_model.ipynb` one by one.
+
+Confirm `config.json` points to valid paths for your environment:
+
+```bash
+python check_config_paths.py
+```
+
+Then run the cells in `test_model.ipynb`.
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/check_config_paths.py b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/check_config_paths.py
@@ -0,0 +1,85 @@
+from __future__ import annotations
+
+import json
+from pathlib import Path
+
+
+PROJECT_ROOT = Path(__file__).resolve().parent
+CONFIG_PATH = PROJECT_ROOT / "config.json"
+
+
+def resolve_path(raw_path: str | None) -> str:
+    if raw_path is None:
+        return "<not set>"
+    return str((PROJECT_ROOT / raw_path).resolve())
+
+
+def path_exists(raw_path: str | None) -> bool | None:
+    if raw_path is None:
+        return None
+    return (PROJECT_ROOT / raw_path).exists()
+
+
+def iter_config_paths(config: dict) -> list[tuple[str, str | None, bool]]:
+    return [
+        ("SSL.dataset 1", config["SSL"].get("dataset 1"), True),
+        ("SSL.dataset 2", config["SSL"].get("dataset 2"), False),
+        ("SSL.dataset 3", config["SSL"].get("dataset 3"), False),
+        ("SSL.saved Encoder path", config["SSL"].get("saved Encoder path"), False),
+        ("fine-tuning.dataset 1", config["fine-tuning"].get("dataset 1"), True),
+        ("fine-tuning.dataset 1 csv", config["fine-tuning"].get("dataset 1 csv"), True),
+        ("fine-tuning.dataset 2", config["fine-tuning"].get("dataset 2"), False),
+        ("fine-tuning.dataset 2 csv", config["fine-tuning"].get("dataset 2 csv"), False),
+        ("fine-tuning.dataset 3", config["fine-tuning"].get("dataset 3"), False),
+        ("fine-tuning.dataset 3 csv", config["fine-tuning"].get("dataset 3 csv"), False),
+        ("fine-tuning.test dataset", config["fine-tuning"].get("test dataset"), True),
+        (
+            "fine-tuning.Encoder path for fine-tuning",
+            config["fine-tuning"].get("Encoder path for fine-tuning"),
+            False,
+        ),
+        (
+            "fine-tuning.Decoder path for fine-tuning",
+            config["fine-tuning"].get("Decoder path for fine-tuning"),
+            False,
+        ),
+        ("fine-tuning.char to token", config["fine-tuning"].get("char to token"), True),
+        ("fine-tuning.token to char", config["fine-tuning"].get("token to char"), True),
+        ("fine-tuning.saved Encoder path", config["fine-tuning"].get("saved Encoder path"), False),
+        ("fine-tuning.saved Decoder path", config["fine-tuning"].get("saved Decoder path"), False),
+    ]
+
+
+def main() -> int:
+    with CONFIG_PATH.open("r", encoding="utf-8") as config_file:
+        config = json.load(config_file)
+
+    print(f"Checking paths in {CONFIG_PATH}")
+    print()
+
+    missing_required = False
+    for label, raw_path, must_exist in iter_config_paths(config):
+        exists = path_exists(raw_path)
+        absolute_path = resolve_path(raw_path)
+        if exists is None:
+            status = "OPTIONAL"
+        elif exists:
+            status = "OK"
+        elif not must_exist:
+            status = "OPTIONAL"
+        else:
+            status = "MISSING"
+            missing_required = True
+        print(f"[{status:<8}] {label}: {absolute_path}")
+
+    print()
+    if missing_required:
+        print("Some configured paths are missing. Update config.json or place your data/models in the expected folders.")
+        return 1
+
+    print("All configured paths exist.")
+    return 0
+
+
+if __name__ == "__main__":
+    raise SystemExit(main())
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/config.json b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/config.json
@@ -3,23 +3,23 @@
         "ViT": false
     },
     "SSL": {
-        "dataset 1": "/home/yukinori/Desktop/CRAFT-pytorch/self_supervised_data/word_images",
+        "dataset 1": "data/ssl/word_images",
         "dataset 2": null,
         "dataset 3": null,
         "epoch size": 1,
         "Batch size": 32,
         "start lr": 0.001,
         "lr scheduler step size": 2,
-        "saved Encoder path": "ViT_encoder.pth"
+        "saved Encoder path": "models/ViT_encoder.pth"
     },
     "fine-tuning": {
-        "dataset 1": "/home/yukinori/Desktop/CRAFT-pytorch/Perfecto/Perfecto/word_images",
-        "dataset 1 csv": "/home/yukinori/Desktop/CRAFT-pytorch/Perfecto/Perfecto/word_images.csv",
-        "dataset 2": "/home/yukinori/Desktop/CRAFT-pytorch/Ezcaray/word_images",
-        "dataset 2 csv": "/home/yukinori/Desktop/CRAFT-pytorch/Ezcaray/word_images.csv",
-        "dataset 3": "/home/yukinori/Desktop/CRAFT-pytorch/Virtuosa/word_images",
-        "dataset 3 csv": "/home/yukinori/Desktop/CRAFT-pytorch/Virtuosa/word_images.csv",
-        "test dataset": "/home/yukinori/Desktop/CRAFT-pytorch/self_supervised_data/word_images",
+        "dataset 1": "data/finetuning/perfecto/word_images",
+        "dataset 1 csv": "data/finetuning/perfecto/word_images.csv",
+        "dataset 2": "data/finetuning/ezcaray/word_images",
+        "dataset 2 csv": "data/finetuning/ezcaray/word_images.csv",
+        "dataset 3": "data/finetuning/virtuosa/word_images",
+        "dataset 3 csv": "data/finetuning/virtuosa/word_images.csv",
+        "test dataset": "test_images",
         "fine-tune on other dataset": true,
         "Encoder path for fine-tuning": "models/trdg_Encoder_9_13.pt",
         "Decoder path for fine-tuning": "models/trdg_Decoder_9_13.pt",
@@ -33,4 +33,4 @@
         "saved Encoder path": "models/trdg_fine_tuned_Encoder_withoutSSL_9_13.pt",
         "saved Decoder path": "models/trdg_fine_tuned_Decoder_withoutSSL_9_13.pt"
     }
-}
+}
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/.gitkeep b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/README.md b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/README.md
@@ -0,0 +1,13 @@
+Place local training data under this directory.
+
+Expected layout:
+
+- `data/ssl/word_images/`
+- `data/finetuning/perfecto/word_images/`
+- `data/finetuning/perfecto/word_images.csv`
+- `data/finetuning/ezcaray/word_images/`
+- `data/finetuning/ezcaray/word_images.csv`
+- `data/finetuning/virtuosa/word_images/`
+- `data/finetuning/virtuosa/word_images.csv`
+
+These paths match the defaults in `config.json`.
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/.gitkeep b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/ezcaray/.gitkeep b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/ezcaray/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/...sance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/ezcaray/word_images.csv b/...sance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/ezcaray/word_images.csv
@@ -0,0 +1 @@
+label,image
diff --git a/..._SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/ezcaray/word_images/.gitkeep b/..._SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/ezcaray/word_images/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/perfecto/.gitkeep b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/perfecto/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/...ance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/perfecto/word_images.csv b/...ance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/perfecto/word_images.csv
@@ -0,0 +1 @@
+label,image
diff --git a/...SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/perfecto/word_images/.gitkeep b/...SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/perfecto/word_images/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/virtuosa/.gitkeep b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/virtuosa/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/...ance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/virtuosa/word_images.csv b/...ance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/virtuosa/word_images.csv
@@ -0,0 +1 @@
+label,image
diff --git a/...SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/virtuosa/word_images/.gitkeep b/...SelfSupervisedLearning_OCR_YukinoriYamamoto/data/finetuning/virtuosa/word_images/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/ssl/.gitkeep b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/ssl/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/ssl/word_images/.gitkeep b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/data/ssl/word_images/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/models/.gitkeep b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/models/.gitkeep
@@ -0,0 +1 @@
+
diff --git a/RenAIssance_Transformer_OCR_Utsav_Rai/code/app/__pycache__/app_streamlit.cpython-39.pyc b/RenAIssance_Transformer_OCR_Utsav_Rai/code/app/__pycache__/app_streamlit.cpython-39.pyc
diff --git a/RenAIssance_Transformer_OCR_Utsav_Rai/code/app/app_streamlit.py b/RenAIssance_Transformer_OCR_Utsav_Rai/code/app/app_streamlit.py
@@ -1,9 +1,11 @@
 import sys
 import os
-# Add CRAFT directory to sys.path for craft imports
-CRAFT_DIR = os.path.abspath(os.path.join(os.path.dirname(__file__), '..', 'CRAFT'))
-if CRAFT_DIR not in sys.path:
-    sys.path.insert(0, CRAFT_DIR)
+
+APP_DIR = os.path.dirname(os.path.abspath(__file__))
+CRAFT_DIR = os.path.abspath(os.path.join(APP_DIR, "..", "CRAFT"))
+for path in (APP_DIR, CRAFT_DIR):
+    if os.path.isdir(path) and path not in sys.path:
+        sys.path.insert(0, path)
 import torch
 import torch.backends.cudnn as cudnn
 from collections import OrderedDict
@@ -17,14 +19,28 @@
 from PIL import Image, ImageEnhance
 import cv2
 import numpy as np
-import os
 import math
 from transformers import TrOCRProcessor, VisionEncoderDecoderModel
 import streamlit as st
 from deskew import determine_skew
 
 st.set_page_config(layout="wide")
 
+
+def resolve_existing_path(env_var, *candidates):
+    override = os.getenv(env_var)
+    if override:
+        return override
+
+    for candidate in candidates:
+        if os.path.exists(candidate):
+            return candidate
+
+    raise FileNotFoundError(
+        f"Could not resolve a path for {env_var or 'required asset'}. "
+        f"Tried: {', '.join(candidates)}"
+    )
+
 def copyStateDict(state_dict):
     if list(state_dict.keys())[0].startswith("module"):
         start_idx = 1
@@ -39,7 +55,11 @@ def copyStateDict(state_dict):
 @st.cache_resource
 def load_craft_model():
     # Define the path to the pre-trained CRAFT model weights
-    trained_model_path = '../../weights/craft_mlt_25k.pth'
+    trained_model_path = resolve_existing_path(
+        "RENAISSANCE_CRAFT_MODEL_PATH",
+        os.path.join(APP_DIR, "weights", "craft_mlt_25k.pth"),
+        os.path.abspath(os.path.join(APP_DIR, "..", "..", "weights", "craft_mlt_25k.pth")),
+    )
 
     # Initialize the CRAFT model
     net = CRAFT()     # initialize
@@ -57,7 +77,11 @@ def load_craft_model():
     refine = True  # Set to True if using refine_net
     if refine:
         from refinenet import RefineNet
-        refiner_model_path = '../../weights/craft_refiner_CTW1500.pth'  # Update the path
+        refiner_model_path = resolve_existing_path(
+            "RENAISSANCE_CRAFT_REFINER_PATH",
+            os.path.join(APP_DIR, "weights", "craft_refiner_CTW1500.pth"),
+            os.path.abspath(os.path.join(APP_DIR, "..", "..", "weights", "craft_refiner_CTW1500.pth")),
+        )
         refine_net = RefineNet()
         refine_net.load_state_dict(copyStateDict(torch.load(refiner_model_path, map_location=device)))
         refine_net.to(device)
@@ -109,9 +133,17 @@ def test_net(net, image, text_threshold, link_threshold, low_text, *, cuda, poly
 @st.cache_resource
 def load_ocr_model():
     device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
-    # Update path to point to the correct location of the OCR weights
-    model_path = "../../models"
-    processor_path = "../../models"
+    model_path = resolve_existing_path(
+        "RENAISSANCE_OCR_MODEL_DIR",
+        os.path.join(APP_DIR, "models"),
+        os.path.abspath(os.path.join(APP_DIR, "..", "..", "models")),
+    )
+    processor_path = resolve_existing_path(
+        "RENAISSANCE_OCR_PROCESSOR_DIR",
+        model_path,
+        os.path.join(APP_DIR, "models"),
+        os.path.abspath(os.path.join(APP_DIR, "..", "..", "models")),
+    )
     processor = TrOCRProcessor.from_pretrained(processor_path)
     model = VisionEncoderDecoderModel.from_pretrained(model_path).to(device)
     return processor, model, device
@@ -771,4 +803,4 @@ def get_virtual_page(pdf_document, virtual_index, dpi, **kwargs):
         st.write("No image to display.")
 
 else:
-    st.info("Please upload a PDF file from the left panel.")
+    st.info("Please upload a PDF file from the left panel.")