humanai-foundation · abhiram123467 · Mar 13, 2026 · Mar 16, 2026 · Mar 17, 2026 · Mar 23, 2026
diff --git a/.github/workflows/test_crnn.yml b/.github/workflows/test_crnn.yml
@@ -0,0 +1,39 @@
+name: CRNN OCR-1 Unit Tests
+
+on:
+  push:
+    branches: [ main ]
+  pull_request:
+    branches: [ main ]
+
+jobs:
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.9", "3.10"]
+
+    steps:
+    - uses: actions/checkout@v3
+
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v4
+      with:
+        python-version: ${{ matrix.python-version }}
+
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install pytest torch --index-url https://download.pytorch.org/whl/cpu
+        pip install google-generativeai
+
+    - name: Run CRNN OCR-1 tests
+      run: |
+        pytest tests/test_crnn_ocr1.py -v
+```
+
+---
+
+**Step 5 — Commit message:**
+```
+ci: Add GitHub Actions workflow for CRNN tests (refs #57)
diff --git a/README.md b/README.md
@@ -1,24 +1,5 @@
-![thumbnail](https://github.com/user-attachments/assets/b0aa865c-416c-4a3a-92be-56a1a77c8f4e)
-# RenAIssance
-The analysis of historical documents is a critical yet costly method in the Humanities. To reduce these costs, AI technology, specifically OCR (Optical Character Recognition), has started to be utilized. However, for many years, there was a lack of accurate OCR tools for Spanish documents from the Renaissance period, despite their academic importance. To address this issue, the HumanAI Foundation launched the **RenAIssance** project, where contributors implement accurate OCR models using various approaches.
+# OCR‑1: CNN‑RNN with LLM Post‑Processing for Historical Documents
 
-# Dataset 
-![letters](https://github.com/user-attachments/assets/c10584db-8f68-4897-a6c4-c70411ed9515)
+This subfolder contains the code for the **OCR‑1** component of the RenAIssance project – a hybrid CNN‑RNN model (ResNet + BiLSTM + CTC) designed to recognise 17th‑century Spanish printed text. It also integrates an optional Gemini LLM post‑processing step to improve accuracy.
 
-The dataset used to train these models consists of images of printed documents from the target era, collected from diverse sources. A portion of the data has been manually labeled by RenAIssance mentors, who are experts in Spanish historical documents. The following printing irregularities in the data present challenges for creating high-accuracy OCR models:
-
-- **Interchangeable Characters:** Characters such as 'u' and 'v', and 'f' and 's' were often used interchangeably.
-- **Tildes and Diacritical Marks:** Used to save space or due to the reuse of type molds.
-- **Old Spellings and Modern Interpretations:** Variations in character usage between historical and modern Spanish.
-- **Line-End Hyphens:** Words split across lines were not always hyphenated.
-
-Additionally, the deterioration and unique layouts of historical documents further complicate OCR tasks, making content extraction from images difficult.
-
-# Method  
-To address these challenges, contributors have introduced various state-of-the-art (SOTA) methods. These can be broadly classified into the following three approaches:
-
-1. **CRNN Approach**  
-2. **Vision Transformer Approaches**  
-3. **Self-Supervised Learning Approach**  
-
-All models, regardless of the approach used, achieve over 90% accuracy. For more detailed information on each approach, please refer to the contributors' repositories.
+## 📁 Structure
diff --git a/RenAIssance_CRNN_OCR_Shashank_Shekhar_Singh/Readme.md b/RenAIssance_CRNN_OCR_Shashank_Shekhar_Singh/Readme.md
@@ -82,7 +82,56 @@ the RNN (Recurrent neural networks).
 For a detailed walkthrough of the project's development, challenges, and solutions, read the complete blog post [here](https://medium.com/@shashankshekharsingh1205/my-journey-with-humanai-in-the-google-summer-of-code24-program-part-2-bb42abce3495).
 
 ## Datasets and Models
-- The `Padilla - Nobleza virtuosa_testExtract.pdf` can be downloaded from [here](https://github.com/Shashankss1205/RenAIssance/blob/main/RenAIssance_CRNN_OCR_Shashank_Shekhar_Singh/data/Padilla_Nobleza_virtuosa_testExtract.pdf) 
+- The `Padilla - Nobleza virtuosa_testExtract.pdf` can be downloaded from [here](https://github.com/Shashankss1205/RenAIssance/blob/main/RenAIssance_CRNN_OCR_Shashank_Shekhar_Singh/data/Padilla_Nobleza_virtuosa_testExtract.pdf)
+- ## Setup
+
+Install all dependencies before running the notebooks:
+```bash
+pip install -r requirements.txt
+```
+
+## Requirements
+- Python 3.10+
+- PyTorch 2.0+
+- CUDA GPU recommended (Google Colab or Kaggle)
+
+## How to Run
+1. Clone this repository
+2. Install dependencies: `pip install -r requirements.txt`
+3. Open `Model.ipynb` in Jupyter or Google Colab
+4. Run all cells in order
+```
+
+---
+
+## Step 5 — Commit
+```
+Commit message : Add setup and run instructions to README
+● Commit directly to main branch
+```
+Click **"Commit changes"** ✅
+
+---
+
+## Step 6 — Open PR
+Click **"Contribute"** → **"Open pull request"**
+
+**Title:**
+```
+Add setup and run instructions to README
+```
+
+**Description:**
+```
+## What This PR Does
+Adds Setup and How to Run sections to README.md
+with clear instructions for new contributors.
+
+## Why
+README was missing environment setup instructions.
+New contributors can now get started immediately.
+
+Related to my GSoC 2026 application for RenAIssance.
 - The `Padilla - 1 Nobleza virtuosa_testTranscription.docx` can be downloaded from [here](https://github.com/Shashankss1205/RenAIssance/blob/main/RenAIssance_CRNN_OCR_Shashank_Shekhar_Singh/data/Padilla_Nobleza_virtuosa_testTranscription.docx) 
 - The ocr model used can be directly generated by running the python notebook or can be downloaded from [here](https://github.com/Shashankss1205/RenAIssance/blob/main/RenAIssance_CRNN_OCR_Shashank_Shekhar_Singh/Model/ocr_model.h5)
 
@@ -108,4 +157,4 @@ This project is licensed under the MIT License. See the [LICENSE](LICENSE) file
 - [Google Summer of Code 2024 Project](https://summerofcode.withgoogle.com/programs/2024/projects/lg7vQeMM)
 - [HumanAI Foundation](https://humanai.foundation/)
 
-Feel free to fork the repository and submit pull requests. For major changes, please open an issue to discuss your ideas first. Contributions are always welcomed!
+Feel free to fork the repository and submit pull requests. For major changes, please open an issue to discuss your ideas first. Contributions are always welcomed!
diff --git a/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/ResNet.py b/RenAIssance_SelfSupervisedLearning_OCR_YukinoriYamamoto/ResNet.py
@@ -1,5 +1,6 @@
 from torch import nn
 
+
 class BasicBlock(nn.Module):
     expansion = 1
 
@@ -36,7 +37,7 @@ def forward(self, x):
 
 
 class ResNet18(nn.Module):
-    def __init__(self, num_classes=1000):
+    def __init__(self, num_classes=3):          # ✅ FIX 1: changed 1000 → 3
         super(ResNet18, self).__init__()
         self.in_channels = 64
 
@@ -50,6 +51,7 @@ def __init__(self, num_classes=1000):
         self.layer4 = self._make_layer(BasicBlock, 512, 2, stride=1)
 
         self.avgpool = nn.AdaptiveAvgPool2d((1, 32))
+        self.fc = nn.Linear(512 * 32, num_classes)  # ✅ FIX 2: added fc layer
 
     def _make_layer(self, block, out_channels, num_blocks, stride):
         strides = [stride] + [1] * (num_blocks - 1)
@@ -76,11 +78,13 @@ def forward(self, x):
             x = layer(x)
 
         x = self.avgpool(x)
+        x = x.view(x.size(0), -1)   # ✅ FIX 3: flatten before fc
+        x = self.fc(x)               # ✅ FIX 4: apply classification head
         return x
 
 
 class ResNet34(nn.Module):
-    def __init__(self, num_classes=1000):
+    def __init__(self, num_classes=3):          # ✅ FIX 5: changed 1000 → 3
         super(ResNet34, self).__init__()
         self.in_channels = 64
 
@@ -94,6 +98,7 @@ def __init__(self, num_classes=1000):
         self.layer4 = self._make_layer(BasicBlock, 512, 3, stride=1)
 
         self.avgpool = nn.AdaptiveAvgPool2d((1, 44))
+        self.fc = nn.Linear(512 * 44, num_classes)  # ✅ FIX 6: added fc layer
 
     def _make_layer(self, block, out_channels, num_blocks, stride):
         strides = [stride] + [1] * (num_blocks - 1)
@@ -120,9 +125,12 @@ def forward(self, x):
             x = layer(x)
 
         x = self.avgpool(x)
+        x = x.view(x.size(0), -1)   # ✅ FIX 7: flatten before fc
+        x = self.fc(x)               # ✅ FIX 8: apply classification head
         return x
 
 
+# ResNet50, Bottleneck — unchanged below this line
 class Bottleneck(nn.Module):
     expansion = 4
     def __init__(self, in_channels, out_channels, stride=(1, 1)):
@@ -137,6 +145,7 @@ def __init__(self, in_channels, out_channels, stride=(1, 1)):
         self.stride = stride
         self.shortcut_conv = nn.Conv2d(in_channels, out_channels * self.expansion, kernel_size=1, stride=stride, bias=False)
         self.shortcut_bn = nn.BatchNorm2d(out_channels * self.expansion)
+
     def forward(self, x):
         identity = x
         out = self.conv1(x)
@@ -154,6 +163,7 @@ def forward(self, x):
         out = self.relu(out)
         return out
 
+
 class ResNet50(nn.Module):
     def __init__(self):
         super(ResNet50, self).__init__()
@@ -168,33 +178,28 @@ def __init__(self):
         self.layer4 = self._make_layer(Bottleneck, 512, 3, stride=(2, 1))
         self.last_conv = nn.Conv2d(2048, 512, kernel_size=1, stride=1, bias=False)
         self.avgpool = nn.AvgPool2d(kernel_size=(2, 1), stride=(2, 1))
-    
+
     def _make_layer(self, block, out_channels, num_blocks, stride):
         strides = [stride] + [1] * (num_blocks - 1)
         layers = nn.ModuleList()
         for stride in strides:
             layers.append(block(self.in_channels, out_channels, stride))
             self.in_channels = out_channels * block.expansion
         return layers
-    
+
     def forward(self, x):
         x = self.conv1(x)
         x = self.bn1(x)
         x = self.relu(x)
         x = self.maxpool(x)
         for layer in self.layer1:
             x = layer(x)
-        # print("layer 1", x.shape)
         for layer in self.layer2:
             x = layer(x)
-        # print("layer 2", x.shape)
         for layer in self.layer3:
             x = layer(x)
-        # print("layer 3", x.shape)
         for layer in self.layer4:
             x = layer(x)
-        # print("layer 4", x.shape)
         x = self.last_conv(x)
         x = self.avgpool(x)
-        # print("avgpool", x.shape)
         return x