OCR CI build artifact (#153)

derekadombek · web-flow · commit 5aef43910abe · 2024-08-07T14:40:30.000Z
* init for [IDWA-OCR-72] Install OCR into an executable

* edit readme with the build command

* add refs to form_filled to use this without args

* lint

* lint

* lint

* build/upload artifact

* pip install pyinstaller

* pip install pyinstaller

* rm dev

* put build in dependencies

* add requirements.txt and use pyinstaller cli

* rm working directory

* point to main and not dir main

* try dist/ and --onefile

* try -windowed

* dist/main

* upgrade upload action

* macos-latest

* try using assets from tests

* try using assets from tests

* revert back to dup assets

* CLI for better handling of arguments

* lint

* lint

* rm args with pyinstaller because of new cli

* use ^ woth version

* install docopt with gh action job

* rm unused assets in the ocr dir

* docs

* docs

* upload bin for each os

* matrix exp

* matrix exp

* matrix exp

* zip

* zip

* check path

* check path

* check path

* whoops

* try gzexe

* &amp;&amp;

* wip

* try building release

* fix the needs:

* try building release

* try building release

* try building release

* try building release

* add checkout

* change from action

* add another checkout

* add paths

* try using workflow_call

* try using workflow_call

* wip

* using download action

* using download action

* try that

* token

* try for loop

* dont use matrix

* token ref

* try with workspace

* try with workspace

* try with workspace

* whoops

* working dir

* working dir

* add --repo

* think i got it

* try encoding with jq

* try encoding with jq

* github.repository

* full url

* matrix again

* see what dir we're in

* path to artifactas

* put everything in first job with create

* put everything in first job with create

* upload all in dir

* write all

* try dif action

* try dif action

* try with content

* upgrade action

* new output

* upgrade upload and download versions

* just path for download

* ls

* ls

* add to uplaod

* try full workflow

* forgot to switch needs job

* again

* fix file names

* fix file names

* change release title name

* missed diffs

* clean-up

* try changing ref

* try changing ref

* try changing ref

* try changing ref

* try changing ref

* try changing ref

* try changing ref

* create and see if upload asset chooses it

* create and see if upload asset chooses it

* create and see if upload asset chooses it

* create and see if upload asset chooses it

* create and see if upload asset chooses it

* create and see if upload asset chooses it

* create and see if upload asset chooses it

* use ref at checkout

* ls

* use ref at checkout

* try new upload action

* fix script

* create tag again

* full workflow

* full workflow

* full workflow

* cleaned up and made as workflow_dispatch

---------

Co-authored-by: Derek Dombek &lt;derek.a.dombek.com&gt;
diff --git a/.github/workflows/build-ocr.yml b/.github/workflows/build-ocr.yml
@@ -0,0 +1,53 @@
+name: Build & Upload OCR Binaries
+on:
+  workflow_call:
+    outputs:
+      output-file:
+        description: "The first output string"
+        value: ${{ jobs.build.outputs.output_artifacts }}
+  workflow_dispatch:
+      
+jobs:
+  build:
+    strategy:
+      matrix:
+        include:
+          - os: macos-latest
+            name: macos
+            cmd: >
+              pyinstaller -F -w -n main-macos ./OCR/ocr/main.py &&
+              cd dist/ &&
+              zip -r9 main-macos main-macos
+            out_file: main-macos.zip
+          - os: windows-latest
+            name: windows
+            cmd: pyinstaller -F -w -n main-windows ./OCR/ocr/main.py
+            out_file: main-windows.exe
+          - os: ubuntu-latest
+            name: ubuntu
+            cmd: >
+              pyinstaller -F -w -n main-ubuntu ./OCR/ocr/main.py &&
+              cd dist/ &&
+              zip -r9 main-ubuntu main-ubuntu
+            out_file: main-ubuntu.zip
+    runs-on: ${{ matrix.os }}
+    outputs:
+      output_artifacts: ${{ steps.artifacts.outputs.matrix.out_file }}
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.10"
+      - name: Install dependencies
+        run: |
+          python -m pip install --upgrade pip
+          pip install -r requirements.txt pyinstaller
+          pip install docopt
+      - name: Build binaries for all OS's
+        run: ${{ matrix.cmd }}
+      - name: Upload Artifacts To Workflow
+        uses: actions/upload-artifact@v4
+        id: artifacts
+        with:
+          name: main-${{ matrix.name }}
+          path: ./dist/${{ matrix.out_file}}
diff --git a/.github/workflows/release-ocr.yml b/.github/workflows/release-ocr.yml
@@ -1,47 +1,38 @@
 name: Release MDE-OCR artifacts
+run-name: Release MDE-OCR artifacts - by @${{ github.actor }}
 on:
-    # workflow_dispatch:
-    #     inputs:
-    #         tag:
-    #             description: 'target environment'
-    #             required: true
-    push:
-        branches:
-          - idwa-ocr-ci-for-executable
-        paths:
-          - .github/workflows/release-ocr.yml
-          - .github/workflows/build-ocr.yml
-          - OCR/**
-        # tags:
-        #     - 'v*'
+    workflow_dispatch:
+        inputs:
+            tag:
+                description: 'Version tag for new release'
+                required: true
 jobs:
   create-release:
     name: Create Release
-    
     runs-on: [ubuntu-latest]
     permissions:
         contents: write
     steps:
     - uses: actions/checkout@v4
     - name: Create tag
-      uses: actions/github-script@v5
+      uses: actions/github-script@v7
       with:
         script: |
             github.rest.git.createRef({
                 owner: context.repo.owner,
                 repo: context.repo.repo,
-                ref: 'refs/tags/1.0.0',
+                ref: 'refs/tags/${{ github.event.inputs.tag }}',
                 sha: context.sha
             })
     - name: Create release
       id: create_release
       env:
         GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
-        tag: ${{ github.ref_name }}
+        tag: ${{ github.event.inputs.tag }}
       run: |
         gh release create "$tag" \
             --repo="$GITHUB_REPOSITORY" \
-            --title="MDE-OCR ${tag#v}" \
+            --title="MDE-OCR ${tag}" \
             --generate-notes
     - name: Output Release URL File
       run: echo "${{ steps.create_release.outputs.upload_url }}" > release_url.txt
@@ -62,9 +53,8 @@ jobs:
           with:
             path: artifacts
             merge-multiple: true
-        - name: Upload release binaries
-          uses: alexellis/upload-assets@0.4.1
-          env:
-            GITHUB_TOKEN: ${{ github.token }}
+        - name: Release Upload Assets
+          uses: jaywcjlove/github-action-upload-assets@main
           with:
-            asset_paths: '["./artifacts/*"]'
+            tag: ${{ github.event.inputs.tag }}
+            asset-path: '["./artifacts/*"]'
diff --git a/OCR/README.md b/OCR/README.md
@@ -29,6 +29,10 @@ Run main, hoping to convert this to a cli at some point
 poetry run main
 ```
 
+To build the OCR service into an executable artifact
+```shell
+poetry run build
+```
 
 Adding new dependencies
 ```shell
diff --git a/OCR/ocr/pyinstaller.py b/OCR/ocr/pyinstaller.py
@@ -0,0 +1,23 @@
+import PyInstaller.__main__
+from pathlib import Path
+
+HERE = Path(__file__).parent.absolute()
+path_to_main = str(HERE / "main.py")
+
+
+# This function installs/packages the main OCR function as an executable.
+# You could also use the commandline. Using `pyinstaller ./OCR/ocr/main.py -F -w` works the same is the function below.
+# If you need to add asset paths, follow the example below.
+def install():
+    PyInstaller.__main__.run(
+        [
+            path_to_main,
+            "--onefile",
+            "--windowed",
+            # SOURCE:DESTINATION
+            # "--add-data=ocr/assets/form_filled.png:assets/",
+            # "--add-data=ocr/assets/form_segmention_template.png:assets/",
+            # "--add-data=ocr/assets/labels.json:assets/",
+            # other pyinstaller options...
+        ]
+    )
diff --git a/OCR/poetry.lock b/OCR/poetry.lock
diff --git a/OCR/pyproject.toml b/OCR/pyproject.toml
@@ -34,6 +34,7 @@ build-backend = "poetry.core.masonry.api"
 
 [tool.poetry.scripts]
 main = "ocr.main:main"
+build = "ocr.pyinstaller:install"
 
 [tool.ruff]
 line-length = 118
diff --git a/requirements.txt b/requirements.txt
@@ -0,0 +1,7 @@
+numpy==1.26.4
+opencv-python==4.9.0.80
+python-dotenv==1.0.1
+Pillow>=10.3.0
+torch==1.13.1
+docopt==0.6.2
+git+https://github.com/huggingface/transformers.git