feat: implicitly fallback to azure form recognizer for non-vision LLMs #814

nikochiko · 2025-09-29T10:29:17Z

Q/A checklist

I have tested my UI changes on mobile and they look acceptable
I have tested changes to the workflows in both the API and the UI
I have done a code review of my changes and looked at each line of the diff + the references of each function I have changed
My changes have not increased the import time of the server

How to check import time?

time python -c 'import server'

You can visualize this using tuna:

python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

coderabbitai · 2025-09-29T10:29:25Z

📝 Walkthrough

Walkthrough

The conditional logic in recipes/VideoBots.py within document_understanding_step was modified. Previously, the Azure Form Recognizer step ran when request.input_images were present and request.document_model was provided. Now, it runs when input_images are present and either a document_model is specified or the selected model is not a vision model. If the selected model is a vision model and no document_model is provided, the Form Recognizer step is skipped. No other logic paths, error handling, or public interfaces were changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

fix: support image input with sealion v4 #808: Adjusts document-understanding to skip Azure Form Recognizer when the model is vision-capable unless document_model is explicitly set, matching the conditional change here.

Suggested reviewers

devxpy

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%.	You can run `@coderabbitai generate docstrings` to improve docstring coverage.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title Check	✅ Passed	The title “feat: implicitly fallback to azure form recognizer for non-vision LLMs” directly reflects the primary change in the diff by highlighting the new implicit fallback behavior for non-vision models and is concise and specific without extraneous detail. It clearly communicates the feature addition to any reviewer scanning the pull request history.
Description Check	✅ Passed	The pull request description includes all the required sections from the repository’s template: a complete Q/A checklist with the four specified items, the detailed “How to check import time” block, and the mandated legal boilerplate. Each section matches the headings and content structure of the provided description template without omissions.

✨ Finishing touches

📝 Generate Docstrings

🧪 Generate unit tests

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch implicit-fallback-to-form-reco-when-non-vision-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (2)

recipes/VideoBots.py (2)

410-413: Confirm UX: toggle vs. implicit fallback

With this change, even if users disable “🩻 Photo & Document Intelligence” (which clears document_model), we’ll still OCR images for non‑vision LLMs. Is this intentional (cost/latency implications)? If not, consider a separate boolean like enable_auto_image_ocr to respect the toggle.

415-424: Optional: parallelize image OCR for latency

If users upload multiple images, sequential OCR will add noticeable latency. You already use flatapply_parallel for documents; consider doing the same here with a small pool size (e.g., 4–8) and per‑call try/except, then extend ocr_texts with non‑empty results.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88edf93 and 07d3127.

📒 Files selected for processing (1)

recipes/VideoBots.py (1 hunks)

🧰 Additional context used

🧬 Code graph analysis (1)

recipes/VideoBots.py (1)

daras_ai_v2/language_model.py (1)

LargeLanguageModels (82-1028)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)

GitHub Check: test (3.10.12, 1.8.3)

coderabbitai · 2025-09-29T10:36:23Z

recipes/VideoBots.py

+        if request.input_images and (
+            request.document_model
+            or not LargeLanguageModels[request.selected_model].is_vision_model
+        ):


⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Guard fallback with Azure key, honor selected document_model, and degrade gracefully

Runtime risk: This path now executes for non‑vision LLMs even when Azure isn’t configured; azure_form_recognizer will likely raise. Add a key check.

Behavior gap: If a document_model is explicitly chosen, we should use it instead of hardcoding "prebuilt-read".

UX: Emit a helpful message when we can’t OCR (non‑vision model + no Azure key) instead of failing.

Proposed patch:

def document_understanding_step(self, request): ocr_texts = [] - if request.input_images and ( - request.document_model - or not LargeLanguageModels[request.selected_model].is_vision_model - ): - yield "Running Azure Form Recognizer..." - for url in request.input_images: - ocr_text = ( - azure_form_recognizer(url, model_id="prebuilt-read") - .get("content", "") - .strip() - ) - if not ocr_text: - continue - ocr_texts.append(ocr_text) + if request.input_images: + is_vision = LargeLanguageModels[request.selected_model].is_vision_model + azure_available = bool(settings.AZURE_FORM_RECOGNIZER_KEY) + if request.document_model or (not is_vision and azure_available): + model_id = request.document_model or "prebuilt-read" + yield f"Running Azure Document Intelligence ({model_id})..." + for url in request.input_images: + try: + ocr_text = ( + azure_form_recognizer(url, model_id=model_id) + .get("content", "") + .strip() + ) + except Exception: + # Skip this image but keep going. + continue + if ocr_text: + ocr_texts.append(ocr_text) + elif not is_vision and not azure_available: + # Non‑vision LLM + images but Azure OCR not configured: inform and continue. + yield "No Azure Document Intelligence key configured; images will be ignored by non‑vision models."

Also applies to: 415-424

🏁 Script executed:

#!/bin/bash grep -R "azure_form_recognizer" -n recipes/VideoBots.py grep -R "settings" -n recipes/VideoBots.py sed -n '380,460p' recipes/VideoBots.py

Length of output: 5584

Guard Azure OCR fallback and honor custom document_model

Check settings.AZURE_FORM_RECOGNIZER_KEY before invoking azure_form_recognizer to prevent runtime failures when Azure isn’t configured.

Use request.document_model as model_id if specified; fall back to "prebuilt-read" only when no custom model is provided.

Yield a clear message (e.g., “No Azure Document Intelligence key configured; images will be ignored by non-vision models.”) when running on a non-vision LLM without an Azure key, instead of failing silently.

🤖 Prompt for AI Agents

In recipes/VideoBots.py around lines 410-413, the current condition may call azure_form_recognizer without Azure configured and ignores request.document_model; change it to first check settings.AZURE_FORM_RECOGNIZER_KEY before invoking azure_form_recognizer, pass request.document_model as the model_id when present and only use "prebuilt-read" as a fallback, and if no AZURE key is set while the selected model is non-vision, yield/emit a clear user-facing message like "No Azure Document Intelligence key configured; images will be ignored by non-vision models." instead of proceeding or failing silently.

feat: implicitly fallback to azure form recognizer for non-vision LLMs

07d3127

coderabbitai bot reviewed Sep 29, 2025

View reviewed changes

nikochiko requested a review from devxpy September 29, 2025 15:57

nikochiko assigned devxpy Sep 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: implicitly fallback to azure form recognizer for non-vision LLMs #814

feat: implicitly fallback to azure form recognizer for non-vision LLMs #814

Uh oh!

nikochiko commented Sep 29, 2025 •

edited

Loading

Uh oh!

coderabbitai bot commented Sep 29, 2025 •

edited

Loading

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

coderabbitai bot left a comment

Uh oh!

coderabbitai bot Sep 29, 2025

Uh oh!

Uh oh!

feat: implicitly fallback to azure form recognizer for non-vision LLMs #814

Are you sure you want to change the base?

feat: implicitly fallback to azure form recognizer for non-vision LLMs #814

Uh oh!

Conversation

nikochiko commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Q/A checklist

Legal Boilerplate

Uh oh!

coderabbitai bot commented Sep 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks and finishing touches

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Sep 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

nikochiko commented Sep 29, 2025 •

edited

Loading

coderabbitai bot commented Sep 29, 2025 •

edited

Loading