Skip to content

Conversation

nikochiko
Copy link
Member

@nikochiko nikochiko commented Sep 29, 2025

Q/A checklist

  • I have tested my UI changes on mobile and they look acceptable
  • I have tested changes to the workflows in both the API and the UI
  • I have done a code review of my changes and looked at each line of the diff + the references of each function I have changed
  • My changes have not increased the import time of the server
How to check import time?

time python -c 'import server'

You can visualize this using tuna:

python3 -X importtime -c 'import server' 2> out.log && tuna out.log

To measure import time for a specific library:

$ time python -c 'import pandas'

________________________________________________________
Executed in    1.15 secs    fish           external
   usr time    2.22 secs   86.00 micros    2.22 secs
   sys time    0.72 secs  613.00 micros    0.72 secs

To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:

def my_function():
    import pandas as pd
    ...

Legal Boilerplate

Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.

Copy link

coderabbitai bot commented Sep 29, 2025

📝 Walkthrough

Walkthrough

The conditional logic in recipes/VideoBots.py within document_understanding_step was modified. Previously, the Azure Form Recognizer step ran when request.input_images were present and request.document_model was provided. Now, it runs when input_images are present and either a document_model is specified or the selected model is not a vision model. If the selected model is a vision model and no document_model is provided, the Form Recognizer step is skipped. No other logic paths, error handling, or public interfaces were changed.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Suggested reviewers

  • devxpy

Pre-merge checks and finishing touches

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title “feat: implicitly fallback to azure form recognizer for non-vision LLMs” directly reflects the primary change in the diff by highlighting the new implicit fallback behavior for non-vision models and is concise and specific without extraneous detail. It clearly communicates the feature addition to any reviewer scanning the pull request history.
Description Check ✅ Passed The pull request description includes all the required sections from the repository’s template: a complete Q/A checklist with the four specified items, the detailed “How to check import time” block, and the mandated legal boilerplate. Each section matches the headings and content structure of the provided description template without omissions.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch implicit-fallback-to-form-reco-when-non-vision-model

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
recipes/VideoBots.py (2)

410-413: Confirm UX: toggle vs. implicit fallback

With this change, even if users disable “🩻 Photo & Document Intelligence” (which clears document_model), we’ll still OCR images for non‑vision LLMs. Is this intentional (cost/latency implications)? If not, consider a separate boolean like enable_auto_image_ocr to respect the toggle.


415-424: Optional: parallelize image OCR for latency

If users upload multiple images, sequential OCR will add noticeable latency. You already use flatapply_parallel for documents; consider doing the same here with a small pool size (e.g., 4–8) and per‑call try/except, then extend ocr_texts with non‑empty results.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 88edf93 and 07d3127.

📒 Files selected for processing (1)
  • recipes/VideoBots.py (1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
recipes/VideoBots.py (1)
daras_ai_v2/language_model.py (1)
  • LargeLanguageModels (82-1028)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: test (3.10.12, 1.8.3)

Comment on lines +410 to +413
if request.input_images and (
request.document_model
or not LargeLanguageModels[request.selected_model].is_vision_model
):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Guard fallback with Azure key, honor selected document_model, and degrade gracefully

  • Runtime risk: This path now executes for non‑vision LLMs even when Azure isn’t configured; azure_form_recognizer will likely raise. Add a key check.
  • Behavior gap: If a document_model is explicitly chosen, we should use it instead of hardcoding "prebuilt-read".
  • UX: Emit a helpful message when we can’t OCR (non‑vision model + no Azure key) instead of failing.

Proposed patch:

 def document_understanding_step(self, request):
     ocr_texts = []
-    if request.input_images and (
-        request.document_model
-        or not LargeLanguageModels[request.selected_model].is_vision_model
-    ):
-        yield "Running Azure Form Recognizer..."
-        for url in request.input_images:
-            ocr_text = (
-                azure_form_recognizer(url, model_id="prebuilt-read")
-                .get("content", "")
-                .strip()
-            )
-            if not ocr_text:
-                continue
-            ocr_texts.append(ocr_text)
+    if request.input_images:
+        is_vision = LargeLanguageModels[request.selected_model].is_vision_model
+        azure_available = bool(settings.AZURE_FORM_RECOGNIZER_KEY)
+        if request.document_model or (not is_vision and azure_available):
+            model_id = request.document_model or "prebuilt-read"
+            yield f"Running Azure Document Intelligence ({model_id})..."
+            for url in request.input_images:
+                try:
+                    ocr_text = (
+                        azure_form_recognizer(url, model_id=model_id)
+                        .get("content", "")
+                        .strip()
+                    )
+                except Exception:
+                    # Skip this image but keep going.
+                    continue
+                if ocr_text:
+                    ocr_texts.append(ocr_text)
+        elif not is_vision and not azure_available:
+            # Non‑vision LLM + images but Azure OCR not configured: inform and continue.
+            yield "No Azure Document Intelligence key configured; images will be ignored by non‑vision models."

Also applies to: 415-424


🏁 Script executed:

#!/bin/bash
grep -R "azure_form_recognizer" -n recipes/VideoBots.py
grep -R "settings" -n recipes/VideoBots.py
sed -n '380,460p' recipes/VideoBots.py

Length of output: 5584


Guard Azure OCR fallback and honor custom document_model

  • Check settings.AZURE_FORM_RECOGNIZER_KEY before invoking azure_form_recognizer to prevent runtime failures when Azure isn’t configured.
  • Use request.document_model as model_id if specified; fall back to "prebuilt-read" only when no custom model is provided.
  • Yield a clear message (e.g., “No Azure Document Intelligence key configured; images will be ignored by non-vision models.”) when running on a non-vision LLM without an Azure key, instead of failing silently.
🤖 Prompt for AI Agents
In recipes/VideoBots.py around lines 410-413, the current condition may call
azure_form_recognizer without Azure configured and ignores
request.document_model; change it to first check
settings.AZURE_FORM_RECOGNIZER_KEY before invoking azure_form_recognizer, pass
request.document_model as the model_id when present and only use "prebuilt-read"
as a fallback, and if no AZURE key is set while the selected model is
non-vision, yield/emit a clear user-facing message like "No Azure Document
Intelligence key configured; images will be ignored by non-vision models."
instead of proceeding or failing silently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants