Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
5 changes: 4 additions & 1 deletion recipes/VideoBots.py
Original file line number Diff line number Diff line change
Expand Up @@ -407,7 +407,10 @@ def run_v2(

def document_understanding_step(self, request):
ocr_texts = []
if request.document_model and request.input_images:
if request.input_images and (
request.document_model
or not LargeLanguageModels[request.selected_model].is_vision_model
):
Comment on lines +410 to +413
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Guard fallback with Azure key, honor selected document_model, and degrade gracefully

  • Runtime risk: This path now executes for non‑vision LLMs even when Azure isn’t configured; azure_form_recognizer will likely raise. Add a key check.
  • Behavior gap: If a document_model is explicitly chosen, we should use it instead of hardcoding "prebuilt-read".
  • UX: Emit a helpful message when we can’t OCR (non‑vision model + no Azure key) instead of failing.

Proposed patch:

 def document_understanding_step(self, request):
     ocr_texts = []
-    if request.input_images and (
-        request.document_model
-        or not LargeLanguageModels[request.selected_model].is_vision_model
-    ):
-        yield "Running Azure Form Recognizer..."
-        for url in request.input_images:
-            ocr_text = (
-                azure_form_recognizer(url, model_id="prebuilt-read")
-                .get("content", "")
-                .strip()
-            )
-            if not ocr_text:
-                continue
-            ocr_texts.append(ocr_text)
+    if request.input_images:
+        is_vision = LargeLanguageModels[request.selected_model].is_vision_model
+        azure_available = bool(settings.AZURE_FORM_RECOGNIZER_KEY)
+        if request.document_model or (not is_vision and azure_available):
+            model_id = request.document_model or "prebuilt-read"
+            yield f"Running Azure Document Intelligence ({model_id})..."
+            for url in request.input_images:
+                try:
+                    ocr_text = (
+                        azure_form_recognizer(url, model_id=model_id)
+                        .get("content", "")
+                        .strip()
+                    )
+                except Exception:
+                    # Skip this image but keep going.
+                    continue
+                if ocr_text:
+                    ocr_texts.append(ocr_text)
+        elif not is_vision and not azure_available:
+            # Non‑vision LLM + images but Azure OCR not configured: inform and continue.
+            yield "No Azure Document Intelligence key configured; images will be ignored by non‑vision models."

Also applies to: 415-424


🏁 Script executed:

#!/bin/bash
grep -R "azure_form_recognizer" -n recipes/VideoBots.py
grep -R "settings" -n recipes/VideoBots.py
sed -n '380,460p' recipes/VideoBots.py

Length of output: 5584


Guard Azure OCR fallback and honor custom document_model

  • Check settings.AZURE_FORM_RECOGNIZER_KEY before invoking azure_form_recognizer to prevent runtime failures when Azure isn’t configured.
  • Use request.document_model as model_id if specified; fall back to "prebuilt-read" only when no custom model is provided.
  • Yield a clear message (e.g., “No Azure Document Intelligence key configured; images will be ignored by non-vision models.”) when running on a non-vision LLM without an Azure key, instead of failing silently.
🤖 Prompt for AI Agents
In recipes/VideoBots.py around lines 410-413, the current condition may call
azure_form_recognizer without Azure configured and ignores
request.document_model; change it to first check
settings.AZURE_FORM_RECOGNIZER_KEY before invoking azure_form_recognizer, pass
request.document_model as the model_id when present and only use "prebuilt-read"
as a fallback, and if no AZURE key is set while the selected model is
non-vision, yield/emit a clear user-facing message like "No Azure Document
Intelligence key configured; images will be ignored by non-vision models."
instead of proceeding or failing silently.

yield "Running Azure Form Recognizer..."
for url in request.input_images:
ocr_text = (
Expand Down
Loading