-
Notifications
You must be signed in to change notification settings - Fork 3
feat: implicitly fallback to azure form recognizer for non-vision LLMs #814
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
feat: implicitly fallback to azure form recognizer for non-vision LLMs #814
Conversation
📝 WalkthroughWalkthroughThe conditional logic in recipes/VideoBots.py within document_understanding_step was modified. Previously, the Azure Form Recognizer step ran when request.input_images were present and request.document_model was provided. Now, it runs when input_images are present and either a document_model is specified or the selected model is not a vision model. If the selected model is a vision model and no document_model is provided, the Form Recognizer step is skipped. No other logic paths, error handling, or public interfaces were changed. Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (2)
recipes/VideoBots.py (2)
410-413
: Confirm UX: toggle vs. implicit fallbackWith this change, even if users disable “🩻 Photo & Document Intelligence” (which clears document_model), we’ll still OCR images for non‑vision LLMs. Is this intentional (cost/latency implications)? If not, consider a separate boolean like enable_auto_image_ocr to respect the toggle.
415-424
: Optional: parallelize image OCR for latencyIf users upload multiple images, sequential OCR will add noticeable latency. You already use flatapply_parallel for documents; consider doing the same here with a small pool size (e.g., 4–8) and per‑call try/except, then extend ocr_texts with non‑empty results.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
recipes/VideoBots.py
(1 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
recipes/VideoBots.py (1)
daras_ai_v2/language_model.py (1)
LargeLanguageModels
(82-1028)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: test (3.10.12, 1.8.3)
if request.input_images and ( | ||
request.document_model | ||
or not LargeLanguageModels[request.selected_model].is_vision_model | ||
): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
Guard fallback with Azure key, honor selected document_model, and degrade gracefully
- Runtime risk: This path now executes for non‑vision LLMs even when Azure isn’t configured; azure_form_recognizer will likely raise. Add a key check.
- Behavior gap: If a document_model is explicitly chosen, we should use it instead of hardcoding "prebuilt-read".
- UX: Emit a helpful message when we can’t OCR (non‑vision model + no Azure key) instead of failing.
Proposed patch:
def document_understanding_step(self, request):
ocr_texts = []
- if request.input_images and (
- request.document_model
- or not LargeLanguageModels[request.selected_model].is_vision_model
- ):
- yield "Running Azure Form Recognizer..."
- for url in request.input_images:
- ocr_text = (
- azure_form_recognizer(url, model_id="prebuilt-read")
- .get("content", "")
- .strip()
- )
- if not ocr_text:
- continue
- ocr_texts.append(ocr_text)
+ if request.input_images:
+ is_vision = LargeLanguageModels[request.selected_model].is_vision_model
+ azure_available = bool(settings.AZURE_FORM_RECOGNIZER_KEY)
+ if request.document_model or (not is_vision and azure_available):
+ model_id = request.document_model or "prebuilt-read"
+ yield f"Running Azure Document Intelligence ({model_id})..."
+ for url in request.input_images:
+ try:
+ ocr_text = (
+ azure_form_recognizer(url, model_id=model_id)
+ .get("content", "")
+ .strip()
+ )
+ except Exception:
+ # Skip this image but keep going.
+ continue
+ if ocr_text:
+ ocr_texts.append(ocr_text)
+ elif not is_vision and not azure_available:
+ # Non‑vision LLM + images but Azure OCR not configured: inform and continue.
+ yield "No Azure Document Intelligence key configured; images will be ignored by non‑vision models."
Also applies to: 415-424
🏁 Script executed:
#!/bin/bash
grep -R "azure_form_recognizer" -n recipes/VideoBots.py
grep -R "settings" -n recipes/VideoBots.py
sed -n '380,460p' recipes/VideoBots.py
Length of output: 5584
Guard Azure OCR fallback and honor custom document_model
- Check settings.AZURE_FORM_RECOGNIZER_KEY before invoking azure_form_recognizer to prevent runtime failures when Azure isn’t configured.
- Use request.document_model as model_id if specified; fall back to
"prebuilt-read"
only when no custom model is provided. - Yield a clear message (e.g., “No Azure Document Intelligence key configured; images will be ignored by non-vision models.”) when running on a non-vision LLM without an Azure key, instead of failing silently.
🤖 Prompt for AI Agents
In recipes/VideoBots.py around lines 410-413, the current condition may call
azure_form_recognizer without Azure configured and ignores
request.document_model; change it to first check
settings.AZURE_FORM_RECOGNIZER_KEY before invoking azure_form_recognizer, pass
request.document_model as the model_id when present and only use "prebuilt-read"
as a fallback, and if no AZURE key is set while the selected model is
non-vision, yield/emit a clear user-facing message like "No Azure Document
Intelligence key configured; images will be ignored by non-vision models."
instead of proceeding or failing silently.
Q/A checklist
How to check import time?
You can visualize this using tuna:
To measure import time for a specific library:
To reduce import times, import libraries that take a long time inside the functions that use them instead of at the top of the file:
Legal Boilerplate
Look, I get it. The entity doing business as “Gooey.AI” and/or “Dara.network” was incorporated in the State of Delaware in 2020 as Dara Network Inc. and is gonna need some rights from me in order to utilize my contributions in this PR. So here's the deal: I retain all rights, title and interest in and to my contributions, and by keeping this boilerplate intact I confirm that Dara Network Inc can use, modify, copy, and redistribute my contributions, under its choice of terms.