🔍 Have You Searched Existing Issues?
📉 Describe the Performance Issue
The AI triage endpoint (POST /triage/chat) is implemented as an async FastAPI route but executes a synchronous LangGraph workflow through run_triage_flow() -> triage_app.invoke().
The LangGraph workflow performs multiple blocking Gemini API calls using synchronous .invoke() operations. As a result, the FastAPI event loop thread handling the request remains occupied for the duration of the LLM inference, which can take several seconds.
Under concurrent load, this reduces responsiveness and concurrency for other requests handled by the same FastAPI worker. Long-running LLM inference should be offloaded to a worker thread rather than executing directly inside the event loop.
🧪 Environment Details
OS: Windows 11
🔁 Steps to Reproduce
- Start the ML backend service.
- Send a request to POST /triage/chat that triggers a full triage workflow.
- While the triage request is processing, send additional requests to endpoints served by the same FastAPI worker.
- Observe increased latency and reduced responsiveness while the LangGraph workflow is executing.
- Inspect the route implementation and note that the async endpoint directly invokes the synchronous run_triage_flow() function, which ultimately performs multiple blocking Gemini API calls through LangGraph's .invoke() methods.
📋 Logs / Screenshots (Optional)
🙌 Contributor Checklist
🔍 Have You Searched Existing Issues?
📉 Describe the Performance Issue
The AI triage endpoint (POST /triage/chat) is implemented as an async FastAPI route but executes a synchronous LangGraph workflow through run_triage_flow() -> triage_app.invoke().
The LangGraph workflow performs multiple blocking Gemini API calls using synchronous .invoke() operations. As a result, the FastAPI event loop thread handling the request remains occupied for the duration of the LLM inference, which can take several seconds.
Under concurrent load, this reduces responsiveness and concurrency for other requests handled by the same FastAPI worker. Long-running LLM inference should be offloaded to a worker thread rather than executing directly inside the event loop.
🧪 Environment Details
OS: Windows 11
🔁 Steps to Reproduce
📋 Logs / Screenshots (Optional)
🙌 Contributor Checklist