[PERFORMANCE] AI Triage endpoint performs blocking LangGraph/Gemini inference inside async FastAPI route

### 🔍 Have You Searched Existing Issues?

- [x] I have searched the existing issues to avoid duplicates

### 📉 Describe the Performance Issue

The AI triage endpoint (POST /triage/chat) is implemented as an async FastAPI route but executes a synchronous LangGraph workflow through run_triage_flow() -> triage_app.invoke().

The LangGraph workflow performs multiple blocking Gemini API calls using synchronous .invoke() operations. As a result, the FastAPI event loop thread handling the request remains occupied for the duration of the LLM inference, which can take several seconds.

Under concurrent load, this reduces responsiveness and concurrency for other requests handled by the same FastAPI worker. Long-running LLM inference should be offloaded to a worker thread rather than executing directly inside the event loop.

### 🧪 Environment Details

OS: Windows 11

### 🔁 Steps to Reproduce

- Start the ML backend service.
- Send a request to POST /triage/chat that triggers a full triage workflow.
- While the triage request is processing, send additional requests to endpoints served by the same FastAPI worker.
- Observe increased latency and reduced responsiveness while the LangGraph workflow is executing.
- Inspect the route implementation and note that the async endpoint directly invokes the synchronous run_triage_flow() function, which ultimately performs multiple blocking Gemini API calls through LangGraph's .invoke() methods.

### 📋 Logs / Screenshots (Optional)

<img width="910" height="760" alt="Image" src="https://github.com/user-attachments/assets/7dd7ffcd-115b-4e92-ad20-d5f6d7d6abde" />

<img width="1112" height="777" alt="Image" src="https://github.com/user-attachments/assets/20e04986-af59-488f-8c08-cfc938bacee7" />

### 🙌 Contributor Checklist

- [x] I agree to follow this project's Code of Conduct
- [x] I want to work on this issue
- [x] I am a GSSOC'26 contributor

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[PERFORMANCE] AI Triage endpoint performs blocking LangGraph/Gemini inference inside async FastAPI route #1928

🔍 Have You Searched Existing Issues?

📉 Describe the Performance Issue

🧪 Environment Details

🔁 Steps to Reproduce

📋 Logs / Screenshots (Optional)

🙌 Contributor Checklist

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[PERFORMANCE] AI Triage endpoint performs blocking LangGraph/Gemini inference inside async FastAPI route #1928

Description

🔍 Have You Searched Existing Issues?

📉 Describe the Performance Issue

🧪 Environment Details

🔁 Steps to Reproduce

📋 Logs / Screenshots (Optional)

🙌 Contributor Checklist

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions