Improve Litellm proxy related error handling (#491)

lickem22 · web-flow · commit 8dad4815a675 · 2025-03-07T09:18:39.000Z
* Adding termcolor to requirements.

* Adding new litellm model for chat.

* Fixing import path for add dummy data script.

* Adding openai/chat litellm config.

* Add utils to generate random int32.

* Adding chat management functionalities.

* Adding prompts for ChatHistory.

* Removed zero-shot cot and updated system message passing.

* Updated pre-commit to include types-requests.

* Added chat fallback model.

* Added REDIS timeout variable to config.

* Updated prompts for ChatHistory.

* Passing in paraphrase argument so that paraphrasing can be skipped for chat histories.

* Updat chat management utilities.

* Updated prompts for ChatHistory.

* Added response generation with RAG and chat history.

* Updated decorators and llm query generation to include chat history. Added /chat endpoint. Temporariliy commented out /search endpoint for quick testing.

* Updated chat manager functions. Updated parameter name for _ask_llm_async from json to _json.

* Updated prompts for spacing.

* Updated prompt and chat management.

* Added utils for generating random int32.

* Refactored chat and search endpoints.

* Consolidated chat and search endpoints.

* CCs.

* Removed termcolor package.

* Adding types-requests to requirements-dev.txt for github workflow.

* Passing along session ID for QueryResponse.

* Logic shift to query refined template.

* CCs.

* Removing paraphrase argument.

* Removing paraphrase argument.

* CCs.

* No need to return session ID.

* Added tests for chat.

* Fixing os env issue with github workflow.

* Fixing os env issue with github workflow.

* Fixing os env issue with github workflow.

* Fixing os env issue with github workflow.

* Fixing os env issue with github workflow.

* Test.

* Reverting tests.

* CCs.

* Checking mocked tests for github actions.

* Updated tests.

* Updated tests and fixed issue with truncation.

* CCs.

* CCs.

* CCs.

* CCs.

* Removed WorkspaceRetrieve pydantic model.

* CCs.

* CCs.

* CCs.

* CCs.

* Updated contents and tags packages for workspaces

* Updated question_answer package for workspaces. Modified parts of data_api and llm_call packages. Finished up lagging function calls that were missing workspaces in previous commits.

* Fixed function signatures.

* Finished data_api package.

* Finished admin package.

* Updated urgency_detection and urgency_rules packages.

* CCs.

* Updated user_tools package. CCs.

* CCs to utils and tags packages.

* Updated question_answer and contents packages.

* CCs to urgency_detection and uregncy_rules packages.

* Updated data_api package.

* CCs to llm_call/dashboard.py.

* CCs to llm_call/llm_prompts.py.

* Updated add_dummy_data_to_db to use workspace_id.

* Updated add_new_data_to_db to use workspace_id.

* Removed unused import.

* Separated workspace logic into its own package with its own routers, utils, and schemas. Updated auth dependencies and routers to resolve circular import issues.

* Linting.

* Changing default workspace to be Workspace_{user.username}.

* Added delete workspace and get workspace by user ID endpoints.

* Updated table names. Added default_workspace column. Updated auth to pull default workspace. Added login-workspace endpoint. Updating tests...

* CCs.

* Updated workspace endpoints and schemas. Included better checks for quotas.

* Checking for unique workspace name when updating workspace. Added ability to remove users from workspaces.

* Added user removal functionality.

* CCs to remaining modules. Fixed circular import issue and removed user_tools package---consolidated with users package now. Additional updates to users routers.

* CCs.

* Updated tests/rails package.

* CCs. Going through and updating tests/api/conftest.py.

* Updated test_admin.py.

* Fixed alembic migration naming issue. Verified alembic tests pass.

* Verified test_archive_content.py.

* Verified test_chat.py

* Verified test_data_api.py.

* Verified test_import_content.py.

* Verified test_import_content.py test_manage_content.py test_manage_tags.py

* Verified test_manage_ud_rules.py.

* Finished verifying existing tests except for dashboard tests. Added migration for on cascade deletion.

* Finished verifying existing tests with pytest-randomly. Fixed lagging issues.

* Added ability for any user to create a workspace.

* commit message

* commit message

* Adding BDD tests.

* Updating workspace BDD tests.

* Updating workspace BDD tests.

* Merging in frontend changes only for multi-turn conv.

* Updating with multi-turn conv frontend PR and pylint fixes.

* Adding linting make command.

* Merged with topic modeling PR.

* Folding in hotfixes to admin_app.

* CCs.

* CCs.

* CCs.

* CCs.

* Folding in hotfixes to admin_app.

* Updated dashboard package for workspace.

* Verified remaining tests.

* Add workspace bar

* Updated github workflow for tests. Updated test_urgency_detect.py to include proper teardown. Updated dashboard filtering logic to point to UrgencyResponseDB instead of ResponseFeedbackDB. CCs.

* Updated optional_components for linting and updated httpx dependency in order to pass github workflow.

* Testing reverting back to using type.

* Testing reverting back to using isinstance.

* CCs.

* CCs.

* CCs.

* CCs.

* Added accidentally deleted pytest fixture.

* Moved archive content test to its own workspace.

* login endpoints now return workspace_name in AuthenticatedDetails. login-workspace now has dependency injection on get_current_user so that access token is required. workspace default quotas changed to env defaults. UserRetrieve now returns list of dicts instead of two separate lists. retrieve_all_users is now retrieve_all_users_in_current_workspace. added get_current_workspace endpoint.

* Create new workspace component

* Returning WorkspaceRetrieve after creating workspaces instead of WorkspaceCreate so that workspace_id is available.

* Edit workspace button

* Moved login-workspace endpoint to workspace/routers.py and changed to switch-workspace endpoint due to authentication requirement. Disabled updating workspace quotas on backend.

* Update edit users

* Added is_default_workspace in return object when adding existing users to a workspace.

* Switch to diferent workspaces feature

* Folding in changes for frontend and backend from update dashboard page 2 PR.

* Modularizing BDD test.

* CCs.

* Added user resetting passwords BDD tests.

* Changed endpoint from /workspace/current to /workspace/current-workspace.

* Added retrieving user information BDD tests.

* Added removing user BDD tests.

* Fixed error in removing users from workspaces BDD tests.

* Added updating user information BDD tests. Other CCs.

* Added creating workspaces BDD tests.

* Updating tests to pass in GHA.

* Adding user role to access token and authentication. Removed is_admin attribute.

* Added users endpoint to check if a username exists.

* Added user routers to differentiate between creating new users and adding existing users.

* Added adding users BDD tests. Put endpoint for checking if username exists back in. Separated out logic for creating new users vs. adding existing users to workspaces.

* Added updating workspaces BDD tests.

* Added type check.

* Added retrieving workspaces BDD tests. Updated pyproject.toml and requirements-dev for coverage.

* Router name change to add-existing-user-to-workspace.

* Updated tests.

* Add new changes

* Add new changes

* Updated user head endpoint to check if username exists to return a status code instead of a boolean.

* CCs.

* Add user to workspace

* Removed access token requirement when resetting user password. Updated tests.

* Default workspace implementation

* Merging frontend changes from main.

* Added official docs for multi-turn chat and workspaces. Removed HACK FIX comments.

* Add reset user logic

* Remove quotas from form

* clean up

* Fix read only issue

* Remove section from integration page for read only users

* Few bug fixes

* Handle long workspace names

* Changing default workspace name to {username}'s Workspace.

* Fix user role bug

* Consolidated workspace migration files to a single migration file that also takes care of data migration. Updated tests.

* Fixing github workflow for tests.

* Separated single workspace migration file into 3 stages for production.

* Final fixes

* Final final changes

* Fixing migration script errors.

* Fixing migration script errors.

* Removed typo in front of Make command.

* Fixing migration script errors.

* Fixing migration script errors.

* Updating with main.

* Fix recovery_code being null issue

* Improve error handling for LLM endpoint

* Catch exception when adding/edting rules

---------

Co-authored-by: tonyzhao6 &lt;&gt;
diff --git a/admin_app/src/app/content/components/ChatSideBar.tsx b/admin_app/src/app/content/components/ChatSideBar.tsx
@@ -100,13 +100,10 @@ const ChatSideBar = ({
       : getResponse(question);
     responsePromise
       .then((response) => {
-        const errorMessage = response.error
-          ? response.error.error_message
-          : "LLM Response failed.";
         const responseMessage = {
           dateTime: new Date().toISOString(),
           type: "response",
-          content: response.status == 200 ? response.llm_response : errorMessage,
+          content: response.llm_response,
           json: response,
         } as ResponseMessage;
 
diff --git a/admin_app/src/app/login/page.tsx b/admin_app/src/app/login/page.tsx
@@ -21,6 +21,7 @@ import * as React from "react";
 import { useEffect } from "react";
 import { appColors, sizes } from "@/utils";
 import {
+  checkIfUsernameExists,
   getRegisterOption,
   registerUser,
   resetPassword,
@@ -135,7 +136,6 @@ const Login = () => {
   const handleCloseConfirmationModal = () => {
     setShowConfirmationModal(false);
   };
-
   return isLoading ? (
     <Grid>
       {" "}
diff --git a/core_backend/app/contents/routers.py b/core_backend/app/contents/routers.py
@@ -18,7 +18,7 @@
 from ..tags.schemas import TagCreate, TagRetrieve
 from ..users.models import UserDB, user_has_required_role_in_workspace
 from ..users.schemas import UserRoles
-from ..utils import setup_logger
+from ..utils import EmbeddingCallException, setup_logger
 from ..workspaces.utils import (
     get_content_quota_by_workspace_id,
     get_workspace_by_workspace_name,
@@ -103,6 +103,7 @@ async def create_content(
     HTTPException
         If the user does not have the required role to create content in the workspace.
         If the content tags are invalid or the user would exceed their content quota.
+        If the embedding of the content fails.
     """
 
     workspace_db = await get_workspace_by_workspace_name(
@@ -147,12 +148,18 @@ async def create_content(
             ) from e
 
     # 4.
-    content_db = await save_content_to_db(
-        asession=asession,
-        content=content,
-        exclude_archived=False,  # Don't exclude for newly saved content!
-        workspace_id=workspace_id,
-    )
+    try:
+        content_db = await save_content_to_db(
+            asession=asession,
+            content=content,
+            exclude_archived=False,  # Don't exclude for newly saved content!
+            workspace_id=workspace_id,
+        )
+    except EmbeddingCallException as e:
+        raise HTTPException(
+            status_code=status.HTTP_502_BAD_GATEWAY,
+            detail="Error embedding content. Please check embedding service.",
+        ) from e
     return _convert_record_to_schema(record=content_db)
 
 
@@ -237,12 +244,18 @@ async def edit_content(
 
     content.content_tags = content_tags
     content.is_archived = old_content.is_archived
-    updated_content = await update_content_in_db(
-        asession=asession,
-        content=content,
-        content_id=content_id,
-        workspace_id=workspace_id,
-    )
+    try:
+        updated_content = await update_content_in_db(
+            asession=asession,
+            content=content,
+            content_id=content_id,
+            workspace_id=workspace_id,
+        )
+    except EmbeddingCallException as e:
+        raise HTTPException(
+            status_code=status.HTTP_502_BAD_GATEWAY,
+            detail="Error embedding content. Please check embedding service.",
+        ) from e
 
     return _convert_record_to_schema(record=updated_content)
 
diff --git a/core_backend/app/llm_call/utils.py b/core_backend/app/llm_call/utils.py
@@ -71,33 +71,52 @@ async def _ask_llm_async(
     if not messages:
         assert isinstance(user_message, str) and isinstance(system_message, str)
         messages = [
-            {
-                "content": system_message,
-                "role": "system",
-            },
-            {
-                "content": user_message,
-                "role": "user",
-            },
+            {"content": system_message, "role": "system"},
+            {"content": user_message, "role": "user"},
         ]
+
     llm_generation_params = llm_generation_params or {
         "max_tokens": 1024,
         "temperature": 0,
     }
 
     logger.info(f"LLM input: 'model': {litellm_model}, 'endpoint': {litellm_endpoint}")
 
-    llm_response_raw = await acompletion(
-        model=litellm_model,
-        messages=messages,
-        api_base=litellm_endpoint,
-        api_key=LITELLM_API_KEY,
-        metadata=metadata,
-        **extra_kwargs,
-        **llm_generation_params,
-    )
-    logger.info(f"LLM output: {llm_response_raw.choices[0].message.content}")
-    return llm_response_raw.choices[0].message.content
+    try:
+        llm_response_raw = await acompletion(
+            model=litellm_model,
+            messages=messages,
+            api_base=litellm_endpoint,
+            api_key=LITELLM_API_KEY,
+            metadata=metadata,
+            **extra_kwargs,
+            **llm_generation_params,
+        )
+    except Exception as err:
+        logger.error("Error calling the LLM", exc_info=True)
+        raise LLMCallException(f"Error during LLM call: {err}") from err
+
+    # Optionally check if the returned response contains an error field
+    if hasattr(llm_response_raw, "error") and llm_response_raw.error:
+        error_msg = getattr(llm_response_raw, "error", "Unknown error")
+        logger.error(f"LLM call returned an error: {error_msg}")
+        raise LLMCallException(f"LLM call returned an error: {error_msg}")
+
+    # Ensure that the response has valid content
+    try:
+        content = llm_response_raw.choices[0].message.content
+    except (AttributeError, IndexError) as e:
+        logger.error("LLM response structure is not as expected", exc_info=True)
+        raise LLMCallException("LLM response structure is not as expected") from e
+
+    logger.info(f"LLM output: {content}")
+    return content
+
+
+class LLMCallException(Exception):
+    """Custom exception for LLM call errors."""
+
+    pass
 
 
 def _truncate_chat_history(
diff --git a/core_backend/app/question_answer/routers.py b/core_backend/app/question_answer/routers.py
@@ -39,6 +39,7 @@
     generate_tts__after,
 )
 from ..llm_call.utils import (
+    LLMCallException,
     append_message_content_to_chat_history,
     get_chat_response,
     init_chat_history,
@@ -131,21 +132,32 @@ async def chat(
     QueryResponse | JSONResponse
         The query response object or an appropriate JSON response.
     """
+    try:
+        # 1.
+        user_query = await init_user_query_and_chat_histories(
+            redis_client=request.app.state.redis,
+            reset_chat_history=reset_chat_history,
+            user_query=user_query,
+        )
 
-    # 1.
-    user_query = await init_user_query_and_chat_histories(
-        redis_client=request.app.state.redis,
-        reset_chat_history=reset_chat_history,
-        user_query=user_query,
-    )
+        # 2
 
-    # 2.
-    return await search(
-        user_query=user_query,
-        request=request,
-        asession=asession,
-        workspace_db=workspace_db,
-    )
+        response = await search(
+            user_query=user_query,
+            request=request,
+            asession=asession,
+            workspace_db=workspace_db,
+        )
+        return response
+    except LLMCallException:
+        return JSONResponse(
+            status_code=status.HTTP_502_BAD_GATEWAY,
+            content={
+                "error_message": (
+                    "LLM call returned an error: Please check LLM configuration"
+                )
+            },
+        )
 
 
 @router.post(
@@ -186,63 +198,74 @@ async def search(
     QueryResponse | JSONResponse
         The query response object or an appropriate JSON response.
     """
+    try:
+        workspace_id = workspace_db.workspace_id
+        user_query_db, user_query_refined_template, response_template = (
+            await get_user_query_and_response(
+                asession=asession,
+                generate_tts=False,
+                user_query=user_query,
+                workspace_id=workspace_id,
+            )
+        )
+        assert isinstance(user_query_db, QueryDB)
 
-    workspace_id = workspace_db.workspace_id
-    user_query_db, user_query_refined_template, response_template = (
-        await get_user_query_and_response(
+        response = await get_search_response(
             asession=asession,
-            generate_tts=False,
-            user_query=user_query,
+            exclude_archived=True,
+            n_similar=int(N_TOP_CONTENT),
+            n_to_crossencoder=int(N_TOP_CONTENT_TO_CROSSENCODER),
+            query_refined=user_query_refined_template,
+            request=request,
+            response=response_template,
             workspace_id=workspace_id,
         )
-    )
-    assert isinstance(user_query_db, QueryDB)
 
-    response = await get_search_response(
-        asession=asession,
-        exclude_archived=True,
-        n_similar=int(N_TOP_CONTENT),
-        n_to_crossencoder=int(N_TOP_CONTENT_TO_CROSSENCODER),
-        query_refined=user_query_refined_template,
-        request=request,
-        response=response_template,
-        workspace_id=workspace_id,
-    )
+        if user_query.generate_llm_response:
+            response = await get_generation_response(
+                query_refined=user_query_refined_template, response=response
+            )
 
-    if user_query.generate_llm_response:
-        response = await get_generation_response(
-            query_refined=user_query_refined_template, response=response
+        await save_query_response_to_db(
+            asession=asession,
+            response=response,
+            user_query_db=user_query_db,
+            workspace_id=workspace_id,
+        )
+        await increment_query_count(
+            asession=asession,
+            contents=response.search_results,
+            workspace_id=workspace_id,
+        )
+        await save_content_for_query_to_db(
+            asession=asession,
+            contents=response.search_results,
+            query_id=response.query_id,
+            session_id=user_query.session_id,
+            workspace_id=workspace_id,
         )
 
-    await save_query_response_to_db(
-        asession=asession,
-        response=response,
-        user_query_db=user_query_db,
-        workspace_id=workspace_id,
-    )
-    await increment_query_count(
-        asession=asession, contents=response.search_results, workspace_id=workspace_id
-    )
-    await save_content_for_query_to_db(
-        asession=asession,
-        contents=response.search_results,
-        query_id=response.query_id,
-        session_id=user_query.session_id,
-        workspace_id=workspace_id,
-    )
+        if isinstance(response, QueryResponseError):
+            return JSONResponse(
+                status_code=status.HTTP_400_BAD_REQUEST, content=response.model_dump()
+            )
+
+        if isinstance(response, QueryResponse):
+            return response
 
-    if isinstance(response, QueryResponseError):
         return JSONResponse(
-            status_code=status.HTTP_400_BAD_REQUEST, content=response.model_dump()
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            content={"error_message": "Internal server error"},
+        )
+    except LLMCallException:
+        return JSONResponse(
+            status_code=status.HTTP_502_BAD_GATEWAY,
+            content={
+                "error_message": (
+                    "LLM call returned an error: Please check LLM configuration"
+                )
+            },
         )
-
-    if isinstance(response, QueryResponse):
-        return response
-
-    return JSONResponse(
-        status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
-        content={"message": "Internal server error"},
-    )
 
 
 @router.post(
diff --git a/core_backend/app/urgency_rules/routers.py b/core_backend/app/urgency_rules/routers.py
diff --git a/core_backend/app/utils.py b/core_backend/app/utils.py

-Original file line number
+Diff line change
 from ..database import get_async_session
 from ..users.models import UserDB, user_has_required_role_in_workspace
 from ..users.schemas import UserRoles
 -from ..utils import setup_logger
 +from ..utils import EmbeddingCallException, setup_logger
 from ..workspaces.utils import get_workspace_by_workspace_name
 from .models import (
     UrgencyRuleDB,
             detail="User does not have the required role to create urgency rules in "
             "the workspace.",
+        )
 +    try:
 -    urgency_rule_db = await save_urgency_rule_to_db(
 -        asession=asession,
 -        urgency_rule=urgency_rule,
 -        workspace_id=workspace_db.workspace_id,
 -    )
 +        urgency_rule_db = await save_urgency_rule_to_db(
 +            asession=asession,
 +            urgency_rule=urgency_rule,
 +            workspace_id=workspace_db.workspace_id,
 +        )
 +    except EmbeddingCallException as e:
 +        raise HTTPException(
 +            status_code=status.HTTP_502_BAD_GATEWAY,
 +            detail="Error embedding rule. Please check embedding service.",
 +        ) from e
     return _convert_record_to_schema(urgency_rule_db=urgency_rule_db)
             status_code=status.HTTP_404_NOT_FOUND,
             detail=f"Urgency Rule ID `{urgency_rule_id}` not found",
+        )
+-
 -    urgency_rule_db = await update_urgency_rule_in_db(
 -        asession=asession,
 -        urgency_rule=urgency_rule,
 -        urgency_rule_id=urgency_rule_id,
 -        workspace_id=workspace_id,
 -    )
 +    try:
 +        urgency_rule_db = await update_urgency_rule_in_db(
 +            asession=asession,
 +            urgency_rule=urgency_rule,
 +            urgency_rule_id=urgency_rule_id,
 +            workspace_id=workspace_id,
 +        )
 +    except EmbeddingCallException as e:
 +        raise HTTPException(
 +            status_code=status.HTTP_502_BAD_GATEWAY,
 +            detail="Error embedding rule. Please check embedding service.",
 +        ) from e
     return _convert_record_to_schema(urgency_rule_db=urgency_rule_db)
-Original file line number
+Diff line change
     return metadata
 +class EmbeddingCallException(Exception):
 +    """Custom exception for embedding call errors."""
++
 +    pass
++
++
 async def embedding(
     *, metadata: Optional[dict] = None, text_to_embed: str
 ) -> list[float]:
         The embedding for the given text.
     """
 -    metadata = metadata or {}
+-
 -    content_embedding = await aembedding(
 -        api_base=LITELLM_ENDPOINT,
 -        api_key=LITELLM_API_KEY,
 -        input=text_to_embed,
 -        metadata=metadata,
 -        model=LITELLM_MODEL_EMBEDDING,
 -    )
+-
 -    return content_embedding.data[0]["embedding"]
 +    try:
 +        content_embedding = await aembedding(
 +            api_base=LITELLM_ENDPOINT,
 +            api_key=LITELLM_API_KEY,
 +            input=text_to_embed,
 +            metadata=metadata,
 +            model=LITELLM_MODEL_EMBEDDING,
 +        )
 +    except Exception as err:
 +        raise EmbeddingCallException(f"Error during embedding call: {err}") from err
++
 +    # Validate the response structure
 +    try:
 +        embedding_value = content_embedding.data[0]["embedding"]
 +    except (AttributeError, IndexError, KeyError) as err:
 +        raise EmbeddingCallException(
 +            "Embedding response structure is not as expected"
 +        ) from err
 +    return embedding_value
 def encode_api_limit(*, api_limit: int | None) -> int | str: