-
Notifications
You must be signed in to change notification settings - Fork 766
Fix LiteLLM infinite loop issue with Ollama/Gemma3 models #166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
LovepreetSinghVerma
wants to merge
1
commit into
google:main
Choose a base branch
from
LovepreetSinghVerma:fix/litellm-infinite-loop
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
+640
−10
Open
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,11 @@ | ||
# LiteLLM Integration | ||
|
||
## Loop Prevention | ||
|
||
When using LiteLLM with certain models (particularly Ollama/Gemma3), be aware that the system includes loop detection to prevent infinite function call loops. The loop detection triggers when the same function is called consecutively more than 5 times. | ||
|
||
If your application legitimately needs to call the same function more than 5 times in a row, you can adjust the `_loop_threshold` value in the `LiteLlm` class. However, this is generally not recommended as repeated calls to the same function are often a sign of an issue with the model's understanding or the function's implementation. | ||
|
||
For more details on this feature, see [LiteLLM Loop Fix Documentation](./litellm_loop_fix.md). | ||
|
||
# Additional Topics |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,62 @@ | ||
# LiteLLM Infinite Loop Fix | ||
|
||
## Overview | ||
|
||
This document describes a fix implemented to address an infinite loop issue that occurs when using ADK (Agent Development Kit) with Ollama/Gemma3 models via the LiteLLM integration. | ||
|
||
## Problem Description | ||
|
||
When using certain models like Ollama/Gemma3 through LiteLLM, the system could enter an infinite loop under the following conditions: | ||
|
||
1. The model makes a function call with arguments | ||
2. The function executes and returns a result | ||
3. The model tries to make another function call, but with malformed JSON in the arguments | ||
4. Due to the malformed JSON, the system gets stuck repeating the same function call | ||
|
||
This issue caused the system to become unresponsive and waste resources, as the model would continuously attempt to call the same function without making progress. | ||
|
||
## Solution | ||
|
||
The fix addresses the issue through two main components: | ||
|
||
### 1. Robust JSON Parsing | ||
|
||
The enhanced `_model_response_to_generate_content_response` function now includes: | ||
|
||
- Comprehensive validation for required fields with proper defaults | ||
- Multiple strategies for parsing malformed JSON: | ||
- Standard JSON parsing | ||
- Single quote replacement | ||
- Regex-based fixes for common JSON formatting issues | ||
- Graceful fallback to empty dictionaries when parsing fails | ||
- Improved error handling to prevent crashes | ||
|
||
### 2. Loop Detection Mechanism | ||
|
||
The `generate_content_async` method in the `LiteLlm` class now includes: | ||
|
||
- Tracking of consecutive calls to the same function | ||
- Detection when the same function is called more than a threshold number of times (default: 5) | ||
- Interruption of potential infinite loops when detected | ||
- Generation of helpful user-facing messages that explain the issue | ||
- Inclusion of relevant context from function calls to assist the user | ||
|
||
## Implementation Details | ||
|
||
The implementation preserves compatibility with all existing ADK functionality while adding the new safety mechanisms. The loop detection is efficient and adds minimal overhead to normal operation. | ||
|
||
### Configuration | ||
|
||
The loop detection threshold can be adjusted by modifying the `_loop_threshold` class variable in the `LiteLlm` class. The default value is 5, which strikes a balance between allowing legitimate repeated function calls and identifying problematic loops. | ||
|
||
### Testing | ||
|
||
The fix has been validated through: | ||
|
||
1. Unit tests for robust JSON parsing | ||
2. Integration tests to verify loop detection | ||
3. Manual system testing to ensure compatibility with existing workflows | ||
|
||
## Conclusion | ||
|
||
This fix makes the LiteLLM integration more robust, particularly when using models that may produce malformed JSON or get stuck in repetitive patterns. It improves reliability and user experience by preventing infinite loops and providing helpful context when issues are detected. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -376,22 +376,137 @@ def _model_response_to_chunk( | |
def _model_response_to_generate_content_response( | ||
response: ModelResponse, | ||
) -> LlmResponse: | ||
"""Converts a litellm response to LlmResponse. | ||
"""Converts a litellm response to LlmResponse with robust error handling. | ||
|
||
This enhanced version: | ||
1. Adds validation for required fields with proper defaults | ||
2. Improves JSON parsing to handle both standard JSON and Python-style strings | ||
3. Implements comprehensive error handling to prevent crashes | ||
4. Maintains compatibility with all model formats | ||
|
||
Implementation note: | ||
This function is part of the fix for an infinite loop issue that occurs when using | ||
Ollama/Gemma3 models with LiteLLM. These models sometimes return malformed JSON in | ||
function call arguments, which can cause the system to get stuck in a loop. | ||
The robust parsing ensures that even with malformed JSON, we can still extract | ||
valid arguments and prevent failures. | ||
|
||
Args: | ||
response: The model response. | ||
|
||
Returns: | ||
The LlmResponse. | ||
""" | ||
|
||
message = None | ||
if response.get("choices", None): | ||
message = response["choices"][0].get("message", None) | ||
|
||
if not message: | ||
raise ValueError("No message in response") | ||
return _message_to_generate_content_response(message) | ||
try: | ||
# Validate response structure | ||
if not hasattr(response, "choices") or not response.choices: | ||
logger.warning("ModelResponse missing choices or empty choices list") | ||
return LlmResponse( | ||
content=types.Content( | ||
role="model", | ||
parts=[types.Part(text="No response generated from model.")], | ||
) | ||
) | ||
|
||
# Get first choice safely | ||
choice = response.choices[0] | ||
|
||
# Validate message existence | ||
if not hasattr(choice, "message") or not choice.message: | ||
logger.warning("Choice missing message or empty message") | ||
return LlmResponse( | ||
content=types.Content( | ||
role="model", | ||
parts=[types.Part(text="Empty message from model.")], | ||
) | ||
) | ||
|
||
message = choice.message | ||
parts = [] | ||
|
||
# Handle text content if present | ||
if hasattr(message, "content") and message.content: | ||
parts.append(types.Part(text=message.content)) | ||
|
||
# Handle tool calls with proper validation | ||
if hasattr(message, "tool_calls") and message.tool_calls: | ||
for tool_call in message.tool_calls: | ||
logger.debug(f"Processing tool call: {tool_call}") | ||
|
||
# Validate required fields | ||
if not hasattr(tool_call, "function"): | ||
logger.warning("Tool call missing function field, skipping") | ||
continue | ||
|
||
if not hasattr(tool_call.function, "name"): | ||
logger.warning("Tool call function missing name field, skipping") | ||
continue | ||
|
||
# Safe ID handling | ||
tool_id = getattr(tool_call, "id", f"generated_id_{id(tool_call)}") | ||
|
||
# Safe arguments parsing with error handling | ||
args = {} | ||
if hasattr(tool_call.function, "arguments"): | ||
arguments = tool_call.function.arguments | ||
if arguments: | ||
try: | ||
# Standard JSON parsing | ||
args = json.loads(arguments) | ||
logger.debug(f"Successfully parsed arguments: {args}") | ||
except json.JSONDecodeError: | ||
logger.warning(f"Failed to parse arguments as JSON: {arguments}") | ||
# Attempt to fix common JSON issues | ||
try: | ||
# Replace single quotes with double quotes | ||
fixed_args = arguments.replace("'", '"') | ||
args = json.loads(fixed_args) | ||
logger.info(f"Successfully parsed arguments after fixing quotes: {args}") | ||
except json.JSONDecodeError: | ||
# Try more aggressive fixes for malformed JSON | ||
try: | ||
import re | ||
# Use regex to extract key-value pairs | ||
fixed_args = re.sub(r"'([^']+)':", r'"\1":', arguments) | ||
fixed_args = re.sub(r":'([^']+)'", r':"\1"', fixed_args) | ||
args = json.loads(fixed_args) | ||
logger.info(f"Successfully parsed arguments after regex fixes: {args}") | ||
except (json.JSONDecodeError, Exception) as e: | ||
logger.warning(f"All parsing attempts failed, using empty dict: {e}") | ||
else: | ||
logger.warning(f"Tool call function missing arguments field, using empty dict") | ||
|
||
# Create function call part | ||
parts.append( | ||
types.Part( | ||
function_call=types.FunctionCall( | ||
name=tool_call.function.name, | ||
args=args, | ||
id=tool_id, | ||
) | ||
) | ||
) | ||
|
||
# Ensure at least one part | ||
if not parts: | ||
logger.warning("No parts created from response, adding empty text part") | ||
parts = [types.Part(text="")] | ||
|
||
return LlmResponse( | ||
content=types.Content( | ||
role="model", | ||
parts=parts, | ||
) | ||
) | ||
except Exception as e: | ||
# Global error handler for any unexpected issues | ||
logger.error(f"Error processing model response: {e}", exc_info=True) | ||
return LlmResponse( | ||
content=types.Content( | ||
role="model", | ||
parts=[types.Part(text="Error processing model response. Please try again.")], | ||
) | ||
) | ||
|
||
|
||
def _message_to_generate_content_response( | ||
|
@@ -559,12 +674,20 @@ class LiteLlm(BaseLlm): | |
model: The name of the LiteLlm model. | ||
llm_client: The LLM client to use for the model. | ||
model_config: The model config. | ||
_consecutive_tool_calls: Counter for tracking consecutive calls to the same function. | ||
_last_tool_call_name: Name of the last function called. | ||
_loop_threshold: Maximum number of consecutive calls to the same function before | ||
triggering loop detection (default: 5). | ||
""" | ||
|
||
llm_client: LiteLLMClient = Field(default_factory=LiteLLMClient) | ||
"""The LLM client to use for the model.""" | ||
|
||
_additional_args: Dict[str, Any] = None | ||
# Loop detection state - Prevents infinite loops when models repeatedly call the same function | ||
_consecutive_tool_calls: int = 0 | ||
_last_tool_call_name: Optional[str] = None | ||
_loop_threshold: int = 5 # Maximum number of consecutive calls to the same tool | ||
|
||
def __init__(self, model: str, **kwargs): | ||
"""Initializes the LiteLlm class. | ||
|
@@ -582,11 +705,36 @@ def __init__(self, model: str, **kwargs): | |
self._additional_args.pop("tools", None) | ||
# public api called from runner determines to stream or not | ||
self._additional_args.pop("stream", None) | ||
# Initialize loop detection state | ||
self._consecutive_tool_calls = 0 | ||
self._last_tool_call_name = None | ||
|
||
async def generate_content_async( | ||
self, llm_request: LlmRequest, stream: bool = False | ||
) -> AsyncGenerator[LlmResponse, None]: | ||
"""Generates content asynchronously. | ||
"""Generates content asynchronously with loop detection. | ||
|
||
This enhanced version: | ||
1. Tracks consecutive calls to the same function | ||
2. Breaks potential infinite loops after a threshold | ||
3. Provides a helpful response when a loop is detected | ||
4. Maintains compatibility with the original method | ||
|
||
Implementation details: | ||
The loop detection mechanism addresses an issue that can occur with certain models | ||
(particularly Ollama/Gemma3), where the model gets stuck repeatedly calling the same | ||
function without making progress. This commonly happens when: | ||
|
||
- The model receives malformed JSON responses it cannot parse | ||
- The model gets into a repetitive pattern of behavior | ||
- The model misunderstands function results and keeps trying the same approach | ||
|
||
When the same function is called consecutively more than the threshold number of times | ||
(default: 5), the loop detection mechanism interrupts the loop and provides a helpful | ||
response to the user instead of continuing to call the model. | ||
|
||
This prevents wasted resources and improves user experience by avoiding situations | ||
where the system would otherwise become unresponsive. | ||
Comment on lines
+717
to
+737
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. While looking at the semantics/style of the rest of the codebase and the fact that you have already provided a separate file documenting these changes, I believe this portion of the function docstring is not necessary. |
||
|
||
Args: | ||
llm_request: LlmRequest, the request to send to the LiteLlm model. | ||
|
@@ -595,6 +743,87 @@ async def generate_content_async( | |
Yields: | ||
LlmResponse: The model response. | ||
""" | ||
# Check if this is a function response by examining history | ||
if (llm_request.history and len(llm_request.history) >= 2 and | ||
llm_request.history[-1].role == "user" and | ||
llm_request.history[-2].role == "model"): | ||
|
||
# Find any function calls in the previous model response | ||
function_parts = [ | ||
p for p in llm_request.history[-2].parts | ||
if hasattr(p, "function_call") and p.function_call | ||
] | ||
|
||
if function_parts: | ||
current_function_name = function_parts[0].function_call.name | ||
logger.debug(f"Previous function call was to: {current_function_name}") | ||
|
||
# Check if we're calling the same function again | ||
if current_function_name == self._last_tool_call_name: | ||
self._consecutive_tool_calls += 1 | ||
logger.warning( | ||
f"Detected consecutive call #{self._consecutive_tool_calls} " | ||
f"to function {current_function_name}" | ||
) | ||
else: | ||
# Reset counter for new function | ||
self._consecutive_tool_calls = 1 | ||
self._last_tool_call_name = current_function_name | ||
logger.debug(f"New function call to: {current_function_name}") | ||
|
||
# If we've exceeded the threshold, break the loop | ||
if self._consecutive_tool_calls >= self._loop_threshold: | ||
logger.error( | ||
f"Detected potential infinite loop: {self._consecutive_tool_calls} " | ||
f"consecutive calls to {current_function_name}" | ||
) | ||
|
||
# Get dealer information to provide in the response (if available) | ||
dealer_info = "" | ||
for content in llm_request.history: | ||
if content.role == "user" and hasattr(content, "parts"): | ||
for part in content.parts: | ||
if hasattr(part, "function_response") and part.function_response: | ||
if part.function_response.name == "get_dealers": | ||
dealer_info = str(part.function_response.response) | ||
break | ||
|
||
# Create helpful response | ||
response_text = ( | ||
f"I've detected a potential infinite loop while trying to call the " | ||
f"{current_function_name} function repeatedly. Let me provide a direct " | ||
f"response instead:\n\n" | ||
) | ||
|
||
if dealer_info: | ||
response_text += f"Here are the dealers available: {dealer_info}\n\n" | ||
else: | ||
response_text += ( | ||
"It seems I was trying to get information repeatedly. " | ||
"Please try asking your question differently.\n\n" | ||
) | ||
|
||
response_text += ( | ||
"If you need specific information, please let me know what you're " | ||
"looking for and I'll try to assist you directly." | ||
) | ||
|
||
# Return a direct response instead of calling model again | ||
yield LlmResponse( | ||
content=types.Content( | ||
role="model", | ||
parts=[types.Part(text=response_text)], | ||
) | ||
) | ||
|
||
# Reset the counter | ||
self._consecutive_tool_calls = 0 | ||
self._last_tool_call_name = None | ||
return | ||
else: | ||
# Reset counter for regular messages | ||
self._consecutive_tool_calls = 0 | ||
self._last_tool_call_name = None | ||
|
||
logger.info(_build_request_log(llm_request)) | ||
|
||
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While looking at the semantics/style of the rest of the codebase and the fact that you have already provided a separate file documenting these changes, I believe this portion of the function docstring is not necessary.