From 5376b2303a584554e8136a631a38884440ff0be6 Mon Sep 17 00:00:00 2001 From: Diogo Goncalves Date: Tue, 22 Apr 2025 17:00:33 +0100 Subject: [PATCH 1/7] Feature/langraph integration (#215) * chore: Provider Unit Tests (#173) * chore: added unit tests for core provider. small bugfix on calculate_metrics of provider * added unit tests and docstring for join chunks * added unit tests and docstrings for calculate_cost on provider * added unit tests and docstrings for input_to_string on provider * added unit tests and docstrings for chat and achat * added unit tests and docstrings for chat and achat * chore: cleaned provider unit tests * chore: separated provider tests into different files. fixed some of its tests * chore: linted code * chore: deleted some comments * chore: linted * chore: Added Azure Provider Unit Tests (#176) * chore: added unit tests for azure provider * chore: added more unit tests and docstrings on azure, removed redundant comments * chore: added unit tests for generate client on Azure Provider * chore: separated azure unit tests into separate files. fixed some of its tests. * chore: linted code * chore: new line Signed-off-by: Diogo Goncalves --------- Signed-off-by: Diogo Goncalves Co-authored-by: Diogo Goncalves * [fix] bump prerelease version in pyproject.toml * chore: rename action * feat: added action to run tests on PR * chore: comments * fix: fix azure config tests * chore: style format * fix: tests workflow * Feature/prompt management (#200) * [feat] prompt management * [feat] testing * [feat] only one active prompt * [fix] bump prerelease version in pyproject.toml * [bugfix] return empty prompt * [fix] bump prerelease version in pyproject.toml * Update CONTRIBUTING.md Signed-off-by: Diogo Goncalves * Feat/ Use Openai Usage to calculate Cache and Reasoning Costs (#199) * feat: collects usage from stream and non stream openai calls * chore: refactored to provider to have a Metrics obj * feat: calculate_metrics now takes into account cached & reasoning tokens. Prices of openai models updated * fix: added caching tokens to model config obj * chore: added integration test for cache and reasoning * chore: added integration test for usage retrieval when max tokens reached * chore: uncommented runs from examples/core.py * fix: bugfix regarding usage on function calling. added a test for this * chore: merged with develop * chore: extracted provider data structures to another file * chore: renamed to private methods some within provider. splitted integration tests into 2 files * chore: deletion of a todo comment * chore: update poetry.lock * chore: specify python versions * chore: moving langchain integration tests to sdk * chore: format * feat: added support for o3-mini and updated o1-mini prices. also updated integration tests to support o3 (#202) * chore: removed duplicated code; removed duplicated integration tests * chore: updated github actions to run integration tests * chore: fixing github actions * chore: fixing github actions again * chore: fixing github actions again-x2 * chore: fixing github actions again-x2 * chore: added cache of dependencies to integration-tests in githubaction * chore: updated integration-tests action to inject github secrets into env * Feat/bedrock support for Nova models through the ConverseAPI (#207) * feat: added support for bedrock nova models * feat: tokens are now read from usage if available to ensure accuracy * chore: removed duplicated integration tests folder in wrong place * feat: refactored bedrock provider into being a single file instead of folder * chore: renamed bedrock to bedrock-converse in examples/core.py * chore: renamed bedrock in config.yaml * [fix] bump prerelease version in pyproject.toml * [fix] bump prerelease version in pyproject.toml * [fix] bump prerelease version in pyproject.toml * Update pyproject.toml updated llmstudio-tracker version Signed-off-by: Miguel Neves <61327611+MiNeves00@users.noreply.github.com> * [fix] bump prerelease version in pyproject.toml * chore: updated llmstudio sdk poetry.lock * Feat/converse support images (#211) * feat: added converse-api support for images in input. started making an integration test for this. * chore: added integration test for converse image sending * chore: send images integration test now also tests for openai * chore: integration test of send_imgs added async testing * chore: updated examples core.py to also have send images * feat: bedrock image input is now same contract as openai * chore: ChatCompletionLLMstudio print now hides large image bytes for readability * chore: fixes in the pretty print of ChatCompletionLLMstudio * chore: small fix in examples/core.py * fix: test_send_imgs had bug on reading env * chore: made clean_print optional on chatcompletions; image from url is directly converted to bytes * [fix] bump prerelease version in pyproject.toml * [fix] bump prerelease version in pyproject.toml * [fix] bump prerelease version in pyproject.toml * feat: adapt langchain integration * chore: update lock * chore: make format --------- Signed-off-by: Diogo Goncalves Signed-off-by: Miguel Neves <61327611+MiNeves00@users.noreply.github.com> Co-authored-by: Miguel Neves <61327611+MiNeves00@users.noreply.github.com> Co-authored-by: GitHub Actions Co-authored-by: brunoalho99 <132477278+brunoalho99@users.noreply.github.com> Co-authored-by: brunoalho Co-authored-by: Miguel Neves --- examples/core.py | 36 +- libs/llmstudio/llmstudio/langchain.py | 665 ++++++++++++++++-- libs/llmstudio/poetry.lock | 2 +- .../test_cache_and_reasoning_costs.py | 1 - 4 files changed, 635 insertions(+), 69 deletions(-) diff --git a/examples/core.py b/examples/core.py index 71fb081c..cecfdb5e 100644 --- a/examples/core.py +++ b/examples/core.py @@ -8,11 +8,13 @@ from dotenv import load_dotenv load_dotenv() -def run_provider(provider, model, api_key=None, **kwargs): +def run_provider(provider, model, api_key=None=None, **kwargs): + print(f"\n\n###RUNNING for <{provider}>, <{model}> ###") print(f"\n\n###RUNNING for <{provider}>, <{model}> ###") llm = LLMCore(provider=provider, api_key=api_key, **kwargs) - latencies = {} + latencies = {} + print("\nAsync Non-Stream") chat_request = build_chat_request(model, chat_input="Hello, my name is Jason", is_stream=False) string = """ @@ -49,14 +51,19 @@ def run_provider(provider, model, api_key=None, **kwargs): """ #chat_request = build_chat_request(model, chat_input=string, is_stream=False) + response_async = asyncio.run(llm.achat(**chat_request)) pprint(response_async) latencies["async (ms)"]= response_async.metrics["latency_s"]*1000 + print("\nAsync Stream") + + print("\nAsync Stream") async def async_stream(): chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True) + chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True) response_async = await llm.achat(**chat_request) async for p in response_async: @@ -71,6 +78,8 @@ async def async_stream(): asyncio.run(async_stream()) + print("\nSync Non-Stream") + chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False) print("\nSync Non-Stream") chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False) @@ -81,7 +90,6 @@ async def async_stream(): print("\nSync Stream") chat_request = build_chat_request(model, chat_input="Hello, my name is Mary", is_stream=True) - response_sync_stream = llm.chat(**chat_request) for p in response_sync_stream: @@ -96,6 +104,7 @@ async def async_stream(): return latencies def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens: int=1000): + if model.startswith(('o1', 'o3')): if model.startswith(('o1', 'o3')): chat_request = { "chat_input": chat_input, @@ -116,6 +125,16 @@ def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens: "maxTokens": max_tokens } } + elif 'amazon.nova' in model or 'anthropic.claude' in model: + chat_request = { + "chat_input": chat_input, + "model": model, + "is_stream": is_stream, + "retries": 0, + "parameters": { + "maxTokens": max_tokens + } + } else: chat_request = { "chat_input": chat_input, @@ -135,19 +154,13 @@ def multiple_provider_runs(provider:str, model:str, num_runs:int, api_key:str, * for _ in range(num_runs): latencies = run_provider(provider=provider, model=model, api_key=api_key, **kwargs) pprint(latencies) - - + def run_chat_all_providers(): # OpenAI multiple_provider_runs(provider="openai", model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"], num_runs=1) multiple_provider_runs(provider="openai", model="o3-mini", api_key=os.environ["OPENAI_API_KEY"], num_runs=1) #multiple_provider_runs(provider="openai", model="o1-preview", api_key=os.environ["OPENAI_API_KEY"], num_runs=1) - # Azure - multiple_provider_runs(provider="azure", model="gpt-4o-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"]) - #multiple_provider_runs(provider="azure", model="gpt-4o", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"]) - #multiple_provider_runs(provider="azure", model="o1-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"]) - #multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"]) # Azure multiple_provider_runs(provider="azure", model="gpt-4o-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"]) @@ -156,7 +169,6 @@ def run_chat_all_providers(): #multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"]) - #multiple_provider_runs(provider="anthropic", model="claude-3-opus-20240229", num_runs=1, api_key=os.environ["ANTHROPIC_API_KEY"]) #multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"]) @@ -214,4 +226,4 @@ def run_send_imgs(): # if p.metrics: # p.clean_print() -run_send_imgs() +run_send_imgs() \ No newline at end of file diff --git a/libs/llmstudio/llmstudio/langchain.py b/libs/llmstudio/llmstudio/langchain.py index da1820a0..19b99787 100644 --- a/libs/llmstudio/llmstudio/langchain.py +++ b/libs/llmstudio/llmstudio/langchain.py @@ -1,3 +1,10 @@ +from __future__ import annotations + +import json +import logging +import ssl +import warnings +from collections.abc import Iterator, Mapping, Sequence from typing import ( Any, Callable, @@ -5,25 +12,361 @@ List, Literal, Optional, - Sequence, - Tuple, Type, + TypedDict, + TypeVar, Union, + cast, ) -from langchain.schema.messages import BaseMessage -from langchain.schema.output import ChatGeneration, ChatResult -from langchain_community.adapters.openai import ( - convert_dict_to_message, - convert_message_to_dict, +import certifi +from langchain_core.language_models import LanguageModelInput +from langchain_core.language_models.chat_models import ( + BaseChatModel, + generate_from_stream, +) +from langchain_core.messages import ( + AIMessage, + AIMessageChunk, + BaseMessage, + BaseMessageChunk, + ChatMessage, + ChatMessageChunk, + FunctionMessage, + FunctionMessageChunk, + HumanMessage, + HumanMessageChunk, + InvalidToolCall, + SystemMessage, + SystemMessageChunk, + ToolCall, + ToolMessage, + ToolMessageChunk, +) +from langchain_core.messages.ai import ( + InputTokenDetails, + OutputTokenDetails, + UsageMetadata, +) +from langchain_core.messages.tool import tool_call_chunk +from langchain_core.output_parsers.openai_tools import ( + make_invalid_tool_call, + parse_tool_call, ) -from langchain_core.language_models.base import LanguageModelInput -from langchain_core.language_models.chat_models import BaseChatModel +from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult from langchain_core.runnables import Runnable from langchain_core.tools import BaseTool from langchain_core.utils.function_calling import convert_to_openai_tool +from pydantic import BaseModel + +logger = logging.getLogger(__name__) + +# This SSL context is equivelent to the default `verify=True`. +# https://www.python-httpx.org/advanced/ssl/#configuring-client-instances +global_ssl_context = ssl.create_default_context(cafile=certifi.where()) + + +def _convert_dict_to_message(_dict: Mapping[str, Any]) -> BaseMessage: + """Convert a dictionary to a LangChain message. + + Args: + _dict: The dictionary. + + Returns: + The LangChain message. + """ + role = _dict.get("role") + name = _dict.get("name") + id_ = _dict.get("id") + if role == "user": + return HumanMessage(content=_dict.get("content", ""), id=id_, name=name) + elif role == "assistant": + # Fix for azure + # Also OpenAI returns None for tool invocations + content = _dict.get("content", "") or "" + additional_kwargs: dict = {} + if function_call := _dict.get("function_call"): + additional_kwargs["function_call"] = dict(function_call) + tool_calls = [] + invalid_tool_calls = [] + if raw_tool_calls := _dict.get("tool_calls"): + additional_kwargs["tool_calls"] = raw_tool_calls + for raw_tool_call in raw_tool_calls: + try: + tool_calls.append(parse_tool_call(raw_tool_call, return_id=True)) + except Exception as e: + invalid_tool_calls.append( + make_invalid_tool_call(raw_tool_call, str(e)) + ) + if audio := _dict.get("audio"): + additional_kwargs["audio"] = audio + return AIMessage( + content=content, + additional_kwargs=additional_kwargs, + name=name, + id=id_, + tool_calls=tool_calls, + invalid_tool_calls=invalid_tool_calls, + ) + elif role in ("system", "developer"): + if role == "developer": + additional_kwargs = {"__openai_role__": role} + else: + additional_kwargs = {} + return SystemMessage( + content=_dict.get("content", ""), + name=name, + id=id_, + additional_kwargs=additional_kwargs, + ) + elif role == "function": + return FunctionMessage( + content=_dict.get("content", ""), name=cast(str, _dict.get("name")), id=id_ + ) + elif role == "tool": + additional_kwargs = {} + if "name" in _dict: + additional_kwargs["name"] = _dict["name"] + return ToolMessage( + content=_dict.get("content", ""), + tool_call_id=cast(str, _dict.get("tool_call_id")), + additional_kwargs=additional_kwargs, + name=name, + id=id_, + ) + else: + return ChatMessage(content=_dict.get("content", ""), role=role, id=id_) # type: ignore[arg-type] + + +def _format_message_content(content: Any) -> Any: + """Format message content.""" + if content and isinstance(content, list): + formatted_content = [] + for block in content: + # Remove unexpected block types + if ( + isinstance(block, dict) + and "type" in block + and block["type"] in ("tool_use", "thinking") + ): + continue + # Anthropic image blocks + elif ( + isinstance(block, dict) + and block.get("type") == "image" + and (source := block.get("source")) + and isinstance(source, dict) + ): + if source.get("type") == "base64" and ( + (media_type := source.get("media_type")) + and (data := source.get("data")) + ): + formatted_content.append( + { + "type": "image_url", + "image_url": {"url": f"data:{media_type};base64,{data}"}, + } + ) + elif source.get("type") == "url" and (url := source.get("url")): + formatted_content.append( + {"type": "image_url", "image_url": {"url": url}} + ) + else: + continue + else: + formatted_content.append(block) + else: + formatted_content = content + + return formatted_content + + +def _lc_tool_call_to_openai_tool_call(tool_call: ToolCall) -> dict: + return { + "type": "function", + "id": tool_call["id"], + "function": { + "name": tool_call["name"], + "arguments": json.dumps(tool_call["args"]), + }, + } + + +def _lc_invalid_tool_call_to_openai_tool_call( + invalid_tool_call: InvalidToolCall, +) -> dict: + return { + "type": "function", + "id": invalid_tool_call["id"], + "function": { + "name": invalid_tool_call["name"], + "arguments": invalid_tool_call["args"], + }, + } + + +def _convert_message_to_dict(message: BaseMessage) -> dict: + """Convert a LangChain message to a dictionary. + + Args: + message: The LangChain message. + + Returns: + The dictionary. + """ + message_dict: dict[str, Any] = {"content": _format_message_content(message.content)} + if (name := message.name or message.additional_kwargs.get("name")) is not None: + message_dict["name"] = name + + # populate role and additional message data + if isinstance(message, ChatMessage): + message_dict["role"] = message.role + elif isinstance(message, HumanMessage): + message_dict["role"] = "user" + elif isinstance(message, AIMessage): + message_dict["role"] = "assistant" + if "function_call" in message.additional_kwargs: + message_dict["function_call"] = message.additional_kwargs["function_call"] + if message.tool_calls or message.invalid_tool_calls: + message_dict["tool_calls"] = [ + _lc_tool_call_to_openai_tool_call(tc) for tc in message.tool_calls + ] + [ + _lc_invalid_tool_call_to_openai_tool_call(tc) + for tc in message.invalid_tool_calls + ] + elif "tool_calls" in message.additional_kwargs: + message_dict["tool_calls"] = message.additional_kwargs["tool_calls"] + tool_call_supported_props = {"id", "type", "function"} + message_dict["tool_calls"] = [ + {k: v for k, v in tool_call.items() if k in tool_call_supported_props} + for tool_call in message_dict["tool_calls"] + ] + else: + pass + # If tool calls present, content null value should be None not empty string. + if "function_call" in message_dict or "tool_calls" in message_dict: + message_dict["content"] = message_dict["content"] or None + + if "audio" in message.additional_kwargs: + # openai doesn't support passing the data back - only the id + # https://platform.openai.com/docs/guides/audio/multi-turn-conversations + raw_audio = message.additional_kwargs["audio"] + audio = ( + {"id": message.additional_kwargs["audio"]["id"]} + if "id" in raw_audio + else raw_audio + ) + message_dict["audio"] = audio + elif isinstance(message, SystemMessage): + message_dict["role"] = message.additional_kwargs.get( + "__openai_role__", "system" + ) + elif isinstance(message, FunctionMessage): + message_dict["role"] = "function" + elif isinstance(message, ToolMessage): + message_dict["role"] = "tool" + message_dict["tool_call_id"] = message.tool_call_id + + supported_props = {"content", "role", "tool_call_id"} + message_dict = {k: v for k, v in message_dict.items() if k in supported_props} + else: + raise TypeError(f"Got unknown type {message}") + return message_dict + + +def _convert_delta_to_message_chunk( + _dict: Mapping[str, Any], default_class: type[BaseMessageChunk] +) -> BaseMessageChunk: + id_ = _dict.get("id") + role = cast(str, _dict.get("role")) + content = cast(str, _dict.get("content") or "") + additional_kwargs: dict = {} + if _dict.get("function_call"): + function_call = dict(_dict["function_call"]) + if "name" in function_call and function_call["name"] is None: + function_call["name"] = "" + additional_kwargs["function_call"] = function_call + tool_call_chunks = [] + if raw_tool_calls := _dict.get("tool_calls"): + additional_kwargs["tool_calls"] = raw_tool_calls + try: + tool_call_chunks = [ + tool_call_chunk( + name=rtc["function"].get("name"), + args=rtc["function"].get("arguments"), + id=rtc.get("id"), + index=rtc["index"], + ) + for rtc in raw_tool_calls + ] + except KeyError: + pass + + if role == "user" or default_class == HumanMessageChunk: + return HumanMessageChunk(content=content, id=id_) + elif role == "assistant" or default_class == AIMessageChunk: + return AIMessageChunk( + content=content, + additional_kwargs=additional_kwargs, + id=id_, + tool_call_chunks=tool_call_chunks, # type: ignore[arg-type] + ) + elif role in ("system", "developer") or default_class == SystemMessageChunk: + if role == "developer": + additional_kwargs = {"__openai_role__": "developer"} + else: + additional_kwargs = {} + return SystemMessageChunk( + content=content, id=id_, additional_kwargs=additional_kwargs + ) + elif role == "function" or default_class == FunctionMessageChunk: + return FunctionMessageChunk(content=content, name=_dict["name"], id=id_) + elif role == "tool" or default_class == ToolMessageChunk: + return ToolMessageChunk( + content=content, tool_call_id=_dict["tool_call_id"], id=id_ + ) + elif role or default_class == ChatMessageChunk: + return ChatMessageChunk(content=content, role=role, id=id_) + else: + return default_class(content=content, id=id_) # type: ignore + + +def _update_token_usage( + overall_token_usage: Union[int, dict], new_usage: Union[int, dict] +) -> Union[int, dict]: + # Token usage is either ints or dictionaries + # `reasoning_tokens` is nested inside `completion_tokens_details` + if isinstance(new_usage, int): + if not isinstance(overall_token_usage, int): + raise ValueError( + f"Got different types for token usage: " + f"{type(new_usage)} and {type(overall_token_usage)}" + ) + return new_usage + overall_token_usage + elif isinstance(new_usage, dict): + if not isinstance(overall_token_usage, dict): + raise ValueError( + f"Got different types for token usage: " + f"{type(new_usage)} and {type(overall_token_usage)}" + ) + return { + k: _update_token_usage(overall_token_usage.get(k, 0), v) + for k, v in new_usage.items() + } + else: + warnings.warn(f"Unexpected type for token usage: {type(new_usage)}") + return new_usage + + +class _FunctionCall(TypedDict): + name: str + + +_BM = TypeVar("_BM", bound=BaseModel) +_DictOrPydanticClass = Union[dict[str, Any], type[_BM], type] +_DictOrPydantic = Union[dict, _BM] + from llmstudio.providers import LLM -from openai import BaseModel class ChatLLMstudio(BaseChatModel): @@ -40,7 +383,7 @@ def _llm_type(self): def _create_message_dicts( self, messages: List[BaseMessage], stop: Optional[List[str]] ) -> Tuple[List[Dict[str, Any]], Dict[str, Any]]: - message_dicts = [convert_message_to_dict(m) for m in messages] + message_dicts = [_convert_message_to_dict(m) for m in messages] return message_dicts def _create_chat_result(self, response: Any) -> ChatResult: @@ -48,7 +391,7 @@ def _create_chat_result(self, response: Any) -> ChatResult: if not isinstance(response, dict): response = response.model_dump() for res in response["choices"]: - message = convert_dict_to_message(res["message"]) + message = _convert_dict_to_message(res["message"]) generation_info = dict(finish_reason=res.get("finish_reason")) if "logprobs" in res: generation_info["logprobs"] = res["logprobs"] @@ -67,64 +410,99 @@ def _create_chat_result(self, response: Any) -> ChatResult: def bind_tools( self, - tools: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]], + tools: Sequence[Union[dict[str, Any], type, Callable, BaseTool]], *, tool_choice: Optional[ - Union[dict, str, Literal["auto", "any", "none"], bool] + Union[dict, str, Literal["auto", "none", "required", "any"], bool] ] = None, + strict: Optional[bool] = None, + parallel_tool_calls: Optional[bool] = None, **kwargs: Any, ) -> Runnable[LanguageModelInput, BaseMessage]: """Bind tool-like objects to this chat model. + Assumes model is compatible with OpenAI tool-calling API. + Args: tools: A list of tool definitions to bind to this chat model. - Can be a dictionary, pydantic model, callable, or BaseTool. Pydantic - models, callables, and BaseTools will be automatically converted to - their schema dictionary representation. - tool_choice: Which tool to require the model to call. - Must be the name of the single provided function, - "auto" to automatically determine which function to call - with the option to not call any function, "any" to enforce that some - function is called, or a dict of the form: - {"type": "function", "function": {"name": <>}}. - **kwargs: Any additional parameters to pass to the - :class:`~langchain.runnable.Runnable` constructor. - """ - formatted_tools = [convert_to_openai_tool(tool) for tool in tools] - if tool_choice is not None and tool_choice: - if isinstance(tool_choice, str) and ( - tool_choice not in ("auto", "any", "none") - ): - tool_choice = {"type": "function", "function": {"name": tool_choice}} - if isinstance(tool_choice, dict) and (len(formatted_tools) != 1): - raise ValueError( - "When specifying `tool_choice`, you must provide exactly one " - f"tool. Received {len(formatted_tools)} tools." - ) - if isinstance(tool_choice, dict) and ( - formatted_tools[0]["function"]["name"] - != tool_choice["function"]["name"] - ): + Supports any tool definition handled by + :meth:`langchain_core.utils.function_calling.convert_to_openai_tool`. + tool_choice: Which tool to require the model to call. Options are: + + - str of the form ``"<>"``: calls <> tool. + - ``"auto"``: automatically selects a tool (including no tool). + - ``"none"``: does not call a tool. + - ``"any"`` or ``"required"`` or ``True``: force at least one tool to be called. + - dict of the form ``{"type": "function", "function": {"name": <>}}``: calls <> tool. + - ``False`` or ``None``: no effect, default OpenAI behavior. + strict: If True, model output is guaranteed to exactly match the JSON Schema + provided in the tool definition. If True, the input schema will be + validated according to + https://platform.openai.com/docs/guides/structured-outputs/supported-schemas. + If False, input schema will not be validated and model output will not + be validated. + If None, ``strict`` argument will not be passed to the model. + parallel_tool_calls: Set to ``False`` to disable parallel tool use. + Defaults to ``None`` (no specification, which allows parallel tool use). + kwargs: Any additional parameters are passed directly to + :meth:`~langchain_openai.chat_models.base.ChatOpenAI.bind`. + + .. versionchanged:: 0.1.21 + + Support for ``strict`` argument added. + + """ # noqa: E501 + + if parallel_tool_calls is not None: + kwargs["parallel_tool_calls"] = parallel_tool_calls + formatted_tools = [ + convert_to_openai_tool(tool, strict=strict) for tool in tools + ] + tool_names = [] + for tool in formatted_tools: + if "function" in tool: + tool_names.append(tool["function"]["name"]) + elif "name" in tool: + tool_names.append(tool["name"]) + else: + pass + if tool_choice: + if isinstance(tool_choice, str): + # tool_choice is a tool/function name + if tool_choice in tool_names: + tool_choice = { + "type": "function", + "function": {"name": tool_choice}, + } + elif tool_choice in ( + "file_search", + "web_search_preview", + "computer_use_preview", + ): + tool_choice = {"type": tool_choice} + # 'any' is not natively supported by OpenAI API. + # We support 'any' since other models use this instead of 'required'. + elif tool_choice == "any": + tool_choice = "required" + else: + pass + elif isinstance(tool_choice, bool): + tool_choice = "required" + elif isinstance(tool_choice, dict): + pass + else: raise ValueError( - f"Tool choice {tool_choice} was specified, but the only " - f"provided tool was {formatted_tools[0]['function']['name']}." + f"Unrecognized tool_choice type. Expected str, bool or dict. " + f"Received: {tool_choice}" ) - if isinstance(tool_choice, bool): - if len(tools) > 1: - raise ValueError( - "tool_choice can only be True when there is one tool. Received " - f"{len(tools)} tools." - ) - tool_name = formatted_tools[0]["function"]["name"] - tool_choice = { - "type": "function", - "function": {"name": tool_name}, - } - kwargs["tool_choice"] = tool_choice return super().bind(tools=formatted_tools, **kwargs) def _generate(self, messages: List[BaseMessage], **kwargs) -> ChatResult: + if self.is_stream: + stream_iter = self._stream(messages, **kwargs) + return generate_from_stream(stream_iter) + messages_dicts = self._create_message_dicts(messages, []) response = self.llm.chat( messages_dicts, @@ -135,3 +513,180 @@ def _generate(self, messages: List[BaseMessage], **kwargs) -> ChatResult: **kwargs, ) return self._create_chat_result(response) + + def _stream( + self, messages: List[BaseMessage], **kwargs + ) -> Iterator[ChatGenerationChunk]: + self.is_stream = True + + messages_dicts = self._create_message_dicts(messages, []) + response = self.llm.chat( + messages_dicts, + model=kwargs.get("model", self.model), + is_stream=kwargs.get("is_stream", self.is_stream), + retries=kwargs.get("retries", self.retries), + parameters=kwargs.get("parameters", self.parameters), + **kwargs, + ) + + try: + for chunk in response: + if not isinstance(chunk, dict): + chunk = chunk.model_dump() + generation_chunk = _convert_chunk_to_generation_chunk( + chunk, + AIMessageChunk, + {}, + ) + if generation_chunk is None: + continue + # default_chunk_class = generation_chunk.message.__class__ + # logprobs = (generation_chunk.generation_info or {}).get("logprobs") + + yield generation_chunk + + except Exception as e: + raise Exception(e) + + +def _convert_chunk_to_generation_chunk( + chunk: dict, + default_chunk_class: Type, + base_generation_info: Optional[Dict], +) -> Optional[ChatGenerationChunk]: + if chunk.get("type") == "content.delta": # from beta.chat.completions.stream + return None + token_usage = chunk.get("usage") + choices = ( + chunk.get("choices", []) + # from beta.chat.completions.stream + or chunk.get("chunk", {}).get("choices", []) + ) + + usage_metadata: Optional[UsageMetadata] = ( + _create_usage_metadata(token_usage) if token_usage else None + ) + if len(choices) == 0: + # logprobs is implicitly None + generation_chunk = ChatGenerationChunk( + message=default_chunk_class(content="", usage_metadata=usage_metadata) + ) + return generation_chunk + + choice = choices[0] + if choice["delta"] is None: + return None + + message_chunk = _convert_delta_to_message_chunk( + choice["delta"], default_chunk_class + ) + generation_info = {**base_generation_info} if base_generation_info else {} + + if finish_reason := choice.get("finish_reason"): + generation_info["finish_reason"] = finish_reason + if model_name := chunk.get("model"): + generation_info["model_name"] = model_name + if system_fingerprint := chunk.get("system_fingerprint"): + generation_info["system_fingerprint"] = system_fingerprint + + logprobs = choice.get("logprobs") + if logprobs: + generation_info["logprobs"] = logprobs + + if usage_metadata and isinstance(message_chunk, AIMessageChunk): + message_chunk.usage_metadata = usage_metadata + + generation_chunk = ChatGenerationChunk( + message=message_chunk, generation_info=generation_info or None + ) + return generation_chunk + + +def _convert_delta_to_message_chunk( + _dict: Mapping[str, Any], default_class: Type[BaseMessageChunk] +) -> BaseMessageChunk: + id_ = _dict.get("id") + role = cast(str, _dict.get("role")) + content = cast(str, _dict.get("content") or "") + additional_kwargs: Dict = {} + if _dict.get("function_call"): + function_call = dict(_dict["function_call"]) + if "name" in function_call and function_call["name"] is None: + function_call["name"] = "" + additional_kwargs["function_call"] = function_call + tool_call_chunks = [] + if raw_tool_calls := _dict.get("tool_calls"): + additional_kwargs["tool_calls"] = raw_tool_calls + try: + tool_call_chunks = [ + tool_call_chunk( + name=rtc["function"].get("name"), + args=rtc["function"].get("arguments"), + id=rtc.get("id"), + index=rtc["index"], + ) + for rtc in raw_tool_calls + ] + except KeyError: + pass + + if role == "user" or default_class == HumanMessageChunk: + return HumanMessageChunk(content=content, id=id_) + elif role == "assistant" or default_class == AIMessageChunk: + return AIMessageChunk( + content=content, + additional_kwargs=additional_kwargs, + id=id_, + tool_call_chunks=tool_call_chunks, # type: ignore[arg-type] + ) + elif role in ("system", "developer") or default_class == SystemMessageChunk: + if role == "developer": + additional_kwargs = {"__openai_role__": "developer"} + else: + additional_kwargs = {} + return SystemMessageChunk( + content=content, id=id_, additional_kwargs=additional_kwargs + ) + elif role == "function" or default_class == FunctionMessageChunk: + return FunctionMessageChunk(content=content, name=_dict["name"], id=id_) + elif role == "tool" or default_class == ToolMessageChunk: + return ToolMessageChunk( + content=content, tool_call_id=_dict["tool_call_id"], id=id_ + ) + elif role or default_class == ChatMessageChunk: + return ChatMessageChunk(content=content, role=role, id=id_) + else: + return default_class(content=content, id=id_) # type: ignore + + +def _create_usage_metadata(oai_token_usage: dict) -> UsageMetadata: + input_tokens = oai_token_usage.get("prompt_tokens", 0) + output_tokens = oai_token_usage.get("completion_tokens", 0) + total_tokens = oai_token_usage.get("total_tokens", input_tokens + output_tokens) + input_token_details: dict = { + "audio": (oai_token_usage.get("prompt_tokens_details") or {}).get( + "audio_tokens" + ), + "cache_read": (oai_token_usage.get("prompt_tokens_details") or {}).get( + "cached_tokens" + ), + } + output_token_details: dict = { + "audio": (oai_token_usage.get("completion_tokens_details") or {}).get( + "audio_tokens" + ), + "reasoning": (oai_token_usage.get("completion_tokens_details") or {}).get( + "reasoning_tokens" + ), + } + return UsageMetadata( + input_tokens=input_tokens, + output_tokens=output_tokens, + total_tokens=total_tokens, + input_token_details=InputTokenDetails( + **{k: v for k, v in input_token_details.items() if v is not None} + ), + output_token_details=OutputTokenDetails( + **{k: v for k, v in output_token_details.items() if v is not None} + ), + ) diff --git a/libs/llmstudio/poetry.lock b/libs/llmstudio/poetry.lock index b81756b3..6e068fd6 100644 --- a/libs/llmstudio/poetry.lock +++ b/libs/llmstudio/poetry.lock @@ -1888,7 +1888,7 @@ pytest = ["pytest (>=7.0.0)", "rich (>=13.9.4,<14.0.0)"] [[package]] name = "llmstudio-core" -version = "1.0.4a1" +version = "1.0.4" description = "LLMStudio core capabilities for routing llm calls for any vendor. No proxy server required. For that use llmstudio[proxy]" optional = false python-versions = "^3.9" diff --git a/libs/llmstudio/tests/integration_tests/test_cache_and_reasoning_costs.py b/libs/llmstudio/tests/integration_tests/test_cache_and_reasoning_costs.py index 5b02ec7f..46ff3e99 100644 --- a/libs/llmstudio/tests/integration_tests/test_cache_and_reasoning_costs.py +++ b/libs/llmstudio/tests/integration_tests/test_cache_and_reasoning_costs.py @@ -168,7 +168,6 @@ def test_metrics_reasoning(provider_model, metrics): + current_metrics["reasoning_tokens"] == current_metrics["total_tokens"] ), "Total tokens mismatch" - print(f"All Reasoning Tests Passed for {provider} - {model}") From 82c4f9e861c06fe32150ddc3cb913f0d56d271c3 Mon Sep 17 00:00:00 2001 From: GitHub Actions Date: Tue, 22 Apr 2025 16:04:09 +0000 Subject: [PATCH 2/7] [fix] bump prerelease version in pyproject.toml --- libs/llmstudio/pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libs/llmstudio/pyproject.toml b/libs/llmstudio/pyproject.toml index b0698373..60afc8b7 100644 --- a/libs/llmstudio/pyproject.toml +++ b/libs/llmstudio/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "llmstudio" -version = "1.0.5" +version = "1.0.6a0" description = "Prompt Perfection at Your Fingertips" authors = ["Cláudio Lemos "] license = "MIT" From 7fd58a3b44b128faa7e8c8be32b04e9be9647c75 Mon Sep 17 00:00:00 2001 From: GitHub Actions Date: Tue, 22 Apr 2025 16:07:48 +0000 Subject: [PATCH 3/7] [fix] bump prerelease version in pyproject.toml --- libs/llmstudio/pyproject.toml | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/libs/llmstudio/pyproject.toml b/libs/llmstudio/pyproject.toml index 60afc8b7..1f1449fe 100644 --- a/libs/llmstudio/pyproject.toml +++ b/libs/llmstudio/pyproject.toml @@ -1,6 +1,6 @@ [tool.poetry] name = "llmstudio" -version = "1.0.6a0" +version = "1.0.6a1" description = "Prompt Perfection at Your Fingertips" authors = ["Cláudio Lemos "] license = "MIT" From 71c050f2a1366b280ebc254f3621f4cc8897e5fb Mon Sep 17 00:00:00 2001 From: Diogo Goncalves Date: Wed, 23 Apr 2025 18:11:54 +0100 Subject: [PATCH 4/7] Update core.py Signed-off-by: Diogo Goncalves --- examples/core.py | 15 +-------------- 1 file changed, 1 insertion(+), 14 deletions(-) diff --git a/examples/core.py b/examples/core.py index cecfdb5e..8ef8adba 100644 --- a/examples/core.py +++ b/examples/core.py @@ -78,8 +78,6 @@ async def async_stream(): asyncio.run(async_stream()) - print("\nSync Non-Stream") - chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False) print("\nSync Non-Stream") chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False) @@ -104,7 +102,6 @@ async def async_stream(): return latencies def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens: int=1000): - if model.startswith(('o1', 'o3')): if model.startswith(('o1', 'o3')): chat_request = { "chat_input": chat_input, @@ -125,16 +122,6 @@ def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens: "maxTokens": max_tokens } } - elif 'amazon.nova' in model or 'anthropic.claude' in model: - chat_request = { - "chat_input": chat_input, - "model": model, - "is_stream": is_stream, - "retries": 0, - "parameters": { - "maxTokens": max_tokens - } - } else: chat_request = { "chat_input": chat_input, @@ -226,4 +213,4 @@ def run_send_imgs(): # if p.metrics: # p.clean_print() -run_send_imgs() \ No newline at end of file +run_send_imgs() From ce0461ce168ded87a515b7348c1921dfc4fd8947 Mon Sep 17 00:00:00 2001 From: Diogo Goncalves Date: Wed, 23 Apr 2025 18:12:52 +0100 Subject: [PATCH 5/7] Update core.py Signed-off-by: Diogo Goncalves --- examples/core.py | 7 +------ 1 file changed, 1 insertion(+), 6 deletions(-) diff --git a/examples/core.py b/examples/core.py index 8ef8adba..6becbf84 100644 --- a/examples/core.py +++ b/examples/core.py @@ -9,7 +9,6 @@ load_dotenv() def run_provider(provider, model, api_key=None=None, **kwargs): - print(f"\n\n###RUNNING for <{provider}>, <{model}> ###") print(f"\n\n###RUNNING for <{provider}>, <{model}> ###") llm = LLMCore(provider=provider, api_key=api_key, **kwargs) @@ -54,11 +53,7 @@ def run_provider(provider, model, api_key=None=None, **kwargs): response_async = asyncio.run(llm.achat(**chat_request)) pprint(response_async) - latencies["async (ms)"]= response_async.metrics["latency_s"]*1000 - - - print("\nAsync Stream") - + latencies["async (ms)"]= response_async.metrics["latency_s"]*1000 print("\nAsync Stream") async def async_stream(): From 54c2da4b358dfac553d521c93e698129cd2688ab Mon Sep 17 00:00:00 2001 From: Diogo Goncalves Date: Wed, 23 Apr 2025 18:14:18 +0100 Subject: [PATCH 6/7] Update core.py Signed-off-by: Diogo Goncalves --- examples/core.py | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/examples/core.py b/examples/core.py index 6becbf84..eb9d5176 100644 --- a/examples/core.py +++ b/examples/core.py @@ -8,12 +8,11 @@ from dotenv import load_dotenv load_dotenv() -def run_provider(provider, model, api_key=None=None, **kwargs): +def run_provider(provider, model, api_key=None, **kwargs): print(f"\n\n###RUNNING for <{provider}>, <{model}> ###") llm = LLMCore(provider=provider, api_key=api_key, **kwargs) latencies = {} - print("\nAsync Non-Stream") chat_request = build_chat_request(model, chat_input="Hello, my name is Jason", is_stream=False) string = """ @@ -58,7 +57,6 @@ def run_provider(provider, model, api_key=None=None, **kwargs): print("\nAsync Stream") async def async_stream(): chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True) - chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True) response_async = await llm.achat(**chat_request) async for p in response_async: From 02c06f5921cbb3ed7e8384654ddb42681fbc945f Mon Sep 17 00:00:00 2001 From: Diogo Goncalves Date: Wed, 23 Apr 2025 18:16:52 +0100 Subject: [PATCH 7/7] Update core.py Signed-off-by: Diogo Goncalves --- examples/core.py | 1 - 1 file changed, 1 deletion(-) diff --git a/examples/core.py b/examples/core.py index eb9d5176..a9ce769f 100644 --- a/examples/core.py +++ b/examples/core.py @@ -49,7 +49,6 @@ def run_provider(provider, model, api_key=None, **kwargs): """ #chat_request = build_chat_request(model, chat_input=string, is_stream=False) - response_async = asyncio.run(llm.achat(**chat_request)) pprint(response_async) latencies["async (ms)"]= response_async.metrics["latency_s"]*1000