From 5376b2303a584554e8136a631a38884440ff0be6 Mon Sep 17 00:00:00 2001
From: Diogo Goncalves <diogoncalves@users.noreply.github.com>
Date: Tue, 22 Apr 2025 17:00:33 +0100
Subject: [PATCH 1/7] Feature/langraph integration (#215)

* chore: Provider Unit Tests (#173)

* chore: added unit tests for core provider. small bugfix on calculate_metrics of provider

* added unit tests and docstring for join chunks

* added unit tests and docstrings for calculate_cost on provider

* added unit tests and docstrings for input_to_string on provider

* added unit tests and docstrings for chat and achat

* added unit tests and docstrings for chat and achat

* chore: cleaned provider unit tests

* chore: separated provider tests into different files. fixed some of its tests

* chore: linted code

* chore: deleted some comments

* chore: linted

* chore: Added Azure Provider Unit Tests (#176)

* chore: added unit tests for azure provider

* chore: added more unit tests and docstrings on azure, removed redundant comments

* chore: added unit tests for generate client on Azure Provider

* chore: separated azure unit tests into separate files. fixed some of its tests.

* chore: linted code

* chore: new line

Signed-off-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>

---------

Signed-off-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>
Co-authored-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>

* [fix] bump prerelease version in pyproject.toml

* chore: rename action

* feat: added action to run tests on PR

* chore: comments

* fix: fix azure config tests

* chore: style format

* fix: tests workflow

* Feature/prompt management (#200)

* [feat] prompt management

* [feat] testing

* [feat] only one active prompt

* [fix] bump prerelease version in pyproject.toml

* [bugfix] return empty prompt

* [fix] bump prerelease version in pyproject.toml

* Update CONTRIBUTING.md

Signed-off-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>

* Feat/ Use Openai Usage to calculate Cache and Reasoning Costs (#199)

* feat: collects usage from stream and non stream openai calls

* chore: refactored to provider to have a Metrics obj

* feat: calculate_metrics now takes into account cached & reasoning tokens. Prices of openai models updated

* fix: added caching tokens to model config obj

* chore: added integration test for cache and reasoning

* chore: added integration test for usage retrieval when max tokens reached

* chore: uncommented runs from examples/core.py

* fix: bugfix regarding usage on function calling. added a test for this

* chore: merged with develop

* chore: extracted provider data structures to another file

* chore: renamed to private methods some within provider. splitted integration tests into 2 files

* chore: deletion of a todo comment

* chore: update poetry.lock

* chore: specify python versions

* chore: moving langchain integration tests to sdk

* chore: format

* feat: added support for o3-mini and updated o1-mini prices. also updated integration tests to support o3 (#202)

* chore: removed duplicated code; removed duplicated integration tests

* chore: updated github actions to run integration tests

* chore: fixing github actions

* chore: fixing github actions again

* chore: fixing github actions again-x2

* chore: fixing github actions again-x2

* chore: added cache of dependencies to integration-tests in githubaction

* chore: updated integration-tests action to inject github secrets into env

* Feat/bedrock support for Nova models through the ConverseAPI (#207)

* feat: added support for bedrock nova models

* feat: tokens are now read from usage if available to ensure accuracy

* chore: removed duplicated integration tests folder in wrong place

* feat: refactored bedrock provider into being a single file instead of folder

* chore: renamed bedrock to bedrock-converse in examples/core.py

* chore: renamed bedrock in config.yaml

* [fix] bump prerelease version in pyproject.toml

* [fix] bump prerelease version in pyproject.toml

* [fix] bump prerelease version in pyproject.toml

* Update pyproject.toml

updated llmstudio-tracker version

Signed-off-by: Miguel Neves <61327611+MiNeves00@users.noreply.github.com>

* [fix] bump prerelease version in pyproject.toml

* chore: updated llmstudio sdk poetry.lock

* Feat/converse support images (#211)

* feat: added converse-api support for images in input. started making an integration test for this.

* chore: added integration test for converse image sending

* chore: send images integration test now also tests for openai

* chore: integration test of send_imgs added async testing

* chore: updated examples core.py to also have send images

* feat: bedrock image input is now same contract as openai

* chore: ChatCompletionLLMstudio print now hides large image bytes for readability

* chore: fixes in the pretty print of ChatCompletionLLMstudio

* chore: small fix in examples/core.py

* fix: test_send_imgs had bug on reading env

* chore: made clean_print optional on chatcompletions; image from url is directly converted to bytes

* [fix] bump prerelease version in pyproject.toml

* [fix] bump prerelease version in pyproject.toml

* [fix] bump prerelease version in pyproject.toml

* feat: adapt langchain integration

* chore: update lock

* chore: make format

---------

Signed-off-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>
Signed-off-by: Miguel Neves <61327611+MiNeves00@users.noreply.github.com>
Co-authored-by: Miguel Neves <61327611+MiNeves00@users.noreply.github.com>
Co-authored-by: GitHub Actions <actions@github.com>
Co-authored-by: brunoalho99 <132477278+brunoalho99@users.noreply.github.com>
Co-authored-by: brunoalho <bruno.alho@tensorops.ai>
Co-authored-by: Miguel Neves <miguel.neves.filipe@gmail.com>
---
 examples/core.py                              |  36 +-
 libs/llmstudio/llmstudio/langchain.py         | 665 ++++++++++++++++--
 libs/llmstudio/poetry.lock                    |   2 +-
 .../test_cache_and_reasoning_costs.py         |   1 -
 4 files changed, 635 insertions(+), 69 deletions(-)

diff --git a/examples/core.py b/examples/core.py
index 71fb081c..cecfdb5e 100644
--- a/examples/core.py
+++ b/examples/core.py
@@ -8,11 +8,13 @@
 from dotenv import load_dotenv
 load_dotenv()
 
-def run_provider(provider, model, api_key=None, **kwargs):
+def run_provider(provider, model, api_key=None=None, **kwargs):
+    print(f"\n\n###RUNNING for <{provider}>, <{model}> ###")
     print(f"\n\n###RUNNING for <{provider}>, <{model}> ###")
     llm = LLMCore(provider=provider, api_key=api_key, **kwargs)
 
-    latencies = {}    
+    latencies = {}
+    
     print("\nAsync Non-Stream")
     chat_request = build_chat_request(model, chat_input="Hello, my name is Jason", is_stream=False)
     string = """
@@ -49,14 +51,19 @@ def run_provider(provider, model, api_key=None, **kwargs):
     """
     #chat_request = build_chat_request(model, chat_input=string, is_stream=False)
     
+    
     response_async = asyncio.run(llm.achat(**chat_request))
     pprint(response_async)
     latencies["async (ms)"]= response_async.metrics["latency_s"]*1000
     
     
+    print("\nAsync Stream")
+    
+    
     print("\nAsync Stream")
     async def async_stream():
         chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True)
+        chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True)
         
         response_async = await llm.achat(**chat_request)
         async for p in response_async:
@@ -71,6 +78,8 @@ async def async_stream():
     asyncio.run(async_stream())
     
     
+    print("\nSync Non-Stream")
+    chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False)
     print("\nSync Non-Stream")
     chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False)
     
@@ -81,7 +90,6 @@ async def async_stream():
 
     print("\nSync Stream")
     chat_request = build_chat_request(model, chat_input="Hello, my name is Mary", is_stream=True)
-
     
     response_sync_stream = llm.chat(**chat_request)
     for p in response_sync_stream:
@@ -96,6 +104,7 @@ async def async_stream():
     return latencies
 
 def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens: int=1000):
+    if model.startswith(('o1', 'o3')):
     if model.startswith(('o1', 'o3')):
         chat_request = {
             "chat_input": chat_input,
@@ -116,6 +125,16 @@ def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens:
                 "maxTokens": max_tokens
             }
         }
+    elif 'amazon.nova' in model or 'anthropic.claude' in model:
+        chat_request = {
+            "chat_input": chat_input,
+            "model": model,
+            "is_stream": is_stream,
+            "retries": 0,
+            "parameters": {
+                "maxTokens": max_tokens
+            }
+        }
     else:
         chat_request = {
             "chat_input": chat_input,
@@ -135,19 +154,13 @@ def multiple_provider_runs(provider:str, model:str, num_runs:int, api_key:str, *
     for _ in range(num_runs):
         latencies = run_provider(provider=provider, model=model, api_key=api_key, **kwargs)
         pprint(latencies)
-
-        
+    
 def run_chat_all_providers():    
     # OpenAI
     multiple_provider_runs(provider="openai", model="gpt-4o-mini", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)
     multiple_provider_runs(provider="openai", model="o3-mini", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)
     #multiple_provider_runs(provider="openai", model="o1-preview", api_key=os.environ["OPENAI_API_KEY"], num_runs=1)
 
-    # Azure
-    multiple_provider_runs(provider="azure", model="gpt-4o-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
-    #multiple_provider_runs(provider="azure", model="gpt-4o", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
-    #multiple_provider_runs(provider="azure", model="o1-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
-    #multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
 
     # Azure
     multiple_provider_runs(provider="azure", model="gpt-4o-mini", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
@@ -156,7 +169,6 @@ def run_chat_all_providers():
     #multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
 
 
-
     #multiple_provider_runs(provider="anthropic", model="claude-3-opus-20240229", num_runs=1, api_key=os.environ["ANTHROPIC_API_KEY"])
 
     #multiple_provider_runs(provider="azure", model="o1-preview", num_runs=1, api_key=os.environ["AZURE_API_KEY"], api_version=os.environ["AZURE_API_VERSION"], api_endpoint=os.environ["AZURE_API_ENDPOINT"])
@@ -214,4 +226,4 @@ def run_send_imgs():
     #    if p.metrics:
     #        p.clean_print()
     
-run_send_imgs()
+run_send_imgs()
\ No newline at end of file
diff --git a/libs/llmstudio/llmstudio/langchain.py b/libs/llmstudio/llmstudio/langchain.py
index da1820a0..19b99787 100644
--- a/libs/llmstudio/llmstudio/langchain.py
+++ b/libs/llmstudio/llmstudio/langchain.py
@@ -1,3 +1,10 @@
+from __future__ import annotations
+
+import json
+import logging
+import ssl
+import warnings
+from collections.abc import Iterator, Mapping, Sequence
 from typing import (
     Any,
     Callable,
@@ -5,25 +12,361 @@
     List,
     Literal,
     Optional,
-    Sequence,
-    Tuple,
     Type,
+    TypedDict,
+    TypeVar,
     Union,
+    cast,
 )
 
-from langchain.schema.messages import BaseMessage
-from langchain.schema.output import ChatGeneration, ChatResult
-from langchain_community.adapters.openai import (
-    convert_dict_to_message,
-    convert_message_to_dict,
+import certifi
+from langchain_core.language_models import LanguageModelInput
+from langchain_core.language_models.chat_models import (
+    BaseChatModel,
+    generate_from_stream,
+)
+from langchain_core.messages import (
+    AIMessage,
+    AIMessageChunk,
+    BaseMessage,
+    BaseMessageChunk,
+    ChatMessage,
+    ChatMessageChunk,
+    FunctionMessage,
+    FunctionMessageChunk,
+    HumanMessage,
+    HumanMessageChunk,
+    InvalidToolCall,
+    SystemMessage,
+    SystemMessageChunk,
+    ToolCall,
+    ToolMessage,
+    ToolMessageChunk,
+)
+from langchain_core.messages.ai import (
+    InputTokenDetails,
+    OutputTokenDetails,
+    UsageMetadata,
+)
+from langchain_core.messages.tool import tool_call_chunk
+from langchain_core.output_parsers.openai_tools import (
+    make_invalid_tool_call,
+    parse_tool_call,
 )
-from langchain_core.language_models.base import LanguageModelInput
-from langchain_core.language_models.chat_models import BaseChatModel
+from langchain_core.outputs import ChatGeneration, ChatGenerationChunk, ChatResult
 from langchain_core.runnables import Runnable
 from langchain_core.tools import BaseTool
 from langchain_core.utils.function_calling import convert_to_openai_tool
+from pydantic import BaseModel
+
+logger = logging.getLogger(__name__)
+
+# This SSL context is equivelent to the default `verify=True`.
+# https://www.python-httpx.org/advanced/ssl/#configuring-client-instances
+global_ssl_context = ssl.create_default_context(cafile=certifi.where())
+
+
+def _convert_dict_to_message(_dict: Mapping[str, Any]) -> BaseMessage:
+    """Convert a dictionary to a LangChain message.
+
+    Args:
+        _dict: The dictionary.
+
+    Returns:
+        The LangChain message.
+    """
+    role = _dict.get("role")
+    name = _dict.get("name")
+    id_ = _dict.get("id")
+    if role == "user":
+        return HumanMessage(content=_dict.get("content", ""), id=id_, name=name)
+    elif role == "assistant":
+        # Fix for azure
+        # Also OpenAI returns None for tool invocations
+        content = _dict.get("content", "") or ""
+        additional_kwargs: dict = {}
+        if function_call := _dict.get("function_call"):
+            additional_kwargs["function_call"] = dict(function_call)
+        tool_calls = []
+        invalid_tool_calls = []
+        if raw_tool_calls := _dict.get("tool_calls"):
+            additional_kwargs["tool_calls"] = raw_tool_calls
+            for raw_tool_call in raw_tool_calls:
+                try:
+                    tool_calls.append(parse_tool_call(raw_tool_call, return_id=True))
+                except Exception as e:
+                    invalid_tool_calls.append(
+                        make_invalid_tool_call(raw_tool_call, str(e))
+                    )
+        if audio := _dict.get("audio"):
+            additional_kwargs["audio"] = audio
+        return AIMessage(
+            content=content,
+            additional_kwargs=additional_kwargs,
+            name=name,
+            id=id_,
+            tool_calls=tool_calls,
+            invalid_tool_calls=invalid_tool_calls,
+        )
+    elif role in ("system", "developer"):
+        if role == "developer":
+            additional_kwargs = {"__openai_role__": role}
+        else:
+            additional_kwargs = {}
+        return SystemMessage(
+            content=_dict.get("content", ""),
+            name=name,
+            id=id_,
+            additional_kwargs=additional_kwargs,
+        )
+    elif role == "function":
+        return FunctionMessage(
+            content=_dict.get("content", ""), name=cast(str, _dict.get("name")), id=id_
+        )
+    elif role == "tool":
+        additional_kwargs = {}
+        if "name" in _dict:
+            additional_kwargs["name"] = _dict["name"]
+        return ToolMessage(
+            content=_dict.get("content", ""),
+            tool_call_id=cast(str, _dict.get("tool_call_id")),
+            additional_kwargs=additional_kwargs,
+            name=name,
+            id=id_,
+        )
+    else:
+        return ChatMessage(content=_dict.get("content", ""), role=role, id=id_)  # type: ignore[arg-type]
+
+
+def _format_message_content(content: Any) -> Any:
+    """Format message content."""
+    if content and isinstance(content, list):
+        formatted_content = []
+        for block in content:
+            # Remove unexpected block types
+            if (
+                isinstance(block, dict)
+                and "type" in block
+                and block["type"] in ("tool_use", "thinking")
+            ):
+                continue
+            # Anthropic image blocks
+            elif (
+                isinstance(block, dict)
+                and block.get("type") == "image"
+                and (source := block.get("source"))
+                and isinstance(source, dict)
+            ):
+                if source.get("type") == "base64" and (
+                    (media_type := source.get("media_type"))
+                    and (data := source.get("data"))
+                ):
+                    formatted_content.append(
+                        {
+                            "type": "image_url",
+                            "image_url": {"url": f"data:{media_type};base64,{data}"},
+                        }
+                    )
+                elif source.get("type") == "url" and (url := source.get("url")):
+                    formatted_content.append(
+                        {"type": "image_url", "image_url": {"url": url}}
+                    )
+                else:
+                    continue
+            else:
+                formatted_content.append(block)
+    else:
+        formatted_content = content
+
+    return formatted_content
+
+
+def _lc_tool_call_to_openai_tool_call(tool_call: ToolCall) -> dict:
+    return {
+        "type": "function",
+        "id": tool_call["id"],
+        "function": {
+            "name": tool_call["name"],
+            "arguments": json.dumps(tool_call["args"]),
+        },
+    }
+
+
+def _lc_invalid_tool_call_to_openai_tool_call(
+    invalid_tool_call: InvalidToolCall,
+) -> dict:
+    return {
+        "type": "function",
+        "id": invalid_tool_call["id"],
+        "function": {
+            "name": invalid_tool_call["name"],
+            "arguments": invalid_tool_call["args"],
+        },
+    }
+
+
+def _convert_message_to_dict(message: BaseMessage) -> dict:
+    """Convert a LangChain message to a dictionary.
+
+    Args:
+        message: The LangChain message.
+
+    Returns:
+        The dictionary.
+    """
+    message_dict: dict[str, Any] = {"content": _format_message_content(message.content)}
+    if (name := message.name or message.additional_kwargs.get("name")) is not None:
+        message_dict["name"] = name
+
+    # populate role and additional message data
+    if isinstance(message, ChatMessage):
+        message_dict["role"] = message.role
+    elif isinstance(message, HumanMessage):
+        message_dict["role"] = "user"
+    elif isinstance(message, AIMessage):
+        message_dict["role"] = "assistant"
+        if "function_call" in message.additional_kwargs:
+            message_dict["function_call"] = message.additional_kwargs["function_call"]
+        if message.tool_calls or message.invalid_tool_calls:
+            message_dict["tool_calls"] = [
+                _lc_tool_call_to_openai_tool_call(tc) for tc in message.tool_calls
+            ] + [
+                _lc_invalid_tool_call_to_openai_tool_call(tc)
+                for tc in message.invalid_tool_calls
+            ]
+        elif "tool_calls" in message.additional_kwargs:
+            message_dict["tool_calls"] = message.additional_kwargs["tool_calls"]
+            tool_call_supported_props = {"id", "type", "function"}
+            message_dict["tool_calls"] = [
+                {k: v for k, v in tool_call.items() if k in tool_call_supported_props}
+                for tool_call in message_dict["tool_calls"]
+            ]
+        else:
+            pass
+        # If tool calls present, content null value should be None not empty string.
+        if "function_call" in message_dict or "tool_calls" in message_dict:
+            message_dict["content"] = message_dict["content"] or None
+
+        if "audio" in message.additional_kwargs:
+            # openai doesn't support passing the data back - only the id
+            # https://platform.openai.com/docs/guides/audio/multi-turn-conversations
+            raw_audio = message.additional_kwargs["audio"]
+            audio = (
+                {"id": message.additional_kwargs["audio"]["id"]}
+                if "id" in raw_audio
+                else raw_audio
+            )
+            message_dict["audio"] = audio
+    elif isinstance(message, SystemMessage):
+        message_dict["role"] = message.additional_kwargs.get(
+            "__openai_role__", "system"
+        )
+    elif isinstance(message, FunctionMessage):
+        message_dict["role"] = "function"
+    elif isinstance(message, ToolMessage):
+        message_dict["role"] = "tool"
+        message_dict["tool_call_id"] = message.tool_call_id
+
+        supported_props = {"content", "role", "tool_call_id"}
+        message_dict = {k: v for k, v in message_dict.items() if k in supported_props}
+    else:
+        raise TypeError(f"Got unknown type {message}")
+    return message_dict
+
+
+def _convert_delta_to_message_chunk(
+    _dict: Mapping[str, Any], default_class: type[BaseMessageChunk]
+) -> BaseMessageChunk:
+    id_ = _dict.get("id")
+    role = cast(str, _dict.get("role"))
+    content = cast(str, _dict.get("content") or "")
+    additional_kwargs: dict = {}
+    if _dict.get("function_call"):
+        function_call = dict(_dict["function_call"])
+        if "name" in function_call and function_call["name"] is None:
+            function_call["name"] = ""
+        additional_kwargs["function_call"] = function_call
+    tool_call_chunks = []
+    if raw_tool_calls := _dict.get("tool_calls"):
+        additional_kwargs["tool_calls"] = raw_tool_calls
+        try:
+            tool_call_chunks = [
+                tool_call_chunk(
+                    name=rtc["function"].get("name"),
+                    args=rtc["function"].get("arguments"),
+                    id=rtc.get("id"),
+                    index=rtc["index"],
+                )
+                for rtc in raw_tool_calls
+            ]
+        except KeyError:
+            pass
+
+    if role == "user" or default_class == HumanMessageChunk:
+        return HumanMessageChunk(content=content, id=id_)
+    elif role == "assistant" or default_class == AIMessageChunk:
+        return AIMessageChunk(
+            content=content,
+            additional_kwargs=additional_kwargs,
+            id=id_,
+            tool_call_chunks=tool_call_chunks,  # type: ignore[arg-type]
+        )
+    elif role in ("system", "developer") or default_class == SystemMessageChunk:
+        if role == "developer":
+            additional_kwargs = {"__openai_role__": "developer"}
+        else:
+            additional_kwargs = {}
+        return SystemMessageChunk(
+            content=content, id=id_, additional_kwargs=additional_kwargs
+        )
+    elif role == "function" or default_class == FunctionMessageChunk:
+        return FunctionMessageChunk(content=content, name=_dict["name"], id=id_)
+    elif role == "tool" or default_class == ToolMessageChunk:
+        return ToolMessageChunk(
+            content=content, tool_call_id=_dict["tool_call_id"], id=id_
+        )
+    elif role or default_class == ChatMessageChunk:
+        return ChatMessageChunk(content=content, role=role, id=id_)
+    else:
+        return default_class(content=content, id=id_)  # type: ignore
+
+
+def _update_token_usage(
+    overall_token_usage: Union[int, dict], new_usage: Union[int, dict]
+) -> Union[int, dict]:
+    # Token usage is either ints or dictionaries
+    # `reasoning_tokens` is nested inside `completion_tokens_details`
+    if isinstance(new_usage, int):
+        if not isinstance(overall_token_usage, int):
+            raise ValueError(
+                f"Got different types for token usage: "
+                f"{type(new_usage)} and {type(overall_token_usage)}"
+            )
+        return new_usage + overall_token_usage
+    elif isinstance(new_usage, dict):
+        if not isinstance(overall_token_usage, dict):
+            raise ValueError(
+                f"Got different types for token usage: "
+                f"{type(new_usage)} and {type(overall_token_usage)}"
+            )
+        return {
+            k: _update_token_usage(overall_token_usage.get(k, 0), v)
+            for k, v in new_usage.items()
+        }
+    else:
+        warnings.warn(f"Unexpected type for token usage: {type(new_usage)}")
+        return new_usage
+
+
+class _FunctionCall(TypedDict):
+    name: str
+
+
+_BM = TypeVar("_BM", bound=BaseModel)
+_DictOrPydanticClass = Union[dict[str, Any], type[_BM], type]
+_DictOrPydantic = Union[dict, _BM]
+
 from llmstudio.providers import LLM
-from openai import BaseModel
 
 
 class ChatLLMstudio(BaseChatModel):
@@ -40,7 +383,7 @@ def _llm_type(self):
     def _create_message_dicts(
         self, messages: List[BaseMessage], stop: Optional[List[str]]
     ) -> Tuple[List[Dict[str, Any]], Dict[str, Any]]:
-        message_dicts = [convert_message_to_dict(m) for m in messages]
+        message_dicts = [_convert_message_to_dict(m) for m in messages]
         return message_dicts
 
     def _create_chat_result(self, response: Any) -> ChatResult:
@@ -48,7 +391,7 @@ def _create_chat_result(self, response: Any) -> ChatResult:
         if not isinstance(response, dict):
             response = response.model_dump()
         for res in response["choices"]:
-            message = convert_dict_to_message(res["message"])
+            message = _convert_dict_to_message(res["message"])
             generation_info = dict(finish_reason=res.get("finish_reason"))
             if "logprobs" in res:
                 generation_info["logprobs"] = res["logprobs"]
@@ -67,64 +410,99 @@ def _create_chat_result(self, response: Any) -> ChatResult:
 
     def bind_tools(
         self,
-        tools: Sequence[Union[Dict[str, Any], Type[BaseModel], Callable, BaseTool]],
+        tools: Sequence[Union[dict[str, Any], type, Callable, BaseTool]],
         *,
         tool_choice: Optional[
-            Union[dict, str, Literal["auto", "any", "none"], bool]
+            Union[dict, str, Literal["auto", "none", "required", "any"], bool]
         ] = None,
+        strict: Optional[bool] = None,
+        parallel_tool_calls: Optional[bool] = None,
         **kwargs: Any,
     ) -> Runnable[LanguageModelInput, BaseMessage]:
         """Bind tool-like objects to this chat model.
 
+        Assumes model is compatible with OpenAI tool-calling API.
+
         Args:
             tools: A list of tool definitions to bind to this chat model.
-                Can be a dictionary, pydantic model, callable, or BaseTool. Pydantic
-                models, callables, and BaseTools will be automatically converted to
-                their schema dictionary representation.
-            tool_choice: Which tool to require the model to call.
-                Must be the name of the single provided function,
-                "auto" to automatically determine which function to call
-                with the option to not call any function, "any" to enforce that some
-                function is called, or a dict of the form:
-                {"type": "function", "function": {"name": <<tool_name>>}}.
-            **kwargs: Any additional parameters to pass to the
-                :class:`~langchain.runnable.Runnable` constructor.
-        """
-        formatted_tools = [convert_to_openai_tool(tool) for tool in tools]
-        if tool_choice is not None and tool_choice:
-            if isinstance(tool_choice, str) and (
-                tool_choice not in ("auto", "any", "none")
-            ):
-                tool_choice = {"type": "function", "function": {"name": tool_choice}}
-            if isinstance(tool_choice, dict) and (len(formatted_tools) != 1):
-                raise ValueError(
-                    "When specifying `tool_choice`, you must provide exactly one "
-                    f"tool. Received {len(formatted_tools)} tools."
-                )
-            if isinstance(tool_choice, dict) and (
-                formatted_tools[0]["function"]["name"]
-                != tool_choice["function"]["name"]
-            ):
+                Supports any tool definition handled by
+                :meth:`langchain_core.utils.function_calling.convert_to_openai_tool`.
+            tool_choice: Which tool to require the model to call. Options are:
+
+                - str of the form ``"<<tool_name>>"``: calls <<tool_name>> tool.
+                - ``"auto"``: automatically selects a tool (including no tool).
+                - ``"none"``: does not call a tool.
+                - ``"any"`` or ``"required"`` or ``True``: force at least one tool to be called.
+                - dict of the form ``{"type": "function", "function": {"name": <<tool_name>>}}``: calls <<tool_name>> tool.
+                - ``False`` or ``None``: no effect, default OpenAI behavior.
+            strict: If True, model output is guaranteed to exactly match the JSON Schema
+                provided in the tool definition. If True, the input schema will be
+                validated according to
+                https://platform.openai.com/docs/guides/structured-outputs/supported-schemas.
+                If False, input schema will not be validated and model output will not
+                be validated.
+                If None, ``strict`` argument will not be passed to the model.
+            parallel_tool_calls: Set to ``False`` to disable parallel tool use.
+                Defaults to ``None`` (no specification, which allows parallel tool use).
+            kwargs: Any additional parameters are passed directly to
+                :meth:`~langchain_openai.chat_models.base.ChatOpenAI.bind`.
+
+        .. versionchanged:: 0.1.21
+
+            Support for ``strict`` argument added.
+
+        """  # noqa: E501
+
+        if parallel_tool_calls is not None:
+            kwargs["parallel_tool_calls"] = parallel_tool_calls
+        formatted_tools = [
+            convert_to_openai_tool(tool, strict=strict) for tool in tools
+        ]
+        tool_names = []
+        for tool in formatted_tools:
+            if "function" in tool:
+                tool_names.append(tool["function"]["name"])
+            elif "name" in tool:
+                tool_names.append(tool["name"])
+            else:
+                pass
+        if tool_choice:
+            if isinstance(tool_choice, str):
+                # tool_choice is a tool/function name
+                if tool_choice in tool_names:
+                    tool_choice = {
+                        "type": "function",
+                        "function": {"name": tool_choice},
+                    }
+                elif tool_choice in (
+                    "file_search",
+                    "web_search_preview",
+                    "computer_use_preview",
+                ):
+                    tool_choice = {"type": tool_choice}
+                # 'any' is not natively supported by OpenAI API.
+                # We support 'any' since other models use this instead of 'required'.
+                elif tool_choice == "any":
+                    tool_choice = "required"
+                else:
+                    pass
+            elif isinstance(tool_choice, bool):
+                tool_choice = "required"
+            elif isinstance(tool_choice, dict):
+                pass
+            else:
                 raise ValueError(
-                    f"Tool choice {tool_choice} was specified, but the only "
-                    f"provided tool was {formatted_tools[0]['function']['name']}."
+                    f"Unrecognized tool_choice type. Expected str, bool or dict. "
+                    f"Received: {tool_choice}"
                 )
-            if isinstance(tool_choice, bool):
-                if len(tools) > 1:
-                    raise ValueError(
-                        "tool_choice can only be True when there is one tool. Received "
-                        f"{len(tools)} tools."
-                    )
-                tool_name = formatted_tools[0]["function"]["name"]
-                tool_choice = {
-                    "type": "function",
-                    "function": {"name": tool_name},
-                }
-
             kwargs["tool_choice"] = tool_choice
         return super().bind(tools=formatted_tools, **kwargs)
 
     def _generate(self, messages: List[BaseMessage], **kwargs) -> ChatResult:
+        if self.is_stream:
+            stream_iter = self._stream(messages, **kwargs)
+            return generate_from_stream(stream_iter)
+
         messages_dicts = self._create_message_dicts(messages, [])
         response = self.llm.chat(
             messages_dicts,
@@ -135,3 +513,180 @@ def _generate(self, messages: List[BaseMessage], **kwargs) -> ChatResult:
             **kwargs,
         )
         return self._create_chat_result(response)
+
+    def _stream(
+        self, messages: List[BaseMessage], **kwargs
+    ) -> Iterator[ChatGenerationChunk]:
+        self.is_stream = True
+
+        messages_dicts = self._create_message_dicts(messages, [])
+        response = self.llm.chat(
+            messages_dicts,
+            model=kwargs.get("model", self.model),
+            is_stream=kwargs.get("is_stream", self.is_stream),
+            retries=kwargs.get("retries", self.retries),
+            parameters=kwargs.get("parameters", self.parameters),
+            **kwargs,
+        )
+
+        try:
+            for chunk in response:
+                if not isinstance(chunk, dict):
+                    chunk = chunk.model_dump()
+                generation_chunk = _convert_chunk_to_generation_chunk(
+                    chunk,
+                    AIMessageChunk,
+                    {},
+                )
+                if generation_chunk is None:
+                    continue
+                # default_chunk_class = generation_chunk.message.__class__
+                # logprobs = (generation_chunk.generation_info or {}).get("logprobs")
+
+                yield generation_chunk
+
+        except Exception as e:
+            raise Exception(e)
+
+
+def _convert_chunk_to_generation_chunk(
+    chunk: dict,
+    default_chunk_class: Type,
+    base_generation_info: Optional[Dict],
+) -> Optional[ChatGenerationChunk]:
+    if chunk.get("type") == "content.delta":  # from beta.chat.completions.stream
+        return None
+    token_usage = chunk.get("usage")
+    choices = (
+        chunk.get("choices", [])
+        # from beta.chat.completions.stream
+        or chunk.get("chunk", {}).get("choices", [])
+    )
+
+    usage_metadata: Optional[UsageMetadata] = (
+        _create_usage_metadata(token_usage) if token_usage else None
+    )
+    if len(choices) == 0:
+        # logprobs is implicitly None
+        generation_chunk = ChatGenerationChunk(
+            message=default_chunk_class(content="", usage_metadata=usage_metadata)
+        )
+        return generation_chunk
+
+    choice = choices[0]
+    if choice["delta"] is None:
+        return None
+
+    message_chunk = _convert_delta_to_message_chunk(
+        choice["delta"], default_chunk_class
+    )
+    generation_info = {**base_generation_info} if base_generation_info else {}
+
+    if finish_reason := choice.get("finish_reason"):
+        generation_info["finish_reason"] = finish_reason
+        if model_name := chunk.get("model"):
+            generation_info["model_name"] = model_name
+        if system_fingerprint := chunk.get("system_fingerprint"):
+            generation_info["system_fingerprint"] = system_fingerprint
+
+    logprobs = choice.get("logprobs")
+    if logprobs:
+        generation_info["logprobs"] = logprobs
+
+    if usage_metadata and isinstance(message_chunk, AIMessageChunk):
+        message_chunk.usage_metadata = usage_metadata
+
+    generation_chunk = ChatGenerationChunk(
+        message=message_chunk, generation_info=generation_info or None
+    )
+    return generation_chunk
+
+
+def _convert_delta_to_message_chunk(
+    _dict: Mapping[str, Any], default_class: Type[BaseMessageChunk]
+) -> BaseMessageChunk:
+    id_ = _dict.get("id")
+    role = cast(str, _dict.get("role"))
+    content = cast(str, _dict.get("content") or "")
+    additional_kwargs: Dict = {}
+    if _dict.get("function_call"):
+        function_call = dict(_dict["function_call"])
+        if "name" in function_call and function_call["name"] is None:
+            function_call["name"] = ""
+        additional_kwargs["function_call"] = function_call
+    tool_call_chunks = []
+    if raw_tool_calls := _dict.get("tool_calls"):
+        additional_kwargs["tool_calls"] = raw_tool_calls
+        try:
+            tool_call_chunks = [
+                tool_call_chunk(
+                    name=rtc["function"].get("name"),
+                    args=rtc["function"].get("arguments"),
+                    id=rtc.get("id"),
+                    index=rtc["index"],
+                )
+                for rtc in raw_tool_calls
+            ]
+        except KeyError:
+            pass
+
+    if role == "user" or default_class == HumanMessageChunk:
+        return HumanMessageChunk(content=content, id=id_)
+    elif role == "assistant" or default_class == AIMessageChunk:
+        return AIMessageChunk(
+            content=content,
+            additional_kwargs=additional_kwargs,
+            id=id_,
+            tool_call_chunks=tool_call_chunks,  # type: ignore[arg-type]
+        )
+    elif role in ("system", "developer") or default_class == SystemMessageChunk:
+        if role == "developer":
+            additional_kwargs = {"__openai_role__": "developer"}
+        else:
+            additional_kwargs = {}
+        return SystemMessageChunk(
+            content=content, id=id_, additional_kwargs=additional_kwargs
+        )
+    elif role == "function" or default_class == FunctionMessageChunk:
+        return FunctionMessageChunk(content=content, name=_dict["name"], id=id_)
+    elif role == "tool" or default_class == ToolMessageChunk:
+        return ToolMessageChunk(
+            content=content, tool_call_id=_dict["tool_call_id"], id=id_
+        )
+    elif role or default_class == ChatMessageChunk:
+        return ChatMessageChunk(content=content, role=role, id=id_)
+    else:
+        return default_class(content=content, id=id_)  # type: ignore
+
+
+def _create_usage_metadata(oai_token_usage: dict) -> UsageMetadata:
+    input_tokens = oai_token_usage.get("prompt_tokens", 0)
+    output_tokens = oai_token_usage.get("completion_tokens", 0)
+    total_tokens = oai_token_usage.get("total_tokens", input_tokens + output_tokens)
+    input_token_details: dict = {
+        "audio": (oai_token_usage.get("prompt_tokens_details") or {}).get(
+            "audio_tokens"
+        ),
+        "cache_read": (oai_token_usage.get("prompt_tokens_details") or {}).get(
+            "cached_tokens"
+        ),
+    }
+    output_token_details: dict = {
+        "audio": (oai_token_usage.get("completion_tokens_details") or {}).get(
+            "audio_tokens"
+        ),
+        "reasoning": (oai_token_usage.get("completion_tokens_details") or {}).get(
+            "reasoning_tokens"
+        ),
+    }
+    return UsageMetadata(
+        input_tokens=input_tokens,
+        output_tokens=output_tokens,
+        total_tokens=total_tokens,
+        input_token_details=InputTokenDetails(
+            **{k: v for k, v in input_token_details.items() if v is not None}
+        ),
+        output_token_details=OutputTokenDetails(
+            **{k: v for k, v in output_token_details.items() if v is not None}
+        ),
+    )
diff --git a/libs/llmstudio/poetry.lock b/libs/llmstudio/poetry.lock
index b81756b3..6e068fd6 100644
--- a/libs/llmstudio/poetry.lock
+++ b/libs/llmstudio/poetry.lock
@@ -1888,7 +1888,7 @@ pytest = ["pytest (>=7.0.0)", "rich (>=13.9.4,<14.0.0)"]
 
 [[package]]
 name = "llmstudio-core"
-version = "1.0.4a1"
+version = "1.0.4"
 description = "LLMStudio core capabilities for routing llm calls for any vendor. No proxy server required. For that use llmstudio[proxy]"
 optional = false
 python-versions = "^3.9"
diff --git a/libs/llmstudio/tests/integration_tests/test_cache_and_reasoning_costs.py b/libs/llmstudio/tests/integration_tests/test_cache_and_reasoning_costs.py
index 5b02ec7f..46ff3e99 100644
--- a/libs/llmstudio/tests/integration_tests/test_cache_and_reasoning_costs.py
+++ b/libs/llmstudio/tests/integration_tests/test_cache_and_reasoning_costs.py
@@ -168,7 +168,6 @@ def test_metrics_reasoning(provider_model, metrics):
             + current_metrics["reasoning_tokens"]
             == current_metrics["total_tokens"]
         ), "Total tokens mismatch"
-
     print(f"All Reasoning Tests Passed for {provider} - {model}")
 
 

From 82c4f9e861c06fe32150ddc3cb913f0d56d271c3 Mon Sep 17 00:00:00 2001
From: GitHub Actions <actions@github.com>
Date: Tue, 22 Apr 2025 16:04:09 +0000
Subject: [PATCH 2/7] [fix] bump prerelease version in pyproject.toml

---
 libs/llmstudio/pyproject.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libs/llmstudio/pyproject.toml b/libs/llmstudio/pyproject.toml
index b0698373..60afc8b7 100644
--- a/libs/llmstudio/pyproject.toml
+++ b/libs/llmstudio/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "llmstudio"
-version = "1.0.5"
+version = "1.0.6a0"
 description = "Prompt Perfection at Your Fingertips"
 authors = ["Cláudio Lemos <claudio@tensorops.ai>"]
 license = "MIT"

From 7fd58a3b44b128faa7e8c8be32b04e9be9647c75 Mon Sep 17 00:00:00 2001
From: GitHub Actions <actions@github.com>
Date: Tue, 22 Apr 2025 16:07:48 +0000
Subject: [PATCH 3/7] [fix] bump prerelease version in pyproject.toml

---
 libs/llmstudio/pyproject.toml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libs/llmstudio/pyproject.toml b/libs/llmstudio/pyproject.toml
index 60afc8b7..1f1449fe 100644
--- a/libs/llmstudio/pyproject.toml
+++ b/libs/llmstudio/pyproject.toml
@@ -1,6 +1,6 @@
 [tool.poetry]
 name = "llmstudio"
-version = "1.0.6a0"
+version = "1.0.6a1"
 description = "Prompt Perfection at Your Fingertips"
 authors = ["Cláudio Lemos <claudio@tensorops.ai>"]
 license = "MIT"

From 71c050f2a1366b280ebc254f3621f4cc8897e5fb Mon Sep 17 00:00:00 2001
From: Diogo Goncalves <diogoncalves@users.noreply.github.com>
Date: Wed, 23 Apr 2025 18:11:54 +0100
Subject: [PATCH 4/7] Update core.py

Signed-off-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>
---
 examples/core.py | 15 +--------------
 1 file changed, 1 insertion(+), 14 deletions(-)

diff --git a/examples/core.py b/examples/core.py
index cecfdb5e..8ef8adba 100644
--- a/examples/core.py
+++ b/examples/core.py
@@ -78,8 +78,6 @@ async def async_stream():
     asyncio.run(async_stream())
     
     
-    print("\nSync Non-Stream")
-    chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False)
     print("\nSync Non-Stream")
     chat_request = build_chat_request(model, chat_input="Hello, my name is Alice", is_stream=False)
     
@@ -104,7 +102,6 @@ async def async_stream():
     return latencies
 
 def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens: int=1000):
-    if model.startswith(('o1', 'o3')):
     if model.startswith(('o1', 'o3')):
         chat_request = {
             "chat_input": chat_input,
@@ -125,16 +122,6 @@ def build_chat_request(model: str, chat_input: str, is_stream: bool, max_tokens:
                 "maxTokens": max_tokens
             }
         }
-    elif 'amazon.nova' in model or 'anthropic.claude' in model:
-        chat_request = {
-            "chat_input": chat_input,
-            "model": model,
-            "is_stream": is_stream,
-            "retries": 0,
-            "parameters": {
-                "maxTokens": max_tokens
-            }
-        }
     else:
         chat_request = {
             "chat_input": chat_input,
@@ -226,4 +213,4 @@ def run_send_imgs():
     #    if p.metrics:
     #        p.clean_print()
     
-run_send_imgs()
\ No newline at end of file
+run_send_imgs()

From ce0461ce168ded87a515b7348c1921dfc4fd8947 Mon Sep 17 00:00:00 2001
From: Diogo Goncalves <diogoncalves@users.noreply.github.com>
Date: Wed, 23 Apr 2025 18:12:52 +0100
Subject: [PATCH 5/7] Update core.py

Signed-off-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>
---
 examples/core.py | 7 +------
 1 file changed, 1 insertion(+), 6 deletions(-)

diff --git a/examples/core.py b/examples/core.py
index 8ef8adba..6becbf84 100644
--- a/examples/core.py
+++ b/examples/core.py
@@ -9,7 +9,6 @@
 load_dotenv()
 
 def run_provider(provider, model, api_key=None=None, **kwargs):
-    print(f"\n\n###RUNNING for <{provider}>, <{model}> ###")
     print(f"\n\n###RUNNING for <{provider}>, <{model}> ###")
     llm = LLMCore(provider=provider, api_key=api_key, **kwargs)
 
@@ -54,11 +53,7 @@ def run_provider(provider, model, api_key=None=None, **kwargs):
     
     response_async = asyncio.run(llm.achat(**chat_request))
     pprint(response_async)
-    latencies["async (ms)"]= response_async.metrics["latency_s"]*1000
-    
-    
-    print("\nAsync Stream")
-    
+    latencies["async (ms)"]= response_async.metrics["latency_s"]*1000    
     
     print("\nAsync Stream")
     async def async_stream():

From 54c2da4b358dfac553d521c93e698129cd2688ab Mon Sep 17 00:00:00 2001
From: Diogo Goncalves <diogoncalves@users.noreply.github.com>
Date: Wed, 23 Apr 2025 18:14:18 +0100
Subject: [PATCH 6/7] Update core.py

Signed-off-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>
---
 examples/core.py | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/examples/core.py b/examples/core.py
index 6becbf84..eb9d5176 100644
--- a/examples/core.py
+++ b/examples/core.py
@@ -8,12 +8,11 @@
 from dotenv import load_dotenv
 load_dotenv()
 
-def run_provider(provider, model, api_key=None=None, **kwargs):
+def run_provider(provider, model, api_key=None, **kwargs):
     print(f"\n\n###RUNNING for <{provider}>, <{model}> ###")
     llm = LLMCore(provider=provider, api_key=api_key, **kwargs)
 
     latencies = {}
-    
     print("\nAsync Non-Stream")
     chat_request = build_chat_request(model, chat_input="Hello, my name is Jason", is_stream=False)
     string = """
@@ -58,7 +57,6 @@ def run_provider(provider, model, api_key=None=None, **kwargs):
     print("\nAsync Stream")
     async def async_stream():
         chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True)
-        chat_request = build_chat_request(model, chat_input="Hello, my name is Tom", is_stream=True)
         
         response_async = await llm.achat(**chat_request)
         async for p in response_async:

From 02c06f5921cbb3ed7e8384654ddb42681fbc945f Mon Sep 17 00:00:00 2001
From: Diogo Goncalves <diogoncalves@users.noreply.github.com>
Date: Wed, 23 Apr 2025 18:16:52 +0100
Subject: [PATCH 7/7] Update core.py

Signed-off-by: Diogo Goncalves <diogoncalves@users.noreply.github.com>
---
 examples/core.py | 1 -
 1 file changed, 1 deletion(-)

diff --git a/examples/core.py b/examples/core.py
index eb9d5176..a9ce769f 100644
--- a/examples/core.py
+++ b/examples/core.py
@@ -49,7 +49,6 @@ def run_provider(provider, model, api_key=None, **kwargs):
     """
     #chat_request = build_chat_request(model, chat_input=string, is_stream=False)
     
-    
     response_async = asyncio.run(llm.achat(**chat_request))
     pprint(response_async)
     latencies["async (ms)"]= response_async.metrics["latency_s"]*1000