Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo sync for protected branch #221

Open
wants to merge 79 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
79 commits
Select commit Hold shift + click to select a range
648dd7c
Add python process how-to guides
moonbox3 Feb 13, 2025
2b565e2
Improve Python agent learn site samples.
moonbox3 Feb 13, 2025
f1b1b6c
Fix spurious zone-end tag
gewarren Feb 13, 2025
b5eb15b
Ingestion -> injection
gewarren Feb 13, 2025
277688d
Merge pull request #460 from gewarren/patch-3
sophialagerkranspandey Feb 13, 2025
e397edf
Merge pull request #459 from gewarren/patch-2
sophialagerkranspandey Feb 13, 2025
9cbc014
Merge pull request #458 from gewarren/sync
sophialagerkranspandey Feb 13, 2025
00900dc
Merge pull request #461 from MicrosoftDocs/main
sophialagerkranspandey Feb 13, 2025
3465088
Include links to repo code.
moonbox3 Feb 14, 2025
a2d9cd0
Remove fixed locale from link
moonbox3 Feb 14, 2025
40075fb
Fix python sample resource link
moonbox3 Feb 14, 2025
4bea51e
Use site relative links for learn site links. They don't need to be a…
moonbox3 Feb 14, 2025
d635b45
Fix media link
moonbox3 Feb 14, 2025
78ad2e7
Scope link per language
moonbox3 Feb 14, 2025
fa534a9
More cleanup
moonbox3 Feb 14, 2025
7666b33
Add prompt template config import. Remove view from link in Python co…
moonbox3 Feb 16, 2025
c03c058
Updates to callout reserved param names with Python function calling.
moonbox3 Feb 17, 2025
039d3cd
updated filters page
eavanvalkenburg Feb 17, 2025
6ddea4c
Merge pull request #457 from moonbox3/update-py-sample-code
moonbox3 Feb 17, 2025
39f9534
Python: merge Python docs updates from live to main (#464)
moonbox3 Feb 17, 2025
ae1e457
Merge branch 'main' into py-processes-how-to
moonbox3 Feb 17, 2025
84cf4bd
Add Python processes sample code.
moonbox3 Feb 17, 2025
03244ed
fixed headings
eavanvalkenburg Feb 18, 2025
4f90961
removed heading
eavanvalkenburg Feb 18, 2025
fa77efc
added notes on ordering
eavanvalkenburg Feb 18, 2025
26e2b72
try inline zone
eavanvalkenburg Feb 18, 2025
436e9cc
fix bullet
eavanvalkenburg Feb 18, 2025
9c91c54
single line zone
eavanvalkenburg Feb 18, 2025
9d59fc2
small text updates
eavanvalkenburg Feb 18, 2025
4043083
added new sample links
eavanvalkenburg Feb 19, 2025
c408c4f
fix indentation
eavanvalkenburg Feb 19, 2025
7d9f07a
polish
eavanvalkenburg Feb 19, 2025
e70f690
Merge pull request #462 from eavanvalkenburg/filters
sophialagerkranspandey Feb 19, 2025
32f9d84
Merge pull request #466 from MicrosoftDocs/main
sophialagerkranspandey Feb 19, 2025
f0d5d11
Update semantic-kernel/Frameworks/process/examples/example-cycles.md
alliscode Feb 25, 2025
57b1890
Merge pull request #465 from moonbox3/py-processes-how-to
alliscode Feb 25, 2025
b754b73
Update semantic-kernel/Frameworks/process/examples/example-cycles.md
sophialagerkranspandey Feb 25, 2025
1db60e2
Update semantic-kernel/Frameworks/process/examples/example-first-proc…
sophialagerkranspandey Feb 25, 2025
50864c0
Update semantic-kernel/Frameworks/process/examples/example-cycles.md
sophialagerkranspandey Feb 25, 2025
1ba50f2
Merge pull request #467 from MicrosoftDocs/main
sophialagerkranspandey Feb 25, 2025
a97ebdb
OpenAI not Open AI (#468)
eric-urban Feb 26, 2025
20fdca6
Fix unsupported distance functions in samples
westey-m Feb 26, 2025
d7d91a3
updated table
eavanvalkenburg Feb 17, 2025
e5bc203
adding mssing stores
eavanvalkenburg Feb 17, 2025
65270c5
updated a whole bunch
eavanvalkenburg Feb 26, 2025
ed29edc
fixes
eavanvalkenburg Feb 26, 2025
f37d616
initial version of realtime docs
eavanvalkenburg Feb 26, 2025
2891764
extra info in table
eavanvalkenburg Feb 26, 2025
02978db
added link
eavanvalkenburg Feb 26, 2025
fe3433d
Merge pull request #470 from westey-m/fix-distance-func-in-docs
sophialagerkranspandey Feb 26, 2025
558bcd7
Merge pull request #472 from MicrosoftDocs/main
sophialagerkranspandey Feb 26, 2025
26fa909
Update Agent Framework related doc and code samples. Add migration co…
moonbox3 Feb 28, 2025
24962d0
Update Agent Framework related doc and code samples. Add migration co…
moonbox3 Feb 28, 2025
8d59c5c
Update title (#474)
moonbox3 Feb 28, 2025
575b53e
Proper migration guide title
moonbox3 Feb 28, 2025
88646f4
Merge pull request #476 from MicrosoftDocs/merge-live-into-main
moonbox3 Feb 28, 2025
e96632d
Merge main to live: updating Migration Guide title (#477)
moonbox3 Feb 28, 2025
b6343d8
Merge pull request #463 from eavanvalkenburg/memory_python
sophialagerkranspandey Feb 28, 2025
bdb6118
Update semantic-kernel/concepts/vector-store-connectors/out-of-the-bo…
sophialagerkranspandey Feb 28, 2025
3f2fcdb
Update semantic-kernel/concepts/vector-store-connectors/out-of-the-bo…
sophialagerkranspandey Feb 28, 2025
76657f9
Update semantic-kernel/concepts/vector-store-connectors/out-of-the-bo…
sophialagerkranspandey Feb 28, 2025
24c208a
Update semantic-kernel/concepts/vector-store-connectors/out-of-the-bo…
sophialagerkranspandey Feb 28, 2025
91d12bb
Merge pull request #478 from MicrosoftDocs/main
sophialagerkranspandey Feb 28, 2025
4616882
extended docs
eavanvalkenburg Mar 4, 2025
7f46bae
fixed link
eavanvalkenburg Mar 4, 2025
6941538
fixed header
eavanvalkenburg Mar 4, 2025
095cf4c
Merge pull request #471 from eavanvalkenburg/realtime
sophialagerkranspandey Mar 4, 2025
f3e12a0
Merge branch 'main' into repo_sync_working_branch
crickman Mar 5, 2025
092ed5d
Merge pull request #482 from MicrosoftDocs/repo_sync_working_branch
crickman Mar 5, 2025
fbaaf72
Update semantic-kernel/concepts/ai-services/realtime.md
sophialagerkranspandey Mar 6, 2025
fcdaeea
Merge pull request #479 from MicrosoftDocs/main
sophialagerkranspandey Mar 6, 2025
fa45498
add some agent language
eavanvalkenburg Mar 6, 2025
eb10199
Improve Python plugin docs part 1
TaoChenOSU Mar 6, 2025
588d7a5
Improve Python plugin docs part 2
TaoChenOSU Mar 6, 2025
6dd3777
remove empty line
TaoChenOSU Mar 6, 2025
39ef038
Merge pull request #485 from MicrosoftDocs/taochen/improve-python-plu…
sophialagerkranspandey Mar 6, 2025
129c980
Update semantic-kernel/concepts/ai-services/realtime.md
sophialagerkranspandey Mar 6, 2025
6896824
Merge pull request #483 from eavanvalkenburg/realtime_p2
sophialagerkranspandey Mar 6, 2025
108f1b9
Merge pull request #486 from MicrosoftDocs/main
sophialagerkranspandey Mar 6, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion semantic-kernel/concepts/ai-services/TOC.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,6 @@
- name: Embedding generation
href: embedding-generation/TOC.yml
- name: AI Integrations
href: integrations.md
href: integrations.md
- name: Realtime
href: realtime.md
22 changes: 12 additions & 10 deletions semantic-kernel/concepts/ai-services/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,21 +14,23 @@ One of the main features of Semantic Kernel is its ability to add different AI s

Within Semantic Kernel, there are interfaces for the most popular AI tasks. In the table below, you can see the services that are supported by each of the SDKs.

| Services | C# | Python | Java | Notes |
|-----------------------------------|:----:|:------:|:----:|-------|
| [Chat completion](./chat-completion/index.md) | ✅ | ✅ | ✅ |
| Text generation | ✅ | ✅ | ✅ |
| Embedding generation (Experimental) | ✅ | ✅ | ✅ |
| Text-to-image (Experimental) | ✅ | ✅ | ❌ |
| Image-to-text (Experimental) | ✅ | ❌ | ❌ |
| Text-to-audio (Experimental) | ✅ | ✅ | ❌ |
| Audio-to-text (Experimental) | ✅ | ✅ | ❌ |
| Services | C# | Python | Java | Notes |
| --------------------------------------------- | :---: | :----: | :---: | ----- |
| [Chat completion](./chat-completion/index.md) | ✅ | ✅ | ✅ |
| Text generation | ✅ | ✅ | ✅ |
| Embedding generation (Experimental) | ✅ | ✅ | ✅ |
| Text-to-image (Experimental) | ✅ | ✅ | ❌ |
| Image-to-text (Experimental) | ✅ | ❌ | ❌ |
| Text-to-audio (Experimental) | ✅ | ✅ | ❌ |
| Audio-to-text (Experimental) | ✅ | ✅ | ❌ |
| [Realtime](./realtime.md) (Experimental) | ❌ | ✅ | ❌ |

> [!TIP]
> In most scenarios, you will only need to add chat completion to your kernel, but to support multi-modal AI, you can add any of the above services to your kernel.

## Next steps

To learn more about each of the services, please refer to the specific articles for each service type. In each of the articles we provide sample code for adding the service to the kernel across multiple AI service providers.

> [!div class="nextstepaction"]
> [Learn about chat completion](./chat-completion/index.md)
> [Learn about chat completion](./chat-completion/index.md)
27 changes: 14 additions & 13 deletions semantic-kernel/concepts/ai-services/integrations.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,21 +18,22 @@ With the available AI connectors, developers can easily build AI agents with swa

### AI Services

| Services | C# | Python | Java | Notes |
|-----------------------------------|:----:|:------:|:----:|-------|
| Text Generation | ✅ | ✅ | ✅ | Example: Text-Davinci-003 |
| Chat Completion | ✅ | ✅ | ✅ | Example: GPT4, Chat-GPT |
| Text Embeddings (Experimental) | ✅ | ✅ | ✅ | Example: Text-Embeddings-Ada-002 |
| Text to Image (Experimental) | ✅ | ✅ | ❌ | Example: Dall-E |
| Image to Text (Experimental) | ✅ | ❌ | ❌ | Example: Pix2Struct |
| Text to Audio (Experimental) | ✅ | ✅ | ❌ | Example: Text-to-speech |
| Audio to Text (Experimental) | ✅ | ✅ | ❌ | Example: Whisper |
| Services | C# | Python | Java | Notes |
| ------------------------------ | :---: | :----: | :---: | -------------------------------- |
| Text Generation | ✅ | ✅ | ✅ | Example: Text-Davinci-003 |
| Chat Completion | ✅ | ✅ | ✅ | Example: GPT4, Chat-GPT |
| Text Embeddings (Experimental) | ✅ | ✅ | ✅ | Example: Text-Embeddings-Ada-002 |
| Text to Image (Experimental) | ✅ | ✅ | ❌ | Example: Dall-E |
| Image to Text (Experimental) | ✅ | ❌ | ❌ | Example: Pix2Struct |
| Text to Audio (Experimental) | ✅ | ✅ | ❌ | Example: Text-to-speech |
| Audio to Text (Experimental) | ✅ | ✅ | ❌ | Example: Whisper |
| Realtime (Experimental) | ❌ | ✅ | ❌ | Example: gpt-4o-realtime-preview |

## Additional plugins

If you want to extend the functionality of your AI agent, you can use plugins to integrate with other Microsoft services. Here are some of the plugins that are available for Semantic Kernel:

| Plugin | C# | Python | Java | Description |
| ---------- | :-: | :----: | :--: | ----------- |
| Logic Apps | ✅ | ✅ | ✅ | Build workflows within Logic Apps using its available connectors and import them as plugins in Semantic Kernel. [Learn more](../plugins/adding-logic-apps-as-plugins.md). |
| Azure Container Apps Dynamic Sessions | ✅ | ✅ | | With dynamic sessions, you can recreate the Code Interpreter experience from the Assistants API by effortlessly spinning up Python containers where AI agents can execute Python code. [Learn more](/azure/container-apps/sessions). |
| Plugin | C# | Python | Java | Description |
| ------------------------------------- | :---: | :----: | :---: | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| Logic Apps | | ✅ | | Build workflows within Logic Apps using its available connectors and import them as plugins in Semantic Kernel. [Learn more](../plugins/adding-logic-apps-as-plugins.md). |
| Azure Container Apps Dynamic Sessions | | ✅ | | With dynamic sessions, you can recreate the Code Interpreter experience from the Assistants API by effortlessly spinning up Python containers where AI agents can execute Python code. [Learn more](/azure/container-apps/sessions). |
189 changes: 189 additions & 0 deletions semantic-kernel/concepts/ai-services/realtime.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,189 @@
---
title: Realtime AI Integrations for Semantic Kernel
description: Learn about realtime multi-modal AI integrations available in Semantic Kernel.
author: eavanvalkenburg
ms.topic: conceptual
ms.author: edvan
ms.date: 02/26/2025
ms.service: semantic-kernel
---

# Realtime Multi-modal APIs

The first realtime API integration for Semantic Kernel has been added, it is currently only available in Python and considered experimental. This is because the underlying services are still being developed and are subject to changes and we might need to make breaking changes to the API in Semantic Kernel as we learn from customers how to use this and as we add other providers of these kinds of models and APIs.

## Realtime Client abstraction

To support different realtime APIs from different vendors, using different protocols, a new client abstraction has been added to the kernel. This client is used to connect to the realtime service and send and receive messages.
The client is responsible for handling the connection to the service, sending messages, and receiving messages. The client is also responsible for handling any errors that occur during the connection or message sending/receiving process. Considering the way these models work, they can be considered agents more than regular chat completions, therefore they also take instructions, rather than a system message, they keep their own internal state and can be invoked to do work on our behalf.
### Realtime API

Any realtime client implements the following methods:

| Method | Description |
| ---------------- | ------------------------------------------------------------------------------------------------------------------ |
| `create_session` | Creates a new session |
| `update_session` | Updates an existing session |
| `delete_session` | Deletes an existing session |
| `receive` | This is a asynchronous generator method that listens for messages from the service and yields them as they arrive. |
| `send` | Sends a message to the service |

### Python implementations

The python version of Semantic Kernel currently supports the following realtime clients:

| Client | Protocol | Modalities | Function calling enabled | Description |
| ------ | --------- | ------------ | ------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| OpenAI | Websocket | Text & Audio | Yes | The OpenAI Realtime API is a websocket based api that allows you to send and receive messages in realtime, this connector uses the OpenAI Python package to connect and receive and send messages. |
| OpenAI | WebRTC | Text & Audio | Yes | The OpenAI Realtime API is a WebRTC based api that allows you to send and receive messages in realtime, it needs a webRTC compatible audio track at session creation time. |
| Azure | Websocket | Text & Audio | Yes | The Azure Realtime API is a websocket based api that allows you to send and receive messages in realtime, this uses the same package as the OpenAI websocket connector. |

## Getting started

To get started with the Realtime API, you need to install the `semantic-kernel` package with the `realtime` extra.

```bash
pip install semantic-kernel[realtime]
```

Depending on how you want to handle audio, you might need additional packages to interface with speakers and microphones, like `pyaudio` or `sounddevice`.

### Websocket clients

Then you can create a kernel and add the realtime client to it, this shows how to do that with a AzureRealtimeWebsocket connection, you can replace AzureRealtimeWebsocket with OpenAIRealtimeWebsocket without any further changes.

```python
from semantic_kernel.connectors.ai.open_ai import (
AzureRealtimeWebsocket,
AzureRealtimeExecutionSettings,
ListenEvents,
)
from semantic_kernel.contents import RealtimeAudioEvent, RealtimeTextEvent

# this will use environment variables to get the api key, endpoint, api version and deployment name.
realtime_client = AzureRealtimeWebsocket()
settings = AzureRealtimeExecutionSettings(voice='alloy')
async with realtime_client(settings=settings, create_response=True):
async for event in realtime_client.receive():
match event:
# receiving a piece of audio (and send it to a undefined audio player)
case RealtimeAudioEvent():
await audio_player.add_audio(event.audio)
# receiving a piece of audio transcript
case RealtimeTextEvent():
# Semantic Kernel parses the transcript to a TextContent object captured in a RealtimeTextEvent
print(event.text.text, end="")
case _:
# OpenAI Specific events
if event.service_type == ListenEvents.SESSION_UPDATED:
print("Session updated")
if event.service_type == ListenEvents.RESPONSE_CREATED:
print("\nMosscap (transcript): ", end="")
```

There are two important things to note, the first is that the `realtime_client` is an async context manager, this means that you can use it in an async function and use `async with` to create the session.
The second is that the `receive` method is an async generator, this means that you can use it in a for loop to receive messages as they arrive.

### WebRTC client

The setup of a WebRTC connection is a bit more complex and so we need a extra parameter when creating the client. This parameter, `audio_track` needs to be a object that implements the `MediaStreamTrack` protocol of the `aiortc` package, this is also demonstrated in the samples that are linked below.

To create a client that uses WebRTC, you would do the following:

```python
from semantic_kernel.connectors.ai.open_ai import (
ListenEvents,
OpenAIRealtimeExecutionSettings,
OpenAIRealtimeWebRTC,
)
from aiortc.mediastreams import MediaStreamTrack

class AudioRecorderWebRTC(MediaStreamTrack):
# implement the MediaStreamTrack methods.

realtime_client = OpenAIRealtimeWebRTC(audio_track=AudioRecorderWebRTC())
# Create the settings for the session
settings = OpenAIRealtimeExecutionSettings(
instructions="""
You are a chat bot. Your name is Mosscap and
you have one goal: figure out what people need.
Your full name, should you need to know it, is
Splendid Speckled Mosscap. You communicate
effectively, but you tend to answer with long
flowery prose.
""",
voice="shimmer",
)
audio_player = AudioPlayer
async with realtime_client(settings=settings, create_response=True):
async for event in realtime_client.receive():
match event.event_type:
# receiving a piece of audio (and send it to a undefined audio player)
case "audio":
await audio_player.add_audio(event.audio)
case "text":
# the model returns both audio and transcript of the audio, which we will print
print(event.text.text, end="")
case "service":
# OpenAI Specific events
if event.service_type == ListenEvents.SESSION_UPDATED:
print("Session updated")
if event.service_type == ListenEvents.RESPONSE_CREATED:
print("\nMosscap (transcript): ", end="")
```

Both of these samples receive the audio as RealtimeAudioEvent and then they pass that to a unspecified audio_player object.

### Audio output callback

Next to this we have a parameter called `audio_output_callback` on the `receive` method and on the class creation. This callback will be called first before any further handling of the audio and gets a `numpy` array of the audio data, instead of it being parsed into AudioContent and returned as a RealtimeAudioEvent that you can then handle, which is what happens above. This has shown to give smoother audio output because there is less overhead between the audio data coming in and it being given to the player.

This example shows how to define and use the `audio_output_callback`:

```python
from semantic_kernel.connectors.ai.open_ai import (
ListenEvents,
OpenAIRealtimeExecutionSettings,
OpenAIRealtimeWebRTC,
)
from aiortc.mediastreams import MediaStreamTrack

class AudioRecorderWebRTC(MediaStreamTrack):
# implement the MediaStreamTrack methods.

class AudioPlayer:
async def play_audio(self, content: np.ndarray):
# implement the audio player

realtime_client = OpenAIRealtimeWebRTC(audio_track=AudioRecorderWebRTC())
# Create the settings for the session
settings = OpenAIRealtimeExecutionSettings(
instructions="""
You are a chat bot. Your name is Mosscap and
you have one goal: figure out what people need.
Your full name, should you need to know it, is
Splendid Speckled Mosscap. You communicate
effectively, but you tend to answer with long
flowery prose.
""",
voice="shimmer",
)
audio_player = AudioPlayer
async with realtime_client(settings=settings, create_response=True):
async for event in realtime_client.receive(audio_output_callback=audio_player.play_audio):
match event.event_type:
# no need to handle case: "audio"
case "text":
# the model returns both audio and transcript of the audio, which we will print
print(event.text.text, end="")
case "service":
# OpenAI Specific events
if event.service_type == ListenEvents.SESSION_UPDATED:
print("Session updated")
if event.service_type == ListenEvents.RESPONSE_CREATED:
print("\nMosscap (transcript): ", end="")
```

### Samples

There are four samples in [our repo](https://github.com/microsoft/semantic-kernel/tree/main/python/samples/concepts/realtime), they cover both the basics using both websockets and WebRTC, as well as a more complex setup including function calling. Finally there is a more [complex demo](https://github.com/microsoft/semantic-kernel/tree/main/python/samples/demos/call_automation) that uses [Azure Communication Services](/azure/communication-services/) to allow you to call your Semantic Kernel enhanced realtime API.
Loading