Phi Family in GitHub Models

Welcome to GitHub Models ! We've got everything fired up and ready for you to explore AI Models hosted on Azure AI.

For more information about the Models available on GitHub Models, check out the GitHub Model Marketplace

Models Available

Each model has a dedicated playground and sample code

Phi Family in GitHub Model Catalog

Phi-4
Phi-3.5-MoE instruct (128k)
Phi-3.5-vision instruct (128k)
Phi-3.5-mini instruct (128k)
Phi-3-Medium-128k-Instruct
Phi-3-medium-4k-instruct
Phi-3-mini-128k-instruct
Phi-3-mini-4k-instruct
Phi-3-small-128k-instruct
Phi-3-small-8k-instruct

Getting Started

There are a few basic examples that are ready for you to run. You can find them in the samples directory. If you want to jump straight to your favorite language, you can find the examples in the following Languages:

Python
JavaScript
C#
Java
cURL

There is also a dedicated Codespaces Environment for running the samples and models.

Sample Code

Below are example code snippets for a few use cases. For additional information about Azure AI Inference SDK, see full documentation and samples.

Setup

Create a personal access token You do not need to give any permissions to the token. Note that the token will be sent to a Microsoft service.

To use the code snippets below, create an environment variable to set your token as the key for the client code.

If you're using bash:

export GITHUB_TOKEN="<your-github-token-goes-here>"

If you're in powershell:

$Env:GITHUB_TOKEN="<your-github-token-goes-here>"

If you're using Windows command prompt:

set GITHUB_TOKEN=<your-github-token-goes-here>

Python Sample

Install dependencies

Install the Azure AI Inference SDK using pip (Requires: Python >=3.8):

pip install azure-ai-inference

Run a basic code sample

This sample demonstrates a basic call to the chat completion API. It is leveraging the GitHub AI model inference endpoint and your GitHub token. The call is synchronous.

import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

endpoint = "https://models.inference.ai.azure.com"
model_name = "Phi-4"
token = os.environ["GITHUB_TOKEN"]

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    messages=[
        UserMessage(content="I have $20,000 in my savings account, where I receive a 4% profit per year and payments twice a year. Can you please tell me how long it will take for me to become a millionaire? Also, can you please explain the math step by step as if you were explaining it to an uneducated person?"),
    ],
    temperature=0.4,
    top_p=1.0,
    max_tokens=2048,
    model=model_name
)

print(response.choices[0].message.content)

Run a multi-turn conversation

This sample demonstrates a multi-turn conversation with the chat completion API. When using the model for a chat application, you'll need to manage the history of that conversation and send the latest messages to the model.

import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import AssistantMessage, SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

token = os.environ["GITHUB_TOKEN"]
endpoint = "https://models.inference.ai.azure.com"
# Replace Model_Name
model_name = "Phi-4"

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

messages = [
    SystemMessage(content="You are a helpful assistant."),
    UserMessage(content="What is the capital of France?"),
    AssistantMessage(content="The capital of France is Paris."),
    UserMessage(content="What about Spain?"),
]

response = client.complete(messages=messages, model=model_name)

print(response.choices[0].message.content)

Stream the output

For a better user experience, you will want to stream the response of the model so that the first token shows up early and you avoid waiting for long responses.

import os
from azure.ai.inference import ChatCompletionsClient
from azure.ai.inference.models import SystemMessage, UserMessage
from azure.core.credentials import AzureKeyCredential

token = os.environ["GITHUB_TOKEN"]
endpoint = "https://models.inference.ai.azure.com"
# Replace Model_Name
model_name = "Phi-4"

client = ChatCompletionsClient(
    endpoint=endpoint,
    credential=AzureKeyCredential(token),
)

response = client.complete(
    stream=True,
    messages=[
        SystemMessage(content="You are a helpful assistant."),
        UserMessage(content="Give me 5 good reasons why I should exercise every day."),
    ],
    model=model_name,
)

for update in response:
    if update.choices:
        print(update.choices[0].delta.content or "", end="")

client.close()

FREE Usage and Rate limits for GitHub Models

The rate limits for the playground and free API usage are intended to help you experiment with models and prototype your AI application. For use beyond those limits, and to bring your application to scale, you must provision resources from an Azure account, and authenticate from there instead of your GitHub personal access token. You don't need to change anything else in your code. Use this link to discover how to go beyond the free tier limits in Azure AI.

Disclosures

Remember when interacting with a model you are experimenting with AI, so content mistakes are possible.

The feature is subject to various limits (including requests per minute, requests per day, tokens per request, and concurrent requests) and is not designed for production use cases.

GitHub Models uses Azure AI Content Safety. These filters cannot be turned off as part of the GitHub Models experience. If you decide to employ models through a paid service, please configure your content filters to meet your requirements.

This service is under GitHub’s Pre-release Terms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!