22 Jul 04:47

Riandy

39e150d

v0.2.14 Latest

Latest

Llama Stack SDK 0.2.14 Update

Update SDK to support Llama Stack server v0.2.14.
Update local module to reflect the latest API specs
Update demo app to work with the latest SDK
Class/method refactor to simplify implementation (full list refer to 6df1e89)
- Message.ofCompletion --> Message.ofAssistant
Stainless commit 7ed6a0b5d7f54c7d0a79de35230f0046b9c72833

Contributors

@cmodi-meta, @Riandy, @seyeong-han, @WuhanMonkey

Contributors

Riandy, WuhanMonkey, and 2 other contributors

Assets 2

14 Apr 23:05

cmodi-meta

v0.2.2

9fc6064

v0.2.2

Llama Stack SDK 0.2.2 Update

Update SDK to support Llama Stack v0.2.2 which includes multi-image inference.

Local RAG Support

The major update is to enable local RAG. The local RAG implementation is 100% offline and is completely on-device.

The local module SDK supports the end-to-end solution of:

creating a vector DB instance
creating text chunks
Receiving embeddings from the Android app
Storing embeddings in a vector DB
Managing the agent turn with RAG tool call to receive a revalant response from the LLM.

On-device Vector DB solution: ObjectBox

Android Demo App

RAG

We've added a RAG feature in the demo app to help showcase how to use remote RAG and local RAG SDKs. With creating a document object, registering a vector db, and using RagTool from Llama Stack, the remote RAG feature contains all RAG-specific logic.

Improved User Experience: The remote RAG feature provides a seamless experience for users, allowing them to ask questions and receive accurate answers quickly.
Increased Efficiency: With the ability to process large documents and retrieve relevant information, the remote RAG feature saves time and increases productivity.

In this example, a PDF or text file (i.e. Car Manual) can easily be processed for Question-Answer inference scenarios with the user.

Also with a few code line changes, you can switch to using local RAG. That's the advantage of Llama Stack Mobile SDKs - to be able to interoperate between remote and local without major code changes!

Multi-image Inference

We've built in a sample support for being able to select multiple images and run inference with Llama 4.

Contributors

@ashwinb, @cmodi-meta, @dltn , @Riandy, @seyeong-han, @WuhanMonkey, @yanxi0830

Contributors

ashwinb, Riandy, and 5 other contributors

Assets 2

20 Mar 17:26

Riandy

v0.1.7

9418165

v0.1.7

0.1.7 Release Note

Support for Llama Stack Server 0.1.7
Update local module to support the specs changes

Contributors

@ashwinb, @cmodi-meta, @Riandy, @WuhanMonkey, @yanxi0830

What's Changed

Upgrade SDK to v0.1.7 by @Riandy in #29

Full Changelog: v0.1.4.2...v0.1.7

Contributors

ashwinb, Riandy, and 3 other contributors

Assets 2

15 Mar 00:11

cmodi-meta

v0.1.4.2

42bbb97

v0.1.4.2

Local Inference Support

The major update is to upgrade to support ExecuTorch v0.5.0 framework for on-device inferencing. Some of the notable improvements are:

Include support for KleidiAI Blockwise Kernels in XNNPACK to give 20%+ gain in Llama prefill
Support models quantized via torchao’s quanitize_ api
Enable stable lowering into XNNPACK
Feature and fixes on Qualcomm and MediaTek backends (support to come in the future)
Bug fixes

It's compatible with models (.pte files) that were exported with the previous 0.4 version of ExecuTorch.

Demo App Location

To help consolidate reference material, we’ve moved the example demo apps from llama-stack-apps to llama-stack-client-kotlin.

Contributors

@ashwinb, @cmodi-meta, @dltn , @Riandy, @WuhanMonkey, @yanxi0830

Contributors

ashwinb, Riandy, and 4 other contributors

Assets 2

25 Feb 20:57

Riandy

v0.1.4.1

038528b

v0.1.4.1

Release

Llama Stack Kotlin SDK v0.1.4.1

Bugfixes for #23
Devs should use Llama Stack server 0.1.4 with Kotlin SDK 0.1.4.1

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @ashwinb, @cmodi-meta, @dineshyv, @Riandy, @WuhanMonkey, @yanxi0830.

Contributors

ashwinb, Riandy, and 4 other contributors

Assets 2

25 Feb 01:29

Riandy

v0.1.4

833ad3e

v0.1.4

Release

Llama Stack Kotlin SDK v0.1.4

API Support for v0.1.4 Llama Stack Server
Bugfixes

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @ashwinb, @cmodi-meta, @dineshyv, @Riandy, @WuhanMonkey, @yanxi0830.

Contributors

ashwinb, Riandy, and 4 other contributors

Assets 2

13 Feb 19:41

Riandy

v0.1.2

def17d0

v0.1.2

Release

Llama Stack Kotlin SDK v0.1.2

API Support for v0.1.2 Llama Stack Server
ToolCall class refactor to simplify implementation
ResponseStreamChunk refactor
Url now leaves inside InterleavedContent.ImageContentItem.Image
Bugfixes

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @ashwinb, @cmodi-meta, @dineshyv, @Riandy, @WuhanMonkey, @yanxi0830.

Contributors

ashwinb, Riandy, and 4 other contributors

Assets 2

28 Jan 00:37

Riandy

v0.1.0

045845f

v0.1.0

Release

Llama Stack launched a stable release (v0.1.0). We have updated the Kotlin client SDK to make it compatible for supporting local and remote inference on Android apps, which enables developers to build RAG applications and Agents using tools and safety shields, image reasoning, monitor those agents with telemetry, and evaluate the agent with scoring functions.

Key Features of this release

API Support for v0.1.0 Llama Stack Server
Remote Inference
- Agentic model inference
- Agentic tool calling
- Image reasoning with Vision Models
Local Inference
- Local inference with streaming capabilities
Sample llama stack applications
- Android
Bugfixes

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @ashwinb, @cmodi-meta, @dineshyv, @Riandy, @WuhanMonkey, @yanxi0830.

Contributors

ashwinb, Riandy, and 4 other contributors

Assets 2

12 Dec 20:02

cmodi-meta

v0.0.58

5ccf74e

v0.0.58

In this release, we build upon our major update for the Llama Stack Kotlin Library supporting local and remote inference on Android apps (v0.0.54.1). This update introduces several key features focused around updates to support Llama Stack server v0.0.58, tool calling, and response streaming on remote inference.

Local Inference Support

Single and multiple custom tool calling

Remote Inference Support

Enabled remote support with Llama Stack server v0.0.58
Fix type referencing error in SDK
Response streaming
Custom Tool calling is supported in non-streaming cases but not yet available for streaming cases

Build Support

Modification to build-libs.sh to clean old jars and any other artifacts before a build to avoid confusion.

Supporting

The models supported in the app vary based on whether you are doing remote or local inferencing.

Remote Model Support
For remote usage, the app supports all models that the Llama Stack server backend supports. This includes a range of models from the lightweight 1B Llama models to the extensive 405B models.

Local Model Support
For on-device usage, the following models are supported:

Llama 3.2 Quantized 1B/3B
Llama 3.2 1B/3B in BF16
Llama 3.1 8B Quantized
Llama 3 8B Quantized

Framework: ExecuTorch (commit: 0a12e33)

Getting Started

Pointer to an Android demo app that developers can use to get started (link with tag)
Quick start instructions on how to add the Kotlin SDK to their Android app
Instructions on how a power developer can contribute to the Kotlin SDK, debug or just play with it to learn more.

If you have any questions feel free to raise an issue and we’d be happy to help!

What’s Next?

This is only the beginning with enabling features on Llama Stack to run on Android devices. We will continue to expand the capabilities of Llama Stack, new use cases, and applications! Specifically we look to focus on:

Agentic workflow with streaming
Image and Speech reasoning
Local/on-device agentic components like memory banks
Examples with RAG usage

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @cmodi-meta, @dltn, @Riandy, @WuhanMonkey, @yanxi0830, and big thank you to the ExecuTorch team.

Contributors

Riandy, WuhanMonkey, and 3 other contributors

Assets 2

06 Dec 22:33

cmodi-meta

v0.0.54.1

0a68271

v0.0.54.1

We are excited to announce a major update to the Llama Stack Kotlin Library which now supports both local and remote inferencing on Android apps. Building on the existing remote inferencing capabilities, this release introduces significant changes to enable seamless local inferencing integration and providing developers more flexibility with their AI workflows. This release focuses on delivering features centered around these capabilities.

Release v0.0.54.1 includes local modules as part of the Kotlin Library dependency in Maven

Local Inference Support

Leverage ExecuTorch on-device framework (commit: 0a12e33) for on-device inferencing.
Script for downloading ExecuTorch aar file.
Allow passing various configurations from Android app: .pte and tokenizer file, sequence length, and temperature.
Send stats metrics from ExecuTorch (tok/sec).
Handle prompt formatting based on model.
Support conversational history.

Remote Inference Support

Enabled remote support with Llama Stack server v0.0.54
Fix lib compile issues due to Stainless autogen invalid types (link and link)

Supporting Models

The models supported in the app vary based on whether you are doing remote or local inferencing.

Remote Model Support

For remote usage, the app supports all models that the Llama Stack server backend supports. This includes a range of models from the lightweight 1B Llama models to the extensive 405B models.

Local Model Support

For on-device usage, the following models are supported:

Llama 3.2 Quantized 1B/3B
Llama 3.2 1B/3B in BF16
Llama 3.1 8B Quantized
Llama 3 8B Quantized

Getting Started

Pointer to an Android demo app. (Note tag: android-0.0.54.1)
Quick start instructions on how to add the Kotlin SDK to their Android app
Instructions on how a power developer can contribute to the Kotlin SDK, debug or just play with it to learn more.

If you have any questions feel free to raise an issue and we’d be happy to help!

What’s Next?

Agentic workflow with streaming
Image and Speech reasoning
Local/on-device agentic components like memory banks
Examples with RAG usage

Stay tuned on future releases and updates!

Contributors

@cmodi-meta, @dltn, @Riandy, @WuhanMonkey, @yanxi0830, and big thank you to the ExecuTorch team.

Contributors

Riandy, WuhanMonkey, and 3 other contributors

Assets 2

Releases: llamastack/llama-stack-client-kotlin

v0.2.14

Llama Stack SDK 0.2.14 Update

Contributors

Contributors

Uh oh!

v0.2.2

Llama Stack SDK 0.2.2 Update

Local RAG Support

Android Demo App

RAG

Multi-image Inference

Contributors

Contributors

Uh oh!

v0.1.7

0.1.7 Release Note

Contributors

What's Changed

Contributors

Uh oh!

v0.1.4.2

Local Inference Support

Demo App Location

Contributors

Contributors

Uh oh!

v0.1.4.1

Release

Contributors

Contributors

Uh oh!

v0.1.4

Release

Contributors

Contributors

Uh oh!

v0.1.2

Release

Contributors

Contributors

Uh oh!

v0.1.0

Release

Key Features of this release

Contributors

Contributors

Uh oh!

v0.0.58

Local Inference Support

Remote Inference Support

Build Support

Supporting

Getting Started

What’s Next?

Contributors

Contributors

Uh oh!

v0.0.54.1

Local Inference Support

Remote Inference Support

Supporting Models

Remote Model Support

Local Model Support

Getting Started

What’s Next?

Contributors

Contributors

Uh oh!