Skip to content

Releases: llamastack/llama-stack-client-kotlin

v0.2.14

22 Jul 04:47
Compare
Choose a tag to compare

Llama Stack SDK 0.2.14 Update

  • Update SDK to support Llama Stack server v0.2.14.
  • Update local module to reflect the latest API specs
  • Update demo app to work with the latest SDK
  • Class/method refactor to simplify implementation (full list refer to 6df1e89)
    • Message.ofCompletion --> Message.ofAssistant
  • Stainless commit 7ed6a0b5d7f54c7d0a79de35230f0046b9c72833

Contributors

@cmodi-meta, @Riandy, @seyeong-han, @WuhanMonkey

v0.2.2

14 Apr 23:05
Compare
Choose a tag to compare

Llama Stack SDK 0.2.2 Update

Update SDK to support Llama Stack v0.2.2 which includes multi-image inference.

Local RAG Support

The major update is to enable local RAG. The local RAG implementation is 100% offline and is completely on-device.

The local module SDK supports the end-to-end solution of:

  1. creating a vector DB instance
  2. creating text chunks
  3. Receiving embeddings from the Android app
  4. Storing embeddings in a vector DB
  5. Managing the agent turn with RAG tool call to receive a revalant response from the LLM.

On-device Vector DB solution: ObjectBox

Android Demo App

RAG

We've added a RAG feature in the demo app to help showcase how to use remote RAG and local RAG SDKs. With creating a document object, registering a vector db, and using RagTool from Llama Stack, the remote RAG feature contains all RAG-specific logic.

  • Improved User Experience: The remote RAG feature provides a seamless experience for users, allowing them to ask questions and receive accurate answers quickly.
  • Increased Efficiency: With the ability to process large documents and retrieve relevant information, the remote RAG feature saves time and increases productivity.

In this example, a PDF or text file (i.e. Car Manual) can easily be processed for Question-Answer inference scenarios with the user.

Also with a few code line changes, you can switch to using local RAG. That's the advantage of Llama Stack Mobile SDKs - to be able to interoperate between remote and local without major code changes!

Multi-image Inference

We've built in a sample support for being able to select multiple images and run inference with Llama 4.

Contributors

@ashwinb, @cmodi-meta, @dltn , @Riandy, @seyeong-han, @WuhanMonkey, @yanxi0830

v0.1.7

20 Mar 17:26
Compare
Choose a tag to compare

0.1.7 Release Note

  • Support for Llama Stack Server 0.1.7
  • Update local module to support the specs changes

Contributors

@ashwinb, @cmodi-meta, @Riandy, @WuhanMonkey, @yanxi0830

What's Changed

Full Changelog: v0.1.4.2...v0.1.7

v0.1.4.2

15 Mar 00:11
Compare
Choose a tag to compare

Local Inference Support

The major update is to upgrade to support ExecuTorch v0.5.0 framework for on-device inferencing. Some of the notable improvements are:

  • Include support for KleidiAI Blockwise Kernels in XNNPACK to give 20%+ gain in Llama prefill
  • Support models quantized via torchao’s quanitize_ api
  • Enable stable lowering into XNNPACK
  • Feature and fixes on Qualcomm and MediaTek backends (support to come in the future)
  • Bug fixes

It's compatible with models (.pte files) that were exported with the previous 0.4 version of ExecuTorch.

Demo App Location

To help consolidate reference material, we’ve moved the example demo apps from llama-stack-apps to llama-stack-client-kotlin.

Contributors

@ashwinb, @cmodi-meta, @dltn , @Riandy, @WuhanMonkey, @yanxi0830

v0.1.4.1

25 Feb 20:57
038528b
Compare
Choose a tag to compare

Release

Llama Stack Kotlin SDK v0.1.4.1

  • Bugfixes for #23
  • Devs should use Llama Stack server 0.1.4 with Kotlin SDK 0.1.4.1

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @ashwinb, @cmodi-meta, @dineshyv, @Riandy, @WuhanMonkey, @yanxi0830.

v0.1.4

25 Feb 01:29
833ad3e
Compare
Choose a tag to compare

Release

Llama Stack Kotlin SDK v0.1.4

  • API Support for v0.1.4 Llama Stack Server
  • Bugfixes

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @ashwinb, @cmodi-meta, @dineshyv, @Riandy, @WuhanMonkey, @yanxi0830.

v0.1.2

13 Feb 19:41
def17d0
Compare
Choose a tag to compare

Release

Llama Stack Kotlin SDK v0.1.2

  • API Support for v0.1.2 Llama Stack Server
  • ToolCall class refactor to simplify implementation
  • ResponseStreamChunk refactor
  • Url now leaves inside InterleavedContent.ImageContentItem.Image
  • Bugfixes

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @ashwinb, @cmodi-meta, @dineshyv, @Riandy, @WuhanMonkey, @yanxi0830.

v0.1.0

28 Jan 00:37
045845f
Compare
Choose a tag to compare

Release

Llama Stack launched a stable release (v0.1.0). We have updated the Kotlin client SDK to make it compatible for supporting local and remote inference on Android apps, which enables developers to build RAG applications and Agents using tools and safety shields, image reasoning, monitor those agents with telemetry, and evaluate the agent with scoring functions.

Key Features of this release

  • API Support for v0.1.0 Llama Stack Server

  • Remote Inference

    • Agentic model inference
    • Agentic tool calling
    • Image reasoning with Vision Models
  • Local Inference

    • Local inference with streaming capabilities
  • Sample llama stack applications

    • Android
  • Bugfixes

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @ashwinb, @cmodi-meta, @dineshyv, @Riandy, @WuhanMonkey, @yanxi0830.

v0.0.58

12 Dec 20:02
5ccf74e
Compare
Choose a tag to compare

In this release, we build upon our major update for the Llama Stack Kotlin Library supporting local and remote inference on Android apps (v0.0.54.1). This update introduces several key features focused around updates to support Llama Stack server v0.0.58, tool calling, and response streaming on remote inference.

Local Inference Support

  • Single and multiple custom tool calling

Remote Inference Support

  • Enabled remote support with Llama Stack server v0.0.58
  • Fix type referencing error in SDK
  • Response streaming
  • Custom Tool calling is supported in non-streaming cases but not yet available for streaming cases

Build Support

  • Modification to build-libs.sh to clean old jars and any other artifacts before a build to avoid confusion.

Supporting

The models supported in the app vary based on whether you are doing remote or local inferencing.

Remote Model Support
For remote usage, the app supports all models that the Llama Stack server backend supports. This includes a range of models from the lightweight 1B Llama models to the extensive 405B models.

Local Model Support
For on-device usage, the following models are supported:

  • Llama 3.2 Quantized 1B/3B
  • Llama 3.2 1B/3B in BF16
  • Llama 3.1 8B Quantized
  • Llama 3 8B Quantized

Framework: ExecuTorch (commit: 0a12e33)

Getting Started

  • Pointer to an Android demo app that developers can use to get started (link with tag)
  • Quick start instructions on how to add the Kotlin SDK to their Android app
  • Instructions on how a power developer can contribute to the Kotlin SDK, debug or just play with it to learn more.

If you have any questions feel free to raise an issue and we’d be happy to help!

What’s Next?

This is only the beginning with enabling features on Llama Stack to run on Android devices. We will continue to expand the capabilities of Llama Stack, new use cases, and applications! Specifically we look to focus on:

  • Agentic workflow with streaming
  • Image and Speech reasoning
  • Local/on-device agentic components like memory banks
  • Examples with RAG usage

Stay tuned on future releases and updates!

Contributors

In alphabetical order: @cmodi-meta, @dltn, @Riandy, @WuhanMonkey, @yanxi0830, and big thank you to the ExecuTorch team.

v0.0.54.1

06 Dec 22:33
0a68271
Compare
Choose a tag to compare

We are excited to announce a major update to the Llama Stack Kotlin Library which now supports both local and remote inferencing on Android apps. Building on the existing remote inferencing capabilities, this release introduces significant changes to enable seamless local inferencing integration and providing developers more flexibility with their AI workflows. This release focuses on delivering features centered around these capabilities.

Release v0.0.54.1 includes local modules as part of the Kotlin Library dependency in Maven

Local Inference Support

  • Leverage ExecuTorch on-device framework (commit: 0a12e33) for on-device inferencing.
  • Script for downloading ExecuTorch aar file.
  • Allow passing various configurations from Android app: .pte and tokenizer file, sequence length, and temperature.
  • Send stats metrics from ExecuTorch (tok/sec).
  • Handle prompt formatting based on model.
  • Support conversational history.

Remote Inference Support

  • Enabled remote support with Llama Stack server v0.0.54
  • Fix lib compile issues due to Stainless autogen invalid types (link and link)

Supporting Models

  • The models supported in the app vary based on whether you are doing remote or local inferencing.

Remote Model Support

  • For remote usage, the app supports all models that the Llama Stack server backend supports. This includes a range of models from the lightweight 1B Llama models to the extensive 405B models.

Local Model Support

For on-device usage, the following models are supported:

  • Llama 3.2 Quantized 1B/3B
  • Llama 3.2 1B/3B in BF16
  • Llama 3.1 8B Quantized
  • Llama 3 8B Quantized

Getting Started

  • Pointer to an Android demo app. (Note tag: android-0.0.54.1)
  • Quick start instructions on how to add the Kotlin SDK to their Android app
  • Instructions on how a power developer can contribute to the Kotlin SDK, debug or just play with it to learn more.

If you have any questions feel free to raise an issue and we’d be happy to help!

What’s Next?

This is only the beginning with enabling features on Llama Stack to run on Android devices. We will continue to expand the capabilities of Llama Stack, new use cases, and applications! Specifically we look to focus on:

  • Agentic workflow with streaming
  • Image and Speech reasoning
  • Local/on-device agentic components like memory banks
  • Examples with RAG usage

Stay tuned on future releases and updates!

Contributors

@cmodi-meta, @dltn, @Riandy, @WuhanMonkey, @yanxi0830, and big thank you to the ExecuTorch team.