Kotlin-LlamaCpp

Implementing GGUF Local Inference into Android Devices with EASE

Native AI inference for Android devices

Run GGUF models directly on your Android device with optimized performance and zero cloud dependency!

This library provides Kotlin bindings for llama.cpp, designed specifically for native Android applications. It leverages modern hardware capabilities to bring efficient large language model inference and multimodal support to mobile devices.

Changelog

v0.4.0 (Latest)

Modernized Core: Native codebase synchronized with the latest llama.cpp upstream (via cui-llama.rn).
Multimodal Support: Full support for vision models (e.g., LLaVA) using mmproj files.
Improved File Handling: Migrated to a robust File Descriptor (FD) passing mechanism, bypassing Android's scoped storage restrictions.
Architecture Support: Optimized for 64-bit platforms (arm64-v8a and x86_64).
Real-time Streaming: Enhanced JNI logic with robust UTF-8 buffering to prevent crashes during token generation.
UI State Feedback: Improved LlamaHelper to provide immediate feedback during image analysis phases.

Why On-Device AI?

Modern Android devices possess the power to run sophisticated AI models locally. Kotlin-LlamaCpp enables:

True On-Device AI: Complete privacy, no internet required.
Hardware Acceleration: Automatic utilization of CPU features (i8mm, dotprod) on ARM and x86.
Multimodal Capabilities: Analyze images locally using multimodal projectors (mmproj).

Getting Started

1. Installation

Add the dependency to your project's build.gradle:

dependencies {
    implementation 'io.github.ljcamargo:llamacpp-kotlin:0.4.0'
}

2. Architecture: The `LlamaHelper` Pattern

For most use cases, it is recommended to manage the library within an Android ViewModel. This ensures that the engine's lifecycle is correctly tied to your UI while keeping heavy computations off the main thread.

The LlamaHelper class requires three main components to initialize:

ContentResolver: Required to open local files via File Descriptors.
CoroutineScope: The scope in which inference tasks will run.
MutableSharedFlow<LLMEvent>: A reactive stream that emits status updates and generated tokens.

Recommended Setup in `MainViewModel.kt`

class MainViewModel(val contentResolver: ContentResolver) : ViewModel() {
    private val scope = CoroutineScope(Dispatchers.IO + SupervisorJob())

    // 1. Flow to collect events from the engine
    private val _llmFlow = MutableSharedFlow<LlamaHelper.LLMEvent>(
        extraBufferCapacity = 64,
        onBufferOverflow = BufferOverflow.DROP_OLDEST
    )

    // 2. StateFlow to hold the accumulated text for the UI
    private val _generatedText = MutableStateFlow("")
    val generatedText = _generatedText.asStateFlow()

    private val llamaHelper by lazy {
        LlamaHelper(contentResolver, scope, _llmFlow)
    }

    fun generate(prompt: String) {
        scope.launch {
            _generatedText.value = "" // Reset text
            llamaHelper.predict(prompt)
            
            // 3. Collect events and accumulate text
            _llmFlow.collect { event ->
                when (event) {
                    is LlamaHelper.LLMEvent.Ongoing -> {
                        _generatedText.value += event.word
                    }
                    is LlamaHelper.LLMEvent.Done -> { /* Stop loading indicators */ }
                    is LlamaHelper.LLMEvent.Error -> { /* Handle error */ }
                    else -> {}
                }
            }
        }
    }
}

Handling Local Files (The URI Caveat)

On modern Android (11+), you cannot pass traditional file paths to native libraries due to Scoped Storage. You must use content:// URIs and ensure you have persistent read access.

Scenario A: User selects a model via File Picker

Use registerForActivityResult and explicitly request persistable permissions. Without this, the native engine will lose access to the file once the app restarts or the URI context changes.

In your Activity/Fragment:

private val modelPickerLauncher = registerForActivityResult(
    ActivityResultContracts.OpenDocument()
) { uri ->
    uri?.let {
        // CRITICAL: Gain long-term access to the file
        contentResolver.takePersistableUriPermission(
            it, Intent.FLAG_GRANT_READ_URI_PERMISSION
        )
        // Now you can pass uri.toString() to LlamaHelper.load()
        viewModel.loadModel(it.toString())
    }
}

Scenario B: Loading a fixed model from App Storage

If your model is stored in your app's internal files directory, you can resolve its URI directly:

val file = File(context.filesDir, "my_model.gguf")
val modelUri = Uri.fromFile(file).toString()
llamaHelper.load(path = modelUri, contextLength = 2048) { id -> /* Loaded */ }

Usage Examples

Basic Text Completion

// In your ViewModel
fun generateResponse(userPrompt: String) {
    llamaHelper.predict(userPrompt)
}

// In your UI (Jetpack Compose)
@Composable
fun SimpleChat(viewModel: MainViewModel) {
    // 4. Listen to the StateFlow (lifecycle-aware)
    val text by viewModel.generatedText.collectAsStateWithLifecycle()

    Column {
        Text(text = text) // Automatically updates as tokens arrive!
        Button(onClick = { viewModel.generate("Hello!") }) {
            Text("Generate")
        }
    }
}

Multimodal (Image Analysis)

To analyze images, you must load a base model AND an mmproj projector file.

// 1. Initialization
llamaHelper.load(
    path = baseModelUri,
    contextLength = 4096,
    mmprojPath = mmprojUri // Provide the projector file here
) { id -> /* Multimodal Ready */ }

// 2. Inference with image
// Note: Per-prompt image injection. The helper automatically 
// handles File Descriptors for the image.
llamaHelper.predict(
    prompt = "What objects are in this photo?",
    imagePath = selectedImageUri 
)

Deep Dive into the Demo App

For a complete working implementation, explore the following files in the Demo App:

MainActivity.kt: Shows how to implement the file pickers and handle persistable permissions.
MainViewModel.kt: Demonstrates the clean integration of LlamaHelper with a reactive StateFlow UI.
ChatScreen.kt: A full Jetpack Compose UI showing how to display inference progress, block inputs during analysis, and handle multimodal results.

Native Code Maintenance

The native C++ core is synchronized with modernized llama.cpp forks. For instructions on how to update or rebuild the native components, see the llamaCpp Library README.

Contributing & License

Contributions are welcome! Please feel free to submit a Pull Request. This project is licensed under the MIT License.

Acknowledgments

Built upon the excellence of:

llama.cpp (Georgi Gerganov)
cui-llama.rn (Vali-98)
llama.rn

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github		.github
.idea		.idea
app		app
gradle		gradle
llamaCpp		llamaCpp
scripts		scripts
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Kotlin-LlamaCpp

Implementing GGUF Local Inference into Android Devices with EASE

Changelog

v0.4.0 (Latest)

Why On-Device AI?

Getting Started

1. Installation

2. Architecture: The `LlamaHelper` Pattern

Recommended Setup in `MainViewModel.kt`

Handling Local Files (The URI Caveat)

Scenario A: User selects a model via File Picker

Scenario B: Loading a fixed model from App Storage

Usage Examples

Basic Text Completion

Multimodal (Image Analysis)

Deep Dive into the Demo App

Native Code Maintenance

Contributing & License

Acknowledgments

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Uh oh!

Contributors

Uh oh!

Languages

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

Kotlin-LlamaCpp

Implementing GGUF Local Inference into Android Devices with EASE

Changelog

v0.4.0 (Latest)

Why On-Device AI?

Getting Started

1. Installation

2. Architecture: The LlamaHelper Pattern

Recommended Setup in MainViewModel.kt

Handling Local Files (The URI Caveat)

Scenario A: User selects a model via File Picker

Scenario B: Loading a fixed model from App Storage

Usage Examples

Basic Text Completion

Multimodal (Image Analysis)

Deep Dive into the Demo App

Native Code Maintenance

Contributing & License

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

2. Architecture: The `LlamaHelper` Pattern

Recommended Setup in `MainViewModel.kt`

Packages