Native AI inference for Android devices
Run GGUF models directly on your Android device with optimized performance and zero cloud dependency!
This library provides Kotlin bindings for llama.cpp, designed specifically for native Android applications. It leverages modern hardware capabilities to bring efficient large language model inference and multimodal support to mobile devices.
- Modernized Core: Native codebase synchronized with the latest
llama.cppupstream (viacui-llama.rn). - Multimodal Support: Full support for vision models (e.g., LLaVA) using
mmprojfiles. - Improved File Handling: Migrated to a robust File Descriptor (FD) passing mechanism, bypassing Android's scoped storage restrictions.
- Architecture Support: Optimized for 64-bit platforms (
arm64-v8aandx86_64). - Real-time Streaming: Enhanced JNI logic with robust UTF-8 buffering to prevent crashes during token generation.
- UI State Feedback: Improved
LlamaHelperto provide immediate feedback during image analysis phases.
Modern Android devices possess the power to run sophisticated AI models locally. Kotlin-LlamaCpp enables:
- True On-Device AI: Complete privacy, no internet required.
- Hardware Acceleration: Automatic utilization of CPU features (i8mm, dotprod) on ARM and x86.
- Multimodal Capabilities: Analyze images locally using multimodal projectors (
mmproj).
Add the dependency to your project's build.gradle:
dependencies {
implementation 'io.github.ljcamargo:llamacpp-kotlin:0.4.0'
}For most use cases, it is recommended to manage the library within an Android ViewModel. This ensures that the engine's lifecycle is correctly tied to your UI while keeping heavy computations off the main thread.
The LlamaHelper class requires three main components to initialize:
ContentResolver: Required to open local files via File Descriptors.CoroutineScope: The scope in which inference tasks will run.MutableSharedFlow<LLMEvent>: A reactive stream that emits status updates and generated tokens.
class MainViewModel(val contentResolver: ContentResolver) : ViewModel() {
private val scope = CoroutineScope(Dispatchers.IO + SupervisorJob())
// 1. Flow to collect events from the engine
private val _llmFlow = MutableSharedFlow<LlamaHelper.LLMEvent>(
extraBufferCapacity = 64,
onBufferOverflow = BufferOverflow.DROP_OLDEST
)
// 2. StateFlow to hold the accumulated text for the UI
private val _generatedText = MutableStateFlow("")
val generatedText = _generatedText.asStateFlow()
private val llamaHelper by lazy {
LlamaHelper(contentResolver, scope, _llmFlow)
}
fun generate(prompt: String) {
scope.launch {
_generatedText.value = "" // Reset text
llamaHelper.predict(prompt)
// 3. Collect events and accumulate text
_llmFlow.collect { event ->
when (event) {
is LlamaHelper.LLMEvent.Ongoing -> {
_generatedText.value += event.word
}
is LlamaHelper.LLMEvent.Done -> { /* Stop loading indicators */ }
is LlamaHelper.LLMEvent.Error -> { /* Handle error */ }
else -> {}
}
}
}
}
}On modern Android (11+), you cannot pass traditional file paths to native libraries due to Scoped Storage. You must use content:// URIs and ensure you have persistent read access.
Use registerForActivityResult and explicitly request persistable permissions. Without this, the native engine will lose access to the file once the app restarts or the URI context changes.
In your Activity/Fragment:
private val modelPickerLauncher = registerForActivityResult(
ActivityResultContracts.OpenDocument()
) { uri ->
uri?.let {
// CRITICAL: Gain long-term access to the file
contentResolver.takePersistableUriPermission(
it, Intent.FLAG_GRANT_READ_URI_PERMISSION
)
// Now you can pass uri.toString() to LlamaHelper.load()
viewModel.loadModel(it.toString())
}
}If your model is stored in your app's internal files directory, you can resolve its URI directly:
val file = File(context.filesDir, "my_model.gguf")
val modelUri = Uri.fromFile(file).toString()
llamaHelper.load(path = modelUri, contextLength = 2048) { id -> /* Loaded */ }// In your ViewModel
fun generateResponse(userPrompt: String) {
llamaHelper.predict(userPrompt)
}
// In your UI (Jetpack Compose)
@Composable
fun SimpleChat(viewModel: MainViewModel) {
// 4. Listen to the StateFlow (lifecycle-aware)
val text by viewModel.generatedText.collectAsStateWithLifecycle()
Column {
Text(text = text) // Automatically updates as tokens arrive!
Button(onClick = { viewModel.generate("Hello!") }) {
Text("Generate")
}
}
}To analyze images, you must load a base model AND an mmproj projector file.
// 1. Initialization
llamaHelper.load(
path = baseModelUri,
contextLength = 4096,
mmprojPath = mmprojUri // Provide the projector file here
) { id -> /* Multimodal Ready */ }
// 2. Inference with image
// Note: Per-prompt image injection. The helper automatically
// handles File Descriptors for the image.
llamaHelper.predict(
prompt = "What objects are in this photo?",
imagePath = selectedImageUri
)For a complete working implementation, explore the following files in the Demo App:
MainActivity.kt: Shows how to implement the file pickers and handle persistable permissions.MainViewModel.kt: Demonstrates the clean integration ofLlamaHelperwith a reactiveStateFlowUI.ChatScreen.kt: A full Jetpack Compose UI showing how to display inference progress, block inputs during analysis, and handle multimodal results.
The native C++ core is synchronized with modernized llama.cpp forks. For instructions on how to update or rebuild the native components, see the llamaCpp Library README.
Contributions are welcome! Please feel free to submit a Pull Request. This project is licensed under the MIT License.
Built upon the excellence of:
- llama.cpp (Georgi Gerganov)
- cui-llama.rn (Vali-98)
- llama.rn