Lumina offers a cutting-edge 🛠️ for Delphi developers to seamlessly integrate advanced generative AI capabilities into their 📱. Built on the computational backbone of llama.cpp 🐪, Lumina prioritizes data privacy 🔒, performance ⚡, and a user-friendly API 📚, making it a powerful tool for local AI inference 🤖.
- Localized Processing 🏠: Operates entirely offline, ensuring sensitive data remains confidential 🛡️ while offering complete computational control 🧠.
- Broad Model Compatibility 🌐: Supports GGUF models compliant with llama.cpp standards, granting access to diverse AI architectures 🧩.
- Intuitive Development Interface 🎛️: A concise, flexible API simplifies model management 🗂️, inference execution 🧮, and callback customization 🎚️, minimizing implementation complexity.
- Future-Ready Scalability 🚀: This release emphasizes stability 🏗️ and foundational features, with plans for multi-turn conversation 💬 and retrieval-augmented generation (RAG) 🔍 in future updates.
Lumina expands your development toolkit 🎒 with capabilities such as:
- Dynamic chatbot creation 💬.
- Automated text generation 📝 and summarization 📰.
- Context-sensitive content generation ✍️.
- Real-time inference for adaptive processes ⚡.
- Operates independently of external networks 🛡️, guaranteeing data security.
- Uses Vulkan 🖥️ for optional GPU acceleration to enhance performance.
- Configurable GPU utilization through the
AGPULayers
parameter 🧩. - Dynamic thread allocation based on hardware capabilities 🖥️ via
AMaxThreads
. - Comprehensive performance metrics 📊, offering insights into throughput 📈 and efficiency.
- Embedded dependencies eliminate the need for external libraries 📦.
- Lightweight architecture (~2.5MB overhead) ensures broad deployment compatibility 🌍.
-
Download the Repository 📦
- Download here and extract the files to your preferred directory 📂.
-
Acquire a GGUF Model 🧠
- Obtain a model from Hugging Face, such as Gemma 2.2B GGUF (Q8_0). Save it to a directory accessible to your application (e.g.,
C:/LLM/GGUF
) 💾.
- Obtain a model from Hugging Face, such as Gemma 2.2B GGUF (Q8_0). Save it to a directory accessible to your application (e.g.,
-
Ensure GPU Compatibility 🎮
- Verify Vulkan compatibility for enhanced performance ⚡. Adjust
AGPULayers
as needed to accommodate VRAM limitations 📉.
- Verify Vulkan compatibility for enhanced performance ⚡. Adjust
-
✨ TLumina Class
- 📜 Add
Lumina
to youruses
section. - 🛠️ Create an instance of
TLumina
. - 🚀 All functionality will then be at your disposal. That simple! 🎉
- 📜 Add
-
Explore Examples 🔍
- Check the
examples
directory for detailed usage demonstrations 📚.
- Check the
Integrate Lumina into your Delphi project 🖥️:
var
Lumina: TLumina;
begin
Lumina := TLumina.Create;
try
if Lumina.LoadModel('C:\LLM\GGUF\gemma-2-2b-it-abliterated-Q8_0.gguf',
'', 8192, -1, 8) then
begin
if Lumina.SimpleInference('What is the capital of Italy?') then
WriteLn('Inference completed successfully.')
else
WriteLn('Error: ', Lumina.GetError);
end;
finally
Lumina.Free;
end;
end;
Define custom behavior using Lumina’s callback functions 🛠️:
procedure NextTokenCallback(const AToken: string; const AUserData: Pointer);
begin
Write(AToken);
end;
Lumina.SetNextTokenCallback(NextTokenCallback, nil);
-
LoadModel 📂
- Parameters:
AModelFilename
: Path to the GGUF model file 📄.ATemplate
: Optional inference template 📝.AMaxContext
: Maximum context size (default: 512) 🧠.AGPULayers
: GPU layer configuration (-1 for maximum) 🎮.AMaxThreads
: Number of CPU threads allocated 🖥️.
- Returns a boolean indicating success ✅.
- Parameters:
-
SimpleInference 🧠
- Accepts a single query for immediate processing 📝.
- Returns a boolean indicating success ✅.
-
SetNextTokenCallback 💬
- Assigns a handler to process tokens during inference 🧩.
-
UnloadModel ❌
- Frees resources allocated during model loading 🗑️.
-
GetPerformanceResult 📊
- Provides metrics, including token generation rates 📈.
Lumina will use the template defined in the model's meta data by default, but you can also define custom templates to match your model’s requirements or change its behavor. These are some common model templates ✍️:
const
CHATML_TEMPLATE = '<|im_start|>{role} {content}<|im_end|><|im_start|>assistant';
GEMMA_TEMPLATE = '<start_of_turn>{role} {content}<end_of_turn>';
PHI_TEMPLATE = '<|{role}|> {content}<|end|><|assistant|>';
- {role} - will be replaced with the role (user, assistant, etc.)
- {content} - will be replaced with the content sent to the model
AGPULayers
values:-1
: Utilize all available layers (default) 🖥️.0
: CPU-only processing 🖥️.- Custom values for partial GPU utilization 🎛️.
Retrieve detailed operational metrics 📈:
var
Perf: TLumina.PerformanceResult;
begin
Perf := Lumina.GetPerformanceResult;
WriteLn('Tokens/Sec: ', Perf.TokensPerSecond);
WriteLn('Input Tokens: ', Perf.TotalInputTokens);
WriteLn('Output Tokens: ', Perf.TotalOutputTokens);
end;
Discover in-depth discussions and insights about Lumina and its innovative features. 🚀✨
Lumina.Deep.Dive.mp4
- Report issues via the Issue Tracker 🐞.
- Engage in discussions on the Forum and Discord 💬.
- Learn more at Learn Delphi 📚.
Contributions to ✨ Lumina are highly encouraged! 🌟
- 🐛 Report Issues: Submit issues if you encounter bugs or need help.
- 💡 Suggest Features: Share your ideas to make Lumina even better.
- 🔧 Create Pull Requests: Help expand the capabilities and robustness of the library.
Your contributions make a difference! 🙌✨
Lumina is distributed under the 🆓 BSD-3-Clause License, allowing for redistribution and use in both source and binary forms, with or without modification, under specific conditions. See the LICENSE file for more details.
Advance your Delphi applications with Lumina 🌟 – a sophisticated solution for integrating local generative AI 🤖.