AppleNeuralEngine-Kit

Run optimized LLMs on Apple Silicon with maximum performance

Overview

AppleNeuralEngine-Kit is a comprehensive toolkit for running Large Language Models directly on Apple Silicon using the Neural Engine. It provides optimized conversion, efficient inference, and user-friendly interfaces for working with LLMs on macOS and iOS.

✨ Key Features

Architecture-Aware Optimization: Automatically detects and optimizes models based on architecture (Llama, Qwen, Mistral, etc.)
Interactive UI: Elegant SwiftUI chat interface with conversation history
Visual Model Conversion: Convert models with a user-friendly macOS interface
Real-Time Progress Tracking: Detailed conversion progress with ETA estimates
Advanced Memory Management: Optimized multi-function chunks reduce memory usage by ~50%
KV Cache Optimization: Specialized prefill models for fast token generation
Python & Swift Integration: Seamless integration between conversion and inference
Performance Analytics: Real-time metrics for token generation

🚀 Quick Start

Installation

# Clone the repository
git clone https://github.com/antmikinka/AppleNeuralEngine-Kit.git
cd AppleNeuralEngine-Kit

# Build the project
swift build

# Install Python dependencies for model conversion (optional)
cd scripts
pip install -r requirements.txt

Running the Chat Interface

swift run ANEChat

Command Line Usage

swift run ANEToolCLI --repo-id meta-llama/Llama-3.2-1B --input-text "Tell me about neural networks"

Converting a Model

# Using the Swift CLI
swift run ANEModelConverter convert-hf --model-id meta-llama/Llama-3.2-1B --output-dir ./models

# Using the Python script with detailed progress
python scripts/convert_hf_to_coreml.py --model_path meta-llama/Llama-3.2-1B --output_path ./models --verbose

🧠 How It Works

AppleNeuralEngine-Kit uses a sophisticated model conversion process that:

Analyzes Model Architecture: Detects model type and optimizes accordingly
Splits into Specialized Components: Separates embeddings, FFN, and LM head
Optimizes for ANE: Applies architecture-specific optimizations for Apple Neural Engine
Creates Multi-Function Chunks: Combines components to minimize memory usage
Applies Quantization: Uses 4-6 bit LUT quantization for optimal size/quality balance

📊 Performance

Model	Tokens/Sec	Memory Usage	Size
Llama-3.2-1B (M1)	7.0	~1.2 GB	600MB
Llama-3.2-1B (M3)	13.9	~1.2 GB	600MB
Llama-3.2-3B (M3)	5.2	~3.5 GB	1.8GB

📚 Documentation

Usage Guide - Detailed usage instructions
Architecture - System design overview
Model Conversion - Converting models for ANE
ANE Model Architecture - Technical details
iOS Implementation - iOS deployment
Contributing - Guidelines for contributors
Changelog - Version history

🛠️ Requirements

macOS 14 (Sonoma) or newer
Apple Silicon Mac (M1/M2/M3 series)
Swift 5.9 or newer
Python 3.8+ with transformers and coremltools (for model conversion)
Xcode Command Line Tools (for CoreML compilation)

🧩 Supported Model Architectures

Llama Models (Llama 2, Llama 3, Llama 3.1, Llama 3.2)
Mistral Models (Mistral 7B, Mistral 8x7B)
Qwen Models (Qwen 1.5, Qwen 2)
QwQ Models (Quantized versions)
Phi Models (Phi-2, Phi-3)
Gemma Models (Gemma 2B, 7B)

👥 Contributing

Contributions are welcome! Please read our Contributing Guidelines before submitting a pull request.

📃 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgements

This project builds upon:

CoreML LLM CLI by Stephen Panaro
ANEMLL for ANE-optimized conversion techniques
LitGPT for model optimization techniques
Apple Silicon 4-bit quantization for efficient model sizing

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Assets.xcassets		Assets.xcassets
Sources		Sources
docs		docs
scripts		scripts
.gitignore		.gitignore
App.swift		App.swift
Info.plist		Info.plist
LICENSE		LICENSE
Package.resolved		Package.resolved
Package.swift		Package.swift
README.md		README.md
generate_xcodeproj.sh		generate_xcodeproj.sh
icon.svg		icon.svg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AppleNeuralEngine-Kit

Overview

✨ Key Features

🚀 Quick Start

Installation

Running the Chat Interface

Command Line Usage

Converting a Model

🧠 How It Works

📊 Performance

📚 Documentation

🛠️ Requirements

🧩 Supported Model Architectures

👥 Contributing

📃 License

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AppleNeuralEngine-Kit

Overview

✨ Key Features

🚀 Quick Start

Installation

Running the Chat Interface

Command Line Usage

Converting a Model

🧠 How It Works

📊 Performance

📚 Documentation

🛠️ Requirements

🧩 Supported Model Architectures

👥 Contributing

📃 License

🙏 Acknowledgements

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages