Native Swift wrapper for OpenAI's tiktoken library, providing fast BPE tokenization for OpenAI models.
TiktokenSwift brings the official tiktoken tokenizer to Swift applications through a lightweight FFI bridge, maintaining the same performance and accuracy as the original Python implementation. It supports all standard OpenAI encodings including cl100k_base
(used by GPT-3.5-turbo and GPT-4), r50k_base
, p50k_base
, o200k_base
(used by GPT-4o), and o200k_harmony
(used by gpt-oss models).
📱 Check out the example SwiftUI app to see TiktokenSwift in action!
Add TiktokenSwift to your project:
dependencies: [
.package(url: "https://github.com/narner/TiktokenSwift.git", from: "0.1.0")
]
import TiktokenSwift
// Load OpenAI's cl100k_base encoding
let encoder = try await CoreBpe.cl100kBase()
// Encode text
let text = "Hello, world!"
let tokens = encoder.encode(text: text, allowedSpecial: [])
print("Tokens: \(tokens)")
// Decode tokens (returns String? directly)
if let decoded = try encoder.decode(tokens: tokens) {
print("Decoded: \(decoded)")
}
// cl100k_base - Used by GPT-3.5-turbo and GPT-4
let cl100k = try await CoreBpe.cl100kBase()
// o200k_base - Used by GPT-4o and o3-mini
let o200k = try await CoreBpe.o200kBase()
// o200k_harmony - Used by gpt-oss models (structured output support)
let o200kHarmony = try await CoreBpe.o200kHarmony()
// Other encodings
let r50k = try await CoreBpe.r50kBase() // GPT-2 and older models
let p50k = try await CoreBpe.p50kBase() // Codex models
// Load by name
let encoder = try await CoreBpe.loadEncoding(named: "cl100k_base")
let textWithSpecial = "Hello <|endoftext|> World"
let tokensWithSpecial = encoder.encode(
text: textWithSpecial,
allowedSpecial: ["<|endoftext|>"]
)
// Or encode ordinary text (without special tokens)
let tokensOrdinary = encoder.encodeOrdinary(text: "Hello <|endoftext|> World")
// o200k_harmony has special tokens for structured output
let harmony = try await CoreBpe.o200kHarmony()
let structuredText = "Analyze <|constrain|> only positive <|return|> result"
let structuredTokens = harmony.encode(
text: structuredText,
allowedSpecial: ["<|constrain|>", "<|return|>"]
)
// Get token count for text
let text = "The quick brown fox jumps over the lazy dog"
let tokens = encoder.encode(text: text, allowedSpecial: [])
print("Token count: \(tokens.count)")
// Useful for API rate limiting
let maxTokens = 4096
if tokens.count > maxTokens {
print("Text exceeds token limit")
}
Common token limits for OpenAI models:
- GPT-4: 8,192 tokens (standard), 32,768 tokens (32k), 128,000 tokens (turbo)
- GPT-3.5-turbo: 4,096 tokens (standard), 16,385 tokens (16k)
- GPT-4o: 128,000 tokens
- o3-mini: 128,000 tokens
- gpt-oss models: 128,000 tokens
- iOS 13.0+ / macOS 10.15+ / tvOS 13.0+ / watchOS 6.0+
- Xcode 14.0+
- Swift 5.9+
- iOS: arm64
- iOS Simulator: arm64, x86_64
- macOS: arm64, x86_64
MIT License - See LICENSE file for details.
TiktokenSwift uses the same Rust-based core as the official Python tiktoken library, providing:
- Fast BPE tokenization optimized in Rust
- Thread-safe encoding/decoding operations
- Efficient memory usage with lazy vocabulary loading
The first time you use an encoding, it will download the vocabulary file (~1-2MB) from OpenAI's servers. These are cached in ~/Library/Caches/tiktoken/
for subsequent use.
If you encounter download issues:
- Check your internet connection
- Verify the cache directory has write permissions
- Try clearing the cache and re-downloading
This project provides Swift bindings for tiktoken, originally developed by OpenAI.