Skip to content

Swift bindings for OpenAI's tiktoken tokenizer using UniFFI. Count tokens, estimate costs, and manage context windows in your iOS and macOS apps

License

Notifications You must be signed in to change notification settings

narner/TiktokenSwift

Repository files navigation

TiktokenSwift

Native Swift wrapper for OpenAI's tiktoken library, providing fast BPE tokenization for OpenAI models.

TiktokenSwift brings the official tiktoken tokenizer to Swift applications through a lightweight FFI bridge, maintaining the same performance and accuracy as the original Python implementation. It supports all standard OpenAI encodings including cl100k_base (used by GPT-3.5-turbo and GPT-4), r50k_base, p50k_base, o200k_base (used by GPT-4o), and o200k_harmony (used by gpt-oss models).

📱 Check out the example SwiftUI app to see TiktokenSwift in action!

Installation

Swift Package Manager

Add TiktokenSwift to your project:

dependencies: [
    .package(url: "https://github.com/narner/TiktokenSwift.git", from: "0.1.0")
]

Quick Start

import TiktokenSwift

// Load OpenAI's cl100k_base encoding
let encoder = try await CoreBpe.cl100kBase()

// Encode text
let text = "Hello, world!"
let tokens = encoder.encode(text: text, allowedSpecial: [])
print("Tokens: \(tokens)")

// Decode tokens (returns String? directly)
if let decoded = try encoder.decode(tokens: tokens) {
    print("Decoded: \(decoded)")
}

Available Encodings

// cl100k_base - Used by GPT-3.5-turbo and GPT-4
let cl100k = try await CoreBpe.cl100kBase()

// o200k_base - Used by GPT-4o and o3-mini
let o200k = try await CoreBpe.o200kBase()

// o200k_harmony - Used by gpt-oss models (structured output support)
let o200kHarmony = try await CoreBpe.o200kHarmony()

// Other encodings
let r50k = try await CoreBpe.r50kBase()    // GPT-2 and older models
let p50k = try await CoreBpe.p50kBase()    // Codex models

// Load by name
let encoder = try await CoreBpe.loadEncoding(named: "cl100k_base")

Advanced Usage

Encoding with Special Tokens

let textWithSpecial = "Hello <|endoftext|> World"
let tokensWithSpecial = encoder.encode(
    text: textWithSpecial, 
    allowedSpecial: ["<|endoftext|>"]
)

// Or encode ordinary text (without special tokens)
let tokensOrdinary = encoder.encodeOrdinary(text: "Hello <|endoftext|> World")

// o200k_harmony has special tokens for structured output
let harmony = try await CoreBpe.o200kHarmony()
let structuredText = "Analyze <|constrain|> only positive <|return|> result"
let structuredTokens = harmony.encode(
    text: structuredText,
    allowedSpecial: ["<|constrain|>", "<|return|>"]
)

Working with Token Counts

// Get token count for text
let text = "The quick brown fox jumps over the lazy dog"
let tokens = encoder.encode(text: text, allowedSpecial: [])
print("Token count: \(tokens.count)")

// Useful for API rate limiting
let maxTokens = 4096
if tokens.count > maxTokens {
    print("Text exceeds token limit")
}

Model Token Limits

Common token limits for OpenAI models:

  • GPT-4: 8,192 tokens (standard), 32,768 tokens (32k), 128,000 tokens (turbo)
  • GPT-3.5-turbo: 4,096 tokens (standard), 16,385 tokens (16k)
  • GPT-4o: 128,000 tokens
  • o3-mini: 128,000 tokens
  • gpt-oss models: 128,000 tokens

Requirements

  • iOS 13.0+ / macOS 10.15+ / tvOS 13.0+ / watchOS 6.0+
  • Xcode 14.0+
  • Swift 5.9+

Architecture Support

  • iOS: arm64
  • iOS Simulator: arm64, x86_64
  • macOS: arm64, x86_64

License

MIT License - See LICENSE file for details.

Performance

TiktokenSwift uses the same Rust-based core as the official Python tiktoken library, providing:

  • Fast BPE tokenization optimized in Rust
  • Thread-safe encoding/decoding operations
  • Efficient memory usage with lazy vocabulary loading

Troubleshooting

Vocabulary Download Issues

The first time you use an encoding, it will download the vocabulary file (~1-2MB) from OpenAI's servers. These are cached in ~/Library/Caches/tiktoken/ for subsequent use.

If you encounter download issues:

  1. Check your internet connection
  2. Verify the cache directory has write permissions
  3. Try clearing the cache and re-downloading

Acknowledgments

This project provides Swift bindings for tiktoken, originally developed by OpenAI.

About

Swift bindings for OpenAI's tiktoken tokenizer using UniFFI. Count tokens, estimate costs, and manage context windows in your iOS and macOS apps

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published