Skip to content

mozilla-ai/encoderfile

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Project logo

πŸš€ Overview

Encoderfile packages transformer encodersβ€”optionally with classification headsβ€”into a single, self-contained executable. No Python runtime, no dependencies, no network calls. Just a fast, portable binary that runs anywhere.

While Llamafile focuses on generative models, Encoderfile is purpose-built for encoder architectures with optional classification heads. It supports embedding, sequence classification, and token classification modelsβ€”covering most encoder-based NLP tasks, from text similarity to classification and taggingβ€”all within one compact binary.

Under the hood, Encoderfile uses ONNX Runtime for inference, ensuring compatibility with a wide range of transformer architectures.

Why?

  • Smaller footprint: a single binary measured in tens-to-hundreds of megabytes, not gigabytes of runtime and packages
  • Compliance-friendly: deterministic, offline, security-boundary-safe
  • Integration-ready: drop into existing systems as a CLI, microservice, or API without refactoring your stack

Encoderfiles can run as:

  • REST API
  • gRPC microservice
  • CLI for batch processing
  • MCP server (Model Context Protocol)

Build Diagram

Supported Architectures

Encoderfile supports the following Hugging Face model classes (and their ONNX-exported equivalents):

Task Supported classes Examples models
Embeddings / Feature Extraction AutoModel, AutoModelForMaskedLM bert-base-uncased, distilbert-base-uncased
Sequence Classification AutoModelForSequenceClassification distilbert-base-uncased-finetuned-sst-2-english, roberta-large-mnli
Token Classification AutoModelForTokenClassification dslim/bert-base-NER, bert-base-cased-finetuned-conll03-english
  • βœ… All architectures must be encoder-only transformers β€” no decoders, no encoder–decoder hybrids (so no T5, no BART).
  • βš™οΈ Models must have ONNX-exported weights (path/to/your/model/model.onnx).
  • 🧠 The ONNX graph input must include input_ids and optionally attention_mask.
  • 🚫 Models relying on generation heads (AutoModelForSeq2SeqLM, AutoModelForCausalLM, etc.) are not supported.
  • XLNet, Transfomer XL, and derivative architectures are not yet supported.

πŸ“¦ Installation

Option 1: Download Pre-built CLI Tool (Recommended)

Download the encoderfile CLI tool to build your own model binaries:

curl -fsSL https://raw.githubusercontent.com/mozilla-ai/encoderfile/main/install.sh | sh
chmod +x encoderfile

Note for Windows users: Pre-built binaries are not available for Windows. Please see our guide on building from source for instructions on building from source.

Move the binary to a location in your PATH:

# Linux/macOS
sudo mv encoderfile /usr/local/bin/

# Or add to your user bin
mkdir -p ~/.local/bin
mv encoderfile ~/.local/bin/

Option 2: Build CLI Tool from Source

See our guide on building from source for detailed instructions on building the CLI tool from source.

Quick build:

cargo build --bin encoderfile --release
./target/release/encoderfile --help

πŸš€ Quick Start

Step 1: Prepare Your Model

First, you need an ONNX-exported model. Export any HuggingFace model:

# Install optimum for ONNX export
pip install optimum[exporters]

# Export a sentiment analysis model
optimum-cli export onnx \
  --model distilbert-base-uncased-finetuned-sst-2-english \
  --task text-classification \
  ./sentiment-model

Step 2: Create Configuration File

Create sentiment-config.yml:

encoderfile:
  name: sentiment-analyzer
  path: ./sentiment-model
  model_type: sequence_classification
  output_path: ./build/sentiment-analyzer.encoderfile

Step 3: Build Your Encoderfile

Use the downloaded encoderfile CLI tool:

encoderfile build -f sentiment-config.yml

This creates a self-contained binary at ./build/sentiment-analyzer.encoderfile.

Step 4: Run Your Model

Start the server:

./build/sentiment-analyzer.encoderfile serve

The server will start on http://localhost:8080 by default.

Making Predictions

Sentiment Analysis:

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": [
      "This is the cutest cat ever!",
      "Boring video, waste of time",
      "These cats are so funny!"
    ]
  }'

Response:

{
  "results": [
    {
      "logits": [0.00021549065, 0.9997845],
      "scores": [0.00021549074, 0.9997845],
      "predicted_index": 1,
      "predicted_label": "POSITIVE"
    },
    {
      "logits": [0.9998148, 0.00018516644],
      "scores": [0.9998148, 0.0001851664],
      "predicted_index": 0,
      "predicted_label": "NEGATIVE"
    },
    {
      "logits": [0.00014975034, 0.9998503],
      "scores": [0.00014975043, 0.9998503],
      "predicted_index": 1,
      "predicted_label": "POSITIVE"
    }
  ],
  "model_id": "sentiment-analyzer"
}

Embeddings:

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": ["Hello world"],
    "normalize": true
  }'

Token Classification (NER):

curl -X POST http://localhost:8080/predict \
  -H "Content-Type: application/json" \
  -d '{
    "inputs": ["Apple Inc. is located in Cupertino, California"]
  }'

🎯 Usage Modes

1. REST API Server

Start an HTTP server (default port 8080):

./my-model.encoderfile serve

Custom configuration:

./my-model.encoderfile serve \
  --http-port 3000 \
  --http-hostname 0.0.0.0

Disable gRPC (HTTP only):

./my-model.encoderfile serve --disable-grpc

2. gRPC Server

Start with default gRPC server (port 50051):

./my-model.encoderfile serve

gRPC only (no HTTP):

./my-model.encoderfile serve --disable-http

Custom gRPC configuration:

./my-model.encoderfile serve \
  --grpc-port 50052 \
  --grpc-hostname localhost

3. CLI Inference

Run one-off inference without starting a server:

# Single input
./my-model.encoderfile infer "This is a test sentence"

# Multiple inputs
./my-model.encoderfile infer "First text" "Second text" "Third text"

# Save output to file
./my-model.encoderfile infer "Test input" -o results.json

4. MCP Server

Run as a Model Context Protocol server:

./my-model.encoderfile mcp --hostname 0.0.0.0 --port 9100

πŸ”§ Server Configuration

Port Configuration

# Custom HTTP port
./my-model.encoderfile serve --http-port 3000

# Custom gRPC port
./my-model.encoderfile serve --grpc-port 50052

# Both
./my-model.encoderfile serve --http-port 3000 --grpc-port 50052

Hostname Configuration

./my-model.encoderfile serve \
  --http-hostname 127.0.0.1 \
  --grpc-hostname localhost

Service Selection

# HTTP only
./my-model.encoderfile serve --disable-grpc

# gRPC only
./my-model.encoderfile serve --disable-http

πŸ“š Documentation

πŸ› οΈ Building Custom Encoderfiles

Once you have the encoderfile CLI tool installed, you can build binaries from any compatible HuggingFace model.

See our guide on building from source for detailed instructions including:

  • How to export models to ONNX format
  • Configuration file options
  • Advanced features (Lua transforms, custom paths, etc.)
  • Troubleshooting tips

Quick workflow:

  1. Export your model to ONNX: optimum-cli export onnx ...
  2. Create a config file: config.yml
  3. Build the binary: encoderfile build -f config.yml
  4. Deploy anywhere: ./build/my-model.encoderfile serve

🀝 Contributing

We welcome contributions! See CONTRIBUTING.md for guidelines.

Development Setup

# Clone the repository
git clone https://github.com/mozilla-ai/encoderfile.git
cd encoderfile

# Set up development environment
make setup

# Run tests
make test

# Build documentation - Check command with Raz
make docs-serve

πŸ“„ License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

πŸ™ Acknowledgments

πŸ’¬ Community

About

Distribute and run transformer encoders with a single file.

Resources

License

Code of conduct

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors 5