Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
63 changes: 40 additions & 23 deletions .github/workflows/pr.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,19 +21,28 @@ jobs:
- name: Detect changed codecs
id: changed
run: |
# Get list of changed .rs files in src/ (excluding main.rs, codec.rs, lib.rs)
CHANGED_FILES=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- 'src/*.rs' | grep -vE '(main|codec|lib)\.rs$' || true)

if [ -z "$CHANGED_FILES" ]; then
echo "No codec files changed"
echo "codecs=" >> $GITHUB_OUTPUT
else
# Extract codec names (filename without .rs extension)
CODECS=$(echo "$CHANGED_FILES" | xargs -n1 basename | sed 's/\.rs$//' | tr '\n' ' ')
echo "Changed codecs: $CODECS"
echo "codecs=$CODECS" >> $GITHUB_OUTPUT
# Get list of changed Rust codec files (excluding main.rs, codec.rs, lib.rs)
RUST_CHANGES=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- 'src/*.rs' | grep -vE '(main|codec|lib)\.rs$' || true)

# Get list of changed Docker codec directories
DOCKER_CHANGES=$(git diff --name-only origin/${{ github.base_ref }}...HEAD -- 'src/*/Dockerfile' 'src/*/*.py' 'src/*/*.java' 'src/*/*.c' 'src/*/*.go' | xargs -n1 dirname 2>/dev/null | sort -u | xargs -n1 basename 2>/dev/null || true)

RUST_CODECS=""
DOCKER_CODECS=""

if [ -n "$RUST_CHANGES" ]; then
RUST_CODECS=$(echo "$RUST_CHANGES" | xargs -n1 basename | sed 's/\.rs$//' | tr '\n' ' ')
fi

if [ -n "$DOCKER_CHANGES" ]; then
DOCKER_CODECS=$(echo "$DOCKER_CHANGES" | tr '\n' ' ')
fi

echo "Rust codecs: $RUST_CODECS"
echo "Docker codecs: $DOCKER_CODECS"
echo "rust_codecs=$RUST_CODECS" >> $GITHUB_OUTPUT
echo "docker_codecs=$DOCKER_CODECS" >> $GITHUB_OUTPUT

- name: Setup Rust
uses: dtolnay/rust-toolchain@stable

Expand All @@ -54,19 +63,27 @@ jobs:
- name: Build
run: cargo build --release

- name: Run codec tests
id: test
- name: Run Rust codec tests
if: steps.changed.outputs.rust_codecs != ''
run: |
CODECS="${{ steps.changed.outputs.codecs }}"
if [ -z "$CODECS" ]; then
echo "No codec changes detected, running all codecs"
cargo run --release 2>&1 | tee results.txt
else
for codec in $CODECS; do
echo "Testing codec: $codec"
cargo run --release -- --codec "$codec" 2>&1 | tee -a results.txt
done
fi
for codec in ${{ steps.changed.outputs.rust_codecs }}; do
echo "Testing Rust codec: $codec"
cargo run --release -- --codec "$codec" 2>&1 | tee -a results.txt
done

- name: Run Docker codec tests
if: steps.changed.outputs.docker_codecs != ''
run: |
for codec in ${{ steps.changed.outputs.docker_codecs }}; do
echo "Testing Docker codec: $codec"
cargo run --release -- --docker --codec "$codec" 2>&1 | tee -a results.txt
done

- name: Run all codecs (no changes detected)
if: steps.changed.outputs.rust_codecs == '' && steps.changed.outputs.docker_codecs == ''
run: |
echo "No codec changes detected, running all codecs"
cargo run --release -- --docker 2>&1 | tee results.txt

- name: Check formatting
run: cargo fmt --check
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ Thumbs.db
*.swo
*~

# Worktrees
.worktrees/

# Data (only data.json.gz is tracked)
*.json
*.json.gz
Expand Down
43 changes: 43 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,49 @@ let codecs: Vec<(Box<dyn EventCodec>, &[(EventKey, EventValue)])> = vec![
- PRs must add a single file: `src/<your-github-username>.rs`
- **Submission deadline: March 1st, 2025** — evaluation dataset revealed and winners announced

## External Codecs (Non-Rust)

You can submit codecs in any language by creating a Docker container that implements the encode/decode ABI.

### Directory Structure

```
src/<github-username>-<lang>/
├── Dockerfile
└── <your implementation files>
```

### ABI Requirements

Your container must accept `encode` or `decode` as the first argument:

```bash
# Encode: JSON events in via stdin, compressed bytes out via stdout
docker run <image> encode < events.json > compressed.bin

# Decode: Compressed bytes in via stdin, JSON events out via stdout
docker run <image> decode < compressed.bin > events.json
```

### Example Dockerfile (Python)

```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY codec.py .
ENTRYPOINT ["python", "codec.py"]
```

### Testing Locally

```bash
# Run with Docker support
cargo run --release -- --docker

# Test specific external codec
cargo run --release -- --codec <name>-<lang>
```

## Generating Your Own Evaluation Dataset

Want to test your codec against different data? You can generate your own dataset
Expand Down
245 changes: 245 additions & 0 deletions docs/plans/2026-01-30-multi-language-support-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,245 @@
# Multi-Language Codec Support

## Goal

Maximize participation in the compression-golf competition by allowing submissions in languages other than Rust (Python, Java, C, Go, etc.).

## Overview

External codecs are Docker containers that implement a standard ABI. The existing Rust harness orchestrates building and running these containers, then verifies correctness using the same round-trip validation as Rust codecs.

## ABI Specification

External codecs are Docker containers with an entrypoint that accepts two subcommands:

```bash
# Encode: JSON events in via stdin, compressed bytes out via stdout
docker run <image> encode < events.json > compressed.bin

# Decode: Compressed bytes in via stdin, JSON events out via stdout
docker run <image> decode < compressed.bin > events.json
```

### Input/Output Format

- **Encode input**: Line-delimited JSON (same format as `data.json`)
- **Encode output**: Raw bytes (the codec's proprietary compressed format)
- **Decode input**: The exact bytes produced by encode
- **Decode output**: Line-delimited JSON (must match original input exactly)

### Exit Codes

- `0` = success
- Non-zero = error (harness captures stderr for diagnostics)

### Verification

The harness calls encode, then decode, then compares the decoded output to the original input. If they match byte-for-byte, the submission is valid. This is the same integrity guarantee as Rust submissions.

## Directory Structure

External codecs use a convention-based directory structure:

```
src/
├── alice.rs # Rust codec (existing)
├── bob.rs # Rust codec (existing)
├── alice-python/ # External codec: name=alice, lang=python
│ ├── Dockerfile
│ ├── codec.py
│ └── requirements.txt
├── carol-java/ # External codec: name=carol, lang=java
│ ├── Dockerfile
│ └── Codec.java
└── dave-c/ # External codec: name=dave, lang=c
├── Dockerfile
└── codec.c
```

### Naming Convention

- Directory name format: `{name}-{lang}`
- `name` can match an existing Rust codec (e.g., `alice.rs` and `alice-python/` can coexist)
- `lang` should be from common identifiers: `python`, `java`, `c`, `cpp`, `go`, `rust`, etc.

### Auto-Discovery

The harness discovers external codecs by scanning for `src/*/Dockerfile`. No configuration file needed.

## Dockerfile Requirements

1. Must accept `encode` or `decode` as the command argument
2. Must read from stdin, write to stdout
3. Must exit 0 on success, non-zero on failure

### Example: Python

```dockerfile
FROM python:3.11-slim
WORKDIR /app
COPY . .
RUN pip install -r requirements.txt
ENTRYPOINT ["python", "codec.py"]
```

### Example: Java

```dockerfile
FROM eclipse-temurin:21
WORKDIR /app
COPY . .
RUN javac Codec.java
ENTRYPOINT ["java", "Codec"]
```

### Example: C

```dockerfile
FROM gcc:13
WORKDIR /app
COPY . .
RUN gcc -O3 -o codec codec.c
ENTRYPOINT ["./codec"]
```

## CLI Interface

### Flags

```bash
# Default: Rust codecs only (current behavior)
cargo run --release

# Include external Docker codecs
cargo run --release -- --docker

# Target specific codec - harness auto-detects type
cargo run --release -- --codec alice # finds alice.rs (Rust)
cargo run --release -- --codec alice-python # finds src/alice-python/Dockerfile (Docker)
```

### Codec Resolution

When `--codec <name>` is specified:

1. Check if `name` is a registered Rust codec → run as Rust
2. Check if `src/{name}/Dockerfile` exists → run as Docker
3. Neither → error: "Unknown codec: {name}"

Targeting a Docker codec with `--codec` implicitly enables Docker mode.

## Harness Implementation

The Rust harness handles the full lifecycle:

```rust
fn build_external_codec(name: &str) -> Result<String> {
let path = format!("src/{}", name);

Command::new("docker")
.args(["build", "-t", name, &path])
.status()?;

Ok(name.to_string())
}

fn run_encode(image: &str, input: &[u8]) -> Result<Vec<u8>> {
let mut child = Command::new("docker")
.args(["run", "-i", "--rm", image, "encode"])
.stdin(Stdio::piped())
.stdout(Stdio::piped())
.stderr(Stdio::piped())
.spawn()?;

child.stdin.take().unwrap().write_all(input)?;
let output = child.wait_with_output()?;

if !output.status.success() {
return Err(/* error with stderr */);
}

Ok(output.stdout)
}

fn run_decode(image: &str, input: &[u8]) -> Result<Vec<u8>> {
// Same pattern as run_encode but with "decode" argument
}
```

## CI/CD Integration

### PR Workflow Updates

The workflow detects both Rust and Docker codec changes:

```yaml
# Detect Rust codec changes (existing)
rust_changes=$(git diff --name-only origin/main...HEAD -- 'src/*.rs')

# Detect Docker codec changes (new)
docker_changes=$(git diff --name-only origin/main...HEAD -- 'src/*/Dockerfile' 'src/*/**')
```

### Workflow Behavior

| Change detected | Action |
|-----------------|--------|
| `src/alice.rs` | `cargo run --release -- --codec alice` |
| `src/alice-python/*` | `cargo run --release -- --codec alice-python` |
| Both | Run both targeted tests |

GitHub Actions runners have Docker pre-installed, so no special setup is needed.

### Constraints

- Timeout: Uses existing CI timeout (harness inherits runner limits)
- Build on every PR (can optimize with caching later)

## Error Handling

| Error | Harness response |
|-------|------------------|
| `docker build` fails | Report build error, show stderr, skip codec |
| `encode` times out | Report timeout, skip codec |
| `encode` exits non-zero | Report error, show stderr, skip codec |
| `decode` fails | Report decode error, skip codec |
| Round-trip mismatch | Report verification failure, show sample mismatches |

### Output Format

External codecs appear in the same results table as Rust codecs:

```
┌────────────────────────┬────────────────┬────────────┐
│ Codec │ Size │ vs Naive │
├────────────────────────┼────────────────┼────────────┤
│ Naive │ 210,727,389 │ baseline │
│ XiangpengHao │ 6,847,283 │ -96.7% │
│ alice-python │ 7,102,445 │ -96.6% │
│ bob-java │ 8,234,112 │ -96.1% │
│ carol-c [BUILD FAILED] │ - │ - │
└────────────────────────┴────────────────┴────────────┘
```

### Debugging

- `--verbose` flag shows full Docker build output and stderr
- Failed codecs don't block other codecs from running
- Temporary files preserved on failure for inspection

## Submission Process

To submit an external codec:

1. Create `src/<github-username>-<lang>/` directory
2. Add a `Dockerfile` implementing the ABI
3. Add your codec implementation
4. Test locally: `cargo run --release -- --codec <name>`
5. Submit PR with the new directory

## Future Optimizations

- Docker layer caching in CI
- Pre-built base images for common languages
- Optional memory limits via `docker run --memory`
- Image registry for faster CI (authors push pre-built images)
Loading