Skip to content

First party data pipeline: Analytics.js > YOUR Cloudflare data pipeline > R2 (Iceberg) > DuckDB + R2 SQL πŸš€

License

Notifications You must be signed in to change notification settings

cliftonc/icelight

icelight

First party product analytics platform. Stream analytics.js events to Apache Iceberg tables on Cloudflare R2 - a very cost effective replacement for Google Analytics, MixPanel etc.

Overview

icelight provides a complete solution for collecting analytics events and storing them in queryable Iceberg tables using Cloudflare's infrastructure:

  • Event Ingestion: RudderStack/Segment-compatible HTTP endpoints
  • Data Storage: Apache Iceberg tables on R2 with automatic compaction
  • Query API: SQL queries via R2 SQL or DuckDB, plus a semantic layer
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Your App /    │────▢│  Event Ingest   │────▢│   Cloudflare    β”‚
β”‚  Analytics SDK  β”‚     β”‚    Worker       β”‚     β”‚    Pipeline     β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                                         β”‚
                        β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”     β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”
                        β”‚   Query API     │◀────│  R2 + Iceberg   β”‚
                        β”‚    Worker       β”‚     β”‚   Data Catalog  β”‚
                        β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜     β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Live Demo: https://try.icelight.dev

Prerequisites

Quick Start

1. Clone & Install

git clone https://github.com/cliftonc/icelight.git
cd icelight
pnpm install

2. Login to Cloudflare

npx wrangler login

3. Launch Everything

pnpm launch

Enter a project name when prompted. The script will:

  • Create an R2 bucket with Data Catalog enabled
  • Create and configure the Pipeline (stream, sink, pipeline)
  • Deploy the Event Ingest worker
  • Deploy the Query API worker (this is the same code as https://try.icelight.dev)
  • Deploy the DuckDB container and API

Once complete, you'll see your worker URLs. You can run this again if you see any issues, as it inspects your cloudflare environment and will attempt to resolve any changes.

4. Open the Web UI

Visit your Query API URL in a browser:

https://icelight-query-api.YOUR-SUBDOMAIN.workers.dev

The Web UI includes:

  • Analysis Builder: Visual query builder with charts
  • R2 SQL: Direct SQL queries against your Iceberg tables
  • DuckDB: Full SQL support (JOINs, aggregations, window functions)
  • Event Simulator: Send test events using the RudderStack SDK

5. Finished?

If for any reason you thought this was interesting, but not that useful (I'd love to know why via an issue), you can clean up:

pnpm teardown

This command will remove everything created in your Cloudflare account, including any data loaded into the bucket.

Client SDK Integration

RudderStack / Segment

Icelight simply uses the open source Analytics.js library (from Segment and Rudderstack).

import { Analytics } from '@rudderstack/analytics-js';

const analytics = new Analytics({
  writeKey: 'any-value',
  dataPlaneUrl: 'https://icelight-event-ingest.YOUR-SUBDOMAIN.workers.dev'
});

analytics.track('Purchase Completed', { orderId: '12345', revenue: 99.99 });
analytics.identify('user-123', { email: '[email protected]', plan: 'premium' });

Direct HTTP

You can also send messages directly, e.g. from your backend:

# Track event
curl -X POST https://YOUR-WORKER.workers.dev/v1/track \
  -H "Content-Type: application/json" \
  -d '{"userId":"user-123","event":"Button Clicked","properties":{"button":"signup"}}'

# Batch events
curl -X POST https://YOUR-WORKER.workers.dev/v1/batch \
  -H "Content-Type: application/json" \
  -d '{"batch":[
    {"type":"track","userId":"u1","event":"Page View"},
    {"type":"identify","userId":"u1","traits":{"name":"John"}}
  ]}'

Querying Data

Via Web UI

The Query API includes a web-based explorer at your worker URL with R2 SQL, DuckDB, and a visual Analysis Builder.

Via API

# R2 SQL query
curl -X POST https://icelight-query-api.YOUR-SUBDOMAIN.workers.dev/query \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT * FROM analytics.events LIMIT 10"}'

# DuckDB query (full SQL support)
curl -X POST https://icelight-query-api.YOUR-SUBDOMAIN.workers.dev/duckdb \
  -H "Content-Type: application/json" \
  -d '{"query": "SELECT type, COUNT(*) FROM r2_datalake.analytics.events GROUP BY type"}'

# Semantic API query
curl -X POST https://icelight-query-api.YOUR-SUBDOMAIN.workers.dev/cubejs-api/v1/load \
  -H "Content-Type: application/json" \
  -d '{"query": {"dimensions": ["Events.type"], "measures": ["Events.count"], "limit": 100}}'

# Get CSV output
curl -X POST https://icelight-query-api.YOUR-SUBDOMAIN.workers.dev/query \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT * FROM analytics.events LIMIT 10", "format": "csv"}'

# List tables
curl https://icelight-query-api.YOUR-SUBDOMAIN.workers.dev/tables/analytics

# Describe table schema
curl https://icelight-query-api.YOUR-SUBDOMAIN.workers.dev/tables/analytics/events

Via External Tools

Connect PyIceberg, DuckDB, or Spark to your R2 Data Catalog. See Cloudflare R2 SQL docs for connection details.

API Endpoints

Ingestion Worker

Thes are the standard Analytics.js endpoints:

Endpoint Method Description
/v1/batch POST Batch events (primary)
/v1/track POST Single track event
/v1/identify POST Single identify event
/v1/page POST Single page event
/v1/screen POST Single screen event
/v1/group POST Single group event
/v1/alias POST Single alias event
/health GET Health check

Query API Worker

Endpoint Method Description
/query POST Execute R2 SQL query
/duckdb POST Execute DuckDB query (full SQL)
/tables/:namespace GET List tables in namespace
/tables/:namespace/:table GET Describe table schema
/cubejs-api/v1/meta GET Get Cube-js compatible semantic layer metadata
/cubejs-api/v1/load POST Execute Cube-js compatible semantic query
/health GET Health check

Development

# Run setup
pnpm install
pnpm launch

# Run ingest worker locally
pnpm dev:ingest

# Run query worker locally
pnpm dev:query

# Build all packages
pnpm build

# Type check
pnpm typecheck

The DuckDB container does not work locally, as there is no current local solution for Cloudflare Containers. Note that the Ingest and Query workers connect to remote infrastructure in Cloudflare.

Cleanup

pnpm teardown

Troubleshooting

"send is not a function" error

Ensure compatibility_date in wrangler.local.jsonc is "2025-01-01" or later. The Pipelines send() method requires this.

"Not logged in" error

Run npx wrangler login and complete the browser authorization flow.

Pipeline binding not working

  1. Check that wrangler.local.jsonc exists in workers/event-ingest/ - if not, run pnpm launch
  2. Verify the pipeline binding in wrangler.local.jsonc has the correct stream ID
  3. Run npx wrangler pipelines streams list to see your streams
  4. Redeploy after any config changes: pnpm deploy:ingest

Query API returns empty data

  1. Check that data has been flushed to R2 (pipelines have a 5-minute flush interval by default)
  2. Verify the WAREHOUSE_NAME in workers/query-api/wrangler.local.jsonc matches your bucket name
  3. Check that CF_ACCOUNT_ID and CF_API_TOKEN secrets are set correctly

"wrangler.local.jsonc not found" error

Run pnpm launch to create the local configuration files with your pipeline bindings.

Limitations

  • Cloudflare Pipelines: Currently in open beta - API may change
  • R2 SQL: Read-only, limited query support (improving in 2026)
  • Local Development: Pipelines require --remote flag for full testing

Documentation

Links

License

MIT

About

First party data pipeline: Analytics.js > YOUR Cloudflare data pipeline > R2 (Iceberg) > DuckDB + R2 SQL πŸš€

Resources

License

Code of conduct

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages