Stripe Data Architecture

A production-grade data platform for a Stripe-style fintech, combining transactional storage, real-time change data capture, analytical warehousing, and batch orchestration. The project demonstrates an end-to-end flow from OLTP writes to analytics-ready marts, with infrastructure described as code.

Architecture

┌─────────────┐     ┌──────────┐     ┌─────────────┐     ┌──────────────┐
│  PostgreSQL  │────▶│ Debezium │────▶│ Apache Kafka │────▶│  Consumers   │
│    (OLTP)    │     │   (CDC)  │     │  (Streaming) │     │              │
└─────────────┘     └──────────┘     └──────┬───────┘     └──────┬───────┘
                                            │                     │
                                            ▼                     ▼
                                    ┌──────────────┐     ┌──────────────┐
                                    │   MongoDB     │     │  Snowflake   │
                                    │   (NoSQL)     │     │   (OLAP)     │
                                    └──────────────┘     └──────────────┘
                                            │                     │
                                            ▼                     ▼
                                    ┌──────────────┐     ┌──────────────┐
                                    │  ML Models   │     │  dbt Models  │
                                    │  (FastAPI)   │     │  (Transform) │
                                    └──────────────┘     └──────────────┘

                    ┌─────────────────────────────────────────────┐
                    │         Apache Airflow (Orchestration)       │
                    │    DAGs: ETL batch + refresh + monitoring    │
                    └─────────────────────────────────────────────┘

PostgreSQL is the system of record. Debezium reads the write-ahead log and streams row-level changes into Kafka, so the OLTP database is never queried for replication. Consumers fan the stream out to MongoDB (semi-structured data and ML features) and to the analytical warehouse. dbt transforms raw tables into a star schema, and Airflow orchestrates the batch ETL, the materialized-view refreshes, and the cross-system data-quality checks.

Tech Stack

OLTP: PostgreSQL 16 (range partitioning, logical replication, UUID keys)
OLAP: Snowflake (star schema)
NoSQL staging: MongoDB 7 (feature store, TTL and text indexes)
Streaming / CDC: Apache Kafka + Debezium PostgreSQL connector
Orchestration: Apache Airflow 2.8
Transformation: dbt (staging and marts models)
Serving: FastAPI fraud-scoring API
Infrastructure as Code: Terraform (AWS)

Project Structure

stripe-data-architecture/
├── docker/
│   └── docker-compose.yml          # Full local infrastructure
├── sql/
│   ├── 01_oltp_schema.sql          # Normalized OLTP schema (3NF)
│   ├── 02_oltp_indexes.sql         # Indexes and partitioning
│   ├── 03_oltp_seed_data.sql       # Synthetic seed data
│   └── 04_olap_schema.sql          # OLAP star schema
├── mongodb/
│   └── init_collections.js         # MongoDB collections and indexes
├── debezium/
│   └── connector_postgres.json     # CDC connector configuration
├── airflow/
│   └── dags/
│       ├── dag_oltp_to_olap.py     # ETL PostgreSQL to OLAP
│       ├── dag_oltp_to_mongodb.py  # Sync to MongoDB
│       └── dag_data_quality.py     # Data-quality checks
├── dbt/
│   ├── dbt_project.yml
│   └── models/
│       ├── staging/                # stg_transactions, stg_merchants
│       └── marts/                  # fact_transactions, dimensions, daily revenue
├── scripts/
│   ├── start_pipeline.sh           # One-command startup
│   ├── kafka_consumer.py           # Kafka to MongoDB consumer
│   └── fraud_scoring_api.py        # FastAPI fraud-detection service
├── terraform/
│   └── main.tf                     # Cloud infrastructure (AWS)
├── docs/
│   ├── architecture_decisions.md   # Architecture Decision Records
│   └── mcd_stripe.mermaid          # Conceptual data model
├── .env.example
└── .gitignore

Getting Started

Prerequisites

Docker and Docker Compose
Python 3.11+
gettext (provides envsubst, used to inject secrets into the Debezium connector)
Terraform (optional, only for cloud deployment)
dbt (optional, to run transformations against the warehouse)

Configuration

Copy the example environment file and adjust the values before starting anything:

cp .env.example .env

No real secret is committed. .env is gitignored. POSTGRES_PASSWORD (and the other credentials in .env.example) are demo defaults; override them in your local .env. The startup script loads .env and substitutes POSTGRES_PASSWORD into the Debezium connector config at registration time, so the password is never hard-coded in connector_postgres.json.

Running

chmod +x scripts/start_pipeline.sh
./scripts/start_pipeline.sh

# Check the running services
docker-compose -f docker/docker-compose.yml ps

Service endpoints

Service	URL	Credentials
PostgreSQL	`localhost:5432`	from your `.env`
MongoDB	`localhost:27017`	from your `.env`
Kafka UI	`http://localhost:8080`	none
Airflow	`http://localhost:8081`	from your `.env`
Debezium	`http://localhost:8083`	none

Design Notes

Key decisions are documented as Architecture Decision Records in docs/architecture_decisions.md, covering the choice of PostgreSQL for OLTP, MongoDB as a feature store, Kafka and Debezium for CDC, the OLAP star schema, Airflow for orchestration, and the security framework (PCI-DSS tokenization, AES-256 at rest, TLS 1.3, RBAC, GDPR soft-delete).

License

MIT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Stripe Data Architecture

Architecture

Tech Stack

Project Structure

Getting Started

Prerequisites

Configuration

Running

Service endpoints

Design Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
airflow/dags		airflow/dags
dbt		dbt
debezium		debezium
docker		docker
docs		docs
mongodb		mongodb
scripts		scripts
sql		sql
terraform		terraform
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Stripe Data Architecture

Architecture

Tech Stack

Project Structure

Getting Started

Prerequisites

Configuration

Running

Service endpoints

Design Notes

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages