Backend / Data engineer based in the UK, focused on Python, ETL pipelines, and web scraping.
A multi-source ETL pipeline for electronic music event discovery. Scrapes events from listing platforms and venue websites, enriches artist data from SoundCloud, Discogs, and Bandcamp, then scores and ranks events using a configurable algorithm.
Built with: Python 3.12, FastAPI, httpx (async), SQLite, Docker, pandas, BeautifulSoup
Highlights:
- 587 tests across 40 files (unit, integration, security, e2e)
mypy --strictacross the entire codebase- CI pipeline: lint, typecheck, test, security audit
- Async enrichment pipeline with retry/backoff/jitter
- Thread-safe concurrency (SQLite WAL, semaphores, ThreadPoolExecutor)
- Incremental processing via SHA-256 lineup hashing
