The Open Data Platform for your community Open Data
Datadex is a fully open-source, serverless, and local-first Data Platform to improve how communities collaborate on Open Data. Why?
- Increase your community's coordination and shared understanding.
- Makes it easy to publish data products built by your community, for your community.
Note
The previous version of Datadex, which utilized Dagster and DuckDB, can be found at this commit.
Check other real-world production Open Data Portals of the Datadex pattern in the following repositories:
- LUNG-SARG. The Open Data Platform for Sustainable, Accessible Lung Radiogenomics.
- Datania. An Open Data Platform at national level that unifies and harmonizes information from different sources.
- Gitcoin Grants Data Portal. A Data hub for Gitcoin Grants data and related models.
- Filecoin Data Portal. A data portal for data related to the Filecoin network and ecosystem.
- Open: Code, standards, infrastructure, and data, all public and open source. Rely on open source tools, standards, public infrastructure, and accessible data formats.
- Modular and Interoperable: Easy to replace, extend or remove components of the pattern. Environment flexibility (your laptop, in a cluster, or from the browser) when running and when deploying (S3 + GH Pages, IPFS, Hugging Face).
- Permissionless: Any improvement is one Pull Request away. Update pipelines, add datasets, or improve documentation. When consuming, there are no API limits, just plain files.
- Data as Code: Reproducible datasets with declarative stateless transformations tracked in
git
. Data is versioned alongside the code. - Glue: Be a bridge between tools and approaches. E.g: Use software engineering good practices like types, tests, materialized views, and more.
Datadex is a Python project. The easiest way to get started is using a Python virtual environment.
If you hit any issue, please open an issue!
Install uv
and let it manage the Python environment. The following commands will install the dependencies.
make setup
Alternatively, you can rely on your system's Python installation to create a virtual environment and install the dependencies.
# Create a virtual environment
python3 -m venv .venv
source .venv/bin/activate
# Install the package and dependencies
pip install -e .
You can use VSCode Remote Containers to get started with Datadex too. If you have Docker installed and running, open the project in VSCode and click on the bottom right corner to open the project in a container.
The development environment can also run in your browser thanks to GitHub Codespaces!
Datadex is licensed under the MIT License. See the LICENSE file for details.