Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
-
Updated
Dec 21, 2024 - Rust
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
The simplest way to run Python on lot's of computers.
Data pipelines from re-usable components
The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.
A project structure for doing and sharing data engineer work.
Clean API primitives for data cleaning in Pyspark. Inspired by PyJanitor, Dataprep.AI and Shadcn.
Lien de l'application
Build ETL piplines on AirFlow to load data from BigQuery and store it in MySQL
DataSift auto applies a data pre-processing pipeline to Data Science Projects.
Big Data ETL pipeline for Brazilian e-commerce data. Implements data ingestion, transformation, and storage using Apache Spark, Hadoop, and SQL. Designed for scalable data processing and analytics.
e-Portfolio showcasing my personal projects.
A deployed machine learning model that has the capability to automatically classify the incoming disaster messages into related 36 categories. Project developed as a part of Udacity's Data Science Nanodegree program.
An extension that registers all pharmacies in Argentina.
JSON-driven ETL pipeline framework prototype
FleetFluid is a Python library that simplifies data transformation by letting you use AI-powered functions without writing (and hosting) them from scratch.
Weaving together different threads (services like image/audio converse, ETL services, etc.) to enable the World Wide Flow
A Python and Spark based ETL framework. While it operates within speed limits that is framework and standards, but offers boundless possibilities.
End To End MLOPS Project With ETL Pipelines- Building Network Security System
End-to-end real-time data engineering pipeline that ingests YouTube API metrics, streams them through Kafka on Google Cloud, processes data with ksqlDB, and delivers analytics to a Telegram bot using Dockerized microservices.
Add a description, image, and links to the etl-pipelines topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipelines topic, visit your repo's landing page and select "manage topics."