Skip to content

Latest commit

 

History

History
91 lines (66 loc) · 3.52 KB

README.md

File metadata and controls

91 lines (66 loc) · 3.52 KB

🔮 Features

Mage

🏔️ Core design principles

💻 Easy developer experience

Open-source engine that comes with a custom notebook UI for building data pipelines.

  • Mage comes with a specialized notebook UI for building data pipelines.
  • Use Python and SQL (more languages coming soon) together in the same pipeline for ultimate flexibility.
  • Set up locally and get started developing with a single command.
  • Deploying to production is fast using native integrations with major cloud providers.

🚢 Engineering best practices built-in

Build and deploy data pipelines using modular code. No more writing throwaway code or trying to turn notebooks into scripts.

  • Writing reusable code is easy because every block in your data pipeline is a standalone file.
  • Data validation is written into each block and tested every time a block is ran.
  • Operationalizing your data pipelines is easy with built-in observability, data quality monitoring, and lineage.
  • Each block of code has a single responsibility: load data from a source, transform data, or export data anywhere.

💳 Data is a first class citizen

Designed from the ground up specifically for running data-intensive workflows.

  • Every block run produces a data product (e.g. dataset, unstructured data, etc.)
  • Every data product can be automatically partitioned.
  • Each pipeline and data product can be versioned.
  • Backfilling data products is a core function and operation.

🪐 Scaling made simple

Analyze and process large data quickly for rapid iteration.

  • Transform very large datasets through a native integration with Spark.
  • Handle data intensive transformations with built-in distributed computing (e.g. Dask, Ray).
  • Run thousands of pipelines simultaneously and manage transparently through a collaborative UI.
  • Execute SQL queries in your data warehouse to process heavy workloads.

More features

  1. Data centric editor
  2. Production ready code
  3. Extensible

1. Data centric editor

An interactive coding experience designed for preparing data to train ML models.

Visualize the impact of your code every time you load, clean, and transform data.

Data centric editor

2. Production ready code

No more writing throw away code or trying to turn notebooks into scripts.

Each block (aka cell) in this editor is a modular file that can be tested, reused, and chained together to create an executable data pipeline locally or in any environment.

Read more about blocks and how they work.

Production ready code

Run your data pipeline end-to-end using the command line function: $ mage run [project] [pipeline]

You can run your pipeline in production environments with the orchestration tools

3. Extensible

Easily add new functionality directly in the source code or through plug-ins (coming soon).

Adding new API endpoints (Tornado), transformations (Python, PySpark, SQL), and charts (using React) is easy to do (tutorial coming soon).

Extensible charts

New features and changelog

Check out what’s new here.