Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Redoing the hudi stack page and overview page #12373

Merged
merged 2 commits into from
Nov 29, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,761 changes: 1,761 additions & 0 deletions site-image-source/hudi-stack-1-x.excalidraw

Large diffs are not rendered by default.

1,255 changes: 1,255 additions & 0 deletions site-image-source/hudi-stack-indexes.excalidraw

Large diffs are not rendered by default.

959 changes: 959 additions & 0 deletions site-image-source/hudi-timeline-actions.excalidraw

Large diffs are not rendered by default.

2,083 changes: 2,083 additions & 0 deletions site-image-source/hudi-timeline-truetime.excalidraw

Large diffs are not rendered by default.

141 changes: 108 additions & 33 deletions website/docs/hudi_stack.md

Large diffs are not rendered by default.

29 changes: 14 additions & 15 deletions website/docs/overview.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,28 @@ last_modified_at: 2019-12-30T15:59:57-04:00

import SlackCommunity from '@site/src/components/SlackCommunity';

Welcome to Apache Hudi! This overview will provide a high level summary of what Apache Hudi is and will orient you on
Hello there! This overview will provide a high level summary of what Apache Hudi is and will orient you on
how to learn more to get started.

## What is Apache Hudi
Apache Hudi (pronounced “hoodie”) is the next generation [streaming data lake platform](/blog/2021/07/21/streaming-data-lake-platform).
Hudi brings core warehouse and database functionality directly to a data lake. Hudi provides [tables](/docs/next/sql_ddl),

Apache Hudi (pronounced "hoodie") pioneered the concept of "[transactional data lakes](https://www.uber.com/blog/hoodie/)", which is more popularly known today as
the data lakehouse architecture. Today, Hudi has grown into an [open data lakehouse platform](/blog/2021/07/21/streaming-data-lake-platform), with a open table format purpose-built for high performance writes on
incremental data pipelines and fast query performance due to comprehensive table optimizations.

Hudi brings core database functionality directly to a data lake - [tables](/docs/next/sql_ddl),
[transactions](/docs/next/timeline), [efficient upserts/deletes](/docs/next/write_operations), [advanced indexes](/docs/next/indexing),
[ingestion services](/docs/hoodie_streaming_ingestion), data [clustering](/docs/next/clustering)/[compaction](/docs/next/compaction) optimizations,
and [concurrency](/docs/next/concurrency_control) all while keeping your data in open source file formats.
and [concurrency control](/docs/next/concurrency_control) all while keeping your data in open file formats. Not only is Apache Hudi great for streaming workloads,
but it also allows you to create efficient incremental batch pipelines. Apache Hudi can easily be used on any [cloud storage platform](/docs/cloud).
Hudi’s advanced performance optimizations, make analytical queries/pipelines faster with any of the popular query engines including, Apache Spark, Flink, Presto, Trino, Hive, etc.

Not only is Apache Hudi great for streaming workloads, but it also allows you to create efficient incremental batch pipelines.
Read the docs for more [use case descriptions](/docs/use_cases) and check out [who's using Hudi](/powered-by), to see how some of the
largest data lakes in the world including [Uber](https://eng.uber.com/uber-big-data-platform/), [Amazon](https://aws.amazon.com/blogs/big-data/how-amazon-transportation-service-enabled-near-real-time-event-analytics-at-petabyte-scale-using-aws-glue-with-apache-hudi/),
[ByteDance](http://hudi.apache.org/blog/2021/09/01/building-eb-level-data-lake-using-hudi-at-bytedance),
[Robinhood](https://s.apache.org/hudi-robinhood-talk) and more are transforming their production data lakes with Hudi.

Apache Hudi can easily be used on any [cloud storage platform](/docs/cloud).
Hudi’s advanced performance optimizations, make analytical workloads faster with any of
the popular query engines including, Apache Spark, Flink, Presto, Trino, Hive, etc.

[Hudi-rs](https://github.com/apache/hudi-rs) is the native Rust implementation for Apache Hudi, which also provides bindings to Python. It
[Hudi-rs](https://github.com/apache/hudi-rs) is the native Rust implementation for Apache Hudi, which also provides bindings to Python. It
expands the use of Apache Hudi for a diverse range of use cases in the non-JVM ecosystems.

## Core Concepts to Learn
Expand All @@ -39,7 +40,7 @@ If you are relatively new to Apache Hudi, it is important to be familiar with a
- [Hudi Table Types](/docs/next/table_types) – `COPY_ON_WRITE` and `MERGE_ON_READ`
- [Hudi Query Types](/docs/next/table_types#query-types) – Snapshot Queries, Incremental Queries, Read-Optimized Queries

See more in the "Concepts" section of the docs.
See more in the "Design & Concepts" section of the docs.

Take a look at recent [blog posts](/blog) that go in depth on certain topics or use cases.

Expand All @@ -51,12 +52,10 @@ Sometimes the fastest way to learn is by doing. Try out these Quick Start resour
- [Flink Quick Start Guide](/docs/flink-quick-start-guide) – if you primarily use Apache Flink
- [Python/Rust Quick Start Guide (Hudi-rs)](/docs/python-rust-quick-start-guide) - if you primarily use Python or Rust

If you want to experience Apache Hudi integrated into an end to end demo with Kafka, Spark, Hive, Presto, etc, try out the Docker Demo:

- [Docker Demo](/docs/docker_demo)
If you want to experience Apache Hudi integrated into an end to end demo with Kafka, Spark, Hive, Presto, etc, try out the [Docker Demo](/docs/docker_demo)

## Connect With The Community
Apache Hudi is community focused and community led and welcomes new-comers with open arms. Leverage the following
Apache Hudi is community-focused and community-led and welcomes new-comers with open arms. Leverage the following
resources to learn more, engage, and get help as you get started.

### Join in on discussions
Expand Down
2 changes: 1 addition & 1 deletion website/docs/python-rust-quick-start-guide.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
---
title: "Python/Rust Quick Start (Hudi-rs)"
title: "Python/Rust Quick Start"
toc: true
last_modified_at: 2024-11-28T12:53:57+08:00
---
Expand Down
Loading