Skip to content

Commit

Permalink
[DOCS] Added reference blogs to hudi docs (#12505)
Browse files Browse the repository at this point in the history
* Added reference blogs to hudi docs

* Uniformed formatting

* Fixed a duplicate entry under reference
  • Loading branch information
ad1happy2go authored Jan 3, 2025
1 parent 935b764 commit 93a4d2a
Show file tree
Hide file tree
Showing 16 changed files with 74 additions and 3 deletions.
5 changes: 5 additions & 0 deletions website/docs/azure_hoodie.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,8 @@ This combination works out of the box. No extra config needed.
.format("org.apache.hudi")
.load("/mountpoint/hudi-tables/customer")
```

## Related Resources

<h3>Blogs</h3>
* [How to use Apache Hudi with Databricks](https://www.onehouse.ai/blog/how-to-use-apache-hudi-with-databricks)
4 changes: 4 additions & 0 deletions website/docs/cleaning.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,6 +148,10 @@ cleans run --sparkMaster local --hoodieConfigs hoodie.cleaner.policy=KEEP_LATEST
You can find more details and the relevant code for these commands in [`org.apache.hudi.cli.commands.CleansCommand`](https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CleansCommand.java) class.

## Related Resources

<h3>Blogs</h3>
* [Cleaner and Archival in Apache Hudi](https://medium.com/@simpsons/cleaner-and-archival-in-apache-hudi-9e15b08b2933)

<h3>Videos</h3>

* [Cleaner Service: Save up to 40% on data lake storage costs | Hudi Labs](https://youtu.be/mUvRhJDoO3w)
Expand Down
5 changes: 5 additions & 0 deletions website/docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -753,3 +753,8 @@ table change-table-type COW
║ hoodie.timeline.layout.version │ 1 │ 1 ║
╚════════════════════════════════════════════════╧══════════════════════════════════════╧══════════════════════════════════════╝
```
## Related Resources
<h3>Blogs</h3>
* [Getting Started: Manage your Hudi tables with the admin Hudi-CLI tool](https://www.onehouse.ai/blog/getting-started-manage-your-hudi-tables-with-the-admin-hudi-cli-tool)
5 changes: 5 additions & 0 deletions website/docs/clustering.md
Original file line number Diff line number Diff line change
Expand Up @@ -341,6 +341,11 @@ and execution strategy `org.apache.hudi.client.clustering.run.strategy.JavaSortA
out-of-the-box. Note that as of now only linear sort is supported in Java execution strategy.

## Related Resources

<h3>Blogs</h3>
[Apache Hudi Z-Order and Hilbert Space Filling Curves](https://www.onehouse.ai/blog/apachehudi-z-order-and-hilbert-space-filling-curves)
[Hudi Z-Order and Hilbert Space-filling Curves](https://medium.com/apache-hudi-blogs/hudi-z-order-and-hilbert-space-filling-curves-68fa28bffaf0)

<h3>Videos</h3>

* [Understanding Clustering in Apache Hudi and the Benefits of Asynchronous Clustering](https://www.youtube.com/watch?v=R_sm4wlGXuE)
6 changes: 6 additions & 0 deletions website/docs/compaction.md
Original file line number Diff line number Diff line change
Expand Up @@ -226,3 +226,9 @@ Offline compaction needs to submit the Flink task on the command line. The progr
| `--seq` | `LIFO` (Optional) | The order in which compaction tasks are executed. Executing from the latest compaction plan by default. `LIFO`: executing from the latest plan. `FIFO`: executing from the oldest plan. |
| `--service` | `false` (Optional) | Whether to start a monitoring service that checks and schedules new compaction task in configured interval. |
| `--min-compaction-interval-seconds` | `600(s)` (optional) | The checking interval for service mode, by default 10 minutes. |

## Related Resources

<h3>Blogs</h3>
[Apache Hudi Compaction](https://medium.com/@simpsons/apache-hudi-compaction-6e6383790234)
[Standalone HoodieCompactor Utility](https://medium.com/@simpsons/standalone-hoodiecompactor-utility-890198e4c539)
5 changes: 5 additions & 0 deletions website/docs/concepts.md
Original file line number Diff line number Diff line change
Expand Up @@ -169,4 +169,9 @@ The intention of merge on read table is to enable near real-time processing dire
data out to specialized systems, which may not be able to handle the data volume. There are also a few secondary side benefits to
this table such as reduced write amplification by avoiding synchronous merge of data, i.e, the amount of data written per 1 bytes of data in a batch

## Related Resources

<h3>Blogs</h3>
* [Comparing Apache Hudi's MOR and COW Tables: Use Cases from Uber and Shopee](https://www.onehouse.ai/blog/comparing-apache-hudis-mor-and-cow-tables-use-cases-from-uber-and-shopee)
* [Hudi Metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
* [File Naming conventions in Apache Hudi](https://medium.com/@simpsons/file-naming-conventions-in-apache-hudi-cd1cdd95f5e7)
5 changes: 5 additions & 0 deletions website/docs/concurrency_control.md
Original file line number Diff line number Diff line change
Expand Up @@ -333,6 +333,11 @@ If you are using the `WriteClient` API, please note that multiple writes to the
It is **NOT** recommended to use the same instance of the write client to perform multi writing.

## Related Resources

<h3>Blogs</h3>
* [Data Lakehouse Concurrency Control](https://www.onehouse.ai/blog/lakehouse-concurrency-control-are-we-too-optimistic)
* [Multi-writer support with Apache Hudi](https://medium.com/@simpsons/multi-writer-support-with-apache-hudi-e1b75dca29e6)

<h3>Videos</h3>

* [Hands on Lab with using DynamoDB as lock table for Apache Hudi Data Lakes](https://youtu.be/JP0orl9_0yQ)
Expand Down
5 changes: 5 additions & 0 deletions website/docs/indexes.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,6 +219,11 @@ partition path value could change due to an update e.g users table partitioned b


## Related Resources

<h3>Blogs</h3>

* [Global vs Non-global index in Apache Hudi](https://medium.com/@simpsons/global-vs-non-global-index-in-apache-hudi-ac880b031cbc)

<h3>Videos</h3>

* [Global Bloom Index: Remove duplicates & guarantee uniquness - Hudi Labs](https://youtu.be/XlRvMFJ7g9c)
Expand Down
4 changes: 3 additions & 1 deletion website/docs/key_generation.md
Original file line number Diff line number Diff line change
Expand Up @@ -212,4 +212,6 @@ Partition path generated from key generator: "04/01/2020"

## Related Resources

* [Hudi metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
<h3>Blogs</h3>
* [Hudi metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
* [Primary key and Partition Generators with Apache Hudi](https://medium.com/@simpsons/primary-key-and-partition-generators-with-apache-hudi-f0e4d71d9d26)
5 changes: 5 additions & 0 deletions website/docs/markers.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,3 +89,8 @@ with direct markers because the file system metadata is efficiently cached in me
| `hoodie.markers.timeline_server_based.batch.num_threads` | 20 | Number of threads to use for batch processing marker creation requests at the timeline server. |
| `hoodie.markers.timeline_server_based.batch.interval_ms` | 50 | The batch interval in milliseconds for marker creation batch processing. |


## Related Resources

<h3>Blogs</h3>
[Timeline Server in Apache Hudi](https://medium.com/@simpsons/timeline-server-in-apache-hudi-b5be25f85e47)
2 changes: 1 addition & 1 deletion website/docs/metadata.md
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,6 @@ metadata table across all writers.

## Related Resources
<h3>Blogs</h3>

* [Table service deployment models in Apache Hudi](https://medium.com/@simpsons/table-service-deployment-models-in-apache-hudi-9cfa5a44addf)
* [Multi Modal Indexing for the Data Lakehouse](https://www.onehouse.ai/blog/introducing-multi-modal-index-for-the-lakehouse-in-apache-hudi)
* [How to Optimize Performance for Your Open Data Lakehouse](https://www.onehouse.ai/blog/how-to-optimize-performance-for-your-open-data-lakehouse)
6 changes: 6 additions & 0 deletions website/docs/performance.md
Original file line number Diff line number Diff line change
Expand Up @@ -131,3 +131,9 @@ To enable Data Skipping in your queries make sure to set following properties to
- `hoodie.enable.data.skipping` (to control data skipping, enabled by default)
- `hoodie.metadata.enable` (to enable metadata table use on the read path, enabled by default)
- `hoodie.metadata.index.column.stats.enable` (to enable column stats index use on the read path)

## Related Resources

<h3>Blogs</h3>
* [Hudi’s Column Stats Index and Data Skipping feature help speed up queries by an orders of magnitude!](https://www.onehouse.ai/blog/hudis-column-stats-index-and-data-skipping-feature-help-speed-up-queries-by-an-orders-of-magnitude)
* [Top 3 Things You Can Do to Get Fast Upsert Performance in Apache Hudi](https://www.onehouse.ai/blog/top-3-things-you-can-do-to-get-fast-upsert-performance-in-apache-hudi)
5 changes: 4 additions & 1 deletion website/docs/precommit_validator.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,9 @@ Hudi offers a [commit notification service](platform_services_post_commit_callba
The commit notification service can be combined with pre-commit validators to send a notification when a commit fails a validation. This is possible by passing details about the validation as a custom value to the HTTP endpoint.

## Related Resources
<h3>Videos</h3>

<h3>Blogs</h3>
* [Apply Pre-Commit Validation for Data Quality in Apache Hudi](https://www.onehouse.ai/blog/apply-pre-commit-validation-for-data-quality-in-apache-hudi)

<h3>Videos</h3>
* [Learn About Apache Hudi Pre Commit Validator with Hands on Lab](https://www.youtube.com/watch?v=KNzs9dj_Btc)
4 changes: 4 additions & 0 deletions website/docs/record_merger.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,3 +251,7 @@ example, [`MySqlDebeziumAvroPayload`](https://github.com/apache/hudi/blob/e76dd1
captured via Debezium for MySQL and PostgresDB. [`AWSDmsAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/AWSDmsAvroPayload.java) provides support for applying changes captured via Amazon Database Migration Service onto S3.
For full configurations, go [here](/docs/configurations#RECORD_PAYLOAD) and please check out [this FAQ](faq_writing_tables/#can-i-implement-my-own-logic-for-how-input-records-are-merged-with-record-on-storage) if you want to implement your own custom payloads.

## Related Resources

<h3>Blogs</h3>
* [How to define your own merge logic with Apache Hudi](https://medium.com/@simpsons/how-to-define-your-own-merge-logic-with-apache-hudi-622ee5ccab1e)
5 changes: 5 additions & 0 deletions website/docs/timeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -151,3 +151,8 @@ Flink jobs using the SQL can be configured through the options in WITH clause. T

Refer [here](https://hudi.apache.org/docs/next/configurations#Flink-Options) for more details.

## Related Resources

<h3>Blogs</h3>
* [Apache Hudi Timeline: Foundational pillar for ACID transactions](https://medium.com/@simpsons/hoodie-timeline-foundational-pillar-for-acid-transactions-be871399cbae)

6 changes: 6 additions & 0 deletions website/docs/writing_tables_streaming_writes.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,9 @@ df.writeStream.format("hudi")
</Tabs
>
## Related Resources

<h3>Blogs</h3>
* [An Introduction to the Hudi and Flink Integration](https://www.onehouse.ai/blog/intro-to-hudi-and-flink)
* [Bulk Insert Sort Modes with Apache Hudi](https://medium.com/@simpsons/bulk-insert-sort-modes-with-apache-hudi-c781e77841bc)

0 comments on commit 93a4d2a

Please sign in to comment.