[DOCS] Added reference blogs to hudi docs (#12505)

* Added reference blogs to hudi docs * Uniformed formatting * Fixed a duplicate entry under reference
apache · Jan 3, 2025 · 93a4d2a · 93a4d2a
1 parent 935b764
commit 93a4d2a
Show file tree

Hide file tree

Showing 16 changed files with 74 additions and 3 deletions.
diff --git a/website/docs/azure_hoodie.md b/website/docs/azure_hoodie.md
@@ -48,3 +48,8 @@ This combination works out of the box. No extra config needed.
     .format("org.apache.hudi")
     .load("/mountpoint/hudi-tables/customer")
   ```
+
+## Related Resources
+
+<h3>Blogs</h3>
+* [How to use Apache Hudi with Databricks](https://www.onehouse.ai/blog/how-to-use-apache-hudi-with-databricks)
diff --git a/website/docs/cleaning.md b/website/docs/cleaning.md
@@ -148,6 +148,10 @@ cleans run --sparkMaster local --hoodieConfigs hoodie.cleaner.policy=KEEP_LATEST
 You can find more details and the relevant code for these commands in [`org.apache.hudi.cli.commands.CleansCommand`](https://github.com/apache/hudi/blob/master/hudi-cli/src/main/java/org/apache/hudi/cli/commands/CleansCommand.java) class. 
 
 ## Related Resources
+
+<h3>Blogs</h3>
+* [Cleaner and Archival in Apache Hudi](https://medium.com/@simpsons/cleaner-and-archival-in-apache-hudi-9e15b08b2933)
+
 <h3>Videos</h3>
 
 * [Cleaner Service: Save up to 40% on data lake storage costs | Hudi Labs](https://youtu.be/mUvRhJDoO3w)

diff --git a/website/docs/cli.md b/website/docs/cli.md
@@ -753,3 +753,8 @@ table change-table-type COW
 ║ hoodie.timeline.layout.version                 │ 1                                    │ 1                                    ║
 ╚════════════════════════════════════════════════╧══════════════════════════════════════╧══════════════════════════════════════╝
 ```
+
+## Related Resources
+
+<h3>Blogs</h3>
+* [Getting Started: Manage your Hudi tables with the admin Hudi-CLI tool](https://www.onehouse.ai/blog/getting-started-manage-your-hudi-tables-with-the-admin-hudi-cli-tool)
diff --git a/website/docs/clustering.md b/website/docs/clustering.md
@@ -341,6 +341,11 @@ and execution strategy `org.apache.hudi.client.clustering.run.strategy.JavaSortA
 out-of-the-box. Note that as of now only linear sort is supported in Java execution strategy.
 
 ## Related Resources
+
+<h3>Blogs</h3>
+[Apache Hudi Z-Order and Hilbert Space Filling Curves](https://www.onehouse.ai/blog/apachehudi-z-order-and-hilbert-space-filling-curves)
+[Hudi Z-Order and Hilbert Space-filling Curves](https://medium.com/apache-hudi-blogs/hudi-z-order-and-hilbert-space-filling-curves-68fa28bffaf0)
+
 <h3>Videos</h3>
 
 * [Understanding Clustering in Apache Hudi and the Benefits of Asynchronous Clustering](https://www.youtube.com/watch?v=R_sm4wlGXuE)
diff --git a/website/docs/compaction.md b/website/docs/compaction.md
@@ -226,3 +226,9 @@ Offline compaction needs to submit the Flink task on the command line. The progr
 | `--seq` | `LIFO`  (Optional)   | The order in which compaction tasks are executed. Executing from the latest compaction plan by default. `LIFO`: executing from the latest plan. `FIFO`: executing from the oldest plan. |
 | `--service` | `false`  (Optional)  | Whether to start a monitoring service that checks and schedules new compaction task in configured interval. |
 | `--min-compaction-interval-seconds` | `600(s)` (optional)  | The checking interval for service mode, by default 10 minutes. |
+
+## Related Resources
+
+<h3>Blogs</h3>
+[Apache Hudi Compaction](https://medium.com/@simpsons/apache-hudi-compaction-6e6383790234)
+[Standalone HoodieCompactor Utility](https://medium.com/@simpsons/standalone-hoodiecompactor-utility-890198e4c539)
diff --git a/website/docs/concepts.md b/website/docs/concepts.md
@@ -169,4 +169,9 @@ The intention of merge on read table is to enable near real-time processing dire
 data out to specialized systems, which may not be able to handle the data volume. There are also a few secondary side benefits to 
 this table such as reduced write amplification by avoiding synchronous merge of data, i.e, the amount of data written per 1 bytes of data in a batch
 
+## Related Resources
 
+<h3>Blogs</h3>
+* [Comparing Apache Hudi's MOR and COW Tables: Use Cases from Uber and Shopee](https://www.onehouse.ai/blog/comparing-apache-hudis-mor-and-cow-tables-use-cases-from-uber-and-shopee)
+* [Hudi Metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
+* [File Naming conventions in Apache Hudi](https://medium.com/@simpsons/file-naming-conventions-in-apache-hudi-cd1cdd95f5e7)
diff --git a/website/docs/concurrency_control.md b/website/docs/concurrency_control.md
@@ -333,6 +333,11 @@ If you are using the `WriteClient` API, please note that multiple writes to the
 It is **NOT** recommended to use the same instance of the write client to perform multi writing. 
 
 ## Related Resources
+
+<h3>Blogs</h3>
+* [Data Lakehouse Concurrency Control](https://www.onehouse.ai/blog/lakehouse-concurrency-control-are-we-too-optimistic)
+* [Multi-writer support with Apache Hudi](https://medium.com/@simpsons/multi-writer-support-with-apache-hudi-e1b75dca29e6)
+
 <h3>Videos</h3>
 
 * [Hands on Lab with using DynamoDB as lock table for Apache Hudi Data Lakes](https://youtu.be/JP0orl9_0yQ)

diff --git a/website/docs/indexes.md b/website/docs/indexes.md
@@ -219,6 +219,11 @@ partition path value could change due to an update e.g users table partitioned b
 
 
 ## Related Resources
+
+<h3>Blogs</h3>
+
+* [Global vs Non-global index in Apache Hudi](https://medium.com/@simpsons/global-vs-non-global-index-in-apache-hudi-ac880b031cbc)
+
 <h3>Videos</h3>
 
 * [Global Bloom Index: Remove duplicates & guarantee uniquness - Hudi Labs](https://youtu.be/XlRvMFJ7g9c)

diff --git a/website/docs/key_generation.md b/website/docs/key_generation.md
@@ -212,4 +212,6 @@ Partition path generated from key generator: "04/01/2020"
 
 ## Related Resources
 
-* [Hudi metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
+<h3>Blogs</h3>
+* [Hudi metafields demystified](https://www.onehouse.ai/blog/hudi-metafields-demystified)
+* [Primary key and Partition Generators with Apache Hudi](https://medium.com/@simpsons/primary-key-and-partition-generators-with-apache-hudi-f0e4d71d9d26)
diff --git a/website/docs/markers.md b/website/docs/markers.md
@@ -89,3 +89,8 @@ with direct markers because the file system metadata is efficiently cached in me
 | `hoodie.markers.timeline_server_based.batch.num_threads` | 20 | Number of threads to use for batch processing marker creation requests at the timeline server. | 
 | `hoodie.markers.timeline_server_based.batch.interval_ms` | 50 | The batch interval in milliseconds for marker creation batch processing. |
 
+
+## Related Resources
+
+<h3>Blogs</h3>
+[Timeline Server in Apache Hudi](https://medium.com/@simpsons/timeline-server-in-apache-hudi-b5be25f85e47)
diff --git a/website/docs/metadata.md b/website/docs/metadata.md
@@ -129,6 +129,6 @@ metadata table across all writers.
 
 ## Related Resources
 <h3>Blogs</h3>
-
 * [Table service deployment models in Apache Hudi](https://medium.com/@simpsons/table-service-deployment-models-in-apache-hudi-9cfa5a44addf)
 * [Multi Modal Indexing for the Data Lakehouse](https://www.onehouse.ai/blog/introducing-multi-modal-index-for-the-lakehouse-in-apache-hudi)
+* [How to Optimize Performance for Your Open Data Lakehouse](https://www.onehouse.ai/blog/how-to-optimize-performance-for-your-open-data-lakehouse)
diff --git a/website/docs/performance.md b/website/docs/performance.md
@@ -131,3 +131,9 @@ To enable Data Skipping in your queries make sure to set following properties to
   - `hoodie.enable.data.skipping` (to control data skipping, enabled by default)
   - `hoodie.metadata.enable` (to enable metadata table use on the read path, enabled by default)
   - `hoodie.metadata.index.column.stats.enable` (to enable column stats index use on the read path)
+
+## Related Resources
+
+<h3>Blogs</h3>
+* [Hudi’s Column Stats Index and Data Skipping feature help speed up queries by an orders of magnitude!](https://www.onehouse.ai/blog/hudis-column-stats-index-and-data-skipping-feature-help-speed-up-queries-by-an-orders-of-magnitude)
+* [Top 3 Things You Can Do to Get Fast Upsert Performance in Apache Hudi](https://www.onehouse.ai/blog/top-3-things-you-can-do-to-get-fast-upsert-performance-in-apache-hudi)
diff --git a/website/docs/precommit_validator.md b/website/docs/precommit_validator.md
@@ -96,6 +96,9 @@ Hudi offers a [commit notification service](platform_services_post_commit_callba
 The commit notification service can be combined with pre-commit validators to send a notification when a commit fails a validation. This is possible by passing details about the validation as a custom value to the HTTP endpoint.
 
 ## Related Resources
-<h3>Videos</h3>
 
+<h3>Blogs</h3>
+* [Apply Pre-Commit Validation for Data Quality in Apache Hudi](https://www.onehouse.ai/blog/apply-pre-commit-validation-for-data-quality-in-apache-hudi)
+
+<h3>Videos</h3>
 * [Learn About Apache Hudi Pre Commit Validator with Hands on Lab](https://www.youtube.com/watch?v=KNzs9dj_Btc)
diff --git a/website/docs/record_merger.md b/website/docs/record_merger.md
@@ -251,3 +251,7 @@ example, [`MySqlDebeziumAvroPayload`](https://github.com/apache/hudi/blob/e76dd1
 captured via Debezium for MySQL and PostgresDB. [`AWSDmsAvroPayload`](https://github.com/apache/hudi/blob/e76dd102bcaf8aec5a932e7277ccdbfd73ce1a32/hudi-common/src/main/java/org/apache/hudi/common/model/AWSDmsAvroPayload.java) provides support for applying changes captured via Amazon Database Migration Service onto S3.
 For full configurations, go [here](/docs/configurations#RECORD_PAYLOAD) and please check out [this FAQ](faq_writing_tables/#can-i-implement-my-own-logic-for-how-input-records-are-merged-with-record-on-storage) if you want to implement your own custom payloads.
 
+## Related Resources
+
+<h3>Blogs</h3>
+* [How to define your own merge logic with Apache Hudi](https://medium.com/@simpsons/how-to-define-your-own-merge-logic-with-apache-hudi-622ee5ccab1e)
diff --git a/website/docs/timeline.md b/website/docs/timeline.md
@@ -151,3 +151,8 @@ Flink jobs using the SQL can be configured through the options in WITH clause. T
 
 Refer [here](https://hudi.apache.org/docs/next/configurations#Flink-Options) for more details.
 
+## Related Resources
+
+<h3>Blogs</h3>
+* [Apache Hudi Timeline: Foundational pillar for ACID transactions](https://medium.com/@simpsons/hoodie-timeline-foundational-pillar-for-acid-transactions-be871399cbae)
+
diff --git a/website/docs/writing_tables_streaming_writes.md b/website/docs/writing_tables_streaming_writes.md
@@ -93,3 +93,9 @@ df.writeStream.format("hudi")
 </Tabs
 >
 
+## Related Resources
+
+<h3>Blogs</h3>
+* [An Introduction to the Hudi and Flink Integration](https://www.onehouse.ai/blog/intro-to-hudi-and-flink)
+* [Bulk Insert Sort Modes with Apache Hudi](https://medium.com/@simpsons/bulk-insert-sort-modes-with-apache-hudi-c781e77841bc)
+