Update partitioned dataset lazy saving docs (#4402)

* Updated Partitioned dataset lazy saving docs Signed-off-by: Elena Khaustova <[email protected]> * Updated release notes Signed-off-by: Elena Khaustova <[email protected]> * Fixed typo Signed-off-by: Elena Khaustova <[email protected]> * Updated docs based on new solution Signed-off-by: Elena Khaustova <[email protected]> * Applied revire comments Signed-off-by: Elena Khaustova <[email protected]> --------- Signed-off-by: Elena Khaustova <[email protected]>
kedro-org · Jan 22, 2025 · 9ee181f · 9ee181f
1 parent fba7c53
commit 9ee181f
Show file tree

Hide file tree

Showing 2 changed files with 20 additions and 0 deletions.
diff --git a/RELEASE.md b/RELEASE.md
@@ -14,6 +14,7 @@
 * Safeguard hooks when user incorrectly registers a hook class in settings.py.
 * Fixed parsing paths with query and fragment.
 * Remove lowercase transformation in regex validation.
+* Updated `Partitioned dataset lazy saving` docs page.
 
 ## Breaking changes to the API
 ## Documentation changes

diff --git a/docs/source/data/partitioned_and_incremental_datasets.md b/docs/source/data/partitioned_and_incremental_datasets.md
@@ -175,6 +175,7 @@ new_partitioned_dataset:
   path: s3://my-bucket-name
   dataset: pandas.CSVDataset
   filename_suffix: ".csv"
+  save_lazily: True
 ```
 
 Here is the node definition:
@@ -238,6 +239,24 @@ def create_partitions() -> Dict[str, Callable[[], Any]]:
 When using lazy saving, the dataset will be written _after_ the `after_node_run` [hook](../hooks/introduction).
 ```
 
+```{note}
+Lazy saving is the default behaviour, meaning that if a `Callable` type is provided, the dataset will be written _after_ the `after_node_run` hook is executed.
+```
+
+In certain cases, it might be useful to disable lazy saving, such as when your object is already a `Callable` (e.g., a TensorFlow model) and you do not intend to save it lazily.
+To disable the lazy saving set `save_lazily` parameter to `False`:
+
+```yaml
+# conf/base/catalog.yml
+
+new_partitioned_dataset:
+  type: partitions.PartitionedDataset
+  path: s3://my-bucket-name
+  dataset: pandas.CSVDataset
+  filename_suffix: ".csv"
+  save_lazily: False
+```
+
 ## Incremental datasets
 
 {class}`IncrementalDataset<kedro-datasets:kedro_datasets.partitions.IncrementalDataset>` is a subclass of `PartitionedDataset`, which stores the information about the last processed partition in the so-called `checkpoint`. `IncrementalDataset` addresses the use case when partitions have to be processed incrementally, that is, each subsequent pipeline run should process just the partitions which were not processed by the previous runs.