From eea10e448c2afc9a7aad42a8561e8744aacd6120 Mon Sep 17 00:00:00 2001
From: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Date: Fri, 14 Feb 2025 09:22:34 +0000
Subject: [PATCH 1/7] Added new doc on troubleshooting

---
 .../website/docs/reference/troubleshooting.md | 417 ++++++++++++++++++
 docs/website/sidebars.js                      |   1 +
 2 files changed, 418 insertions(+)
 create mode 100644 docs/website/docs/reference/troubleshooting.md

diff --git a/docs/website/docs/reference/troubleshooting.md b/docs/website/docs/reference/troubleshooting.md
new file mode 100644
index 0000000000..c70964df22
--- /dev/null
+++ b/docs/website/docs/reference/troubleshooting.md
@@ -0,0 +1,417 @@
+---
+title: Pipeline common failure scenarios and mitigation measures
+description: Doc explaining the common failure scenarios in extract, transform and load stage and their mitigation measures
+keywords: [faq, usage information, technical help]
+---
+
+This guide outlines common failure scenarios during the Extract, Normalize, and Load stages of a data pipeline.
+
+## Extract stage
+
+Failures during the **Extract** stage often stem from source errors, memory limitations, or timestamp-related issues. Below are common scenarios and possible solutions.
+
+### Source errors
+
+Source errors typically result from rate limits, invalid credentials, or misconfigured settings.
+
+### Common scenarios and possible solutions
+
+1. **Rate limits (Error 429):**
+
+    - **Scenario:**
+        - Exceeding API rate limits triggers `Error 429`.
+
+    - **Possible solution:**
+        - Verify that authentication is functioning as expected.
+        - Increase the API rate limit if permissible.
+        - Review the API documentation to understand rate limits and examine headers such as `Retry-After`.
+        - Implement request delays using functions like `time.sleep()` or libraries such as `ratelimiter` to ensure compliance with rate limits.
+        - Handle "Too Many Requests" (`429`) responses by implementing retry logic with exponential backoff strategies.
+        - Optimize API usage by batching requests when possible and caching results to reduce the number of calls.
+
+2. **Invalid credentials (Error 401, 403, or `ConfigFieldMissingException`):**
+
+    - **Scenario:**
+        - Missing or invalid credentials cause these errors.
+
+    - **Possible solution:**
+        - Verify credentials and ensure proper scopes/permissions are enabled. For more on how to set up credentials:  [Read our docs](../general-usage/credentials/setup).
+        - If dlt expects a configuration of secrets value but cannot find it, it will output the `ConfigFieldMissingException`. [Read more about the exceptions here.](../general-usage/credentials/setup#understanding-the-exceptions)
+
+3. **Source configuration errors (`DictValidationException`):**
+
+    - **Scenario:**
+        - Incorrect field placement (e.g., `params` outside the `endpoint` field).
+        - Unexpected fields in the configuration.
+        - For example, this is the incorrect configuration:
+      ```py
+      # ERROR 2: Method outside endpoint
+      source = rest_api_source(
+          config={
+              "client": {
+                  "base_url": "https://jsonplaceholder.typicode.com/"
+              },
+              "resources": [
+                  {
+                      "name": "posts",
+                      # Wrong: method should be inside endpoint
+                      "method": "GET",  
+                      "endpoint": {
+                          "path": "posts",
+                          "params": {
+                              "_limit": 5
+                          }
+                      }
+                  }
+              ]
+          }
+      )
+      ```
+    - Correct configuration:
+    
+    ```py
+    # Create the source first
+    source = rest_api_source(
+        config={
+            "client": {
+                "base_url": "https://jsonplaceholder.typicode.com/"
+            },
+            "resources": [  # Add this line
+                {
+                    "name": "posts",
+                    "endpoint": {
+                        "path": "posts",
+                        "method": "GET",
+                        "params": {
+                            "_limit": 5
+                        }
+                    }
+                }
+            ]
+        }
+    )
+    ```
+    - **Possible solution:**
+    - Review and validate the code configuration structure against the source documentation.
+        
+    Read [REST API’s source here.](../dlt-ecosystem/verified-sources/rest_api/)
+
+## Memory errors
+
+Memory issues can disrupt extraction processes.
+
+### Common scenarios and possible solutions
+
+   1. **RAM exhaustion:**
+
+       - **Scenario:**
+           - Available RAM is insufficient for in-memory operations.
+
+       - **Possible solution:**
+
+           1. **Buffer Size Management:**
+               - Adjust `max_buffer_items` to limit buffer size. [Learn about buffer configuration.](./performance#controlling-in-memory-buffers)
+           2. Streaming Processing
+               - Big data should be processed in chunks for efficient handling.
+
+   2. **Storage memory shortages:**
+
+       - **Scenario:**
+
+           - Intermediate files exceed available storage space.
+
+       - **Possible solution:**
+
+           - If your storage reaches its limit, you can mount an external cloud storage location and set the `DLT_DATA_DIR` environment variable to point to it. This ensures that dlt uses the mounted storage as its data directory instead of local disk space. [Read more here.](./performance)
+   
+## Unsupported timestamps
+
+Timestamp issues occur when formats are incompatible with the destination or inconsistent across pipeline runs.
+
+### Common scenarios and possible solutions
+
+1. **Unsupported formats or features:**
+
+    - **Scenario:**
+
+        - Combining `precision` and `timezone` in timestamps causes errors in specific destinations (e.g., DuckDB).
+
+    - **Possible solution:**
+
+        - Simplify the timestamp format to exclude unsupported features. Example:
+
+        ```py
+        import dlt
+
+        @dlt.resource(
+            columns={"event_tstamp": {"data_type": "timestamp", "precision": 3, "timezone": False}},
+            primary_key="event_id",
+        )
+        def events():
+            yield [{"event_id": 1, "event_tstamp": "2024-07-30T10:00:00.123+00:00"}]
+        
+        pipeline = dlt.pipeline(destination="duckdb")
+        pipeline.run(events())
+        ```
+
+2. **Inconsistent formats across runs:**
+
+    - **Scenario:**
+
+        - Different pipeline runs use varying timestamp formats, affecting column datatype inference at the destination.
+        
+    - **Impact:**
+
+        - For instance:
+            - **1st pipeline run:** `{"id": 1, "end_date": "2024-02-28 00:00:00"}`
+            - **2nd pipeline run:** `{"id": 2, "end_date": "2024/02/28"}`
+            - **3rd pipeline run:** `{"id": 3, "end_date": "2024-07-30T10:00:00.123456789"}`
+       
+        - If the first run uses a timestamp-compatible format (e.g., `YYYY-MM-DD HH:MM:SS`), the destination (BigQuery) infers the column as a `TIMESTAMP`. Subsequent runs using compatible formats are automatically converted to this type.
+        
+        - However, introducing incompatible formats later, such as:
+            - **4th pipeline run:** `{"id": 4, "end_date": "20-08-2024"}` (DD-MM-YYYY)
+            - **5th pipeline run:** `{"id": 5, "end_date": "04th of January 2024"}`
+        
+        - BigQuery will interpret these as text and create a new variant column (`end_date__v_text`) to store the incompatible values. This preserves the schema consistency while accommodating all data.
+
+    - **Possible solution:**
+
+        - Standardize timestamp formats across all runs to maintain consistent schema inference and avoid the creation of variant columns.
+        
+3. Inconsistent formats for incremental loading
+
+    - **Scenario:**
+
+        - Data source returns string timestamps but incremental loading is configured with an integer timestamp value.
+        - Example:
+        ```py
+        # API response
+        data = [
+            {"id": 1, "name": "Item 1", "created_at": "2024-01-01 00:00:00"},
+        ]
+        
+        # Incorrect configuration (type mismatch)
+        @dlt.resource(primary_key="id")
+        def my_data(
+            created_at=dlt.sources.incremental(
+                "created_at",
+                initial_value= 9999
+            )
+        ):
+            yield data
+        ```
+
+    - **Impact:**
+
+       - Pipeline fails with `IncrementalCursorInvalidCoercion` error
+       - Error message indicates comparison failure between integer and string types
+       - Unable to perform incremental loading until type mismatch is resolved
+
+    - **Possible Solutions:**
+
+       - Use string timestamp for incremental loading.
+       - Convert source data using “add_map”.
+       - If you need to use timestamps for comparison but want to preserve the original format, create a separate column.
+
+## Normalize stage
+
+Failures during the **Normalize** stage commonly arise from memory limitations, parallelization issues, or schema inference errors.
+
+### Memory errors
+
+Memory-intensive operations may fail during normalization.
+
+### Common scenarios and possible solutions
+
+1. **Large dataset in one resource:**
+
+    - **Scenario:**
+
+        - Large datasets exhaust memory during processing.
+
+    - **Possible solution:**
+
+        - Enable file rotation using `file_max_items` or `file_max_bytes`.
+        - Increase parallel workers for better processing. [Read more about parallel processing.](./performance#parallelism-within-a-pipeline)
+
+2. **Storage memory shortages**
+
+    - **Scenario:**
+
+        - When lots of files are being processed, the available storage space might be insufficient.
+
+    - **Possible solution:**
+
+        - If your storage reaches its limit, you can mount an external cloud storage location and set the `DLT_DATA_DIR` environment variable to point to it. This ensures that dlt uses the mounted storage as its data directory instead of local disk space. [Read more here.](./performance#keep-pipeline-working-folder-in-a-bucket-on-constrained-environments)
+
+### Parallelization issues
+
+Improper configuration of workers may lead to inefficiencies or failures.
+
+### Common scenarios and possible solutions
+
+1. **Resource exhaustion or underutilization:**
+
+    - **Scenario:**
+
+        - Too many workers may exhaust resources; too few may underutilize capacity.
+
+    - **Possible solution:**
+
+        - Adjust worker settings in the `config.toml` file. [Read more about parallel processing.](./performance#parallelism-within-a-pipeline)
+
+2. **Threading conflicts:**
+
+    - **Scenario:**
+
+        - The `fork` process spawning method (default on Linux) conflicts with threaded libraries.
+
+    - **Possible solution:**
+
+        - Switch to the `spawn` method for process pool creation. [Learn more about process spawning.](./performance#normalize)
+
+### Schema inference errors
+
+Complex or inconsistent data structures can cause schema inference failures.
+
+### Common scenarios and possible solutions
+
+1. **Inconsistent data types:**
+
+    - **Scenario:**
+      ```py
+      # First pipeline run
+      data_run_1 = [
+          {"id": 1, "value": 42},              # value is integer
+          {"id": 2, "value": 123}
+      ]
+      
+      # Second pipeline run
+      data_run_2 = [
+          {"id": 3, "value": "high"},          # value changes to text
+          {"id": 4, "value": "low"}
+      ]
+      
+      # Third pipeline run
+      data_run_3 = [
+          {"id": 5, "value": 789},             # back to integer
+          {"id": 6, "value": "medium"}         # mixed types
+      ]
+      ```
+
+    - **Impact:**
+
+        - Original column remains as is.
+        - New variant column `value__v_text` created for text values.
+        - May require additional data handling in downstream processes.
+
+    - **Possible solutions:**
+
+        - Enforce Type Consistency
+            - You can enforce type consistency using the `apply_hints` method. This ensures all values in a column follow the specified data type.
+
+                ```python
+                # Assuming 'resource' is your data resource
+                resource.apply_hints(columns={
+                    "value": {"data_type": "text"},  # Enforce 'value' to be of type 'text'
+                })
+                ```
+
+            - In this example, the `value` column is always treated as text, even if the original data contains integers or mixed types.
+
+        - Handle multiple types with separate columns.
+            - The `dlt` library automatically handles mixed data types by creating variant columns. If a column contains different data types, `dlt` generates a separate column for each type.
+            - For example, if a column named `value` contains both integers and strings, `dlt` creates a new column called `value__v_text` for the string values.
+            - After processing multiple runs, the schema will be:
+
+                ```python
+                | name          | data_type     | nullable |
+                |---------------|---------------|----------|
+                | id            | bigint        | true     |
+                | value         | bigint        | true     |
+                | value__v_text | text          | true     |
+                ```
+
+        - Use Type validation **to Ensure Consistency**
+            - When processing pipeline runs with mixed data types, type validation can be applied to enforce strict type rules.
+
+            - **Example:**
+            ```py
+            def validate_value(value):
+                if not isinstance(value, (int, str)):  # Allow only integers and strings
+                    raise TypeError(f"Invalid type: {type(value)}. Expected int or str.")
+                return str(value)  # Convert all values to a consistent type (e.g., text)
+
+            # First pipeline run
+            data_run_1 = [{"id": 1, "value": validate_value(42)},
+                          {"id": 2, "value": validate_value(123)}]
+            
+            # Second pipeline run
+            data_run_2 = [{"id": 3, "value": validate_value("high")},
+                          {"id": 4, "value": validate_value("low")}]
+            
+            # Third pipeline run
+            data_run_3 = [{"id": 7, "value": validate_value([1, 2, 3])}]
+            ```
+
+            In this example, data_run_4 contains an invalid value (a list) instead of an integer or string. When the pipeline runs with data_run_4, the validate_value function raises a TypeError.
+
+2. **Nested data challenges:**
+
+    - **Scenario:**
+
+        - Issues arise due to deep nesting, inconsistent nesting, or unsupported types.
+
+    - **Possible solution:**
+
+        - Simplify nested structures or preprocess data. [Read about nested tables.](../general-usage/destination-tables#nested-tables)
+        - You can limit unnesting level with `max_table_nesting`.
+
+## Load stage
+
+Failures in the **Load** stage often relate to authentication issues, schema changes, datatype mismatches or memory problems. 
+
+### Authentication and connection failures
+
+### Common scenarios and possible solutions
+
+- **Scenario:**
+
+    - Incorrect credentials.
+    - Data loading is interrupted due to connection issues or database downtime. This may leave some tables partially loaded or completely empty, halting the pipeline process.
+
+- **Possible solution:**
+
+    - Verify credentials and follow proper setup instructions. [Credential setup guide.](../general-usage/credentials/setup)
+    - If the connection is restored, you can resume the load process using the `pipeline.load()` method. This ensures the pipeline picks up from where it stopped, reloading any remaining data packages.
+    - If data was **partially loaded**, check the `dlt_loads` table. If a `load_id` is missing from this table, it means the corresponding load **failed**. You can then remove partially loaded data by deleting any records associated with `load_id` values that do not exist in `dlt_loads`. [More details here](../general-usage/destination-tables#load-packages-and-load-ids).
+
+### Schema changes (e.g., column renaming, Datatype mismatches)
+
+### Common scenarios and possible solutions
+
+- **Scenario:**
+
+    - Renamed columns create variant columns in the destination schema.
+    - Incoming datatypes that the destination doesn’t support result in variant columns.
+
+- **Possible solution:**
+
+    - Use schema evolution to handle column renaming. [Read more about schema evolution.](../general-usage/schema-evolution#evolving-the-schema)
+
+### Memory management issues
+
+- **Scenario:**
+
+    - Loading large datasets without file rotation enabled. This would make dlt try to upload a huge data set into destination at once. *(Note: Rotation is disabled by default.)*
+
+- **Impact:**
+
+    - Pipeline failures due to out-of-memory errors.
+
+- **Solution:**
+
+    - Enable file rotation. [Read more about it here.](./performance#controlling-intermediary-file-size-and-rotation)
+
+By identifying potential failure scenarios and applying the suggested mitigation strategies, you can ensure reliable and efficient pipeline performance.
\ No newline at end of file
diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js
index d4fd9d4341..8432834706 100644
--- a/docs/website/sidebars.js
+++ b/docs/website/sidebars.js
@@ -463,6 +463,7 @@ const sidebars = {
             'dlt-ecosystem/table-formats/iceberg',
           ]
         },
+        'reference/troubleshooting',
         'reference/frequently-asked-questions',
       ],
     },

From 2a269b875253898fe84fb45122b4aa4af76c9ec9 Mon Sep 17 00:00:00 2001
From: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Date: Sun, 16 Feb 2025 03:20:42 +0000
Subject: [PATCH 2/7] Updated doc

---
 docs/website/docs/reference/troubleshooting.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/docs/website/docs/reference/troubleshooting.md b/docs/website/docs/reference/troubleshooting.md
index c70964df22..087b0f3efa 100644
--- a/docs/website/docs/reference/troubleshooting.md
+++ b/docs/website/docs/reference/troubleshooting.md
@@ -311,7 +311,7 @@ Complex or inconsistent data structures can cause schema inference failures.
         - Enforce Type Consistency
             - You can enforce type consistency using the `apply_hints` method. This ensures all values in a column follow the specified data type.
 
-                ```python
+                ```py
                 # Assuming 'resource' is your data resource
                 resource.apply_hints(columns={
                     "value": {"data_type": "text"},  # Enforce 'value' to be of type 'text'
@@ -325,7 +325,7 @@ Complex or inconsistent data structures can cause schema inference failures.
             - For example, if a column named `value` contains both integers and strings, `dlt` creates a new column called `value__v_text` for the string values.
             - After processing multiple runs, the schema will be:
 
-                ```python
+                ```text
                 | name          | data_type     | nullable |
                 |---------------|---------------|----------|
                 | id            | bigint        | true     |

From 7c939e9318a23476a4501a0a1da514e352075fd7 Mon Sep 17 00:00:00 2001
From: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Date: Tue, 18 Feb 2025 13:09:03 +0000
Subject: [PATCH 3/7] Updated as per comments

---
 docs/website/docs/reference/troubleshooting.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/docs/website/docs/reference/troubleshooting.md b/docs/website/docs/reference/troubleshooting.md
index 087b0f3efa..a974d696b7 100644
--- a/docs/website/docs/reference/troubleshooting.md
+++ b/docs/website/docs/reference/troubleshooting.md
@@ -1,9 +1,11 @@
 ---
-title: Pipeline common failure scenarios and mitigation measures
+title: Troubleshooting
 description: Doc explaining the common failure scenarios in extract, transform and load stage and their mitigation measures
 keywords: [faq, usage information, technical help]
 ---
 
+# Pipeline common failure scenarios and mitigation measures
+
 This guide outlines common failure scenarios during the Extract, Normalize, and Load stages of a data pipeline.
 
 ## Extract stage
@@ -96,7 +98,7 @@ Source errors typically result from rate limits, invalid credentials, or misconf
         
     Read [REST API’s source here.](../dlt-ecosystem/verified-sources/rest_api/)
 
-## Memory errors
+### Memory errors
 
 Memory issues can disrupt extraction processes.
 
@@ -124,7 +126,7 @@ Memory issues can disrupt extraction processes.
 
            - If your storage reaches its limit, you can mount an external cloud storage location and set the `DLT_DATA_DIR` environment variable to point to it. This ensures that dlt uses the mounted storage as its data directory instead of local disk space. [Read more here.](./performance)
    
-## Unsupported timestamps
+### Unsupported timestamps
 
 Timestamp issues occur when formats are incompatible with the destination or inconsistent across pipeline runs.
 

From 4b4bb68a18b7a830e058e637608eefa37933cc07 Mon Sep 17 00:00:00 2001
From: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Date: Sun, 23 Feb 2025 09:33:23 +0000
Subject: [PATCH 4/7] Updated for filenotfound error

---
 .../website/docs/reference/troubleshooting.md | 35 +++++++++++++++++--
 1 file changed, 33 insertions(+), 2 deletions(-)

diff --git a/docs/website/docs/reference/troubleshooting.md b/docs/website/docs/reference/troubleshooting.md
index a974d696b7..4be54ab679 100644
--- a/docs/website/docs/reference/troubleshooting.md
+++ b/docs/website/docs/reference/troubleshooting.md
@@ -181,7 +181,7 @@ Timestamp issues occur when formats are incompatible with the destination or inc
 
         - Standardize timestamp formats across all runs to maintain consistent schema inference and avoid the creation of variant columns.
         
-3. Inconsistent formats for incremental loading
+3. **Inconsistent formats for incremental loading**
 
     - **Scenario:**
 
@@ -402,6 +402,37 @@ Failures in the **Load** stage often relate to authentication issues, schema cha
 
     - Use schema evolution to handle column renaming. [Read more about schema evolution.](../general-usage/schema-evolution#evolving-the-schema)
 
+### **`FileNotFoundError` for 'schema_updates.json' in parallel runs**
+
+- **Scenario**
+  When running the same pipeline name multiple times in parallel (e.g., via Airflow), `dlt` may fail at the load stage with an error like:
+  
+  > `FileNotFoundError: schema_updates.json not found`
+  
+  This happens because `schema_updates.json` is generated during normalization. Concurrent runs using the same pipeline name may overwrite or lock access to this file, causing failures.
+  
+- **Possible Solutions**
+  
+  1. **Use unique pipeline names for each parallel run** 
+  
+     If calling `pipeline.run()` multiple times within the same workflow (e.g., once per resource), assign a unique `pipeline_name` for each run. This ensures separate working directories, preventing file conflicts.
+  
+  2. **Leverage dlt’s concurrency management or Airflow helpers**  
+  
+     dlt’s Airflow integration “serializes” resources into separate tasks while safely handling concurrency. To parallelize resource extraction without file conflicts, use:  
+     ```py
+     decompose="serialize"
+     ```
+     More details are available in the [Airflow documentation](../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer#2-valueerror-can-only-decompose-dlt-source).
+  
+  3. **Disable dev mode to prevent multiple destination datasets**  
+  
+     When `dev_mode=True`, dlt generates unique dataset names (`<dataset_name>_<timestamp>`) for each run. To maintain a consistent dataset, set:  
+     ```py
+     dev_mode=False
+     ```
+     Read more about this in the [dev mode documentation](../general-usage/pipeline#do-experiments-with-dev-mode).
+
 ### Memory management issues
 
 - **Scenario:**
@@ -412,7 +443,7 @@ Failures in the **Load** stage often relate to authentication issues, schema cha
 
     - Pipeline failures due to out-of-memory errors.
 
-- **Solution:**
+- **Possible Solution:**
 
     - Enable file rotation. [Read more about it here.](./performance#controlling-intermediary-file-size-and-rotation)
 

From 94dfe8202c6c084617ac211a34d3806db9d32048 Mon Sep 17 00:00:00 2001
From: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Date: Mon, 24 Feb 2025 13:59:19 +0000
Subject: [PATCH 5/7] Updated

---
 docs/website/docs/reference/troubleshooting.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/website/docs/reference/troubleshooting.md b/docs/website/docs/reference/troubleshooting.md
index 4be54ab679..2fe8065378 100644
--- a/docs/website/docs/reference/troubleshooting.md
+++ b/docs/website/docs/reference/troubleshooting.md
@@ -402,7 +402,7 @@ Failures in the **Load** stage often relate to authentication issues, schema cha
 
     - Use schema evolution to handle column renaming. [Read more about schema evolution.](../general-usage/schema-evolution#evolving-the-schema)
 
-### **`FileNotFoundError` for 'schema_updates.json' in parallel runs**
+### `FileNotFoundError` for 'schema_updates.json' in parallel runs
 
 - **Scenario**
   When running the same pipeline name multiple times in parallel (e.g., via Airflow), `dlt` may fail at the load stage with an error like:

From 72dcc99354cc39a09a3337bf0661cf63018cf0fb Mon Sep 17 00:00:00 2001
From: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Date: Mon, 3 Mar 2025 13:02:49 +0000
Subject: [PATCH 6/7] Added sections of troubleshooting to performance, schema
 and schema evolution.

---
 .../docs/general-usage/schema-evolution.md    | 77 +++++++++++++++++++
 docs/website/docs/general-usage/schema.md     | 56 ++++++++++++++
 docs/website/docs/reference/performance.md    | 12 ++-
 .../docs/walkthroughs/run-a-pipeline.md       | 22 ++++++
 4 files changed, 166 insertions(+), 1 deletion(-)

diff --git a/docs/website/docs/general-usage/schema-evolution.md b/docs/website/docs/general-usage/schema-evolution.md
index 6ef638886d..897e2ada36 100644
--- a/docs/website/docs/general-usage/schema-evolution.md
+++ b/docs/website/docs/general-usage/schema-evolution.md
@@ -213,3 +213,80 @@ Demonstrating schema evolution without talking about schema and data contracts i
 
 Schema and data contracts can be applied to entities such as ‘tables’, ‘columns’, and ‘data_types’ using contract modes such as ‘evolve’, ‘freeze’, ‘discard_rows’, and ‘discard_columns’ to tell dlt how to apply contracts for a particular entity. To read more about **schema and data contracts**, read our [documentation](./schema-contracts).
 
+## Troubleshooting
+This section addresses common schema evolution issues.
+
+1. #### Inconsistent data types:
+    - Data sources that vary in data type between pipeline runs may result in additional variant columns and may require extra handling. For example, consider the following pipeline runs:
+    ```py
+    # First pipeline run: "value" is an integer
+    data_run_1 = [
+        {"id": 1, "value": 42},              
+        {"id": 2, "value": 123}
+    ]
+    
+    # Second pipeline run: "value" changes to text
+    data_run_2 = [
+        {"id": 3, "value": "high"},
+        {"id": 4, "value": "low"}
+    ]
+    
+    # Third pipeline run: Mixed types in "value"
+    data_run_3 = [
+        {"id": 5, "value": 789},             # back to integer
+        {"id": 6, "value": "medium"}         # mixed types
+    ]
+    ```
+        
+    - As a result, the original column remains unchanged and a new variant column value__v_text is created for text values, requiring downstream processes to handle both columns appropriately.
+
+    - **Recommended solutions:**
+        - **Enforce Type consistency**
+            - You can enforce type consistency using the `apply_hints` method. This ensure that all values in the column adhere to a specified data type. For example:
+                
+            ```py
+            # Assuming 'resource' is your data resource
+            resource.apply_hints(columns={
+                "value": {"data_type": "text"},  # Enforce 'value' to be of type 'text'
+            })
+            ```
+            In this example, the `value` column is always treated as text, even if the original data contains integers or mixed types.
+
+        - **Handle multiple types with separate columns**
+            - The dlt library automatically handles mixed data types by creating variant columns. If a column contains different data types, dlt generates a separate column for each type.
+            - For example, if a column named `value` contains both integers and strings, dlt creates a new column called `value__v_text` for the string values.
+            - After processing multiple runs, the schema will be:
+              ```text
+              | name         | data_type    | nullable |
+              |--------------|--------------|----------|
+              | id           | bigint       | true     |
+              | value        | bigint       | true     |
+              | value__v_text| text         | true     |
+              ```
+                
+        - **Apply Type validation**
+            - Validate incoming data to ensure that only expected types are processed. For example:
+            ```py
+            def validate_value(value):
+                if not isinstance(value, (int, str)):  # Allow only integers and strings
+                    raise TypeError(f"Invalid type: {type(value)}. Expected int or str.")
+                return str(value)  # Convert all values to a consistent type (e.g., text)
+            
+            # First pipeline run
+            data_run_1 = [{"id": 1, "value": validate_value(42)},
+                          {"id": 2, "value": validate_value(123)}]
+            
+            # Second pipeline run
+            data_run_2 = [{"id": 3, "value": validate_value("high")},
+                          {"id": 4, "value": validate_value("low")}]
+            
+            # Third pipeline run
+            data_run_3 = [{"id": 7, "value": validate_value([1, 2, 3])}]
+            ```
+            
+            In this example, `data_run_3` contains an invalid value (a list) instead of an integer or string. When the pipeline runs with `data_run_3`, the `validate_value` function raises a `TypeError`.
+            
+#### 2. Nested data challenges:
+- Issues arise due to deep nesting, inconsistent nesting, or unsupported types.
+
+- To avoid this, you can simplify nested structures or preprocess data [see nested tables](../general-usage/destination-tables#nested-tables) or limit the unnesting level with max_table_nesting.
\ No newline at end of file
diff --git a/docs/website/docs/general-usage/schema.md b/docs/website/docs/general-usage/schema.md
index 32850699f1..94374ab544 100644
--- a/docs/website/docs/general-usage/schema.md
+++ b/docs/website/docs/general-usage/schema.md
@@ -441,3 +441,59 @@ def textual(nesting_level: int):
     return dlt.resource([])
 ```
 
+## Troubleshooting
+
+This section addresses common datatype issues.
+
+### Unsupported timestamps and format issues
+
+Timestamp issues can occur when the formats are incompatible with the destination or when they change inconsistently between pipeline runs.
+
+#### 1. Unsupported formats or features
+- Combining `precision` and `timezone` in timestamps causes errors in specific destinations (e.g., DuckDB).
+- You can simplify the timestamp format to exclude unsupported features. For example:
+
+  ```py
+  import dlt
+  
+  @dlt.resource(
+      columns={"event_tstamp": {"data_type": "timestamp", "precision": 3, "timezone": False}},
+      primary_key="event_id",
+  )
+  def events():
+      yield [{"event_id": 1, "event_tstamp": "2024-07-30T10:00:00.123+00:00"}]
+  
+  pipeline = dlt.pipeline(destination="duckdb")
+  pipeline.run(events())
+  ```
+
+#### 2. Inconsistent formats across runs
+- Different pipeline runs use varying timestamp formats (e.g., `YYYY-MM-DD HH:MM:SS` vs. ISO 8601 vs. non-standard formats).
+- As a result, the destination (e.g., BigQuery) might infer the timestamp column in one run, but later runs with incompatible formats (like `20-08-2024` or `04th of January 2024`) result in the creation of variant columns (e.g., `end_date__v_text`).
+- It is best practice to standardize timestamp formats across all pipeline runs to maintain consistent column datatype inference.
+
+#### 3. Inconsistent formats for incremental loading
+- Data source returns string timestamps but incremental loading is configured with an integer timestamp value.
+  - Example:
+      ```py
+      # API response
+      data = [
+          {"id": 1, "name": "Item 1", "created_at": "2024-01-01 00:00:00"},
+      ]
+      
+      # Incorrect configuration (type mismatch)
+      @dlt.resource(primary_key="id")
+      def my_data(
+          created_at=dlt.sources.incremental(
+              "created_at",
+              initial_value= 9999
+          )
+      ):
+          yield data 
+      ```
+- This makes the pipeline fails with an `IncrementalCursorInvalidCoercion` error because it cannot compare an integer (`initial_value` of 9999) with a string timestamp. The error indicates a type mismatch between the expected and actual data formats.
+- To solve this, you can:
+    - Use string timestamp for incremental loading.
+    - Convert source data using “add_map”.
+    - If you need to use timestamps for comparison but want to preserve the original format, create a separate column.
+
diff --git a/docs/website/docs/reference/performance.md b/docs/website/docs/reference/performance.md
index f7773ff83f..23834f04d7 100644
--- a/docs/website/docs/reference/performance.md
+++ b/docs/website/docs/reference/performance.md
@@ -122,10 +122,20 @@ Below, we set files to rotate after 100,000 items written or when the filesize e
 
 <!--@@@DLT_SNIPPET ./performance_snippets/toml-snippets.toml::file_size_toml-->
 
+:::note NOTE
+When working with a single resource that handles a very large dataset, memory exhaustion may occur during processing. To mitigate this, enable file rotation by configuring `file_max_items` or `file_max_bytes` to split the data into smaller chunks and consider increasing the number of parallel workers for better processing. Read more about [parallel processing.](#parallelism-within-a-pipeline)
+:::
+
 ### Disabling and enabling file compression
 Several [text file formats](../dlt-ecosystem/file-formats/) have `gzip` compression enabled by default. If you wish that your load packages have uncompressed files (e.g., to debug the content easily), change `data_writer.disable_compression` in config.toml. The entry below will disable the compression of the files processed in the `normalize` stage.
 <!--@@@DLT_SNIPPET ./performance_snippets/toml-snippets.toml::compression_toml-->
 
+### Handling insufficient RAM for in-memory operations
+If your available RAM is not sufficient for in-memory operations, consider these optimizations:
+
+Adjust the `buffer_max_items` setting to fine-tune the size of in-memory buffers. This helps to prevent memory overconsumption when processing large datasets. For more details, [see the buffer configuration guide.](#controlling-in-memory-buffers)
+
+For handling big data efficiently, process your data in **chunks** rather than loading it entirely into memory. This batching approach allows for more effective resource management and can significantly reduce memory usage.
 
 ### Freeing disk space after loading
 
@@ -197,7 +207,7 @@ The default is to not parallelize normalization and to perform it in the main pr
 :::
 
 :::note
-Normalization is CPU-bound and can easily saturate all your cores. Never allow `dlt` to use all cores on your local machine.
+Normalization is CPU-bound and can easily saturate all your cores if not configured properly. Too many workers may exhaust resources; too few may underutilize capacity. Never allow dlt to use all available cores on your local machine, adjust the worker settings in your `config.toml` accordingly.
 :::
 
 :::caution
diff --git a/docs/website/docs/walkthroughs/run-a-pipeline.md b/docs/website/docs/walkthroughs/run-a-pipeline.md
index 49b5cb33e1..208a52f747 100644
--- a/docs/website/docs/walkthroughs/run-a-pipeline.md
+++ b/docs/website/docs/walkthroughs/run-a-pipeline.md
@@ -282,6 +282,28 @@ should tell you what went wrong.
 The most probable cause of the failed job is **the data in the job file**. You can inspect the file
 using the **JOB file path** provided.
 
+### Exceeding API rate limits
+
+If your pipeline triggers an HTTP `Error 429`, this means that the API has temporarily blocked your requests due to exceeding the allowed rate limits. Here are some steps to help you troubleshoot and resolve the issue:
+
+- Ensure that your API credentials are set up correctly so that your requests are properly authenticated.
+
+- Check the API’s guidelines on rate limits. Look for headers such as `Retry-After` in the response to determine how long you should wait before retrying.
+
+- Use tools like `time.sleep()` or libraries such as `ratelimiter` to introduce delays between requests. This helps in staying within the allowed limits.
+
+- Incorporate exponential backoff strategies in your code. This means if a request fails with a `429`, you wait for a short period and then try again, increasing the wait time on subsequent failures.
+
+- Consider batching requests or caching results to reduce the number of API calls needed during your data load process.
+
+### Connection failures
+
+Data loading can be interrupted due to connection issues or database downtime. When this happens, some tables might be partially loaded or even empty, which halts the pipeline process.
+
+- If the connection is restored, you can resume the load process using the `pipeline.load()` method. This method will pick up from where the previous load stopped and will reload any remaining data packages.
+
+- In the event that data was partially loaded, check the `dlt_loads` table. If a specific `load_id` is missing from this table, it indicates that the corresponding load has failed. You can then remove any partially loaded data by deleting records associated with those `load_id` values that do not exist in `dlt_loads`. More details can be found in the destination [tables documentation.](../general-usage/destination-tables#load-packages-and-load-ids)
+
 ## Further readings
 
 - [Beef up your script for production](../running-in-production/running.md), easily add alerting,

From 8ca85f98f6210c43c1120190d0758d8a147d2bec Mon Sep 17 00:00:00 2001
From: dat-a-man <98139823+dat-a-man@users.noreply.github.com>
Date: Mon, 3 Mar 2025 13:04:39 +0000
Subject: [PATCH 7/7] removed trouble shooting doc

---
 .../website/docs/reference/troubleshooting.md | 450 ------------------
 docs/website/sidebars.js                      |   1 -
 2 files changed, 451 deletions(-)
 delete mode 100644 docs/website/docs/reference/troubleshooting.md

diff --git a/docs/website/docs/reference/troubleshooting.md b/docs/website/docs/reference/troubleshooting.md
deleted file mode 100644
index 2fe8065378..0000000000
--- a/docs/website/docs/reference/troubleshooting.md
+++ /dev/null
@@ -1,450 +0,0 @@
----
-title: Troubleshooting
-description: Doc explaining the common failure scenarios in extract, transform and load stage and their mitigation measures
-keywords: [faq, usage information, technical help]
----
-
-# Pipeline common failure scenarios and mitigation measures
-
-This guide outlines common failure scenarios during the Extract, Normalize, and Load stages of a data pipeline.
-
-## Extract stage
-
-Failures during the **Extract** stage often stem from source errors, memory limitations, or timestamp-related issues. Below are common scenarios and possible solutions.
-
-### Source errors
-
-Source errors typically result from rate limits, invalid credentials, or misconfigured settings.
-
-### Common scenarios and possible solutions
-
-1. **Rate limits (Error 429):**
-
-    - **Scenario:**
-        - Exceeding API rate limits triggers `Error 429`.
-
-    - **Possible solution:**
-        - Verify that authentication is functioning as expected.
-        - Increase the API rate limit if permissible.
-        - Review the API documentation to understand rate limits and examine headers such as `Retry-After`.
-        - Implement request delays using functions like `time.sleep()` or libraries such as `ratelimiter` to ensure compliance with rate limits.
-        - Handle "Too Many Requests" (`429`) responses by implementing retry logic with exponential backoff strategies.
-        - Optimize API usage by batching requests when possible and caching results to reduce the number of calls.
-
-2. **Invalid credentials (Error 401, 403, or `ConfigFieldMissingException`):**
-
-    - **Scenario:**
-        - Missing or invalid credentials cause these errors.
-
-    - **Possible solution:**
-        - Verify credentials and ensure proper scopes/permissions are enabled. For more on how to set up credentials:  [Read our docs](../general-usage/credentials/setup).
-        - If dlt expects a configuration of secrets value but cannot find it, it will output the `ConfigFieldMissingException`. [Read more about the exceptions here.](../general-usage/credentials/setup#understanding-the-exceptions)
-
-3. **Source configuration errors (`DictValidationException`):**
-
-    - **Scenario:**
-        - Incorrect field placement (e.g., `params` outside the `endpoint` field).
-        - Unexpected fields in the configuration.
-        - For example, this is the incorrect configuration:
-      ```py
-      # ERROR 2: Method outside endpoint
-      source = rest_api_source(
-          config={
-              "client": {
-                  "base_url": "https://jsonplaceholder.typicode.com/"
-              },
-              "resources": [
-                  {
-                      "name": "posts",
-                      # Wrong: method should be inside endpoint
-                      "method": "GET",  
-                      "endpoint": {
-                          "path": "posts",
-                          "params": {
-                              "_limit": 5
-                          }
-                      }
-                  }
-              ]
-          }
-      )
-      ```
-    - Correct configuration:
-    
-    ```py
-    # Create the source first
-    source = rest_api_source(
-        config={
-            "client": {
-                "base_url": "https://jsonplaceholder.typicode.com/"
-            },
-            "resources": [  # Add this line
-                {
-                    "name": "posts",
-                    "endpoint": {
-                        "path": "posts",
-                        "method": "GET",
-                        "params": {
-                            "_limit": 5
-                        }
-                    }
-                }
-            ]
-        }
-    )
-    ```
-    - **Possible solution:**
-    - Review and validate the code configuration structure against the source documentation.
-        
-    Read [REST API’s source here.](../dlt-ecosystem/verified-sources/rest_api/)
-
-### Memory errors
-
-Memory issues can disrupt extraction processes.
-
-### Common scenarios and possible solutions
-
-   1. **RAM exhaustion:**
-
-       - **Scenario:**
-           - Available RAM is insufficient for in-memory operations.
-
-       - **Possible solution:**
-
-           1. **Buffer Size Management:**
-               - Adjust `max_buffer_items` to limit buffer size. [Learn about buffer configuration.](./performance#controlling-in-memory-buffers)
-           2. Streaming Processing
-               - Big data should be processed in chunks for efficient handling.
-
-   2. **Storage memory shortages:**
-
-       - **Scenario:**
-
-           - Intermediate files exceed available storage space.
-
-       - **Possible solution:**
-
-           - If your storage reaches its limit, you can mount an external cloud storage location and set the `DLT_DATA_DIR` environment variable to point to it. This ensures that dlt uses the mounted storage as its data directory instead of local disk space. [Read more here.](./performance)
-   
-### Unsupported timestamps
-
-Timestamp issues occur when formats are incompatible with the destination or inconsistent across pipeline runs.
-
-### Common scenarios and possible solutions
-
-1. **Unsupported formats or features:**
-
-    - **Scenario:**
-
-        - Combining `precision` and `timezone` in timestamps causes errors in specific destinations (e.g., DuckDB).
-
-    - **Possible solution:**
-
-        - Simplify the timestamp format to exclude unsupported features. Example:
-
-        ```py
-        import dlt
-
-        @dlt.resource(
-            columns={"event_tstamp": {"data_type": "timestamp", "precision": 3, "timezone": False}},
-            primary_key="event_id",
-        )
-        def events():
-            yield [{"event_id": 1, "event_tstamp": "2024-07-30T10:00:00.123+00:00"}]
-        
-        pipeline = dlt.pipeline(destination="duckdb")
-        pipeline.run(events())
-        ```
-
-2. **Inconsistent formats across runs:**
-
-    - **Scenario:**
-
-        - Different pipeline runs use varying timestamp formats, affecting column datatype inference at the destination.
-        
-    - **Impact:**
-
-        - For instance:
-            - **1st pipeline run:** `{"id": 1, "end_date": "2024-02-28 00:00:00"}`
-            - **2nd pipeline run:** `{"id": 2, "end_date": "2024/02/28"}`
-            - **3rd pipeline run:** `{"id": 3, "end_date": "2024-07-30T10:00:00.123456789"}`
-       
-        - If the first run uses a timestamp-compatible format (e.g., `YYYY-MM-DD HH:MM:SS`), the destination (BigQuery) infers the column as a `TIMESTAMP`. Subsequent runs using compatible formats are automatically converted to this type.
-        
-        - However, introducing incompatible formats later, such as:
-            - **4th pipeline run:** `{"id": 4, "end_date": "20-08-2024"}` (DD-MM-YYYY)
-            - **5th pipeline run:** `{"id": 5, "end_date": "04th of January 2024"}`
-        
-        - BigQuery will interpret these as text and create a new variant column (`end_date__v_text`) to store the incompatible values. This preserves the schema consistency while accommodating all data.
-
-    - **Possible solution:**
-
-        - Standardize timestamp formats across all runs to maintain consistent schema inference and avoid the creation of variant columns.
-        
-3. **Inconsistent formats for incremental loading**
-
-    - **Scenario:**
-
-        - Data source returns string timestamps but incremental loading is configured with an integer timestamp value.
-        - Example:
-        ```py
-        # API response
-        data = [
-            {"id": 1, "name": "Item 1", "created_at": "2024-01-01 00:00:00"},
-        ]
-        
-        # Incorrect configuration (type mismatch)
-        @dlt.resource(primary_key="id")
-        def my_data(
-            created_at=dlt.sources.incremental(
-                "created_at",
-                initial_value= 9999
-            )
-        ):
-            yield data
-        ```
-
-    - **Impact:**
-
-       - Pipeline fails with `IncrementalCursorInvalidCoercion` error
-       - Error message indicates comparison failure between integer and string types
-       - Unable to perform incremental loading until type mismatch is resolved
-
-    - **Possible Solutions:**
-
-       - Use string timestamp for incremental loading.
-       - Convert source data using “add_map”.
-       - If you need to use timestamps for comparison but want to preserve the original format, create a separate column.
-
-## Normalize stage
-
-Failures during the **Normalize** stage commonly arise from memory limitations, parallelization issues, or schema inference errors.
-
-### Memory errors
-
-Memory-intensive operations may fail during normalization.
-
-### Common scenarios and possible solutions
-
-1. **Large dataset in one resource:**
-
-    - **Scenario:**
-
-        - Large datasets exhaust memory during processing.
-
-    - **Possible solution:**
-
-        - Enable file rotation using `file_max_items` or `file_max_bytes`.
-        - Increase parallel workers for better processing. [Read more about parallel processing.](./performance#parallelism-within-a-pipeline)
-
-2. **Storage memory shortages**
-
-    - **Scenario:**
-
-        - When lots of files are being processed, the available storage space might be insufficient.
-
-    - **Possible solution:**
-
-        - If your storage reaches its limit, you can mount an external cloud storage location and set the `DLT_DATA_DIR` environment variable to point to it. This ensures that dlt uses the mounted storage as its data directory instead of local disk space. [Read more here.](./performance#keep-pipeline-working-folder-in-a-bucket-on-constrained-environments)
-
-### Parallelization issues
-
-Improper configuration of workers may lead to inefficiencies or failures.
-
-### Common scenarios and possible solutions
-
-1. **Resource exhaustion or underutilization:**
-
-    - **Scenario:**
-
-        - Too many workers may exhaust resources; too few may underutilize capacity.
-
-    - **Possible solution:**
-
-        - Adjust worker settings in the `config.toml` file. [Read more about parallel processing.](./performance#parallelism-within-a-pipeline)
-
-2. **Threading conflicts:**
-
-    - **Scenario:**
-
-        - The `fork` process spawning method (default on Linux) conflicts with threaded libraries.
-
-    - **Possible solution:**
-
-        - Switch to the `spawn` method for process pool creation. [Learn more about process spawning.](./performance#normalize)
-
-### Schema inference errors
-
-Complex or inconsistent data structures can cause schema inference failures.
-
-### Common scenarios and possible solutions
-
-1. **Inconsistent data types:**
-
-    - **Scenario:**
-      ```py
-      # First pipeline run
-      data_run_1 = [
-          {"id": 1, "value": 42},              # value is integer
-          {"id": 2, "value": 123}
-      ]
-      
-      # Second pipeline run
-      data_run_2 = [
-          {"id": 3, "value": "high"},          # value changes to text
-          {"id": 4, "value": "low"}
-      ]
-      
-      # Third pipeline run
-      data_run_3 = [
-          {"id": 5, "value": 789},             # back to integer
-          {"id": 6, "value": "medium"}         # mixed types
-      ]
-      ```
-
-    - **Impact:**
-
-        - Original column remains as is.
-        - New variant column `value__v_text` created for text values.
-        - May require additional data handling in downstream processes.
-
-    - **Possible solutions:**
-
-        - Enforce Type Consistency
-            - You can enforce type consistency using the `apply_hints` method. This ensures all values in a column follow the specified data type.
-
-                ```py
-                # Assuming 'resource' is your data resource
-                resource.apply_hints(columns={
-                    "value": {"data_type": "text"},  # Enforce 'value' to be of type 'text'
-                })
-                ```
-
-            - In this example, the `value` column is always treated as text, even if the original data contains integers or mixed types.
-
-        - Handle multiple types with separate columns.
-            - The `dlt` library automatically handles mixed data types by creating variant columns. If a column contains different data types, `dlt` generates a separate column for each type.
-            - For example, if a column named `value` contains both integers and strings, `dlt` creates a new column called `value__v_text` for the string values.
-            - After processing multiple runs, the schema will be:
-
-                ```text
-                | name          | data_type     | nullable |
-                |---------------|---------------|----------|
-                | id            | bigint        | true     |
-                | value         | bigint        | true     |
-                | value__v_text | text          | true     |
-                ```
-
-        - Use Type validation **to Ensure Consistency**
-            - When processing pipeline runs with mixed data types, type validation can be applied to enforce strict type rules.
-
-            - **Example:**
-            ```py
-            def validate_value(value):
-                if not isinstance(value, (int, str)):  # Allow only integers and strings
-                    raise TypeError(f"Invalid type: {type(value)}. Expected int or str.")
-                return str(value)  # Convert all values to a consistent type (e.g., text)
-
-            # First pipeline run
-            data_run_1 = [{"id": 1, "value": validate_value(42)},
-                          {"id": 2, "value": validate_value(123)}]
-            
-            # Second pipeline run
-            data_run_2 = [{"id": 3, "value": validate_value("high")},
-                          {"id": 4, "value": validate_value("low")}]
-            
-            # Third pipeline run
-            data_run_3 = [{"id": 7, "value": validate_value([1, 2, 3])}]
-            ```
-
-            In this example, data_run_4 contains an invalid value (a list) instead of an integer or string. When the pipeline runs with data_run_4, the validate_value function raises a TypeError.
-
-2. **Nested data challenges:**
-
-    - **Scenario:**
-
-        - Issues arise due to deep nesting, inconsistent nesting, or unsupported types.
-
-    - **Possible solution:**
-
-        - Simplify nested structures or preprocess data. [Read about nested tables.](../general-usage/destination-tables#nested-tables)
-        - You can limit unnesting level with `max_table_nesting`.
-
-## Load stage
-
-Failures in the **Load** stage often relate to authentication issues, schema changes, datatype mismatches or memory problems. 
-
-### Authentication and connection failures
-
-### Common scenarios and possible solutions
-
-- **Scenario:**
-
-    - Incorrect credentials.
-    - Data loading is interrupted due to connection issues or database downtime. This may leave some tables partially loaded or completely empty, halting the pipeline process.
-
-- **Possible solution:**
-
-    - Verify credentials and follow proper setup instructions. [Credential setup guide.](../general-usage/credentials/setup)
-    - If the connection is restored, you can resume the load process using the `pipeline.load()` method. This ensures the pipeline picks up from where it stopped, reloading any remaining data packages.
-    - If data was **partially loaded**, check the `dlt_loads` table. If a `load_id` is missing from this table, it means the corresponding load **failed**. You can then remove partially loaded data by deleting any records associated with `load_id` values that do not exist in `dlt_loads`. [More details here](../general-usage/destination-tables#load-packages-and-load-ids).
-
-### Schema changes (e.g., column renaming, Datatype mismatches)
-
-### Common scenarios and possible solutions
-
-- **Scenario:**
-
-    - Renamed columns create variant columns in the destination schema.
-    - Incoming datatypes that the destination doesn’t support result in variant columns.
-
-- **Possible solution:**
-
-    - Use schema evolution to handle column renaming. [Read more about schema evolution.](../general-usage/schema-evolution#evolving-the-schema)
-
-### `FileNotFoundError` for 'schema_updates.json' in parallel runs
-
-- **Scenario**
-  When running the same pipeline name multiple times in parallel (e.g., via Airflow), `dlt` may fail at the load stage with an error like:
-  
-  > `FileNotFoundError: schema_updates.json not found`
-  
-  This happens because `schema_updates.json` is generated during normalization. Concurrent runs using the same pipeline name may overwrite or lock access to this file, causing failures.
-  
-- **Possible Solutions**
-  
-  1. **Use unique pipeline names for each parallel run** 
-  
-     If calling `pipeline.run()` multiple times within the same workflow (e.g., once per resource), assign a unique `pipeline_name` for each run. This ensures separate working directories, preventing file conflicts.
-  
-  2. **Leverage dlt’s concurrency management or Airflow helpers**  
-  
-     dlt’s Airflow integration “serializes” resources into separate tasks while safely handling concurrency. To parallelize resource extraction without file conflicts, use:  
-     ```py
-     decompose="serialize"
-     ```
-     More details are available in the [Airflow documentation](../walkthroughs/deploy-a-pipeline/deploy-with-airflow-composer#2-valueerror-can-only-decompose-dlt-source).
-  
-  3. **Disable dev mode to prevent multiple destination datasets**  
-  
-     When `dev_mode=True`, dlt generates unique dataset names (`<dataset_name>_<timestamp>`) for each run. To maintain a consistent dataset, set:  
-     ```py
-     dev_mode=False
-     ```
-     Read more about this in the [dev mode documentation](../general-usage/pipeline#do-experiments-with-dev-mode).
-
-### Memory management issues
-
-- **Scenario:**
-
-    - Loading large datasets without file rotation enabled. This would make dlt try to upload a huge data set into destination at once. *(Note: Rotation is disabled by default.)*
-
-- **Impact:**
-
-    - Pipeline failures due to out-of-memory errors.
-
-- **Possible Solution:**
-
-    - Enable file rotation. [Read more about it here.](./performance#controlling-intermediary-file-size-and-rotation)
-
-By identifying potential failure scenarios and applying the suggested mitigation strategies, you can ensure reliable and efficient pipeline performance.
\ No newline at end of file
diff --git a/docs/website/sidebars.js b/docs/website/sidebars.js
index 8432834706..d4fd9d4341 100644
--- a/docs/website/sidebars.js
+++ b/docs/website/sidebars.js
@@ -463,7 +463,6 @@ const sidebars = {
             'dlt-ecosystem/table-formats/iceberg',
           ]
         },
-        'reference/troubleshooting',
         'reference/frequently-asked-questions',
       ],
     },