Update link validation to skip GitHub links and improve progress tracking (#3641)

htahir1 · github-advanced-security[bot] · htahir1 · commit b883aa5a7a74 · 2025-05-06T10:58:28.000+02:00
* Update link validation to skip GitHub links and improve progress tracking * Potential fix for code scanning alert no. 588: Incomplete URL substring sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> (cherry picked from commit 6691d40)
diff --git a/docs/book/component-guide/artifact-stores/custom.md b/docs/book/component-guide/artifact-stores/custom.md
@@ -156,7 +156,7 @@ zenml artifact-store flavor register flavors.my_flavor.MyArtifactStoreFlavor
 ```
 
 {% hint style="warning" %}
-ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
+ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
 
 If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
 {% endhint %}
diff --git a/docs/book/component-guide/container-registries/custom.md b/docs/book/component-guide/container-registries/custom.md
@@ -98,7 +98,7 @@ zenml container-registry flavor register flavors.my_flavor.MyContainerRegistryFl
 ```
 
 {% hint style="warning" %}
-ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
+ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
 
 If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually it's better to not have to rely on this mechanism, and initialize zenml at the root.
 {% endhint %}
diff --git a/docs/book/component-guide/data-validators/custom.md b/docs/book/component-guide/data-validators/custom.md
@@ -40,7 +40,7 @@ zenml data-validator flavor register flavors.my_flavor.MyDataValidatorFlavor
 ```
 
 {% hint style="warning" %}
-ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
+ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
 
 If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually it's better to not have to rely on this mechanism, and initialize zenml at the root.
 {% endhint %}
diff --git a/docs/book/component-guide/experiment-trackers/custom.md b/docs/book/component-guide/experiment-trackers/custom.md
@@ -37,7 +37,7 @@ zenml experiment-tracker flavor register flavors.my_flavor.MyExperimentTrackerFl
 ```
 
 {% hint style="warning" %}
-ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
+ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
 
 If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
 {% endhint %}
diff --git a/docs/book/component-guide/image-builders/custom.md b/docs/book/component-guide/image-builders/custom.md
@@ -88,7 +88,7 @@ zenml image-builder flavor register flavors.my_flavor.MyImageBuilderFlavor
 ```
 
 {% hint style="warning" %}
-ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
+ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
 
 If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually it's better to not have to rely on this mechanism, and initialize zenml at the root.
 {% endhint %}
diff --git a/docs/book/component-guide/model-deployers/custom.md b/docs/book/component-guide/model-deployers/custom.md
@@ -143,7 +143,7 @@ zenml model-deployer flavor register flavors.my_flavor.MyModelDeployerFlavor
 ```
 
 {% hint style="warning" %}
-ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
+ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
 
 If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
 {% endhint %}
diff --git a/docs/book/component-guide/orchestrators/custom.md b/docs/book/component-guide/orchestrators/custom.md
@@ -101,7 +101,7 @@ zenml orchestrator flavor register flavors.my_flavor.MyOrchestratorFlavor
 ```
 
 {% hint style="warning" %}
-ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
+ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
 
 If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
 {% endhint %}
diff --git a/docs/book/component-guide/step-operators/custom.md b/docs/book/component-guide/step-operators/custom.md
@@ -97,7 +97,7 @@ zenml step-operator flavor register flavors.my_flavor.MyStepOperatorFlavor
 ```
 
 {% hint style="warning" %}
-ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
+ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
 
 If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
 {% endhint %}
diff --git a/docs/book/user-guide/best-practices/set-up-your-repository.md b/docs/book/user-guide/best-practices/set-up-your-repository.md
@@ -115,7 +115,7 @@ While it doesn't matter how you structure your ZenML project, here is a recommen
 └── run.py
 ```
 
-All ZenML [Project templates](using-project-templates.md#generating-project-from-a-project-template) are modeled around this basic structure. The `steps` and `pipelines` folders contain the steps and pipelines defined in your project. If your project is simpler you can also just keep your steps at the top level of the `steps` folder without the need so structure them in subfolders.
+All ZenML [Project templates](https://docs.zenml.io/user-guides/best-practices/project-templates) are modeled around this basic structure. The `steps` and `pipelines` folders contain the steps and pipelines defined in your project. If your project is simpler you can also just keep your steps at the top level of the `steps` folder without the need so structure them in subfolders.
 
 {% hint style="info" %}
 It might also make sense to register your repository as a code repository. These enable ZenML to keep track of the code version that you use for your pipeline runs. Additionally, running a pipeline that is tracked in [a registered code repository](https://docs.zenml.io/user-guides/production-guide/connect-code-repository) can speed up the Docker image building for containerized stack components by eliminating the need to rebuild Docker images each time you change one of your source code files. Learn more about these in [connecting your Git repository](https://docs.zenml.io/concepts/code-repositories).
diff --git a/docs/link_checker.py b/docs/link_checker.py
@@ -59,6 +59,7 @@
 import os
 import re
 import sys
+import time
 from concurrent.futures import ThreadPoolExecutor, as_completed
 from typing import Dict, List, Optional, Tuple
 
@@ -242,6 +243,10 @@ def check_link_validity(
     if not HAS_REQUESTS:
         return url, False, "requests module not installed", None
 
+    # Skip GitHub links
+    if "github.com" in url:
+        return url, True, "GitHub link validation skipped", None
+
     # Clean up escaped characters in URLs
     # This helps with Markdown URLs that have escaped underscores, etc.
     cleaned_url = clean_url(url)
@@ -326,32 +331,49 @@ def validate_urls(
     Returns:
         Dictionary of {url: (is_valid, error_message, status_code)}
     """
+    if not urls:
+        return {}
+        
     results = {}
 
+    # Count and report GitHub links that will be skipped in validation
+    from urllib.parse import urlparse
+    github_urls = [url for url in urls if urlparse(url).hostname and urlparse(url).hostname.endswith("github.com")]
+    other_urls = [url for url in urls if urlparse(url).hostname and not urlparse(url).hostname.endswith("github.com")]
+    
     print(f"Validating {len(urls)} links...")
-
-    with ThreadPoolExecutor(max_workers=max_workers) as executor:
-        future_to_url = {
-            executor.submit(check_link_validity, url): url for url in urls
-        }
-
+    print(f"Note: {len(github_urls)} GitHub links will be automatically marked as valid (skipping validation)")
+    
+    # Use moderate settings for non-GitHub URLs
+    actual_max_workers = min(6, max_workers)
+    
+    print(f"Using {actual_max_workers} workers for remaining {len(other_urls)} links...")
+    
+    with ThreadPoolExecutor(max_workers=actual_max_workers) as executor:
+        future_to_url = {}
+        
+        # Submit all URLs (GitHub links will be auto-skipped in check_link_validity)
+        for url in urls:
+            future_to_url[executor.submit(check_link_validity, url, timeout=15)] = url
+        
+        # Process results
         for i, future in enumerate(as_completed(future_to_url), 1):
             url = future_to_url[future]
             try:
                 _, is_valid, error_message, status_code = future.result()
                 results[url] = (is_valid, error_message, status_code)
-
-                # Print progress indicator
-                if i % 10 == 0 or i == len(urls):
-                    print(
-                        f"  Checked {i}/{len(urls)} links",
-                        end="\r",
-                        flush=True,
-                    )
+                
+                if "github.com" in url:
+                    print(f"  Checked URL {i}/{len(urls)} [github.com]: ✓ Skipped (automatically marked valid)")
+                else:
+                    status = "✅ Valid" if is_valid else f"❌ {error_message}"
+                    domain = url.split('/')[2] if '://' in url and '/' in url.split('://', 1)[1] else 'unknown'
+                    print(f"  Checked URL {i}/{len(urls)} [{domain}]: {status}")
+                    
             except Exception as e:
                 results[url] = (False, str(e), None)
+                print(f"  Error checking URL {i}/{len(urls)}: {e}")
 
-    print()  # New line after progress
     return results