Skip to content

Commit b883aa5

Browse files
Update link validation to skip GitHub links and improve progress tracking (#3641)
* Update link validation to skip GitHub links and improve progress tracking * Potential fix for code scanning alert no. 588: Incomplete URL substring sanitization Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> --------- Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com> (cherry picked from commit 6691d40)
1 parent bc803d6 commit b883aa5

File tree

10 files changed

+46
-24
lines changed

10 files changed

+46
-24
lines changed

docs/book/component-guide/artifact-stores/custom.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -156,7 +156,7 @@ zenml artifact-store flavor register flavors.my_flavor.MyArtifactStoreFlavor
156156
```
157157

158158
{% hint style="warning" %}
159-
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
159+
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
160160

161161
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
162162
{% endhint %}

docs/book/component-guide/container-registries/custom.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -98,7 +98,7 @@ zenml container-registry flavor register flavors.my_flavor.MyContainerRegistryFl
9898
```
9999

100100
{% hint style="warning" %}
101-
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
101+
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
102102

103103
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually it's better to not have to rely on this mechanism, and initialize zenml at the root.
104104
{% endhint %}

docs/book/component-guide/data-validators/custom.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ zenml data-validator flavor register flavors.my_flavor.MyDataValidatorFlavor
4040
```
4141

4242
{% hint style="warning" %}
43-
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
43+
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
4444

4545
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually it's better to not have to rely on this mechanism, and initialize zenml at the root.
4646
{% endhint %}

docs/book/component-guide/experiment-trackers/custom.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,7 @@ zenml experiment-tracker flavor register flavors.my_flavor.MyExperimentTrackerFl
3737
```
3838

3939
{% hint style="warning" %}
40-
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
40+
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
4141

4242
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
4343
{% endhint %}

docs/book/component-guide/image-builders/custom.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -88,7 +88,7 @@ zenml image-builder flavor register flavors.my_flavor.MyImageBuilderFlavor
8888
```
8989

9090
{% hint style="warning" %}
91-
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
91+
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
9292

9393
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually it's better to not have to rely on this mechanism, and initialize zenml at the root.
9494
{% endhint %}

docs/book/component-guide/model-deployers/custom.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -143,7 +143,7 @@ zenml model-deployer flavor register flavors.my_flavor.MyModelDeployerFlavor
143143
```
144144

145145
{% hint style="warning" %}
146-
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
146+
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
147147

148148
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
149149
{% endhint %}

docs/book/component-guide/orchestrators/custom.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -101,7 +101,7 @@ zenml orchestrator flavor register flavors.my_flavor.MyOrchestratorFlavor
101101
```
102102

103103
{% hint style="warning" %}
104-
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
104+
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
105105

106106
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
107107
{% endhint %}

docs/book/component-guide/step-operators/custom.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -97,7 +97,7 @@ zenml step-operator flavor register flavors.my_flavor.MyStepOperatorFlavor
9797
```
9898

9999
{% hint style="warning" %}
100-
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/how-to/infrastructure-deployment/infrastructure-as-code/best-practices) of initializing zenml at the root of your repository.
100+
ZenML resolves the flavor class by taking the path where you initialized zenml (via `zenml init`) as the starting point of resolution. Therefore, please ensure you follow [the best practice](https://docs.zenml.io/user-guides/best-practices/iac) of initializing zenml at the root of your repository.
101101

102102
If ZenML does not find an initialized ZenML repository in any parent directory, it will default to the current working directory, but usually, it's better to not have to rely on this mechanism and initialize zenml at the root.
103103
{% endhint %}

docs/book/user-guide/best-practices/set-up-your-repository.md

+1-1
Original file line numberDiff line numberDiff line change
@@ -115,7 +115,7 @@ While it doesn't matter how you structure your ZenML project, here is a recommen
115115
└── run.py
116116
```
117117

118-
All ZenML [Project templates](using-project-templates.md#generating-project-from-a-project-template) are modeled around this basic structure. The `steps` and `pipelines` folders contain the steps and pipelines defined in your project. If your project is simpler you can also just keep your steps at the top level of the `steps` folder without the need so structure them in subfolders.
118+
All ZenML [Project templates](https://docs.zenml.io/user-guides/best-practices/project-templates) are modeled around this basic structure. The `steps` and `pipelines` folders contain the steps and pipelines defined in your project. If your project is simpler you can also just keep your steps at the top level of the `steps` folder without the need so structure them in subfolders.
119119

120120
{% hint style="info" %}
121121
It might also make sense to register your repository as a code repository. These enable ZenML to keep track of the code version that you use for your pipeline runs. Additionally, running a pipeline that is tracked in [a registered code repository](https://docs.zenml.io/user-guides/production-guide/connect-code-repository) can speed up the Docker image building for containerized stack components by eliminating the need to rebuild Docker images each time you change one of your source code files. Learn more about these in [connecting your Git repository](https://docs.zenml.io/concepts/code-repositories).

docs/link_checker.py

+37-15
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,7 @@
5959
import os
6060
import re
6161
import sys
62+
import time
6263
from concurrent.futures import ThreadPoolExecutor, as_completed
6364
from typing import Dict, List, Optional, Tuple
6465

@@ -242,6 +243,10 @@ def check_link_validity(
242243
if not HAS_REQUESTS:
243244
return url, False, "requests module not installed", None
244245

246+
# Skip GitHub links
247+
if "github.com" in url:
248+
return url, True, "GitHub link validation skipped", None
249+
245250
# Clean up escaped characters in URLs
246251
# This helps with Markdown URLs that have escaped underscores, etc.
247252
cleaned_url = clean_url(url)
@@ -326,32 +331,49 @@ def validate_urls(
326331
Returns:
327332
Dictionary of {url: (is_valid, error_message, status_code)}
328333
"""
334+
if not urls:
335+
return {}
336+
329337
results = {}
330338

339+
# Count and report GitHub links that will be skipped in validation
340+
from urllib.parse import urlparse
341+
github_urls = [url for url in urls if urlparse(url).hostname and urlparse(url).hostname.endswith("github.com")]
342+
other_urls = [url for url in urls if urlparse(url).hostname and not urlparse(url).hostname.endswith("github.com")]
343+
331344
print(f"Validating {len(urls)} links...")
332-
333-
with ThreadPoolExecutor(max_workers=max_workers) as executor:
334-
future_to_url = {
335-
executor.submit(check_link_validity, url): url for url in urls
336-
}
337-
345+
print(f"Note: {len(github_urls)} GitHub links will be automatically marked as valid (skipping validation)")
346+
347+
# Use moderate settings for non-GitHub URLs
348+
actual_max_workers = min(6, max_workers)
349+
350+
print(f"Using {actual_max_workers} workers for remaining {len(other_urls)} links...")
351+
352+
with ThreadPoolExecutor(max_workers=actual_max_workers) as executor:
353+
future_to_url = {}
354+
355+
# Submit all URLs (GitHub links will be auto-skipped in check_link_validity)
356+
for url in urls:
357+
future_to_url[executor.submit(check_link_validity, url, timeout=15)] = url
358+
359+
# Process results
338360
for i, future in enumerate(as_completed(future_to_url), 1):
339361
url = future_to_url[future]
340362
try:
341363
_, is_valid, error_message, status_code = future.result()
342364
results[url] = (is_valid, error_message, status_code)
343-
344-
# Print progress indicator
345-
if i % 10 == 0 or i == len(urls):
346-
print(
347-
f" Checked {i}/{len(urls)} links",
348-
end="\r",
349-
flush=True,
350-
)
365+
366+
if "github.com" in url:
367+
print(f" Checked URL {i}/{len(urls)} [github.com]: ✓ Skipped (automatically marked valid)")
368+
else:
369+
status = "✅ Valid" if is_valid else f"❌ {error_message}"
370+
domain = url.split('/')[2] if '://' in url and '/' in url.split('://', 1)[1] else 'unknown'
371+
print(f" Checked URL {i}/{len(urls)} [{domain}]: {status}")
372+
351373
except Exception as e:
352374
results[url] = (False, str(e), None)
375+
print(f" Error checking URL {i}/{len(urls)}: {e}")
353376

354-
print() # New line after progress
355377
return results
356378

357379

0 commit comments

Comments
 (0)