feat(script): Verify the existence of checker config doc_url pages and find appropriate older releases for gone (removed, dealpha, etc.) checkers#4207
Merged
bruntib merged 1 commit intoMay 22, 2024
Conversation
doc_url pages and find appropriate older releases for gone (removed, dealpha, etc.) checkers
b0ceb23 to
fafc919
Compare
7964d06 to
7e479e5
Compare
7e479e5 to
c9e798c
Compare
bruntib
approved these changes
Apr 24, 2024
Contributor
Author
…dealpha) The checker label configuration at `/config/labels/analyzers` most often contains a `doc_url` entry which points to the documentation URL of the checker, as shown in the UI. When the user clicks this, the browser redirects them to this page, however, these external links are very susceptible to link rot, especially when analysers entirely decomission checkers (e.g., `clang-tidy/cert-dcl21-cpp`) or checkers change name during a dealphafication (e.g., `alpha.cplusplus.EnumCastOutOfRange` -> `optin.core.EnumCastOutOfRange`). In these cases, older analysis results stored with the older (or still extant) check will have a `doc_url` that points to nowhere in the upstream. In addition, there were several identified cases where the links were recognised as broken (both by this tool and by an actual browser) but the checker was still extant, simply because of a typo: `cplusplus.PlacementNew`, `#wdeprecated-deprecated-coroutine` (instead of `#wdeprecated-coroutine`), `#wclang-diagnostic-unsafe-buffer-usage` (instead of `#wunsafe-buffer-usage`). This patch adds an opt-in, developer-only tool under `/scripts/labels`, which automatically checks (by the way of HTTP requests and HTML DOM scraping) whether the existing URLs still point to alive links, and reports this status. If there is analyser-specific additional knowledge (e.g., ClangSA and Clang-Tidy is implemented as such as of now), it uses additional heuristics (most of which is available through reusable library components for future development!) to figure out a fixed version of the `doc_url` by normalising `#anchors` to fix typos, and looking up earlier releases in which the checked under verification was still extant.
c9e798c to
992eeef
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
The checker label configuration files most often contain a documentation page link that we suggest to the user when viewing the details of a report. These JSON files are always hard-baked into a released package, and the server serves information based on what is available in the deployed image. As all of these links point to external resources, these links are very susceptible to link rot.
For example, suppose that an analysis was stored with the
alpha.Foochecker, with the URL pointing to.../alpha/Foo.html. Once the underlying analyser's documentation changes (usually for two reasons: improving the checker and removing it from alpha, or the checker becoming completely removed from upstream!), this link is now dead. Newer reports stored withcore.Foo(.../core/Foo.html) will point to a proper documentation, but re-routingalpha.Foo's documentation page tocore.Foo's would be an invalid action, as the behaviour of the checker might have changed meanwhile, rendering the contents of the new document inapplicable to the old report! In addition, nothing prevents the user from running an older analyser with/through a newer CodeChecker package, and uploading new results from thealpha.version even when aftercore.analyser's release.This patch introduces an opt-in tool which reads the configuration files and verifies whether the URL is available to a hypothetical user. If not, it attempts to employ a heuristic pipeline to attempt a URL that corresponds to the checker with the currently dead link, first by fixing the typos in the URL, and if that is still unsuccessful, trying the documentation sites of older releases. For now, this fixing logic is only implemented for the LLVM-based analysers, Clang SA and Clang-Tidy, as implementing it requires an accurate understanding of the documentation structure of the specific analyser.