-
Notifications
You must be signed in to change notification settings - Fork 29
Add markdown heading anchor(TS-2595) #1177
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Add the plugins pacakge for MarkdownIt which includes the anchors plugin for automatically generating anchor id's to headings. Refs. TS-2595
Add anchors_plugin to the markdownfiy logic. Add "id" to the list of allowed attributes for h1-h6 html tags. Refs. TS-2595
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdds the mdit-py-plugins dependency, enables the anchors plugin with a custom slugger in MarkdownIt (producing heading ids prefixed with Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant View as Server View / Template
participant MarkdownTag as markdownify tag
participant MD as MarkdownIt ("gfm-like")
participant Anchors as anchors_plugin
participant Sanitizer as HTML sanitizer
View->>MarkdownTag: render_markdown(markdown)
MarkdownTag->>MD: init MarkdownIt("gfm-like")
MarkdownTag->>MD: use(Anchors, slug_func=slugger, max_level=6)
MD->>Anchors: detect headings
Anchors-->>MD: attach generated id attrs (e.g. user-content-...)
MD-->>MarkdownTag: HTML with heading ids
MarkdownTag->>Sanitizer: sanitize (allow id on h1–h6)
Sanitizer-->>View: sanitized HTML
sequenceDiagram
autonumber
participant Browser as User Browser
participant Page as base.html JS
participant DOM as Document
Browser->>Page: page load / hashchange
Page->>DOM: slugger(hash) -> try getElementById(slugged or raw)
DOM-->>Page: element found?
alt element found
Page->>DOM: element.scrollIntoView()
Page->>Browser: history.pushState/update URL
else not found
Page-->>Browser: no-op
end
Browser->>Page: user clicks internal #anchor link
Page->>DOM: intercept, compute slug, scroll as above
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 💡 Knowledge Base configuration:
You can enable these sources in your CodeRabbit configuration. 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
✨ Finishing Touches🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
CodeRabbit Configuration File (
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1177 +/- ##
=======================================
Coverage 92.23% 92.24%
=======================================
Files 331 331
Lines 10088 10094 +6
Branches 927 927
=======================================
+ Hits 9305 9311 +6
Misses 657 657
Partials 126 126 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
django/thunderstore/markdown/allowed_tags.py (1)
1-10
: Remove invalid HTML tags h7 and h8 from ALLOWED_TAGS.They’re not standard elements and won’t be emitted by Markdown. Small cleanup.
ALLOWED_TAGS = [ "h1", "h2", "h3", "h4", "h5", "h6", - "h7", - "h8", "br",django/thunderstore/markdown/templatetags/markdownify.py (1)
6-6
: Enabling anchors_plugin at init is good; consider guarding against cross-page id collisions.If there are existing DOM ids elsewhere (templates, components), heading ids could collide. Optional: prefix slugs (e.g., md-). Keeps headings unique across the full document without hurting UX.
-from mdit_py_plugins.anchors import anchors_plugin +from mdit_py_plugins.anchors import anchors_plugin, slugify as _slugify @@ -md = MarkdownIt("gfm-like").use(anchors_plugin, max_level=6) +def _prefixed_slugify(s, idx): + return f"md-{_slugify(s, idx)}" + +md = MarkdownIt("gfm-like").use(anchors_plugin, max_level=6, slugify=_prefixed_slugify)Add a quick test to ensure ids survive Bleach and remain distinct:
def test_markdown_headings_have_unique_ids(): html = render_markdown("# A\n\n# A\n\n## B") from lxml import html as lh doc = lh.fromstring(html) ids = [h.get("id") for h in doc.xpath("//h1|//h2|//h3|//h4|//h5|//h6")] assert all(ids), ids assert len(ids) == len(set(ids)), idsAlso applies to: 15-15
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
⛔ Files ignored due to path filters (1)
django/poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
django/pyproject.toml
(1 hunks)django/thunderstore/markdown/allowed_tags.py
(1 hunks)django/thunderstore/markdown/templatetags/markdownify.py
(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (8)
- GitHub Check: Build docker image
- GitHub Check: Analyze (javascript)
- GitHub Check: Test pytest (5)
- GitHub Check: Test pytest (6)
- GitHub Check: Test pytest (4)
- GitHub Check: Test pytest (3)
- GitHub Check: Test pytest (2)
- GitHub Check: Test pytest (1)
🔇 Additional comments (2)
django/thunderstore/markdown/allowed_tags.py (1)
82-87
: Allowing id on h1–h6 is correct and aligns with anchors_plugin.This enables heading anchors to survive Bleach sanitization. LGTM.
django/pyproject.toml (1)
35-35
: Anchors plugin import verified at runtimeThe
mdit-py-plugins = "0.4.2"
pin has been validated locally: importinganchors_plugin
succeeds under markdown-it-py 3.0.0 as expected. No further changes are needed—please ensure your lockfile and CI images are updated to include the new wheel.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (2)
django/thunderstore/api/cyberstorm/tests/test_markdown.py (2)
66-74
: Reduce brittleness: assert 200 and ignore trailing whitespaceAvoid failures from incidental newline differences and assert status explicitly.
response = api_client.get( f"/api/cyberstorm/package/{v.package.namespace}/{v.package.name}/latest/readme/", ) - actual = response.json() - expected_html = ( - '<h1 id="very-strong-header">Very <strong>strong</strong> header</h1>\n' - ) - assert actual["html"] == expected_html + assert response.status_code == 200 + actual = response.json() + expected_html = '<h1 id="very-strong-header">Very <strong>strong</strong> header</h1>' + assert actual["html"].strip() == expected_html
62-74
: Optional: add coverage for duplicate and non-ASCII headingsHelps prevent regressions in slug/uniqueness behavior.
Example tests to consider:
@pytest.mark.django_db def test_readme_heading_ids__deduplicate_on_conflict(api_client: APIClient) -> None: v = PackageVersionFactory(readme="# Title\n# Title") r = api_client.get(f"/api/cyberstorm/package/{v.package.namespace}/{v.package.name}/latest/readme/") assert r.status_code == 200 html = r.json()["html"] assert 'id="title"' in html # Ensure a distinct id was generated for the duplicate assert html.count('id="title') >= 2 and 'id="title-' in html @pytest.mark.django_db def test_readme_heading_ids__non_ascii_slug(api_client: APIClient) -> None: v = PackageVersionFactory(readme="# Über café") r = api_client.get(f"/api/cyberstorm/package/{v.package.namespace}/{v.package.name}/latest/readme/") assert r.status_code == 200 html = r.json()["html"] # Expect a stable, URL-safe slug; adjust assertion if your slugify rules differ assert 'id=' in html
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (1)
django/thunderstore/api/cyberstorm/tests/test_markdown.py
(1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: Analyze (python)
- GitHub Check: Analyze (javascript)
- GitHub Check: Build docker image
- GitHub Check: Build docker image
🔇 Additional comments (1)
django/thunderstore/api/cyberstorm/tests/test_markdown.py (1)
70-73
: LGTM: heading id expectation is correctThe expected anchor id matches the slug generated from the heading text and preserves inline .
Refs. TS-2595
4022395
to
ec17152
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (3)
django/thunderstore/frontend/api/experimental/tests/test_markdown.py (3)
31-32
: Reduce brittleness of HTML assertionComparing the whole HTML string is fragile to minor formatting changes. Assert on key parts instead.
Apply this diff:
- assert response.json() == {"html": MARKDOWN_RENDERED} + data = response.json() + html = data["html"] + assert 'id="test-markdown"' in html + assert "<p>this is a description</p>" in html
46-46
: Remove debug print from testLeftover
Apply this diff:
- print(response.content)
1-53
: Add coverage for more cases (h2–h6, duplicates, punctuation/unicode)Extend tests to ensure ids are present and unique across levels and repeated headings; also cover punctuation/unicode.
Example additional tests:
import pytest @pytest.mark.django_db @pytest.mark.parametrize("heading_tag, md", [ ("h1", "# A Title"), ("h2", "## Sub Title"), ("h3", "### Deeper"), ("h4", "#### Deeper"), ("h5", "##### Deeper"), ("h6", "###### Deeper"), ]) def test_headings_all_levels_have_ids(api_client, heading_tag, md): resp = api_client.post( reverse("api:experimental:frontend.render-markdown"), json.dumps({"markdown": md}), content_type="application/json", ) html = resp.json()["html"] assert f"<{heading_tag} id=" in html @pytest.mark.django_db def test_duplicate_headings_get_unique_ids(api_client): md = "# Foo\n\n# Foo\n" resp = api_client.post( reverse("api:experimental:frontend.render-markdown"), json.dumps({"markdown": md}), content_type="application/json", ) html = resp.json()["html"] # Expect two different ids containing 'foo' ids = sorted(set(__import__("re").findall(r'<h1 id="([^"]+)"', html))) assert len(ids) == 2 and ids[0].startswith("foo") and ids[1].startswith("foo") and ids[0] != ids[1] @pytest.mark.django_db def test_heading_slug_punctuation_and_unicode(api_client): md = "# Héllo, world! — v2.0" resp = api_client.post( reverse("api:experimental:frontend.render-markdown"), json.dumps({"markdown": md}), content_type="application/json", ) html = resp.json()["html"] assert "<h1 " in html and 'id="' in html # do not lock to a specific slug algorithm
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
django/thunderstore/api/cyberstorm/tests/test_markdown.py
(1 hunks)django/thunderstore/frontend/api/experimental/tests/test_markdown.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- django/thunderstore/api/cyberstorm/tests/test_markdown.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Analyze (javascript)
- GitHub Check: Build docker image
- GitHub Check: Build docker image
🔇 Additional comments (1)
django/thunderstore/frontend/api/experimental/tests/test_markdown.py (1)
15-17
: LGTM: expected heading id matches anchors plugin sluggingThe updated expectation for
<h1 id="test-markdown">
looks correct and aligned with enabling anchors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Give me a bit more time to look at this, no merging yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (2)
django/thunderstore/frontend/templates/base.html (2)
35-39
: Harden slugger for broader browser support and trim/normalize input.Unicode property escapes are widely supported but not universal; add a safe ASCII fallback. Also trim input to avoid leading/trailing dashes.
- function slugger(text) { - let slug = text.replace(/\s+/g, "-"); - slug = slug.replace(/[^\p{L}\p{N}-]+/gu, ""); - return "user-content-" + slug.toLowerCase(); - } + function slugger(text) { + let slug = String(text).trim(); + try { + slug = slug.replace(/\s+/g, "-").replace(/[^\p{L}\p{N}-]+/gu, ""); + } catch (_) { + // Fallback for older engines without Unicode property escapes + slug = slug.replace(/\s+/g, "-").replace(/[^A-Za-z0-9-]+/g, ""); + } + return "user-content-" + slug.toLowerCase(); + }
68-70
: Consider relying on native hash navigation and only augment when needed.Native behavior already scrolls to elements with matching
id
. Keeping the listeners is fine, but ensure they only act when native scrolling fails (handled by your updatedscrollToHash
returning false). Optionally, remove theDOMContentLoaded
call if you find double-scrolling in some browsers.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
💡 Knowledge Base configuration:
- MCP integration is disabled by default for public repositories
- Jira integration is disabled by default for public repositories
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
django/thunderstore/frontend/templates/base.html
(1 hunks)django/thunderstore/markdown/templatetags/markdownify.py
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- django/thunderstore/markdown/templatetags/markdownify.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Analyze (javascript)
- GitHub Check: Build docker image
- GitHub Check: Build docker image
🔇 Additional comments (1)
django/thunderstore/frontend/templates/base.html (1)
33-70
: Verify parity between backend and frontend slug algorithms.The server uses
anchors_plugin
with a custom slugger (per PR description). Any drift from this JS slugger will break cross-linking. Please confirm both produce identical ids, including for diacritics, punctuation, emoji, CJK, and duplicate-heading suffixing behavior.Would you like me to generate a small test comparing Python slug outputs against this JS version over a seeded set of headings?
Update markdown tests to account for user-content slug change
Improve anchor navigation by attempting to find normal anchors first rather than user-content prepended anchors
TODO: Put the javascript somewhere more reasonable than base.html |
Ensures all headings (# through ######) get unique ids for anchor navigation.
Refs. TS-2595
Summary by CodeRabbit
New Features
Chores
Tests