Skip to content

Fix updateinfo v2 timeout for RL8 AppStream by filtering prefetch#78

Merged
brianclemens merged 1 commit intoresf:mainfrom
rockythorn:fix/updateinfo-appstream-timeout
Apr 18, 2026
Merged

Fix updateinfo v2 timeout for RL8 AppStream by filtering prefetch#78
brianclemens merged 1 commit intoresf:mainfrom
rockythorn:fix/updateinfo-appstream-timeout

Conversation

@rockythorn
Copy link
Copy Markdown
Collaborator

@rockythorn rockythorn commented Mar 18, 2026

The get_updateinfo_v2 endpoint was loading all advisory_packages rows for every matching advisory (all repos, arches, mirrors), then discarding ~95% of them in Python. For RL8 AppStream without a minor_version filter, this meant ~300k rows loaded to produce ~20k, consistently exceeding the 30-second server worker timeout.

Changes

  • Use Prefetch with a filtered queryset on advisory__packages so only packages matching the requested repo and supported_product are loaded from the DB
  • Add composite index on advisory_packages(advisory_id, repo_name, supported_product_id) to support the filtered prefetch query
  • Add composite index on advisory_affected_products(supported_product_id, major_version, arch) to speed up the initial filter query
  • Add migration 20260318000000_add_updateinfo_perf_indexes.sql to apply the new indexes to production

Closes #77

Testing

Confirmed the endpoint was timing out on production before this fix:

Endpoint Status Time
RL8 AppStream x86_64 (production, pre-fix) ❌ 500 28.9s
RL8 AppStream aarch64 (production, pre-fix) ❌ 500 29.4s
RL8 BaseOS x86_64 (production, pre-fix) ✅ 200 8.4s

Local benchmarks were run against a DB restored from the April 2025 production dump (6,905 advisories, 384,990 packages) across three scenarios:

Scenario RL8 AppStream x86_64 RL8 AppStream aarch64 RL8 BaseOS x86_64
main, no indexes (current production state) ✅ 200 @ 8.3s ✅ 200 @ 8.3s ✅ 200 @ 1.6s
main, with indexes only ✅ 200 @ 8.2s ✅ 200 @ 8.2s ✅ 200 @ 1.4s
fix branch, Prefetch + indexes ✅ 200 @ 8.1s ✅ 200 @ 8.1s ✅ 200 @ 1.4s

Local benchmarks did not reproduce the same timing differences seen in production. We suspect this is primarily due to local hardware being significantly faster than the Kubernetes cluster production runs on — the local machine's NVMe SSD likely masks the I/O cost that would be more pronounced on production storage under concurrent load.

EXPLAIN ANALYZE gives a more hardware-independent view, showing the new indexes reduce buffer reads by 12x:

Buffers read Buffers from cache
Without indexes 5,604 11,529
With indexes 468 3,768

On production storage where data is less likely to be fully cached in memory, that 12x I/O reduction should translate directly into wall-clock time savings.

The get_updateinfo_v2 endpoint was loading all advisory_packages rows
for every matching advisory (all repos, arches, mirrors), then
discarding ~95% of them in Python. For RL8 AppStream without a
minor_version filter, this meant ~300k rows loaded to produce ~20k,
consistently exceeding the 30-second server worker timeout.

Fixes:
- Use Prefetch with a filtered queryset on advisory__packages so only
  packages matching the requested repo and supported_product are loaded
  from the DB
- Add composite index on advisory_packages(advisory_id, repo_name,
  supported_product_id) to support the filtered prefetch query
- Add composite index on advisory_affected_products(supported_product_id,
  major_version, arch) to speed up the initial filter query

Closes resf#77
Copy link
Copy Markdown

@jdieter jdieter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks sane to me, and the numbers speak for themselves!

@brianclemens brianclemens merged commit 8eb7b71 into resf:main Apr 18, 2026
1 check passed
@jdieter
Copy link
Copy Markdown

jdieter commented Apr 18, 2026

Thanks, @brianclemens!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Updateinfo.xml Timeouts

5 participants