Refs #38685 - Clean duplicate erratum packages as a migration#11497
Refs #38685 - Clean duplicate erratum packages as a migration#11497sjha4 wants to merge 1 commit intoKatello:masterfrom
Conversation
Reviewer's GuideThis migration now embeds the rake task logic directly to clean up duplicate erratum_packages and their associations before converting the id column to bigint, ensuring data consistency without relying on external tasks. Entity relationship diagram for updated erratum package cleanup migrationerDiagram
KATELLO_ERRATUM_PACKAGE {
bigint id
string nvrea
integer erratum_id
string name
string filename
}
KATELLO_MODULE_STREAM_ERRATUM_PACKAGE {
integer id
bigint erratum_package_id
integer module_stream_id
}
KATELLO_ERRATUM_PACKAGE ||--o{ KATELLO_MODULE_STREAM_ERRATUM_PACKAGE : "erratum_package_id"
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
There was a problem hiding this comment.
Hey there - I've reviewed your changes and they look great!
Prompt for AI Agents
Please address the comments from this code review:
## Individual Comments
### Comment 1
<location> `db/migrate/20250613210050_use_big_int_for_erratum_packages_id.rb:46` </location>
<code_context>
+
+ update_mappings.each_slice(1000) do |batch|
+ batch.each do |old_id, new_id|
+ Katello::ModuleStreamErratumPackage
+ .where(erratum_package_id: old_id)
+ .where(
+ module_stream_id: Katello::ModuleStreamErratumPackage
+ .where(erratum_package_id: new_id)
+ .select(:module_stream_id)
+ )
+ .delete_all
+
+ Katello::ModuleStreamErratumPackage
</code_context>
<issue_to_address>
The delete_all query may be inefficient due to nested where clauses.
This approach may trigger a subquery for each batch item, degrading performance with large datasets. Please consider optimizing by precomputing module_stream_ids or batching deletions.
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
| Katello::ModuleStreamErratumPackage | ||
| .where(erratum_package_id: old_id) | ||
| .where( | ||
| module_stream_id: Katello::ModuleStreamErratumPackage | ||
| .where(erratum_package_id: new_id) | ||
| .select(:module_stream_id) | ||
| ) | ||
| .delete_all |
There was a problem hiding this comment.
suggestion (performance): The delete_all query may be inefficient due to nested where clauses.
This approach may trigger a subquery for each batch item, degrading performance with large datasets. Please consider optimizing by precomputing module_stream_ids or batching deletions.
|
Did some performance testing on a DB with the following counts: |
|
After discussions offline, we decided against having this as a migration which runs for all upgrades. The rake task itself can live independently for support purposes while not being an upgrade blocker. Closing this for reasons above. |
What are the changes introduced in this pull request?
We introduced a rake task as part of: #11480 .. However, to properly run the upgrade without depending on the rake task as upstream doesn't have f-m and the pre-upgrade hook in f-m can't access the rake task without backporting the task in time, a migration file change would be the change easier to deliver.
Considerations taken when implementing this change?
Transfer rake task logic to migration file for bigint migration which in a case triggered the bad index error.
What are the testing steps for this pull request?
Refer to steps in #11480 (comment).
However, you'll want to run bundle exec rails db:rollback before/after you've created all the test data to be able to rerun the updated migration file with bundle exec rails db:migrate.
Summary by Sourcery
Incorporate duplicate erratum package cleanup into the bigint migration to ensure a smooth upgrade without relying on an external rake task
Enhancements: