Skip to content

Add source_field to the enrichments model#1494

Merged
wendelfabianchinsamy merged 1 commit intomasterfrom
add-source-id-attribute-to-enrichment-mnodel
Mar 11, 2026
Merged

Add source_field to the enrichments model#1494
wendelfabianchinsamy merged 1 commit intomasterfrom
add-source-id-attribute-to-enrichment-mnodel

Conversation

@wendelfabianchinsamy
Copy link
Contributor

@wendelfabianchinsamy wendelfabianchinsamy commented Mar 10, 2026

Purpose

closes: https://github.com/datacite/product-backlog/issues/689

Approach

Open Questions and Pre-Merge TODOs

Learning

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

Summary by CodeRabbit

  • New Features
    • Enrichment batch processing now requires a SOURCE_ID parameter to track data origins.
    • Source identifiers are automatically normalized for consistency across records.
    • Enrichment records now include source attribution for improved data traceability.

@coderabbitai
Copy link

coderabbitai bot commented Mar 10, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a source_id field throughout the enrichment system. Changes include a database migration adding the column, model-level validation and normalization, job parameter updates to accept and pass the identifier, and rake task modifications to require and forward the parameter.

Changes

Cohort / File(s) Summary
Database Schema
db/migrate/20260310084109_add_source_id_to_enrichments.rb, db/schema.rb
New migration adds non-null string column source_id (limit 255) to enrichments table; schema version bumped accordingly.
Model Validation & Normalization
app/models/enrichment.rb
Added before_validation callback to normalize source_id (trim and uppercase); new presence validation for source_id field.
Job & Task Updates
app/jobs/enrichment_batch_process_job.rb, lib/tasks/enrichment.rake
Job perform method signature extended to accept source_id parameter; rake task now requires SOURCE_ID environment variable and forwards it to the job enqueue call.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Title check ⚠️ Warning The title mentions 'source_field' but the actual changes add 'source_id' throughout the codebase, creating a terminology mismatch. Update the title to 'Add source_id to the enrichments model' to accurately reflect the actual implementation and field name used.
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch add-source-id-attribute-to-enrichment-mnodel

Tip

Try Coding Plans. Let us write the prompt for your AI agent so you can ship faster (with fewer bugs).
Share your feedback on Discord.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
app/models/enrichment.rb (1)

5-6: Validate source_id against the column limit.

source_id is now required, but the model still allows values longer than the 255-byte column. That won't fail until save, which can turn an input mistake into a DB error instead of a normal validation failure.

Proposed fix
-  validates :uuid, presence: true, uniqueness: true
-  validates :source_id, presence: true
+  validates :uuid, presence: true, uniqueness: true
+  validates :source_id, presence: true, length: { maximum: 255 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/enrichment.rb` around lines 5 - 6, The model Enrichment validates
presence of :source_id but not its maximum length, which can allow values
exceeding the 255-byte DB column; add a length validation to the model (e.g.,
update the validates for :source_id in Enrichment to include length: { maximum:
255 } or implement a custom validator that checks source_id.bytesize <= 255) so
invalid inputs fail model validation instead of raising a DB error on save.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/jobs/enrichment_batch_process_job.rb`:
- Line 8: The perform method signature change breaks already-enqueued jobs;
update the perform definition in enrichment_batch_process_job.rb (the perform
method) to give source_id a safe default (e.g., def perform(lines, filename,
source_id = nil)) so older jobs without the third arg won't raise ArgumentError,
and ensure any downstream code inside perform handles a nil/absent source_id
appropriately.

In `@db/migrate/20260310084109_add_source_id_to_enrichments.rb`:
- Around line 4-6: The migration currently calls add_column :enrichments,
:source_id, :string, limit: 255, null: false which will fail if enrichments
already contains rows; change this to a safe two-step pattern: first add_column
:enrichments, :source_id, :string, limit: 255, null: true (or in a separate
migration create the column nullable), then backfill existing rows (e.g.
Enrichment.reset_column_information; Enrichment.where(source_id:
nil).update_all(source_id: '<appropriate value>' or computed values)), and
finally run change_column_null :enrichments, :source_id, false (and optionally
add an index) in a follow-up migration to enforce NOT NULL.

---

Nitpick comments:
In `@app/models/enrichment.rb`:
- Around line 5-6: The model Enrichment validates presence of :source_id but not
its maximum length, which can allow values exceeding the 255-byte DB column; add
a length validation to the model (e.g., update the validates for :source_id in
Enrichment to include length: { maximum: 255 } or implement a custom validator
that checks source_id.bytesize <= 255) so invalid inputs fail model validation
instead of raising a DB error on save.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2b9f3566-3c3e-4502-8e3d-70d7b4267889

📥 Commits

Reviewing files that changed from the base of the PR and between 31afb9e and 404d07d.

📒 Files selected for processing (5)
  • app/jobs/enrichment_batch_process_job.rb
  • app/models/enrichment.rb
  • db/migrate/20260310084109_add_source_id_to_enrichments.rb
  • db/schema.rb
  • lib/tasks/enrichment.rake

@wendelfabianchinsamy wendelfabianchinsamy requested a review from a team March 10, 2026 14:10
@wendelfabianchinsamy wendelfabianchinsamy merged commit 2e3cbc0 into master Mar 11, 2026
19 checks passed
@wendelfabianchinsamy wendelfabianchinsamy deleted the add-source-id-attribute-to-enrichment-mnodel branch March 11, 2026 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants