Skip to content

Refactor ReindexByDoiJob to include Shoryuken::Worker and update perform method#1495

Merged
codycooperross merged 3 commits intomasterfrom
reindex-job-fix
Mar 12, 2026
Merged

Refactor ReindexByDoiJob to include Shoryuken::Worker and update perform method#1495
codycooperross merged 3 commits intomasterfrom
reindex-job-fix

Conversation

@codycooperross
Copy link
Contributor

@codycooperross codycooperross commented Mar 11, 2026

Purpose

closes: Add github issue that originated this PR

Approach

Open Questions and Pre-Merge TODOs

Learning

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

Summary by CodeRabbit

  • Chores

    • Updated DOI reindexing background processing to a new queueing mode with environment-aware queues and automatic cleanup; job now accepts payload-based messages for more flexible delivery.
  • Tests

    • Updated job specs to use the new payload-style invocation and assertions so test coverage aligns with the updated job behavior.

@coderabbitai
Copy link

coderabbitai bot commented Mar 11, 2026

📝 Walkthrough

Walkthrough

ReindexByDoiJob now includes Shoryuken worker integration, adds shoryuken_options with a dynamic queue and auto_delete, changes the perform signature to accept (sqs_message = nil, data = nil), and parses a JSON payload to extract doi; specs updated to call the job with the new two-argument form.

Changes

Cohort / File(s) Summary
Job Implementation
app/jobs/reindex_by_doi_job.rb
Adds include Shoryuken::Worker, retains queue_as :lupo_background and adds shoryuken_options queue: -> { "#{ENV["RAILS_ENV"]}_lupo_background" }, auto_delete: true. Changes perform signature to perform(sqs_message = nil, data = nil) and parses data JSON to extract doi (uses parsed["doi"]), returning early if absent; downstream logic for Datacite vs other DOI imports is preserved but uses the extracted doi.
Job Specs
spec/jobs/reindex_by_doi_job_spec.rb
Updates tests to invoke the job with two arguments (first arg nil/SQS message, second arg the JSON payload containing doi); replaces single-arg perform_now/perform_later calls with the two-argument interface and adjusts enqueued-job expectations accordingly.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: refactoring ReindexByDoiJob to include Shoryuken::Worker and updating the perform method signature.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch reindex-job-fix

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/jobs/reindex_by_doi_job.rb`:
- Around line 8-10: The perform signature currently def perform(sqs_message =
nil, doi_id = nil) causes single-argument calls (perform_later/perform_now) to
place the DOI into sqs_message and leave doi_id nil; update perform in
ReindexByDoiJob so it normalizes inputs: if doi_id is nil then detect whether
sqs_message is the DOI string (e.g., String) or a Shoryuken/SQS payload (e.g.,
Hash with "body" or :body) and set doi_id accordingly before calling
Doi.find_by(doi: doi_id); this preserves support for perform_later/perform_now
and direct Shoryuken invocation while keeping calls to Doi.find_by unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 67b5ecd6-357c-47e1-8bf1-9cb52e7f4d67

📥 Commits

Reviewing files that changed from the base of the PR and between 36d27d9 and 99f77b6.

📒 Files selected for processing (1)
  • app/jobs/reindex_by_doi_job.rb

Comment on lines 8 to 10
def perform(sqs_message = nil, doi_id = nil)
doi = Doi.find_by(doi: doi_id)
return unless doi.present?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Description: Find all places where ReindexByDoiJob is invoked to verify argument compatibility

echo "=== Direct invocations of ReindexByDoiJob ==="
rg -n -C3 'ReindexByDoiJob\.(perform_later|perform_now|perform_async|set)' --type ruby

echo ""
echo "=== Test file contents ==="
cat spec/jobs/reindex_by_doi_job_spec.rb 2>/dev/null || echo "Test file not found"

Repository: datacite/lupo

Length of output: 2860


🏁 Script executed:

cat -n app/jobs/reindex_by_doi_job.rb

Repository: datacite/lupo

Length of output: 720


🏁 Script executed:

rg -n 'ReindexByDoiJob' --type ruby -B2 -A2

Repository: datacite/lupo

Length of output: 1931


Critical: Method signature breaks all existing perform_later and perform_now invocations.

The current signature def perform(sqs_message = nil, doi_id = nil) causes all single-argument invocations to fail. When called with perform_later(doi_string) or perform_now(doi_string), the DOI value is assigned to the sqs_message parameter, leaving doi_id as nil. This causes Doi.find_by(doi: nil) to return nil, and the job silently exits without processing.

All three test invocations in spec/jobs/reindex_by_doi_job_spec.rb are broken:

  • Line 8: ReindexByDoiJob.perform_later(datacite_doi.doi)
  • Line 17: ReindexByDoiJob.perform_now(datacite_doi.doi)
  • Line 25: ReindexByDoiJob.perform_now(other_doi.doi)
Proposed fix to support both ActiveJob and Shoryuken

If this job needs to support both ActiveJob (perform_later) and direct Shoryuken invocation, detect how it was invoked:

  def perform(sqs_message = nil, doi_id = nil)
+   # When called via perform_later, sqs_message contains the doi_id
+   # When called directly by Shoryuken, doi_id is the second argument
+   resolved_doi_id = doi_id || sqs_message
-   doi = Doi.find_by(doi: doi_id)
+   doi = Doi.find_by(doi: resolved_doi_id)

Alternatively, if this job is only invoked via Shoryuken, update all callers to pass the second argument:

-  subject(:job) { ReindexByDoiJob.perform_later(datacite_doi.doi) }
+  subject(:job) { ReindexByDoiJob.perform_later(nil, datacite_doi.doi) }

and

-  ReindexByDoiJob.perform_now(datacite_doi.doi)
+  ReindexByDoiJob.perform_now(nil, datacite_doi.doi)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def perform(sqs_message = nil, doi_id = nil)
doi = Doi.find_by(doi: doi_id)
return unless doi.present?
def perform(sqs_message = nil, doi_id = nil)
# When called via perform_later, sqs_message contains the doi_id
# When called directly by Shoryuken, doi_id is the second argument
resolved_doi_id = doi_id || sqs_message
doi = Doi.find_by(doi: resolved_doi_id)
return unless doi.present?
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/jobs/reindex_by_doi_job.rb` around lines 8 - 10, The perform signature
currently def perform(sqs_message = nil, doi_id = nil) causes single-argument
calls (perform_later/perform_now) to place the DOI into sqs_message and leave
doi_id nil; update perform in ReindexByDoiJob so it normalizes inputs: if doi_id
is nil then detect whether sqs_message is the DOI string (e.g., String) or a
Shoryuken/SQS payload (e.g., Hash with "body" or :body) and set doi_id
accordingly before calling Doi.find_by(doi: doi_id); this preserves support for
perform_later/perform_now and direct Shoryuken invocation while keeping calls to
Doi.find_by unchanged.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
app/jobs/reindex_by_doi_job.rb (2)

6-7: Redundant/conflicting queue declarations may cause confusion.

You have both queue_as :lupo_background (ActiveJob) and shoryuken_options queue: (Shoryuken). Looking at OtherDoiJob in the codebase, it only uses shoryuken_options without queue_as. With Shoryuken.active_job_queue_name_prefixing = true set in the initializer, ActiveJob will prefix the queue_as value, while shoryuken_options has its own explicit prefix logic.

Consider removing queue_as :lupo_background to match the pattern used by OtherDoiJob and avoid potential queue routing ambiguity.

♻️ Suggested fix
 class ReindexByDoiJob < ApplicationJob
   include Shoryuken::Worker

-  queue_as :lupo_background
   shoryuken_options queue: -> { "#{ENV["RAILS_ENV"]}_lupo_background" }, auto_delete: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/jobs/reindex_by_doi_job.rb` around lines 6 - 7, Remove the redundant
ActiveJob queue declaration to avoid conflicting queue naming: delete the call
to queue_as :lupo_background from ReindexByDoiJob so it only uses the
Shoryuken-specific shoryuken_options queue: -> {
"#{ENV["RAILS_ENV"]}_lupo_background" } configuration (matching OtherDoiJob) and
rely on the existing Shoryuken.active_job_queue_name_prefixing logic instead.

9-11: Remove unused sqs_message parameter or document its purpose.

The sqs_message parameter is declared but never used in the method body. All current callers pass nil for this argument, and there are no production code invocations of this job outside the test suite.

If this job is strictly for Shoryuken-only usage, remove the unused parameter. If sqs_message is intentionally reserved for future queue-specific logic, document this clearly.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/jobs/reindex_by_doi_job.rb` around lines 9 - 11, The perform method in
ReindexByDoiJob declares an unused sqs_message parameter (perform(sqs_message =
nil, doi_id = nil)) — either remove sqs_message from the signature and update
tests/call sites to call perform(doi_id) (or perform_async with only doi_id), or
keep it but mark it as intentionally unused by renaming to _sqs_message and
adding a short comment documenting that it’s reserved for Shoryuken/SQS
metadata; make the change in perform and ensure the Doi.find_by(doi: doi_id)
usage remains unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/jobs/reindex_by_doi_job.rb`:
- Around line 6-7: Remove the redundant ActiveJob queue declaration to avoid
conflicting queue naming: delete the call to queue_as :lupo_background from
ReindexByDoiJob so it only uses the Shoryuken-specific shoryuken_options queue:
-> { "#{ENV["RAILS_ENV"]}_lupo_background" } configuration (matching
OtherDoiJob) and rely on the existing Shoryuken.active_job_queue_name_prefixing
logic instead.
- Around line 9-11: The perform method in ReindexByDoiJob declares an unused
sqs_message parameter (perform(sqs_message = nil, doi_id = nil)) — either remove
sqs_message from the signature and update tests/call sites to call
perform(doi_id) (or perform_async with only doi_id), or keep it but mark it as
intentionally unused by renaming to _sqs_message and adding a short comment
documenting that it’s reserved for Shoryuken/SQS metadata; make the change in
perform and ensure the Doi.find_by(doi: doi_id) usage remains unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 80515640-eed9-4f63-8c00-20a05fc395f7

📥 Commits

Reviewing files that changed from the base of the PR and between 99f77b6 and d1205e4.

📒 Files selected for processing (2)
  • app/jobs/reindex_by_doi_job.rb
  • spec/jobs/reindex_by_doi_job_spec.rb

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
spec/jobs/reindex_by_doi_job_spec.rb (1)

10-14: Add a regression for the Shoryuken queue binding.

This assertion only covers queue_as; it will still pass if the new shoryuken_options queue points at the wrong SQS queue. Since the production change is the Shoryuken integration, a small spec around the resolved worker queue name would make this much safer.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/jobs/reindex_by_doi_job_spec.rb` around lines 10 - 14, The spec
currently only checks ActiveJob's queue_as but misses the Shoryuken binding; add
an assertion that the worker's Shoryuken queue is set to the same resolved queue
name. In the spec for ReindexByDoiJob, after the existing enqueue assertion, add
a line that inspects the job class's Shoryuken options (e.g.
ReindexByDoiJob.shoryuken_options[:queue] or
ReindexByDoiJob.get_shoryuken_options[:queue]) and expect it to equal the
intended queue name ("test_lupo_background" or the resolved production queue
name used by the app) so the spec fails if shoryuken_options points at the wrong
SQS queue.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/jobs/reindex_by_doi_job.rb`:
- Line 7: The shoryuken queue name is hard-coded to ENV["RAILS_ENV"] in the
shoryuken_options call; change it to use the same prefix logic as the publisher
(ENV["SQS_PREFIX"].presence || Rails.env) so the worker listens on the same
"<prefix>_lupo_background" queue as producers; update the lambda passed to
shoryuken_options (the shoryuken_options call) to build the queue name from
ENV["SQS_PREFIX"].presence || Rails.env instead of ENV["RAILS_ENV"].
- Around line 9-14: The perform method calls JSON.parse(data) directly which
will raise for nil or malformed JSON; change perform to first guard against
nil/empty payload (e.g. return unless data.present?) and then safely parse
inside a begin/rescue that catches JSON::ParserError and TypeError, returning
(and optionally logging) when parsing fails; after the safe parse ensure
parsed.is_a?(Hash) before accessing parsed["doi"] so invalid payloads do not
cause exceptions in perform.

---

Nitpick comments:
In `@spec/jobs/reindex_by_doi_job_spec.rb`:
- Around line 10-14: The spec currently only checks ActiveJob's queue_as but
misses the Shoryuken binding; add an assertion that the worker's Shoryuken queue
is set to the same resolved queue name. In the spec for ReindexByDoiJob, after
the existing enqueue assertion, add a line that inspects the job class's
Shoryuken options (e.g. ReindexByDoiJob.shoryuken_options[:queue] or
ReindexByDoiJob.get_shoryuken_options[:queue]) and expect it to equal the
intended queue name ("test_lupo_background" or the resolved production queue
name used by the app) so the spec fails if shoryuken_options points at the wrong
SQS queue.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e25e49a5-5be5-465c-9541-8bc6fc65e30e

📥 Commits

Reviewing files that changed from the base of the PR and between d1205e4 and 49a46ed.

📒 Files selected for processing (2)
  • app/jobs/reindex_by_doi_job.rb
  • spec/jobs/reindex_by_doi_job_spec.rb

include Shoryuken::Worker

queue_as :lupo_background
shoryuken_options queue: -> { "#{ENV["RAILS_ENV"]}_lupo_background" }, auto_delete: true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Keep the Shoryuken queue name aligned with the publisher.

Line 7 hard-codes ENV["RAILS_ENV"], but app/services/sqs_message_service.rb:6-7 publishes to ENV["SQS_PREFIX"].presence || Rails.env. In prefixed environments, producers will send to <prefix>_lupo_background while this worker listens on <RAILS_ENV>_lupo_background, so it will never see those messages.

💡 Proposed fix
-  shoryuken_options queue: -> { "#{ENV["RAILS_ENV"]}_lupo_background" }, auto_delete: true
+  shoryuken_options queue: -> { "#{ENV["SQS_PREFIX"].presence || Rails.env}_lupo_background" }, auto_delete: true
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
shoryuken_options queue: -> { "#{ENV["RAILS_ENV"]}_lupo_background" }, auto_delete: true
shoryuken_options queue: -> { "#{ENV["SQS_PREFIX"].presence || Rails.env}_lupo_background" }, auto_delete: true
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/jobs/reindex_by_doi_job.rb` at line 7, The shoryuken queue name is
hard-coded to ENV["RAILS_ENV"] in the shoryuken_options call; change it to use
the same prefix logic as the publisher (ENV["SQS_PREFIX"].presence || Rails.env)
so the worker listens on the same "<prefix>_lupo_background" queue as producers;
update the lambda passed to shoryuken_options (the shoryuken_options call) to
build the queue name from ENV["SQS_PREFIX"].presence || Rails.env instead of
ENV["RAILS_ENV"].

Comment on lines +9 to +14
def perform(sqs_message = nil, data = nil)
parsed = JSON.parse(data)
return unless parsed.is_a?(Hash)

doi_id = parsed["doi"]
return if doi_id.blank?
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Handle invalid payloads before parsing.

JSON.parse(data) can raise on nil, malformed JSON, or a Hash, so Line 11 never runs. On a queue consumer, that turns one bad payload into an exception path before the worker can decide how to handle it.

💡 Proposed fix
   def perform(sqs_message = nil, data = nil)
-    parsed = JSON.parse(data)
+    parsed =
+      case data
+      when Hash
+        data
+      when String
+        JSON.parse(data)
+      else
+        return
+      end
     return unless parsed.is_a?(Hash)
 
-    doi_id = parsed["doi"]
+    doi_id = parsed["doi"] || parsed[:doi]
     return if doi_id.blank?
 
     doi = Doi.find_by(doi: doi_id)
     return unless doi.present?
 
     if doi.agency == "datacite"
       DataciteDoiImportInBulkJob.perform_later([doi.id], index: DataciteDoi.active_index)
     else
       OtherDoiImportInBulkJob.perform_later([doi.id], index: OtherDoi.active_index)
     end
+  rescue JSON::ParserError
+    return
   end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def perform(sqs_message = nil, data = nil)
parsed = JSON.parse(data)
return unless parsed.is_a?(Hash)
doi_id = parsed["doi"]
return if doi_id.blank?
def perform(sqs_message = nil, data = nil)
parsed =
case data
when Hash
data
when String
JSON.parse(data)
else
return
end
return unless parsed.is_a?(Hash)
doi_id = parsed["doi"] || parsed[:doi]
return if doi_id.blank?
doi = Doi.find_by(doi: doi_id)
return unless doi.present?
if doi.agency == "datacite"
DataciteDoiImportInBulkJob.perform_later([doi.id], index: DataciteDoi.active_index)
else
OtherDoiImportInBulkJob.perform_later([doi.id], index: OtherDoi.active_index)
end
rescue JSON::ParserError
return
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/jobs/reindex_by_doi_job.rb` around lines 9 - 14, The perform method calls
JSON.parse(data) directly which will raise for nil or malformed JSON; change
perform to first guard against nil/empty payload (e.g. return unless
data.present?) and then safely parse inside a begin/rescue that catches
JSON::ParserError and TypeError, returning (and optionally logging) when parsing
fails; after the safe parse ensure parsed.is_a?(Hash) before accessing
parsed["doi"] so invalid payloads do not cause exceptions in perform.

@codycooperross codycooperross merged commit 9969471 into master Mar 12, 2026
56 of 57 checks passed
@codycooperross codycooperross deleted the reindex-job-fix branch March 12, 2026 12:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants