Skip to content

Optimize DOI event performance with bulk-preloading#1449

Open
jrhoads wants to merge 24 commits intomasterfrom
events-optimization
Open

Optimize DOI event performance with bulk-preloading#1449
jrhoads wants to merge 24 commits intomasterfrom
events-optimization

Conversation

@jrhoads
Copy link
Contributor

@jrhoads jrhoads commented Jan 13, 2026

Purpose

This PR addresses N+1 query performance issues when fetching or importing DOIs that have associated event data (citations, views, downloads, references, etc.). Previously, each relationship type for every DOI in a collection triggered a separate database query.

Approach

The optimization introduces a "bulk-preload" strategy for events:

  1. EventsPreloader Service: A new service that fetches all relevant events for a batch of DOIs in just 1 or 2 queries.
  2. PreloadedEventRelation Concern: A wrapper class that mimics the ActiveRecord::Relation API (supporting pluck, map, select, etc.), allowing existing code to operate on in-memory arrays as if they were database relations.
  3. Model Integration: The Doi model’s association methods (e.g., view_events, citation_events) are overridden to check if preloaded_events is present. If so, they filter the in-memory array; otherwise, they fall back to the standard database scope.

Key Modifications

  • app/services/events_preloader.rb: New service to batch-fetch events based on source_doi or target_doi.
  • app/models/concerns/preloaded_event_relation.rb: New wrapper for in-memory event collections to ensure backward compatibility with relation-based methods.
  • app/models/doi.rb: Added preloaded_events accessor and logic to switch between preloaded data and real-time database queries.
  • app/controllers/datacite_dois_controller.rb: Integrated preloading for the :detail view to speed up serialization.
  • app/models/datacite_doi.rb & app/models/other_doi.rb: Integrated preloading into import_in_bulk to optimize Elasticsearch indexing processes.

Important Technical Details

  • Memory vs. DB: The preloader maps events to DOI objects in memory. This significantly reduces high-latency database rounds but increases memory usage slightly for very large batches (mitigated by CHUNK_SIZE = 1000).
  • Case Sensitivity: The preloader handles DOI matching using .upcase to ensure consistency between database records and memory mapping.
  • Transparency: If preloaded_events is never initialized, the application behaves exactly as before, ensuring safety for code paths not yet optimized.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

Summary by CodeRabbit

  • New Features

    • Batch event preloading and an in-memory, queryable event collection to speed DOI detail responses and bulk indexing.
  • Refactor

    • Bulk import/indexing updated to use batched event preloading for fewer queries and improved performance while preserving behavior.
  • Bug Fixes

    • Case-insensitive provider filter added for reference lookups.
  • Tests

    • Expanded test coverage for preloading, in-memory relation behavior, chunking, case-insensitive matching, and query-count regressions.

@jrhoads jrhoads force-pushed the events-optimization branch from 17e9f41 to 46a7e33 Compare January 14, 2026 13:21
@jrhoads jrhoads force-pushed the events-optimization branch from 83f6c29 to 99492b7 Compare January 20, 2026 08:35
@jrhoads jrhoads force-pushed the events-optimization branch from 99492b7 to 13f8d15 Compare January 29, 2026 11:41
@jrhoads jrhoads force-pushed the events-optimization branch from 13f8d15 to e998678 Compare February 13, 2026 09:46
@coderabbitai
Copy link

coderabbitai bot commented Feb 13, 2026

📝 Walkthrough

Walkthrough

Adds an EventsPreloader service that batch-populates DOIs with in-memory event arrays, a PreloadedEventRelation wrapper to mimic ActiveRecord relations, integrates preloading into batch imports and controller show, adds a provider_id filter, and updates tests for preloading and relation behavior.

Changes

Cohort / File(s) Summary
Event preloader service
app/services/events_preloader.rb
New EventsPreloader that chunks DOI lists, queries Events by source_doi/target_doi, and assigns matched Event objects into each DOI's preloaded_events array.
In-memory relation wrapper
app/models/concerns/preloaded_event_relation.rb
New PreloadedEventRelation implementing Enumerable and relation-like methods (pluck, select, compact, uniq, sort_by, group_by, to_a, etc.) over arrays of event objects.
Doi model changes
app/models/doi.rb
Adds attr_accessor :preloaded_events, requires the wrapper, and overrides event accessors (view_events, download_events, reference_events, citation_events, part_events, part_of_events, version_events, version_of_events) to return filtered PreloadedEventRelation when preloaded data exists, otherwise falls back to associations.
Batch import integration
app/models/datacite_doi.rb, app/models/other_doi.rb
Removed per-record eager-loading of event associations and invoke EventsPreloader.new(dois).preload! per batch inside find_in_batches/import_in_bulk.
Controller change
app/controllers/datacite_dois_controller.rb
show now calls EventsPreloader.new([doi]).preload! before rendering JSON so serializers can access preloaded events.
Reference repository filter
app/models/reference_repository.rb
Adds case-insensitive provider_id term filter when options[:provider_id] is provided.
Specs and tests
spec/models/doi_related_spec.rb, spec/models/preloaded_event_relation_spec.rb, spec/services/events_preloader_spec.rb, spec/requests/repositories_spec.rb
Adds comprehensive tests for EventsPreloader (chunking, case-insensitive matching, integration), PreloadedEventRelation behavior, Doi preloading integration and backward-compatibility; updates index reset sequence.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Optimize DOI event performance with bulk-preloading' directly describes the main objective: introducing bulk-preloading to optimize DOI event performance and reduce N+1 query issues.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings (stacked PR)
  • 📝 Generate docstrings (commit on current branch)
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch events-optimization
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In `@app/controllers/datacite_dois_controller.rb`:
- Around line 452-454: The guard `params[:detail] != false` is always truthy
because params are strings, so remove the misleading conditional and always
preload: call EventsPreloader.new([doi]).preload! unconditionally (keep the
existing comment about preloading) so the intent matches behavior;
alternatively, if you want to preserve conditional behavior, change the check to
string comparison (e.g., params[:detail] != "false" or
ActiveModel::Type::Boolean.new.cast(params[:detail]) ) but the simpler fix is to
drop the `if` and always invoke EventsPreloader.new([doi]).preload!.

In `@app/services/events_preloader.rb`:
- Around line 37-49: When distributing events in the all_events loop, avoid
appending the same event twice when event.source_doi == event.target_doi by
checking for equality before pushing to the same DOI's preloaded_events: look up
source_doi_obj and target_doi_obj from `@doi_map` (used in the current loop), and
if they refer to the same object, append the event only once (otherwise append
to each if present); adjust the logic around source_doi_obj.preloaded_events <<
event and target_doi_obj.preloaded_events << event to perform this dedupe during
distribution (or add a single guard to skip the second append when source and
target DOIs match).
🧹 Nitpick comments (7)
app/models/concerns/preloaded_event_relation.rb (2)

69-71: inject override doesn't support the symbol form.

Ruby's Enumerable#inject supports both inject(:+) (symbol) and inject(0) { |sum, x| sum + x } (block) forms. This override only passes initial and &block to Array#inject, so calling relation.inject(:+) would pass :+ as initial with a nil block, raising a TypeError.

Currently, inject is only called on the result of pluck (which returns a plain Array), so this isn't actively broken. But it's a latent trap if the API surface grows.

♻️ Suggested fix
-  def inject(initial = nil, &block)
-    `@events.inject`(initial, &block)
+  def inject(*args, &block)
+    `@events.inject`(*args, &block)
   end

94-96: Remove to_ary — it contradicts the design and is never used.

The to_ary method is never called explicitly in the codebase and no tests cover it. Since wrapper-preserving methods like select, compact, and uniq return new PreloadedEventRelation instances, having to_ary (which enables implicit unwrapping via array concatenation, splat, or Array() calls) contradicts the design intent. Keep only to_a for explicit conversion.

app/services/events_preloader.rb (1)

51-54: Redundant initialization — already handled in initialize.

Line 15 in initialize already sets doi.preloaded_events ||= [] for every DOI. This block on lines 52-54 is a no-op unless a DOI's preloaded_events was somehow set to nil after initialization, which shouldn't happen in normal flow.

spec/models/preloaded_event_relation_spec.rb (2)

65-71: sort_by test doesn't verify actual reordering.

The events are already in ascending total order ([10, 20, 30]), so sort_by(&:total) produces the same order as the input. This test passes trivially without proving that sorting actually works. Consider sorting in reverse or using a differently-ordered input.

🔧 Suggested improvement
   describe "#sort_by" do
-    it "sorts events" do
-      result = relation.sort_by(&:total)
+    it "sorts events by attribute" do
+      reversed = PreloadedEventRelation.new([event3, event1, event2])
+      result = reversed.sort_by(&:total)
       expect(result).to be_a(PreloadedEventRelation)
       expect(result.to_a.map(&:total)).to eq([10, 20, 30])
     end
   end

133-145: Consider testing first/last/count on an empty relation too.

The Enumerable methods block only tests the happy path. Since PreloadedEventRelation wraps an array and delegates these, it would be good to verify edge behavior (e.g., first on empty returns nil, count returns 0).

spec/services/events_preloader_spec.rb (1)

89-107: Large batch test creates 1100+ DB records — significant test suite slowdown.

create_list(:doi, EventsPreloader::CHUNK_SIZE + 100, ...) with CHUNK_SIZE = 1000 inserts 1100 DOI records plus associated data. This will be very slow (potentially minutes) and could dominate the test suite runtime.

Consider using build_stubbed_list or mocking the chunking behavior instead of hitting the database, since the test only verifies that no error is raised and preloaded_events is not nil.

♻️ Suggested alternative approach
     context "with large batch" do
       it "chunks large DOI lists" do
-        # Create more than CHUNK_SIZE DOIs
-        large_batch = create_list(:doi, EventsPreloader::CHUNK_SIZE + 100, client: client, aasm_state: "findable")
-
-        # Create events for some of them
-        create(:event_for_crossref, {
-          subj_id: "https://doi.org/#{large_batch.first.doi}",
-          obj_id: "https://doi.org/#{large_batch.last.doi}",
-          relation_type_id: "references",
-        })
-
-        expect {
-          EventsPreloader.new(large_batch).preload!
-        }.not_to raise_error
-
-        expect(large_batch.first.preloaded_events).not_to be_nil
+        # Verify chunking by checking that Event.where is called multiple times
+        large_batch = build_stubbed_list(:doi, EventsPreloader::CHUNK_SIZE + 100, client: client)
+        large_batch.each { |d| allow(d).to receive(:preloaded_events=) }
+        large_batch.each { |d| allow(d).to receive(:preloaded_events).and_return([]) }
+
+        relation = double("relation", to_a: [])
+        expect(Event).to receive(:where).twice.and_return(relation)
+
+        EventsPreloader.new(large_batch).preload!
       end
     end
spec/models/doi_related_spec.rb (1)

42-65: Test doesn't actually verify preloading — it verifies events via a fresh DB query.

fresh_doi1 = DataciteDoi.find(doi1.id) creates a new instance with preloaded_events = nil, so reference_events and citation_events fall back to database queries. This test confirms events exist in the DB but does not validate that EventsPreloader was used during import_in_bulk.

To verify preloading actually occurred, consider spying on EventsPreloader or asserting on the in-batch DOI objects:

♻️ Suggested approach
-        allow(DataciteDoi).to receive(:upload_to_elasticsearch)
-
-        DataciteDoi.import_in_bulk(ids)
-
-        # Verify that events were preloaded (check via a fresh query)
-        fresh_doi1 = DataciteDoi.find(doi1.id)
-        expect(fresh_doi1.reference_events.count).to eq(1)
-        expect(fresh_doi1.citation_events.count).to eq(1)
+        preloader_spy = instance_double(EventsPreloader, preload!: nil)
+        expect(EventsPreloader).to receive(:new).and_return(preloader_spy)
+        expect(preloader_spy).to receive(:preload!).once
+
+        allow(DataciteDoi).to receive(:upload_to_elasticsearch)
+
+        DataciteDoi.import_in_bulk(ids)

@jrhoads jrhoads force-pushed the events-optimization branch from e998678 to 7c3032f Compare February 25, 2026 15:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
app/services/events_preloader.rb (1)

37-49: Duplicate event appended when source_doi == target_doi.

This issue was already flagged in a prior review. The fix proposed there (check target_doi_obj != source_doi_obj before the second append) remains unaddressed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/events_preloader.rb` around lines 37 - 49, When iterating
all_events, avoid appending the same event twice when event.source_doi ==
event.target_doi by checking that the resolved target_doi_obj is different from
source_doi_obj before pushing to preloaded_events; specifically, after resolving
source_doi_obj via `@doi_map` and appending to source_doi_obj.preloaded_events,
resolve target_doi_obj from `@doi_map` for event.target_doi and only append to
target_doi_obj.preloaded_events if target_doi_obj exists AND target_doi_obj !=
source_doi_obj (using the existing source_doi_obj and target_doi_obj local
variables).
🧹 Nitpick comments (2)
app/services/events_preloader.rb (1)

51-54: Redundant second ||= [] loop — can be removed.

initialize already calls doi.preloaded_events ||= [] for every DOI. After preload! appends events to those already-initialized arrays, no DOI's preloaded_events can be nil, making this second loop dead code.

♻️ Proposed cleanup
-   # Ensure all DOIs have an array (even if empty)
-   `@dois.each` do |doi|
-     doi.preloaded_events ||= []
-   end
-
    self
  end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/events_preloader.rb` around lines 51 - 54, Remove the redundant
loop that sets doi.preloaded_events ||= [] inside preload! (the `@dois.each` block
shown), since initialize already ensures every DOI's preloaded_events is
initialized; locate the `@dois.each` |doi| ... doi.preloaded_events ||= [] in
EventsPreloader (or the preload! method) and delete that loop so preload! only
appends events to existing arrays.
spec/models/preloaded_event_relation_spec.rb (1)

81-86: Consider adding a no-argument inject test to expose the latent edge case.

The implementation defaults initial to nil and always passes it explicitly: @events.inject(nil, &block). When called without an initial value (relation.inject { |acc, e| ... }), this passes nil as the accumulator instead of using the first element as Array's native inject would. A test covering the no-argument form would catch this deviation.

🧪 Suggested additional test
+    it "reduces events without an explicit initial value" do
+      result = relation.inject { |acc, e| acc.total > e.total ? acc : e }
+      expect(result).to eq(event3)  # event3 has the highest total (30)
+    end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/preloaded_event_relation_spec.rb` around lines 81 - 86, Add a
spec that calls relation.inject without an explicit initial to catch the edge
case where the implementation currently passes nil as the initial accumulator;
e.g. add a test that invokes relation.inject { |sum, e| sum + e.total }
(referencing the relation.inject behavior in PreloadedEventRelation) and assert
the expected aggregated result (60) so the failing behavior is exposed.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/services/events_preloader.rb`:
- Around line 28-34: The Event query inside the doi_identifiers.each_slice block
is unordered (Event.where(...)) so consumers of PreloadedEventRelation#first or
`#last` get nondeterministic results; update the query used in this block (the
Event.where(...) call that feeds all_events.concat) to include a stable order
(for example .order(:occurred_at) or .order(:id)) so the in-memory all_events
array has a predictable, repeatable ordering across runs.
- Around line 12-16: Guard against nil DOI identifiers before calling .upcase by
checking or normalizing the value first: in the loop that populates `@doi_map`
(the block iterating `@dois` and setting `@doi_map`[doi.doi.upcase] = doi) and in
the preload! method where doi.doi is used again, replace the direct .upcase call
with a nil-safe approach (e.g., skip entries where doi.doi is nil/blank or call
to_s.upcase) so you never call .upcase on nil; also keep the existing
initialization of doi.preloaded_events ||= [] unchanged. Ensure you update both
the `@dois.each` block and the preload! code path that references doi.doi to use
the same nil-safe logic.

In `@spec/models/doi_related_spec.rb`:
- Around line 42-65: The test currently only checks DB state; instead verify
that EventsPreloader actually preloaded events during DataciteDoi.import_in_bulk
by spying on EventsPreloader#preload! or capturing yielded DOI instances from
the batch callback: e.g. set an expectation that EventsPreloader receives
:preload! when DataciteDoi.import_in_bulk(ids) runs (or wrap import_in_bulk with
a block to capture the batch DOIs and assert those instances have non-nil
preloaded_events and their reference_events/citation_events are available
without extra queries), referencing DataciteDoi.import_in_bulk,
EventsPreloader#preload!, fresh_doi1/doi1, and the
reference_events/citation_events accessors to locate the relevant code.

---

Duplicate comments:
In `@app/services/events_preloader.rb`:
- Around line 37-49: When iterating all_events, avoid appending the same event
twice when event.source_doi == event.target_doi by checking that the resolved
target_doi_obj is different from source_doi_obj before pushing to
preloaded_events; specifically, after resolving source_doi_obj via `@doi_map` and
appending to source_doi_obj.preloaded_events, resolve target_doi_obj from
`@doi_map` for event.target_doi and only append to target_doi_obj.preloaded_events
if target_doi_obj exists AND target_doi_obj != source_doi_obj (using the
existing source_doi_obj and target_doi_obj local variables).

---

Nitpick comments:
In `@app/services/events_preloader.rb`:
- Around line 51-54: Remove the redundant loop that sets doi.preloaded_events
||= [] inside preload! (the `@dois.each` block shown), since initialize already
ensures every DOI's preloaded_events is initialized; locate the `@dois.each` |doi|
... doi.preloaded_events ||= [] in EventsPreloader (or the preload! method) and
delete that loop so preload! only appends events to existing arrays.

In `@spec/models/preloaded_event_relation_spec.rb`:
- Around line 81-86: Add a spec that calls relation.inject without an explicit
initial to catch the edge case where the implementation currently passes nil as
the initial accumulator; e.g. add a test that invokes relation.inject { |sum, e|
sum + e.total } (referencing the relation.inject behavior in
PreloadedEventRelation) and assert the expected aggregated result (60) so the
failing behavior is exposed.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between e998678 and 7c3032f.

📒 Files selected for processing (11)
  • app/controllers/datacite_dois_controller.rb
  • app/models/concerns/preloaded_event_relation.rb
  • app/models/datacite_doi.rb
  • app/models/doi.rb
  • app/models/other_doi.rb
  • app/models/reference_repository.rb
  • app/services/events_preloader.rb
  • spec/models/doi_related_spec.rb
  • spec/models/preloaded_event_relation_spec.rb
  • spec/requests/repositories_spec.rb
  • spec/services/events_preloader_spec.rb
🚧 Files skipped from review as they are similar to previous changes (6)
  • app/controllers/datacite_dois_controller.rb
  • app/models/datacite_doi.rb
  • spec/services/events_preloader_spec.rb
  • app/models/reference_repository.rb
  • app/models/doi.rb
  • app/models/concerns/preloaded_event_relation.rb

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@spec/requests/reference_repositories_spec.rb`:
- Around line 56-61: The tear-down is too broad: instead of calling
Client.destroy_all and Provider.destroy_all, only delete the specific records
created in this spec (the records set up in before :all, e.g. the `@client` and
any `@provider` instances) — track those instances or their ids in before :all and
call destroy on them in after :all (or use an array of created records) so you
don't wipe all Client/Provider rows; keep ReferenceRepository.destroy_all and
Prefix.destroy_all as-is since they're local to this spec.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7c3032f and 46c2e6b.

📒 Files selected for processing (1)
  • spec/requests/reference_repositories_spec.rb

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
spec/services/events_preloader_spec.rb (1)

104-122: Consider reducing test data size or using stubbing for the chunking test.

Creating 1,100 DOIs (CHUNK_SIZE + 100) in this test is expensive and will slow down the test suite. The test only verifies that chunking doesn't raise an error and that preloaded_events is set.

Consider either:

  • Stubbing CHUNK_SIZE to a smaller value (e.g., 10) for this test.
  • Reducing the batch size and verifying the chunking logic through a unit test approach.
♻️ Suggested approach using stub
     context "with large batch" do
       it "chunks large DOI lists" do
-        # Create more than CHUNK_SIZE DOIs
-        large_batch = create_list(:doi, EventsPreloader::CHUNK_SIZE + 100, client: client, aasm_state: "findable")
+        # Stub CHUNK_SIZE to a smaller value to avoid creating thousands of records
+        stub_const("EventsPreloader::CHUNK_SIZE", 10)
+        large_batch = create_list(:doi, 15, client: client, aasm_state: "findable")

         # Create events for some of them
         create(:event_for_crossref, {
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/services/events_preloader_spec.rb` around lines 104 - 122, The test
creates a very large fixture list (EventsPreloader::CHUNK_SIZE + 100) which
slows the suite; change the spec to avoid heavy data by stubbing
EventsPreloader::CHUNK_SIZE to a small value (e.g., allow(EventsPreloader).to
receive(:const_get).with("CHUNK_SIZE").and_return(10)) or otherwise temporarily
set CHUNK_SIZE within the example, then create a much smaller list (e.g., 15)
and keep calling EventsPreloader.new(large_batch).preload! and asserting
large_batch.first.preloaded_events; ensure the stub is scoped to this example so
other tests are unaffected and reference EventsPreloader, CHUNK_SIZE, preload!
and the spec example that currently creates the large_batch.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@spec/services/events_preloader_spec.rb`:
- Around line 11-41: Remove the duplicated and unclosed spec blocks to fix the
syntax error: keep a single describe "#initialize" and a single describe
"#preload!" block, remove the repeated describe "#initialize" clone and the
duplicate describe "#preload!", ensure the original describe "#initialize" has
its matching end, and either implement or remove the empty context "with source
and target events" (context "with source and target events") so every
describe/context has a closing end; verify the spec now contains one
EventsPreloader.new(dois) test and the preload! contexts are properly closed.

---

Nitpick comments:
In `@spec/services/events_preloader_spec.rb`:
- Around line 104-122: The test creates a very large fixture list
(EventsPreloader::CHUNK_SIZE + 100) which slows the suite; change the spec to
avoid heavy data by stubbing EventsPreloader::CHUNK_SIZE to a small value (e.g.,
allow(EventsPreloader).to receive(:const_get).with("CHUNK_SIZE").and_return(10))
or otherwise temporarily set CHUNK_SIZE within the example, then create a much
smaller list (e.g., 15) and keep calling
EventsPreloader.new(large_batch).preload! and asserting
large_batch.first.preloaded_events; ensure the stub is scoped to this example so
other tests are unaffected and reference EventsPreloader, CHUNK_SIZE, preload!
and the spec example that currently creates the large_batch.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 46c2e6b and 7f17493.

📒 Files selected for processing (2)
  • app/models/concerns/preloaded_event_relation.rb
  • spec/services/events_preloader_spec.rb
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/models/concerns/preloaded_event_relation.rb

@jrhoads jrhoads force-pushed the events-optimization branch from 7f17493 to 6ae2a83 Compare February 26, 2026 13:27
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
app/models/concerns/preloaded_event_relation.rb (1)

89-96: Consider returning a defensive copy from to_a and to_ary.

Both methods return the internal @events array directly, which allows callers to mutate the wrapper's internal state. While this matches AR behavior and is likely fine in this context, it's worth noting.

# If defensive copies are ever needed:
def to_a
  `@events.dup`
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/concerns/preloaded_event_relation.rb` around lines 89 - 96, to_a
and to_ary currently expose the internal `@events` array directly which allows
external callers to mutate internal state; change both methods (to_a and to_ary
in PreloadedEventRelation) to return a defensive copy of `@events` (e.g. use dup)
so callers get a copy rather than the original, ensuring internal state isn't
modified by external code while preserving current behavior for read-only use;
if nested mutability is a concern consider using map(&:dup) or deep duplication
where appropriate.
app/models/doi.rb (1)

106-184: Optional: Consider extracting duplicate filter logic into a helper method.

All eight methods follow the same pattern. A private helper could reduce duplication:

♻️ Optional refactor to reduce duplication
private

def filtered_preloaded_events(relation_type_key, relation_type_value, doi_key)
  PreloadedEventRelation.new(
    preloaded_events.select do |e|
      e.public_send(relation_type_key) == relation_type_value &&
        e.public_send(doi_key)&.upcase == doi.upcase
    end
  )
end

# Then in each method:
def view_events
  if !preloaded_events.nil?
    filtered_preloaded_events(:target_relation_type_id, "views", :target_doi)
  else
    association(:view_events).scope
  end
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/doi.rb` around lines 106 - 184, All eight event accessor methods
(view_events, download_events, reference_events, citation_events, part_events,
part_of_events, version_events, version_of_events) duplicate the same
preloaded_events filtering logic; extract a private helper (eg.
filtered_preloaded_events(relation_type_key, relation_type_value, doi_key)) that
builds and returns the PreloadedEventRelation for the given keys and value using
preloaded_events.select, then replace each method to call that helper when
preloaded_events is present and fall back to association(...).scope otherwise,
keeping the existing symbols PreloadedEventRelation, preloaded_events, doi and
association(...) to locate and wire up the change.
spec/models/preloaded_event_relation_spec.rb (1)

127-131: Consider adding a test for to_ary method.

The class implements both to_a and to_ary, but only to_a is tested. to_ary enables implicit array coercion (e.g., in splat operations).

🧪 Optional test addition
describe "#to_ary" do
  it "enables implicit array coercion" do
    expect(relation.to_ary).to eq(events)
    # Test splat behavior
    first, *rest = relation
    expect(first).to eq(event1)
    expect(rest).to eq([event2, event3])
  end
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/preloaded_event_relation_spec.rb` around lines 127 - 131, Add a
spec for the to_ary behavior: create a new describe block for "#to_ary" that
asserts relation.to_ary equals the underlying events array and verifies implicit
array coercion (splat/destructuring) by doing something like "first, *rest =
relation" and asserting first equals event1 and rest equals [event2, event3];
reference the existing relation and event fixtures used in the "#to_a" spec so
the new test mirrors that setup.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@app/models/concerns/preloaded_event_relation.rb`:
- Around line 89-96: to_a and to_ary currently expose the internal `@events` array
directly which allows external callers to mutate internal state; change both
methods (to_a and to_ary in PreloadedEventRelation) to return a defensive copy
of `@events` (e.g. use dup) so callers get a copy rather than the original,
ensuring internal state isn't modified by external code while preserving current
behavior for read-only use; if nested mutability is a concern consider using
map(&:dup) or deep duplication where appropriate.

In `@app/models/doi.rb`:
- Around line 106-184: All eight event accessor methods (view_events,
download_events, reference_events, citation_events, part_events, part_of_events,
version_events, version_of_events) duplicate the same preloaded_events filtering
logic; extract a private helper (eg.
filtered_preloaded_events(relation_type_key, relation_type_value, doi_key)) that
builds and returns the PreloadedEventRelation for the given keys and value using
preloaded_events.select, then replace each method to call that helper when
preloaded_events is present and fall back to association(...).scope otherwise,
keeping the existing symbols PreloadedEventRelation, preloaded_events, doi and
association(...) to locate and wire up the change.

In `@spec/models/preloaded_event_relation_spec.rb`:
- Around line 127-131: Add a spec for the to_ary behavior: create a new describe
block for "#to_ary" that asserts relation.to_ary equals the underlying events
array and verifies implicit array coercion (splat/destructuring) by doing
something like "first, *rest = relation" and asserting first equals event1 and
rest equals [event2, event3]; reference the existing relation and event fixtures
used in the "#to_a" spec so the new test mirrors that setup.

ℹ️ Review info

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Cache: Disabled due to data retention organization setting

Knowledge base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between 7f17493 and 6ae2a83.

📒 Files selected for processing (11)
  • app/controllers/datacite_dois_controller.rb
  • app/models/concerns/preloaded_event_relation.rb
  • app/models/datacite_doi.rb
  • app/models/doi.rb
  • app/models/other_doi.rb
  • app/models/reference_repository.rb
  • app/services/events_preloader.rb
  • spec/models/doi_related_spec.rb
  • spec/models/preloaded_event_relation_spec.rb
  • spec/requests/repositories_spec.rb
  • spec/services/events_preloader_spec.rb
🚧 Files skipped from review as they are similar to previous changes (6)
  • app/models/other_doi.rb
  • app/models/reference_repository.rb
  • app/services/events_preloader.rb
  • spec/services/events_preloader_spec.rb
  • app/models/datacite_doi.rb
  • spec/requests/repositories_spec.rb

@jrhoads jrhoads requested a review from a team February 26, 2026 16:27
@jrhoads jrhoads force-pushed the events-optimization branch from 6ae2a83 to 55b93dc Compare March 13, 2026 12:23
@jrhoads
Copy link
Contributor Author

jrhoads commented Mar 13, 2026

@coderabbitai please review again

@coderabbitai
Copy link

coderabbitai bot commented Mar 13, 2026

@jrhoads Sure! I'll re-review the PR now.

✅ Actions performed

Full review triggered.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (1)
spec/models/doi_related_spec.rb (1)

42-65: ⚠️ Potential issue | 🟡 Minor

This test still doesn’t verify preloading actually ran.

fresh_doi1 on Line 62 is a new instance with preloaded_events = nil, so Lines 63-64 validate DB-backed associations, not EventsPreloader usage in import_in_bulk.

🧪 Suggested fix
-      it "should preload events for all DOIs in batch" do
+      it "should invoke EventsPreloader during import_in_bulk" do
         # Create events for the DOIs
         create(:event_for_crossref, {
           subj_id: "https://doi.org/#{doi1.doi}",
           obj_id: "https://doi.org/#{doi2.doi}",
           relation_type_id: "references",
         })
         create(:event_for_datacite_crossref, {
           subj_id: "https://doi.org/#{doi1.doi}",
           obj_id: "https://doi.org/#{doi3.doi}",
           relation_type_id: "is-referenced-by",
         })
-
+        preloader = instance_double(EventsPreloader, preload!: nil)
+        allow(EventsPreloader).to receive(:new).and_return(preloader)
         allow(DataciteDoi).to receive(:upload_to_elasticsearch)
-
         DataciteDoi.import_in_bulk(ids)
-
-        # Verify that events were preloaded (check via a fresh query)
-        fresh_doi1 = DataciteDoi.find(doi1.id)
-        expect(fresh_doi1.reference_events.count).to eq(1)
-        expect(fresh_doi1.citation_events.count).to eq(1)
+        expect(EventsPreloader).to have_received(:new)
+        expect(preloader).to have_received(:preload!)
       end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 42 - 65, The test currently
checks DB-backed associations on a fresh instance (fresh_doi1) which doesn't
prove EventsPreloader ran; change the assertion to either spy on the preloader
or check the original in-memory instance after import. Specifically, in the test
for DataciteDoi.import_in_bulk, add an expectation that EventsPreloader.preload
(or the actual preloader class/method used inside DataciteDoi.import_in_bulk)
receives the expected DOIs, or after calling DataciteDoi.import_in_bulk(ids)
assert that doi1.association(:reference_events).loaded? and
doi1.association(:citation_events).loaded? are true (or use
expect(EventsPreloader).to
have_received(:preload).with(an_array_including(doi1.id))). This targets
DataciteDoi.import_in_bulk and the EventsPreloader invocation so the test
verifies preloading ran rather than just DB counts on a fresh record.
🧹 Nitpick comments (3)
spec/models/preloaded_event_relation_spec.rb (1)

81-85: Add #inject symbol-form coverage to match the varargs change.

Line 83 only exercises block form. Add symbol-arg cases to lock behavior for inject(:+) and inject(initial, :+).

✅ Suggested spec extension
   describe "#inject" do
     it "reduces events" do
       result = relation.inject(0) { |sum, e| sum + e.total }
       expect(result).to eq(60)
     end
+
+    it "supports symbol form" do
+      expect(relation.pluck(:total).inject(:+)).to eq(60)
+      expect(relation.pluck(:total).inject(5, :+)).to eq(65)
+    end
   end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/preloaded_event_relation_spec.rb` around lines 81 - 85, Add
symbol-form test coverage for the `#inject` method: in
spec/models/preloaded_event_relation_spec.rb extend the describe "#inject"
examples to include cases calling relation.inject(:+) and relation.inject(10,
:+) (or another nonzero initial value) and assert their results match the block
form (e.g., sum of totals and total+initial). Locate the existing examples that
use relation.inject(0) { |sum, e| ... } and add two new it blocks (or combine
assertions) to verify symbol-arg behavior for inject and inject with an initial
value.
spec/services/events_preloader_spec.rb (1)

71-77: Relax the strict query-count assertion to avoid brittle failures.

Given the design allows 1–2 preload queries, asserting Line 77 as <= 1 can fail valid implementations.

🔧 Suggested change
-        expect {
-          preloader.preload!
-        }.not_to exceed_query_limit(1)
+        expect {
+          preloader.preload!
+        }.not_to exceed_query_limit(2)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/services/events_preloader_spec.rb` around lines 71 - 77, The test's
strict DB query assertion is brittle; update the expectation in
spec/services/events_preloader_spec.rb to allow up to two queries by changing
the matcher argument from exceed_query_limit(1) to exceed_query_limit(2) (the
case around the EventsPreloader.new(dois) and preloader.preload! call).
app/models/doi.rb (1)

108-170: Using super in fallback paths is a semantic improvement but not required—current code is functionally correct.

The event methods (view_events, download_events, etc.) currently return association(...).scope in the fallback path, which successfully enables both preloaded and database-backed scenarios. Tests validate that both code paths work correctly with .to_a enumeration. While using super would preserve the native association reader semantics, the functional behavior is already sound and no proxy-specific APIs are used anywhere in the codebase.

This refactor is optional for consistency.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/doi.rb` around lines 108 - 170, The current event reader methods
(view_events, download_events, reference_events, citation_events, part_events,
part_of_events, version_events, version_of_events) fall back to
association(...).scope when preloaded_events is nil; change those fallback
branches to call super instead to preserve native association reader
semantics—keep the preloaded path using
filtered_preloaded_events(:target_relation_type_id / :source_relation_type_id,
"<type>", :target_doi / :source_doi) and only replace the else branches with
super so the methods delegate to the original association reader when not using
preloaded_events.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/models/concerns/preloaded_event_relation.rb`:
- Around line 19-29: The wrapper methods first, last, and count currently accept
no arguments or blocks; update them to mirror Array/Enumerable signatures by
forwarding any arguments and blocks to `@events`. Replace def first;
`@events.first`; end with def first(*args, &blk); `@events.first`(*args, &blk); end,
do the same for last (def last(*args, &blk); `@events.last`(*args, &blk); end),
and implement count to accept an optional item and/or block (def count(item =
nil, &blk); `@events.count`(item, &blk); end) so behavior matches Array/Enumerable
and ActiveRecord::Relation.

In `@app/services/events_preloader.rb`:
- Around line 22-26: In preload! change the early exits so they return self
instead of nil and preserve the final array initialization: replace the two bare
returns after checking `@dois.empty`? and doi_identifiers.empty? with returns of
self, and ensure the method still initializes `@doi_events` (e.g. via `@doi_events`
||= []) on the early-exit path so the final array initialization behavior
matches the normal path; update references inside preload! (the `@dois` check,
doi_identifiers variable and final `@doi_events` initialization) accordingly.
- Around line 29-35: The current loop over doi_identifiers slices (using
CHUNK_SIZE) concatenates each chunk's query results via
all_events.concat(events), which can produce duplicate Event objects when the
same Event matches DOIs across multiple chunks; change the logic in the loop
that builds all_events (the block using doi_identifiers.each_slice, Event.where,
and all_events.concat) to deduplicate by a stable identifier (e.g., Event#id)
before adding to all_events—either by tracking seen IDs in a Set and only
pushing new events, or by merging and uniquing by id after the loop—so each
Event appears only once.

---

Duplicate comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 42-65: The test currently checks DB-backed associations on a fresh
instance (fresh_doi1) which doesn't prove EventsPreloader ran; change the
assertion to either spy on the preloader or check the original in-memory
instance after import. Specifically, in the test for DataciteDoi.import_in_bulk,
add an expectation that EventsPreloader.preload (or the actual preloader
class/method used inside DataciteDoi.import_in_bulk) receives the expected DOIs,
or after calling DataciteDoi.import_in_bulk(ids) assert that
doi1.association(:reference_events).loaded? and
doi1.association(:citation_events).loaded? are true (or use
expect(EventsPreloader).to
have_received(:preload).with(an_array_including(doi1.id))). This targets
DataciteDoi.import_in_bulk and the EventsPreloader invocation so the test
verifies preloading ran rather than just DB counts on a fresh record.

---

Nitpick comments:
In `@app/models/doi.rb`:
- Around line 108-170: The current event reader methods (view_events,
download_events, reference_events, citation_events, part_events, part_of_events,
version_events, version_of_events) fall back to association(...).scope when
preloaded_events is nil; change those fallback branches to call super instead to
preserve native association reader semantics—keep the preloaded path using
filtered_preloaded_events(:target_relation_type_id / :source_relation_type_id,
"<type>", :target_doi / :source_doi) and only replace the else branches with
super so the methods delegate to the original association reader when not using
preloaded_events.

In `@spec/models/preloaded_event_relation_spec.rb`:
- Around line 81-85: Add symbol-form test coverage for the `#inject` method: in
spec/models/preloaded_event_relation_spec.rb extend the describe "#inject"
examples to include cases calling relation.inject(:+) and relation.inject(10,
:+) (or another nonzero initial value) and assert their results match the block
form (e.g., sum of totals and total+initial). Locate the existing examples that
use relation.inject(0) { |sum, e| ... } and add two new it blocks (or combine
assertions) to verify symbol-arg behavior for inject and inject with an initial
value.

In `@spec/services/events_preloader_spec.rb`:
- Around line 71-77: The test's strict DB query assertion is brittle; update the
expectation in spec/services/events_preloader_spec.rb to allow up to two queries
by changing the matcher argument from exceed_query_limit(1) to
exceed_query_limit(2) (the case around the EventsPreloader.new(dois) and
preloader.preload! call).

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2c11f7fd-2f48-491e-af01-c2f1f44b9473

📥 Commits

Reviewing files that changed from the base of the PR and between 9969471 and 45eb43e.

📒 Files selected for processing (11)
  • app/controllers/datacite_dois_controller.rb
  • app/models/concerns/preloaded_event_relation.rb
  • app/models/datacite_doi.rb
  • app/models/doi.rb
  • app/models/other_doi.rb
  • app/models/reference_repository.rb
  • app/services/events_preloader.rb
  • spec/models/doi_related_spec.rb
  • spec/models/preloaded_event_relation_spec.rb
  • spec/requests/repositories_spec.rb
  • spec/services/events_preloader_spec.rb

Comment on lines +19 to +29
def first
@events.first
end

def last
@events.last
end

def count
@events.count
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Find potential call sites that pass args/blocks to these methods.
# Expected: Any matches indicate current zero-arity methods are risky.
rg -nP '\b(view_events|download_events|reference_events|citation_events|part_events|part_of_events|version_events|version_of_events)\.(first|last)\s*\(\s*[^)]'
rg -nP '\b(view_events|download_events|reference_events|citation_events|part_events|part_of_events|version_events|version_of_events)\.count\s*\{'

Repository: datacite/lupo

Length of output: 39


🏁 Script executed:

# Read the full implementation file
cat -n app/models/concerns/preloaded_event_relation.rb

Repository: datacite/lupo

Length of output: 2619


🏁 Script executed:

# Check if `@events` is an Array or something else
rg -A 5 -B 5 '@events\s*=' app/models/concerns/preloaded_event_relation.rb

Repository: datacite/lupo

Length of output: 313


🏁 Script executed:

# Look for all method definitions to understand the pattern
rg 'def ' app/models/concerns/preloaded_event_relation.rb

Repository: datacite/lupo

Length of output: 377


🏁 Script executed:

# Check if this concern is included anywhere
rg 'PreloadedEventRelation|preloaded_event_relation' --type ruby

Repository: datacite/lupo

Length of output: 1875


🏁 Script executed:

# Check how PreloadedEventRelation is used in doi.rb
rg -A 10 -B 10 'PreloadedEventRelation' app/models/doi.rb

Repository: datacite/lupo

Length of output: 802


🏁 Script executed:

# Check the test file to see if args are tested
cat -n spec/models/preloaded_event_relation_spec.rb

Repository: datacite/lupo

Length of output: 5202


🏁 Script executed:

# Check if Array#first, Array#last, Array#count can accept args in Ruby
# Verify their actual signatures in the standard library
python3 << 'EOF'
# Check Ruby's Array method signatures
import subprocess
result = subprocess.run(['ruby', '-e', '''
puts "Array#first signature:"
puts Array.method(:first).arity rescue puts "N/A"

arr = [1, 2, 3, 4, 5]
puts "arr.first(2): #{arr.first(2).inspect}"
puts "arr.last(2): #{arr.last(2).inspect}"
puts "arr.count: #{arr.count}"
puts "arr.count { |x| x > 2 }: #{arr.count { |x| x > 2 }}"
'''], capture_output=True, text=True)
print(result.stdout)
if result.stderr:
    print("STDERR:", result.stderr)
EOF

Repository: datacite/lupo

Length of output: 165


Match first/last/count arity with Enumerable and Array behavior.

The current zero-arity signatures reject valid arguments and blocks that Array and Enumerable support. Ruby's Array methods accept optional parameters: first(n), last(n), and count { |item| ... }. As a drop-in replacement for ActiveRecord::Relation, this wrapper must support the same signatures to maintain compatibility.

🐛 Fix
-  def first
-    `@events.first`
+  def first(*args)
+    `@events.first`(*args)
   end

-  def last
-    `@events.last`
+  def last(*args)
+    `@events.last`(*args)
   end

-  def count
-    `@events.count`
+  def count(*args, &block)
+    `@events.count`(*args, &block)
   end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
def first
@events.first
end
def last
@events.last
end
def count
@events.count
end
def first(*args)
`@events.first`(*args)
end
def last(*args)
`@events.last`(*args)
end
def count(*args, &block)
`@events.count`(*args, &block)
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/concerns/preloaded_event_relation.rb` around lines 19 - 29, The
wrapper methods first, last, and count currently accept no arguments or blocks;
update them to mirror Array/Enumerable signatures by forwarding any arguments
and blocks to `@events`. Replace def first; `@events.first`; end with def
first(*args, &blk); `@events.first`(*args, &blk); end, do the same for last (def
last(*args, &blk); `@events.last`(*args, &blk); end), and implement count to
accept an optional item and/or block (def count(item = nil, &blk);
`@events.count`(item, &blk); end) so behavior matches Array/Enumerable and
ActiveRecord::Relation.

Comment on lines +22 to +26
return if @dois.empty?

doi_identifiers = @dois.filter_map { |doi| doi.doi&.upcase }.uniq
return if doi_identifiers.empty?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Keep preload! return behavior consistent on early exits.

Line 22 and Line 25 return nil, but the normal path returns self. This also skips the final array initialization path for all-blank DOI inputs.

🛠️ Proposed fix
   def preload!
-    return if `@dois.empty`?
+    return self if `@dois.empty`?

     doi_identifiers = `@dois.filter_map` { |doi| doi.doi&.upcase }.uniq
-    return if doi_identifiers.empty?
+    if doi_identifiers.empty?
+      `@dois.each` { |doi| doi.preloaded_events ||= [] }
+      return self
+    end
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
return if @dois.empty?
doi_identifiers = @dois.filter_map { |doi| doi.doi&.upcase }.uniq
return if doi_identifiers.empty?
return self if `@dois.empty`?
doi_identifiers = `@dois.filter_map` { |doi| doi.doi&.upcase }.uniq
if doi_identifiers.empty?
`@dois.each` { |doi| doi.preloaded_events ||= [] }
return self
end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/services/events_preloader.rb` around lines 22 - 26, In preload! change
the early exits so they return self instead of nil and preserve the final array
initialization: replace the two bare returns after checking `@dois.empty`? and
doi_identifiers.empty? with returns of self, and ensure the method still
initializes `@doi_events` (e.g. via `@doi_events` ||= []) on the early-exit path so
the final array initialization behavior matches the normal path; update
references inside preload! (the `@dois` check, doi_identifiers variable and final
`@doi_events` initialization) accordingly.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (2)
spec/models/doi_related_spec.rb (2)

32-40: Consider documenting the query budget breakdown.

The query limit of 6 is reasonable for a ceiling test, but the comment only partially explains the expected queries. If associations or implementation change, this test may silently pass with different query patterns or unexpectedly fail.

Consider documenting the expected breakdown more explicitly (e.g., "1 DOI fetch + 1 EventsPreloader query + 4 association loads") or extracting the limit to a named constant if this threshold is referenced elsewhere.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 32 - 40, The test using
DataciteDoi.import_in_bulk currently asserts a query cap of 6 but the comment is
vague; update the example to document the expected query budget breakdown (e.g.,
"1 DOI fetch + 1 EventsPreloader query + 4 association loads") and extract the
numeric limit into a clearly named local constant or let (e.g.,
EXPECTED_QUERY_LIMIT or expected_query_limit) used in the expect { ... }.not_to
exceed_query_limit(...) call so the intent is explicit and reusable, and update
the comment to reference the named constant and the specific components (DOIs,
EventsPreloader, client, media, metadata, allocator) to make future changes
clearer.

65-79: Test validates empty-events path but could be more comprehensive.

This test verifies that as_indexed_json makes no additional queries when preloaded_events is set, but since no events are created, it only tests the empty-events scenario. The in-memory filtering path with actual event data isn't exercised here.

Consider adding events to the DOI fixture to verify that event data is correctly retrieved from preloaded_events without hitting the database:

🧪 Suggested enhancement
 describe ".as_indexed_json" do
   let(:client) { create(:client) }
   let(:doi) { create(:doi, client: client, aasm_state: "findable") }
+  let!(:view_event) { create(:event_for_datacite_investigations, obj_id: "https://doi.org/#{doi.doi}", relation_type_id: "unique-dataset-investigations-regular", total: 25) }

   it "uses preloaded_events-backed relations for event metrics" do
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 65 - 79, The test only covers
the empty-events case; update the spec to also create one or more Event records
associated with the DOI fixture so the in-memory filtering path is exercised:
build the relation with DataciteDoi.where(...).includes(...), preload events via
EventsPreloader.new(dois.to_a).preload!, assert dois.first.preloaded_events is
present, then call dois.first.as_indexed_json and assert it does not
exceed_query_limit(0) while also asserting the indexed JSON contains the
expected event data (derived from the created Event records) to ensure
as_indexed_json reads from preloaded_events rather than hitting the DB.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@spec/models/doi_related_spec.rb`:
- Line 80: Rename the example description string in the spec "it" block that
currently reads "should make few db call" to the correct plural form "should
make few db calls" in spec/models/doi_related_spec.rb (the `it` example
containing that title) so the test name uses proper grammar.

---

Nitpick comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 32-40: The test using DataciteDoi.import_in_bulk currently asserts
a query cap of 6 but the comment is vague; update the example to document the
expected query budget breakdown (e.g., "1 DOI fetch + 1 EventsPreloader query +
4 association loads") and extract the numeric limit into a clearly named local
constant or let (e.g., EXPECTED_QUERY_LIMIT or expected_query_limit) used in the
expect { ... }.not_to exceed_query_limit(...) call so the intent is explicit and
reusable, and update the comment to reference the named constant and the
specific components (DOIs, EventsPreloader, client, media, metadata, allocator)
to make future changes clearer.
- Around line 65-79: The test only covers the empty-events case; update the spec
to also create one or more Event records associated with the DOI fixture so the
in-memory filtering path is exercised: build the relation with
DataciteDoi.where(...).includes(...), preload events via
EventsPreloader.new(dois.to_a).preload!, assert dois.first.preloaded_events is
present, then call dois.first.as_indexed_json and assert it does not
exceed_query_limit(0) while also asserting the indexed JSON contains the
expected event data (derived from the created Event records) to ensure
as_indexed_json reads from preloaded_events rather than hitting the DB.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4504c1d7-3df4-438d-bd1b-509456292890

📥 Commits

Reviewing files that changed from the base of the PR and between cf567e8 and 6e5bdee.

📒 Files selected for processing (1)
  • spec/models/doi_related_spec.rb

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/models/other_doi.rb`:
- Around line 150-154: The includes hash in the OtherDoi query (the
selected_dois assignment using OtherDoi.where(...).includes) violates RuboCop's
Layout/SpaceInsideHashLiteralBraces; update the hash literal spacing to add
spaces inside the braces (e.g., change {client: :provider} to have a space after
{ and before }) so the includes call reads with properly spaced braces.

In `@spec/models/doi_related_spec.rb`:
- Around line 65-79: The test currently only exercises the empty-preload path;
modify the example to create one or more real event records associated with the
doi (using the existing event factory/model), then build the relation
(DataciteDoi.where(...).includes(...)), run
EventsPreloader.new(dois.to_a).preload!, and call dois.first.as_indexed_json to
assert that (1) it does not issue extra queries and (2) computed metrics
returned from as_indexed_json match expected values derived from the created
events (reference DataciteDoi, EventsPreloader, preloaded_events,
as_indexed_json and any event factory/model names to locate code).
- Line 68: The RuboCop brace-spacing offense is in the includes call on the
DataciteDoi query (variable dois); change the hash spacing inside the braces so
it reads includes({ client: :provider }, :media, :metadata) instead of
includes({client: :provider}, :media, :metadata) to satisfy the style rule.
- Around line 81-96: The spec is not using the production eager-load graph so it
misses the provider preload; update the query in
DataciteDoi.where(...).includes(...) to match the production preload used by
import_in_bulk (e.g. include client: :provider along with :media and :metadata),
so EventsPreloader.new(dois.to_a).preload! and the call to
dois.first.as_indexed_json exercise the same eager-loading path and the
expect(...).not_to exceed_query_limit(13) accurately catches regressions.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 03c1b369-232d-4c4d-94bd-57d56424c29e

📥 Commits

Reviewing files that changed from the base of the PR and between 6e5bdee and 671d5e7.

📒 Files selected for processing (3)
  • app/models/datacite_doi.rb
  • app/models/other_doi.rb
  • spec/models/doi_related_spec.rb
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/models/datacite_doi.rb

Comment on lines +65 to +79
it "uses preloaded_events-backed relations for event metrics" do
allow(DataciteDoi).to receive(:upload_to_elasticsearch)
# Build the relation with includes, as in production
dois = DataciteDoi.where(id: doi.id).includes({client: :provider}, :media, :metadata)
# Preload events explicitly
preloader = EventsPreloader.new(dois.to_a)
preloader.preload!
# Sanity: events are preloaded on the instance we’ll index
expect(dois.first.preloaded_events).not_to be_nil
# When we call as_indexed_json, the overridden *_events methods should
# hit the preloaded array, not issue extra queries for events.
expect {
dois.first.as_indexed_json
}.not_to exceed_query_limit(0) # if all associations were pre-included
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

This only covers the empty-preload path.

No events are created here, so the example proves zero extra queries for preloaded_events = [] but not that as_indexed_json computes real metrics correctly from populated PreloadedEventRelations.

🧰 Tools
🪛 GitHub Actions: Test Pull Request

[error] 68-68: RuboCop: Layout/SpaceInsideHashLiteralBraces: Space inside { missing.


[error] 68-68: RuboCop: Layout/SpaceInsideHashLiteralBraces: Space inside } missing.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 65 - 79, The test currently
only exercises the empty-preload path; modify the example to create one or more
real event records associated with the doi (using the existing event
factory/model), then build the relation (DataciteDoi.where(...).includes(...)),
run EventsPreloader.new(dois.to_a).preload!, and call dois.first.as_indexed_json
to assert that (1) it does not issue extra queries and (2) computed metrics
returned from as_indexed_json match expected values derived from the created
events (reference DataciteDoi, EventsPreloader, preloaded_events,
as_indexed_json and any event factory/model names to locate code).

Comment on lines +81 to 96
it "should make few db calls" do
allow(DataciteDoi).to receive(:upload_to_elasticsearch)
dois = DataciteDoi.where(id: doi.id).includes(
:client,
:media,
:view_events,
:download_events,
:citation_events,
:reference_events,
:part_events,
:part_of_events,
:version_events,
:version_of_events,
:metadata
)

# Preload events to avoid N+1 queries
EventsPreloader.new(dois.to_a).preload!

# Test the maximum number of queries made by the method
# With EventsPreloader, we should have fewer queries
expect {
dois.first.as_indexed_json
}.not_to exceed_query_limit(13)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Use the production eager-load graph in this query-budget spec.

import_in_bulk now preloads { client: :provider }, but this example still only includes :client. That leaves the provider query unaccounted for and makes the 13-query ceiling too loose to catch regressions in the new path.

🧪 Suggested update
         dois = DataciteDoi.where(id: doi.id).includes(
-          :client,
+          { client: :provider },
           :media,
           :metadata
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 81 - 96, The spec is not using
the production eager-load graph so it misses the provider preload; update the
query in DataciteDoi.where(...).includes(...) to match the production preload
used by import_in_bulk (e.g. include client: :provider along with :media and
:metadata), so EventsPreloader.new(dois.to_a).preload! and the call to
dois.first.as_indexed_json exercise the same eager-loading path and the
expect(...).not_to exceed_query_limit(13) accurately catches regressions.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
spec/models/doi_related_spec.rb (2)

83-87: ⚠️ Potential issue | 🟡 Minor

Use the production include graph in this query-budget spec.

Line 84 still uses :client, but the production path now includes { client: :provider }. Keeping the spec aligned is important so provider-related regressions are caught.

Suggested fix
 dois = DataciteDoi.where(id: doi.id).includes(
-  :client,
+  { client: :provider },
   :media,
   :metadata
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 83 - 87, Update the
ActiveRecord include graph used when loading DataciteDoi records in the test:
change the includes call on the `dois = DataciteDoi.where(id:
doi.id).includes(:client, :media, :metadata)` expression to use the production
shape by replacing `:client` with a nested association `client: :provider`
(i.e., `.includes(client: :provider, :media, :metadata)`) so provider
associations are eagerly loaded in the spec.

65-79: ⚠️ Potential issue | 🟡 Minor

Populate real events before asserting preloaded metric behavior.

At Line 70 preloading runs, but this example doesn't create events, so it only validates the empty-preload path. Add event fixtures and assert metric outputs from preloaded data to protect the non-empty path.

Suggested test adjustment
 it "uses preloaded_events-backed relations for event metrics" do
   allow(DataciteDoi).to receive(:upload_to_elasticsearch)
+  target = create(:doi, client: client, aasm_state: "findable")
+  source = create(:doi, client: client, aasm_state: "findable")
+  create(:event_for_crossref,
+         subj_id: "https://doi.org/#{doi.doi}",
+         obj_id: "https://doi.org/#{target.doi}",
+         relation_type_id: "references")
+  create(:event_for_datacite_crossref,
+         subj_id: "https://doi.org/#{doi.doi}",
+         obj_id: "https://doi.org/#{source.doi}",
+         relation_type_id: "is-referenced-by")
+
   dois = DataciteDoi.where(id: doi.id).includes({ client: :provider }, :media, :metadata)
   preloader = EventsPreloader.new(dois.to_a)
   preloader.preload!

   expect(dois.first.preloaded_events).not_to be_nil
   expect {
-    dois.first.as_indexed_json
+    dois.first.as_indexed_json
+    expect(dois.first.reference_count).to eq(1)
+    expect(dois.first.citation_count).to eq(1)
   }.not_to exceed_query_limit(0)
 end
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 65 - 79, The test for preloaded
event metrics (using DataciteDoi, EventsPreloader, preloaded_events and
as_indexed_json) only preloads an empty events set; create and attach real event
fixtures to the DOI before calling EventsPreloader.preload! (e.g., create one or
more Event/Metric records associated with doi), re-run preloader, then assert
that the metric fields returned by dois.first.as_indexed_json reflect those
populated events and still do not perform extra queries (retain the expect { ...
}.not_to exceed_query_limit(0)); this ensures the non-empty path is validated
and metrics are asserted against preloaded data.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 83-87: Update the ActiveRecord include graph used when loading
DataciteDoi records in the test: change the includes call on the `dois =
DataciteDoi.where(id: doi.id).includes(:client, :media, :metadata)` expression
to use the production shape by replacing `:client` with a nested association
`client: :provider` (i.e., `.includes(client: :provider, :media, :metadata)`) so
provider associations are eagerly loaded in the spec.
- Around line 65-79: The test for preloaded event metrics (using DataciteDoi,
EventsPreloader, preloaded_events and as_indexed_json) only preloads an empty
events set; create and attach real event fixtures to the DOI before calling
EventsPreloader.preload! (e.g., create one or more Event/Metric records
associated with doi), re-run preloader, then assert that the metric fields
returned by dois.first.as_indexed_json reflect those populated events and still
do not perform extra queries (retain the expect { ... }.not_to
exceed_query_limit(0)); this ensures the non-empty path is validated and metrics
are asserted against preloaded data.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 54b253de-906d-4adf-b596-0676a1beba23

📥 Commits

Reviewing files that changed from the base of the PR and between 671d5e7 and 5d230c6.

📒 Files selected for processing (3)
  • app/models/datacite_doi.rb
  • app/models/other_doi.rb
  • spec/models/doi_related_spec.rb
🚧 Files skipped from review as they are similar to previous changes (1)
  • app/models/other_doi.rb

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
spec/models/doi_related_spec.rb (2)

81-87: ⚠️ Potential issue | 🟡 Minor

Use the production eager-load graph in this query-budget spec.

Line 84 still uses :client, while import_in_bulk now preloads { client: :provider }. This makes the query budget less representative of production behavior.

Suggested fix
-        dois = DataciteDoi.where(id: doi.id).includes(
-          :client,
+        dois = DataciteDoi.where(id: doi.id).includes(
+          { client: :provider },
           :media,
           :metadata
         )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 81 - 87, The spec's
eager-loading doesn't match production: replace the includes(...) call in the
"should make few db calls" example (the DataciteDoi.where(...).includes call) to
use the same production preload graph used by import_in_bulk by swapping :client
for a nested preload { client: :provider } (keep :media and :metadata as-is) so
the query-budget spec reflects real-world eager loading.

65-79: ⚠️ Potential issue | 🟡 Minor

This example still validates the empty-preload path only.

No events are created in this spec, so it proves query behavior for preloaded_events = [], not correctness of populated event metrics during as_indexed_json. Add at least one reference/citation fixture and assert metric values from the indexed payload.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 65 - 79, The spec currently
only verifies the empty-preload path; update the test for
DataciteDoi#as_indexed_json and EventsPreloader to create at least one
event-related fixture (e.g., a Reference/Citation and associated Event model)
linked to the doi, run preloader (EventsPreloader.new(dois.to_a).preload!), call
dois.first.as_indexed_json and add assertions that the returned payload contains
the expected event metrics (counts/timestamps/fields) for that created event,
while keeping the existing expect { dois.first.as_indexed_json }.not_to
exceed_query_limit(0) to ensure no extra queries.
🧹 Nitpick comments (1)
spec/models/doi_related_spec.rb (1)

52-55: Assertion is stricter than the comment implies.

Line 54 says IDs may be a subset for multiple batches, but match_array(ids) requires the full set. Consider either passing an explicit single-batch batch_size in this spec or updating the matcher to accept per-batch subsets.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spec/models/doi_related_spec.rb` around lines 52 - 55, The test currently
asserts EventsPreloader.new receives a full set of IDs
(expect(dois_arg.map(&:id)).to match_array(ids)) but the comment warns that
multiple batches may produce only a subset; fix by either forcing a single-batch
scenario (invoke the code under test or stub/configure the batch size so
batch_size >= ids.length) or relax the assertion to accept per-batch subsets
(e.g., change the matcher to assert that dois_arg.map(&:id) is a subset of ids
or uses include matcher). Update the spec around EventsPreloader and DataciteDoi
accordingly and reference the ids and batch_size configuration used by the code
under test.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 81-87: The spec's eager-loading doesn't match production: replace
the includes(...) call in the "should make few db calls" example (the
DataciteDoi.where(...).includes call) to use the same production preload graph
used by import_in_bulk by swapping :client for a nested preload { client:
:provider } (keep :media and :metadata as-is) so the query-budget spec reflects
real-world eager loading.
- Around line 65-79: The spec currently only verifies the empty-preload path;
update the test for DataciteDoi#as_indexed_json and EventsPreloader to create at
least one event-related fixture (e.g., a Reference/Citation and associated Event
model) linked to the doi, run preloader
(EventsPreloader.new(dois.to_a).preload!), call dois.first.as_indexed_json and
add assertions that the returned payload contains the expected event metrics
(counts/timestamps/fields) for that created event, while keeping the existing
expect { dois.first.as_indexed_json }.not_to exceed_query_limit(0) to ensure no
extra queries.

---

Nitpick comments:
In `@spec/models/doi_related_spec.rb`:
- Around line 52-55: The test currently asserts EventsPreloader.new receives a
full set of IDs (expect(dois_arg.map(&:id)).to match_array(ids)) but the comment
warns that multiple batches may produce only a subset; fix by either forcing a
single-batch scenario (invoke the code under test or stub/configure the batch
size so batch_size >= ids.length) or relax the assertion to accept per-batch
subsets (e.g., change the matcher to assert that dois_arg.map(&:id) is a subset
of ids or uses include matcher). Update the spec around EventsPreloader and
DataciteDoi accordingly and reference the ids and batch_size configuration used
by the code under test.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 2da26e76-40ec-4430-8bfc-62520fa068aa

📥 Commits

Reviewing files that changed from the base of the PR and between 671d5e7 and 5d230c6.

📒 Files selected for processing (3)
  • app/models/datacite_doi.rb
  • app/models/other_doi.rb
  • spec/models/doi_related_spec.rb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant