Conversation
📝 WalkthroughWalkthroughThe changes update the Crossref API integration by switching the endpoint from event data service to the main Crossref API, renaming date parameters from collected-date to created-date, removing the scholix parameter, and refactoring pagination from cursor-based to page-based semantics. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 1 | ❌ 2❌ Failed checks (1 warning, 1 inconclusive)
✅ Passed checks (1 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@app/models/crossref.rb`:
- Around line 32-39: The VCR cassettes are failing because tests now hit the new
Crossref endpoint path "/beta/datacitations" with the query params
("from-created-date"/"until-created-date") built in the URL in the method that
constructs the request URL (the string
"#{ENV['CROSSREF_QUERY_URL']}/beta/datacitations?#{URI.encode_www_form(params)}"
in app/models/crossref.rb); re-record the failing VCR cassettes (e.g.,
Crossref/_import/.../*.yml) against the updated API calls so they contain
recordings for the new endpoint and query parameters, or manually edit those
cassette YAMLs to reflect the new request URL and query params used by the
Crossref request builder.
- Around line 58-70: The variable page returned from get_total is being shadowed
by the loop variable in the (0...total_pages).each loop and thus never used;
rename the loop variable (e.g., page_index or page_num) in the loop that sets
options[:offset], options[:total], and options[:page] so the original page value
remains available, remove or use the unused count returned from process_data (or
ignore assignment if not needed), and reconcile pagination mode: if push_data
still returns a next-cursor, refactor the pagination to use cursor-driven
iteration (consume and pass next-cursor from push_data) instead of page-based
math, updating process_data/push_data call sites accordingly; check
job_batch_size and options[:offset]/[:page] usage to ensure consistency.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 4b4709b2-e181-4596-afc0-58aa3233836e
📒 Files selected for processing (2)
app/models/crossref.rbconfig/application.rb
| "from-created-date" => options[:from_date], | ||
| "until-created-date" => options[:until_date], | ||
| mailto: "[email protected]", | ||
| scholix: true, | ||
| rows: options[:rows], | ||
| cursor: options[:cursor], | ||
| page: options[:page], | ||
| }.compact | ||
|
|
||
| "#{ENV['CROSSREF_QUERY_URL']}/v1/events?#{URI.encode_www_form(params)}" | ||
| "#{ENV['CROSSREF_QUERY_URL']}/beta/datacitations?#{URI.encode_www_form(params)}" |
There was a problem hiding this comment.
Pipeline failures: VCR cassettes need re-recording.
The test failures indicate that VCR cassettes don't have recordings for the new API endpoint (/beta/datacitations with from-created-date/until-created-date parameters). Re-record the cassettes or update them manually to match the new request format.
Affected cassettes based on failures:
Crossref/_import/with_valid_parameters/queues_jobs_for_DOIs_updated_within_the_specified_date_range.ymlCrossref/_import/with_missing_parameters/queues_jobs_for_the_default_date_range_yesterday_to_today_.yml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/crossref.rb` around lines 32 - 39, The VCR cassettes are failing
because tests now hit the new Crossref endpoint path "/beta/datacitations" with
the query params ("from-created-date"/"until-created-date") built in the URL in
the method that constructs the request URL (the string
"#{ENV['CROSSREF_QUERY_URL']}/beta/datacitations?#{URI.encode_www_form(params)}"
in app/models/crossref.rb); re-record the failing VCR cassettes (e.g.,
Crossref/_import/.../*.yml) against the updated API calls so they contain
recordings for the new endpoint and query parameters, or manually edit those
cassette YAMLs to reflect the new request URL and query params used by the
Crossref request builder.
| total, page = get_total(options) | ||
|
|
||
| if total.positive? | ||
| # walk through results paginated via cursor | ||
| # walk through results paginated via page | ||
| total_pages = (total.to_f / job_batch_size).ceil | ||
| error_total = 0 | ||
|
|
||
| (0...total_pages).each do |page| | ||
| options[:offset] = page * job_batch_size | ||
| options[:total] = total | ||
| options[:cursor] = cursor | ||
| count, cursor = process_data(options) | ||
| options[:page] = page | ||
| count, page = process_data(options) | ||
| end |
There was a problem hiding this comment.
Critical: Variable shadowing causes page from get_total to be unused.
At line 58, page is assigned from get_total, but at line 65, the loop variable page shadows it. The initial page value is never used.
Additionally:
- Line 69:
countis assigned but never used. push_data(line 107) still returnsnext-cursor, creating an inconsistency with page-based pagination.
If the loop should iterate using sequential page numbers (0, 1, 2...), rename the loop variable to avoid confusion. If the API's next-page value should drive pagination, the logic needs restructuring.
🐛 Proposed fix for variable shadowing
- total, page = get_total(options)
+ total, _next_page = get_total(options)
if total.positive?
# walk through results paginated via page
total_pages = (total.to_f / job_batch_size).ceil
error_total = 0
- (0...total_pages).each do |page|
- options[:offset] = page * job_batch_size
+ (0...total_pages).each do |page_num|
+ options[:offset] = page_num * job_batch_size
options[:total] = total
- options[:page] = page
- count, page = process_data(options)
+ options[:page] = page_num
+ process_data(options)
end🧰 Tools
🪛 GitHub Actions: Test Pull Request
[error] 58-58: RSpec failure due to VCR: Unhandled HTTP request for Crossref import query; request was not found in cassette /home/runner/work/levriero/levriero/spec/fixtures/vcr_cassettes/Crossref/_import/with_valid_parameters/queues_jobs_for_DOIs_updated_within_the_specified_date_range.yml
[error] 58-58: RSpec failure due to VCR: Request not found in cassette /home/runner/work/levriero/levriero/spec/fixtures/vcr_cassettes/Crossref/import/with_missing_parameters/queues_jobs_for_the_default_date_range_yesterday_to_today.yml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@app/models/crossref.rb` around lines 58 - 70, The variable page returned from
get_total is being shadowed by the loop variable in the (0...total_pages).each
loop and thus never used; rename the loop variable (e.g., page_index or
page_num) in the loop that sets options[:offset], options[:total], and
options[:page] so the original page value remains available, remove or use the
unused count returned from process_data (or ignore assignment if not needed),
and reconcile pagination mode: if push_data still returns a next-cursor,
refactor the pagination to use cursor-driven iteration (consume and pass
next-cursor from push_data) instead of page-based math, updating
process_data/push_data call sites accordingly; check job_batch_size and
options[:offset]/[:page] usage to ensure consistency.
Purpose
closes: Add github issue that originated this PR
Approach
Open Questions and Pre-Merge TODOs
Learning
Types of changes
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Reviewer, please remember our guidelines:
Summary by CodeRabbit