Skip to content

Update the crossref api#243

Open
kaysiz wants to merge 1 commit intomasterfrom
ks-681-replace-crossref-api
Open

Update the crossref api#243
kaysiz wants to merge 1 commit intomasterfrom
ks-681-replace-crossref-api

Conversation

@kaysiz
Copy link
Member

@kaysiz kaysiz commented Mar 24, 2026

Purpose

closes: Add github issue that originated this PR

Approach

Open Questions and Pre-Merge TODOs

Learning

Types of changes

  • Bug fix (non-breaking change which fixes an issue)

  • New feature (non-breaking change which adds functionality)

  • Breaking change (fix or feature that would cause existing functionality to change)

Reviewer, please remember our guidelines:

  • Be humble in the language and feedback you give, ask don't tell.
  • Consider using positive language as opposed to neutral when offering feedback. This is to avoid the negative bias that can occur with neutral language appearing negative.
  • Offer suggestions on how to improve code e.g. simplification or expanding clarity.
  • Ensure you give reasons for the changes you are proposing.

Summary by CodeRabbit

  • Chores
    • Updated Crossref API endpoint integration to access current data sources
    • Modified query parameters for improved data retrieval
    • Optimized pagination mechanism for enhanced API communication

@kaysiz kaysiz self-assigned this Mar 24, 2026
@coderabbitai
Copy link

coderabbitai bot commented Mar 24, 2026

📝 Walkthrough

Walkthrough

The changes update the Crossref API integration by switching the endpoint from event data service to the main Crossref API, renaming date parameters from collected-date to created-date, removing the scholix parameter, and refactoring pagination from cursor-based to page-based semantics.

Changes

Cohort / File(s) Summary
Crossref API Integration
app/models/crossref.rb, config/application.rb
Updated API endpoint URL from https://api.eventdata.crossref.org to https://api.crossref.org, replaced query parameters from-collected-date/until-collected-date with from-created-date/until-created-date, removed scholix: true, and refactored pagination from cursor-based (cursor, next-cursor) to page-based (page, next-page) logic in query construction and job queuing.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 1 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Title check ❓ Inconclusive The title 'Update the crossref api' is vague and does not clearly convey the specific nature of the changes, which involve switching API endpoints, changing query parameters, and modifying pagination semantics. Consider a more specific title such as 'Switch Crossref API endpoint and update query parameters' or 'Update Crossref API endpoint and pagination logic' to better reflect the actual changes.
✅ Passed checks (1 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch ks-681-replace-crossref-api

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@app/models/crossref.rb`:
- Around line 32-39: The VCR cassettes are failing because tests now hit the new
Crossref endpoint path "/beta/datacitations" with the query params
("from-created-date"/"until-created-date") built in the URL in the method that
constructs the request URL (the string
"#{ENV['CROSSREF_QUERY_URL']}/beta/datacitations?#{URI.encode_www_form(params)}"
in app/models/crossref.rb); re-record the failing VCR cassettes (e.g.,
Crossref/_import/.../*.yml) against the updated API calls so they contain
recordings for the new endpoint and query parameters, or manually edit those
cassette YAMLs to reflect the new request URL and query params used by the
Crossref request builder.
- Around line 58-70: The variable page returned from get_total is being shadowed
by the loop variable in the (0...total_pages).each loop and thus never used;
rename the loop variable (e.g., page_index or page_num) in the loop that sets
options[:offset], options[:total], and options[:page] so the original page value
remains available, remove or use the unused count returned from process_data (or
ignore assignment if not needed), and reconcile pagination mode: if push_data
still returns a next-cursor, refactor the pagination to use cursor-driven
iteration (consume and pass next-cursor from push_data) instead of page-based
math, updating process_data/push_data call sites accordingly; check
job_batch_size and options[:offset]/[:page] usage to ensure consistency.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 4b4709b2-e181-4596-afc0-58aa3233836e

📥 Commits

Reviewing files that changed from the base of the PR and between 65c8077 and 51e6d13.

📒 Files selected for processing (2)
  • app/models/crossref.rb
  • config/application.rb

Comment on lines +32 to +39
"from-created-date" => options[:from_date],
"until-created-date" => options[:until_date],
mailto: "[email protected]",
scholix: true,
rows: options[:rows],
cursor: options[:cursor],
page: options[:page],
}.compact

"#{ENV['CROSSREF_QUERY_URL']}/v1/events?#{URI.encode_www_form(params)}"
"#{ENV['CROSSREF_QUERY_URL']}/beta/datacitations?#{URI.encode_www_form(params)}"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Pipeline failures: VCR cassettes need re-recording.

The test failures indicate that VCR cassettes don't have recordings for the new API endpoint (/beta/datacitations with from-created-date/until-created-date parameters). Re-record the cassettes or update them manually to match the new request format.

Affected cassettes based on failures:

  • Crossref/_import/with_valid_parameters/queues_jobs_for_DOIs_updated_within_the_specified_date_range.yml
  • Crossref/_import/with_missing_parameters/queues_jobs_for_the_default_date_range_yesterday_to_today_.yml
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/crossref.rb` around lines 32 - 39, The VCR cassettes are failing
because tests now hit the new Crossref endpoint path "/beta/datacitations" with
the query params ("from-created-date"/"until-created-date") built in the URL in
the method that constructs the request URL (the string
"#{ENV['CROSSREF_QUERY_URL']}/beta/datacitations?#{URI.encode_www_form(params)}"
in app/models/crossref.rb); re-record the failing VCR cassettes (e.g.,
Crossref/_import/.../*.yml) against the updated API calls so they contain
recordings for the new endpoint and query parameters, or manually edit those
cassette YAMLs to reflect the new request URL and query params used by the
Crossref request builder.

Comment on lines +58 to 70
total, page = get_total(options)

if total.positive?
# walk through results paginated via cursor
# walk through results paginated via page
total_pages = (total.to_f / job_batch_size).ceil
error_total = 0

(0...total_pages).each do |page|
options[:offset] = page * job_batch_size
options[:total] = total
options[:cursor] = cursor
count, cursor = process_data(options)
options[:page] = page
count, page = process_data(options)
end
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: Variable shadowing causes page from get_total to be unused.

At line 58, page is assigned from get_total, but at line 65, the loop variable page shadows it. The initial page value is never used.

Additionally:

  1. Line 69: count is assigned but never used.
  2. push_data (line 107) still returns next-cursor, creating an inconsistency with page-based pagination.

If the loop should iterate using sequential page numbers (0, 1, 2...), rename the loop variable to avoid confusion. If the API's next-page value should drive pagination, the logic needs restructuring.

🐛 Proposed fix for variable shadowing
-    total, page = get_total(options)
+    total, _next_page = get_total(options)

     if total.positive?
       # walk through results paginated via page
       total_pages = (total.to_f / job_batch_size).ceil
       error_total = 0

-      (0...total_pages).each do |page|
-        options[:offset] = page * job_batch_size
+      (0...total_pages).each do |page_num|
+        options[:offset] = page_num * job_batch_size
         options[:total] = total
-        options[:page] = page
-        count, page = process_data(options)
+        options[:page] = page_num
+        process_data(options)
       end
🧰 Tools
🪛 GitHub Actions: Test Pull Request

[error] 58-58: RSpec failure due to VCR: Unhandled HTTP request for Crossref import query; request was not found in cassette /home/runner/work/levriero/levriero/spec/fixtures/vcr_cassettes/Crossref/_import/with_valid_parameters/queues_jobs_for_DOIs_updated_within_the_specified_date_range.yml


[error] 58-58: RSpec failure due to VCR: Request not found in cassette /home/runner/work/levriero/levriero/spec/fixtures/vcr_cassettes/Crossref/import/with_missing_parameters/queues_jobs_for_the_default_date_range_yesterday_to_today.yml

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@app/models/crossref.rb` around lines 58 - 70, The variable page returned from
get_total is being shadowed by the loop variable in the (0...total_pages).each
loop and thus never used; rename the loop variable (e.g., page_index or
page_num) in the loop that sets options[:offset], options[:total], and
options[:page] so the original page value remains available, remove or use the
unused count returned from process_data (or ignore assignment if not needed),
and reconcile pagination mode: if push_data still returns a next-cursor,
refactor the pagination to use cursor-driven iteration (consume and pass
next-cursor from push_data) instead of page-based math, updating
process_data/push_data call sites accordingly; check job_batch_size and
options[:offset]/[:page] usage to ensure consistency.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant