-
Notifications
You must be signed in to change notification settings - Fork 0
Update the crossref api #243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
51e6d13
ffc0157
9098d49
9f292da
7e5e36a
0e4a6fe
50a94f5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -29,23 +29,21 @@ def source_id | |
|
|
||
| def get_query_url(options = {}) | ||
| params = { | ||
| source: "crossref", | ||
| "from-collected-date" => options[:from_date], | ||
| "until-collected-date" => options[:until_date], | ||
| "from-created-date" => options[:from_date], | ||
| "until-created-date" => options[:until_date], | ||
| mailto: "info@datacite.org", | ||
| scholix: true, | ||
| rows: options[:rows], | ||
| cursor: options[:cursor], | ||
| page: options[:page], | ||
| }.compact | ||
|
|
||
| "#{ENV['CROSSREF_QUERY_URL']}/v1/events?#{URI.encode_www_form(params)}" | ||
| "#{ENV['CROSSREF_QUERY_URL']}/beta/datacitations?#{URI.encode_www_form(params)}" | ||
| end | ||
|
|
||
| def get_total(options = {}) | ||
| query_url = get_query_url(options.merge(rows: 0)) | ||
| result = Maremma.get(query_url, options) | ||
| message = result.body.dig("data", "message").to_h | ||
| [message["total-results"].to_i, message["next-cursor"]] | ||
| message["total-results"].to_i | ||
| end | ||
|
|
||
| def queue_jobs(options = {}) | ||
|
|
@@ -57,18 +55,18 @@ def queue_jobs(options = {}) | |
| options[:until_date].presence || Time.now.to_date.iso8601 | ||
| options[:content_type] = "json" | ||
|
|
||
| total, cursor = get_total(options) | ||
| total = get_total(options) | ||
|
|
||
| if total.positive? | ||
| # walk through results paginated via cursor | ||
| # walk through results paginated via page | ||
| total_pages = (total.to_f / job_batch_size).ceil | ||
| error_total = 0 | ||
|
|
||
| (0...total_pages).each do |page| | ||
| options[:offset] = page * job_batch_size | ||
| (0...total_pages).each do |page_num| | ||
| options[:offset] = page_num * job_batch_size | ||
| options[:total] = total | ||
| options[:cursor] = cursor | ||
| count, cursor = process_data(options) | ||
| options[:page] = page_num | ||
| process_data(options) | ||
|
Comment on lines
+65
to
+69
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🌐 Web query:
💡 Result: The Crossref beta Data Citations API, accessible at https://api.crossref.org/beta/datacitations/, supports pagination using the standard Crossref REST API parameters: rows (default 20, max 1000) to specify the number of results per page, and cursor (start with cursor=* for the first page, then use the next-cursor from the response) for efficient pagination through large result sets. Stop paginating when the number of returned items is less than the requested rows. Cursors expire after 5 minutes of inactivity. Include mailto parameter during beta for contact. Full details, including all parameters, are in the official Swagger documentation referenced in the API docs. Citations:
🏁 Script executed: # Find the crossref.rb file and examine the pagination implementation
find . -name "crossref.rb" -type f | head -5Repository: datacite/levriero Length of output: 85 🏁 Script executed: # Once located, read the file to understand the full context of pagination
cat -n app/models/crossref.rb | head -100Repository: datacite/levriero Length of output: 4288 🏁 Script executed: # Also search for the get_query_url method implementation
rg "def get_query_url" -A 20 app/models/Repository: datacite/levriero Length of output: 7738 🏁 Script executed: # Search for process_data method
rg "def process_data" -A 15 app/models/Repository: datacite/levriero Length of output: 688 Replace page-based pagination with cursor-based pagination for Crossref The Crossref The loop should instead:
Additionally, 🤖 Prompt for AI Agents |
||
| end | ||
| text = "Queued import for #{total} DOIs updated #{options[:from_date]} - #{options[:until_date]}." | ||
| else | ||
|
|
@@ -99,34 +97,28 @@ def push_data(result, _options = {}) | |
| return result.body.fetch("errors") if result.body.fetch("errors", | ||
| nil).present? | ||
|
|
||
| items = result.body.dig("data", "message", "events") | ||
| items = result.body.dig("data", "message", "items") | ||
| # Rails.logger.info "Extracting related identifiers for #{items.size} DOIs updated from #{options[:from_date]} until #{options[:until_date]}." | ||
|
|
||
| Array.wrap(items).map do |item| | ||
| CrossrefImportJob.perform_later(item) | ||
| end | ||
|
|
||
| [items.length, result.body.dig("data", "message", "next-cursor")] | ||
| end | ||
|
|
||
| def self.push_item(item) | ||
| subj = cached_crossref_response(item["subj_id"]) | ||
| obj = cached_datacite_response(item["obj_id"]) | ||
| subj = cached_crossref_response(item["subject"]["id"]) | ||
| obj = cached_datacite_response(item["object"]["id"]) | ||
|
|
||
| data = { | ||
| "data" => { | ||
| "id" => item["id"], | ||
| "type" => "events", | ||
| "attributes" => { | ||
| "messageAction" => item["action"], | ||
| "subjId" => item["subj_id"], | ||
| "objId" => item["obj_id"], | ||
| "relationTypeId" => item["relation_type_id"].to_s.dasherize, | ||
| "sourceId" => item["source_id"].to_s.dasherize, | ||
| "sourceToken" => item["source_token"], | ||
| "occurredAt" => item["occurred_at"], | ||
| "subjId" => item["subject"]["id"], | ||
| "objId" => item["object"]["id"], | ||
| "relationTypeId" => item["relation"].to_s.dasherize, | ||
| "sourceId" => "crossref", | ||
| "sourceToken" => ENV["CROSSREF_SOURCE_TOKEN"], | ||
| "timestamp" => item["timestamp"], | ||
| "license" => item["license"], | ||
| "subj" => subj, | ||
| "obj" => obj, | ||
| }, | ||
|
|
@@ -135,6 +127,6 @@ def self.push_item(item) | |
|
|
||
| send_event_import_message(data) | ||
|
|
||
| Rails.logger.info "[Event Data] #{item['subj_id']} #{item['relation_type_id']} #{item['obj_id']} sent to the events queue." | ||
| Rails.logger.info "[Event Data] #{item["subject"]["id"]} #{item["relation"]} #{item["object"]["id"]} sent to the events queue." | ||
| end | ||
| end | ||
Large diffs are not rendered by default.
Large diffs are not rendered by default.
Uh oh!
There was an error while loading. Please reload this page.