Skip to content

Identify MuckRock in calls to Accounts and DocumentCloud#2181

Open
duckduckgrayduck wants to merge 2 commits into
masterfrom
identify_muckrock
Open

Identify MuckRock in calls to Accounts and DocumentCloud#2181
duckduckgrayduck wants to merge 2 commits into
masterfrom
identify_muckrock

Conversation

@duckduckgrayduck
Copy link
Copy Markdown
Contributor

@duckduckgrayduck duckduckgrayduck commented May 20, 2026

  • We were calling the same DC client in several files and sometimes multiple times in a single file, all using the same user. This consolidates this into one client in core/utils.py that anyone can call in the future to use the MuckRock Accounts service account. The import_to_documentcloud method uses a different client, so I preserved that functionality as-is and documented it as such. It hasn't been re-used yet, so I didn't see a reason to create a helper method for this yet.

  • This preserves the old user agent (specifying the version of python-requests) and appends a custom user agent we can set in environment variables

  • This PR also identifies ourselves in calls to Accounts

@duckduckgrayduck duckduckgrayduck temporarily deployed to muckrock-pip-identify-m-ez0wwj May 20, 2026 22:44 Inactive
@duckduckgrayduck duckduckgrayduck temporarily deployed to muckrock-pip-identify-m-ez0wwj May 20, 2026 23:27 Inactive
@duckduckgrayduck duckduckgrayduck temporarily deployed to muckrock-pip-identify-m-ez0wwj May 20, 2026 23:28 Inactive
@duckduckgrayduck duckduckgrayduck temporarily deployed to muckrock-pip-identify-m-ez0wwj May 20, 2026 23:30 Inactive
@duckduckgrayduck
Copy link
Copy Markdown
Contributor Author

In python manage.py shell in deploy preview I tested

Tested:

  • import_doccloud_file
  • upload_document_cloud
  • upload_user_document_cloud
  • foia_file_delete_dc
  • get_text_ocr
  • fetch_and_load_documentcloud_stats
  • squarelet / get_squarelet_access_token UA changes
  • set_document_cloud_pages
  • noindex_documentcloud

Not tested (complicated to test and there are no functional differences in the client, so I don't think it is necessary).

  • datum_per_page (crowdsource)
  • import_doccloud_proj (crowdsource)

Can see that the user agent works on DocumentCloud here.

Squarelet staging isn't proxied behind Cloudflare so we can't see it there, but in heroku logs you can see that the user agent is set there as well.

Comment thread muckrock/core/utils.py
def get_dc_client():
"""Get a DocumentCloud client for the MuckRock User Account"""
client = DocumentCloud(
username=settings.DOCUMENTCLOUD_BETA_USERNAME,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should rename these settings at some point. Doesn't have to be now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants