Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Headerset #563

Open
wants to merge 4 commits into
base: develop
Choose a base branch
from
Open

Conversation

Divyanshi750
Copy link
Contributor

Summary

This pull request enhances the User-Agent header generation mechanism in the Ricecooker download utility. The main changes made are:

  • Introduced a new configure_download_session() function to generate User-Agent headers
  • Modified the make_request() function to support optional user email identification

References

Fixes: #483

Reviewer guidance

Added test cases to:

  • Generate requests with various scenarios:
    • No email provided
    • Custom email supplied
    • Different Ricecooker versions

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this, this is a good start!

This can be done more simply, and more globally by leveraging the information we get from authenticating the user against the Studio server. Please ask any follow up questions about the directions I have suggested.

Also, please make these commits independently of the commits from your other PR.

)
if response.status_code == 200:
return
url_response = config.SESSION.post(config.get_upload_url(), json=data)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why this code block has been dedented? It wasn't in your other PR.

@@ -207,6 +223,9 @@ def make_request(
retry_count = 0
max_retries = 5
request_headers = DEFAULT_HEADERS

configure_download_session(sess, user_email)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note, I think we should be setting this more globally on config.DOWNLOAD_SESSION, not just within the the downloader utility. While it is most acute when we are downloading HTML pages, it is better we do this more broadly.

Once we have authenticated against Studio with the token, we should be able to use the user email from there, rather than requiring it be explicitly passed in.

See here where we receive the username (which is the email address) once we have authenticated with Studio: https://github.com/learningequality/ricecooker/blob/develop/ricecooker/commands.py#L90

Once we have that information in hand, we can do the required DOWNLOAD_SESSION update.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, passing an email manually every time can be tedious. I will work on an approach and make sure that the code is more of Don't Repeat Yourself and improves overall authentication and session management. But, I would like to ask if there any specific requirements or constraints for setting the download session that I need to take care of?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Divyanshi750 - I'm sorry, I don't understand what isn't clear here - I have outlined which session and needs to be updated, and also shown where in the code this should happen.

@rtibbles rtibbles self-assigned this Jan 25, 2025
@Divyanshi750
Copy link
Contributor Author

Improved session configuration by adding global email setup after Studio authentication.

Changes made:

  • Added AuthenticationError class in exceptions.py for better error handling in case the authentication fails
  • Updated download session configuration to use authenticated user's email when available
  • Moved session configuration logic to be more globally accessible

This makes the download session configuration more consistent and eliminates the need
to manually pass email for each download operation as requested.Please look into the changes suggested and review the changes made. @rtibbles

Copy link
Member

@rtibbles rtibbles left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In spite of tests being present, the code is full of issues that are evident from a code read through.

This has the appearance of being generated with a lot of LLM assistance, and not sufficient critical reflection of the generated code.

It is also not fixing the issue in the life cycle of a SushiChef class being executed. The header needs to be set on the central DOWNLOAD_SESSION in the config module - and should be done in the place where the user has already been authenticated with a token with the Studio server.

@@ -136,3 +136,4 @@ video_cache_py3.sqlite
cache.sqlite

chefdata/
audio_cache.sqlite
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been added in develop from your other PR - you are still committing the file itself here as well too, so that needs to be removed from the commit history.

@@ -0,0 +1,11 @@
[[source]]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still needs to be removed, as does the file below.

@@ -26,6 +26,7 @@
"console_scripts": [
"corrections = ricecooker.utils.corrections:correctionsmain",
"jiro = ricecooker.cli:main",
"ricecooker = ricecooker.cli:main",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't need to be added.

)


def test_performance_overhead():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this test doing?

baseline_time = timeit.timeit(baseline_request, number=100)
custom_email_time = timeit.timeit(custom_email_request, number=100)

assert custom_email_time - baseline_time < 0.01
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we making assertions in the module level code? This looks like it was generated by an LLM and unthinkingly pasted in here. Please focus only on testing the code you have added around updating the user agent.



def configure_download_session(session, user_email=None):
import ricecooker
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being imported twice?

@@ -28,6 +28,8 @@
from ricecooker.utils.html import download_file
from ricecooker.utils.html import replace_links
from ricecooker.utils.zip import create_predictable_zip
import logging
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why has logging been added here?

Studio authentication method that also configures the download session.
"""
try:
auth_response = authenticate_studio(username, password)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this function making a recursive call here?

@@ -48,6 +50,14 @@
}



from . import configure_download_session
def download_file(url, clear_cookies=False, headers=None, timeout=60,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this new method for?

@@ -197,7 +207,7 @@ def read(


def make_request(
url, clear_cookies=False, headers=None, timeout=60, session=None, *args, **kwargs
url, clear_cookies=False, headers=None, timeout=60, session=None, user_email=None, *args, **kwargs
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to pass this in, we instead copy the user agent from the globally configured DOWNLOAD_SESSION that should have had its user agent headers set properly. So far this PR is not doing this though, so is missing the main point of this issue.

@rtibbles
Copy link
Member

The majority of comments here are still unaddressed - the main approach needs to be tweaked to focus on the config.DOWNLOAD_SESSION and to be triggered when the tree manager authenticates to Studio.

@rtibbles
Copy link
Member

rtibbles commented Mar 1, 2025

This has been force pushed - but there are still conflicts to be resolved, and the principle feedback given is still unaddressed.

@Divyanshi750
Copy link
Contributor Author

This has been force pushed - but there are still conflicts to be resolved, and the principle feedback given is still unaddressed.

Hi @rtibbles , I was thinking to approach the problem in this way:
Implementing a SessionManager class directly in the tree.py file. It would handle HTTP sessions for downloads and API requests. And then create function to get the current download session or create a new one.
Inside TreeMAnager class, can we implement a function to authenticate to Studio using the provided token and initialize download session.
In downloader.py, a function that would download a file from a URL and save it to a destination or return the contents. And then return the downloaded file.
Let me know if any refinements are needed!

@rtibbles
Copy link
Member

We just need to update the session on the core config object, and have that update when authentication has happened with Studio, so we have the registered email address. That's all that is needed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

No explicit header set for the DOWNLOAD_SESSION
2 participants