-
Notifications
You must be signed in to change notification settings - Fork 17
Feature request: parse and store custom user agent in BigQuery public dataset #94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Can you give us an example of the user agent you tried to use? |
We tried to introduce user agent tracking in our pip installation process in https://github.com/napari/napari/pull/5135/files Where we used environment variable PIP_USER_AGENT_USER_DATA to set the user agent, and an example value of this string is |
I think the problem is that
So the default user agent is something like There's a lot of fields in this JSON blob that we do include in the BigQuery dataset, my recommendation would be to choose one of them to override/modify with something napari-specific. |
Ah! I didn't realize the user agent data is only processed to take certain keys. I tested parsing
And this seems to be working. I think we can override the installer part by setting the user agent data to be |
Great, shall we close this then? |
Hi there, I investigated a bit, and I could not see how user data is used in the flow. Curious if someone can help me with this:
From there on there seems to be some processing that transformed the string to be parsed by the parser, assuming the details dict does not change:
The details would be None because of the parsing error. If the user data is preprocessed with json loading, then it will parse correctly, for example (notice the user data is loaded as dict, instead of the example above where user data is a string):
I want to confirm that the user_data from pip is indeed preprocessed correctly for parsing, otherwise the user_data specified by pip will not be correctly recorded, in fact, it would corrupt the whole record due to parsing error. |
Another issue is that it seems in the https://github.com/pypa/linehaul-cloud-function/blob/a964b841b2718635efe3fa975093a7997a96be01/linehaul/events/parser.py#L205-L239 The user data is not used, without modifying the system-level info like compiling a specific cpython I don't see a good way to override any column currently being tracked. The feature request here is to allow overriding the hardcoded columns using user data |
Ah, yeah, seems like I misinterpreted what I'm a little bit wary of us adding a column to include
I think this might be best as a feature request to pip instead -- we just parse the fields that |
Understandable. I would not suggest doing that either. Alternatively, the parser here does parse user_data correctly, I wonder if it is less concerning to use the user_data to overwrite existing columns when valid. For example, if user_data specifies a valid "installer" structure, the result parsing can use that to overwrite the "installer" part that comes from the non-user-data part of the user agent.
Would get parsed out to read installer name is y instead of x. Does this sound more reasonable? |
I think that would be a feature request for the |
@bnelson-czi let's close this issue then. I have opened a feature request on the pip side as suggested |
Hi linehaul devs!
Our team maintains an application with an extensive plugin ecosystem. Plugins can be installed within or outside the application via pip, and we would like to understand where users are installing plugins.
We tried customizing the application's user agent, thinking that
linehaul
would parse/stream the data to thefile_downloads
table under thedetails
data structure. Unfortunately, it didn't work. We saw some functions in your codebase that parse user agent data but we don't know if those data actually get stored anywhere.Any guidance or thoughts on enabling this? Thanks!
The text was updated successfully, but these errors were encountered: