Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Filter standard library packages out of Python models' packages config #9875

Closed
3 tasks done
gwenwindflower opened this issue Apr 8, 2024 · 6 comments
Closed
3 tasks done
Labels
enhancement New feature or request python_models Refinement Maintainer input needed stale Issues that have gone stale

Comments

@gwenwindflower
Copy link

gwenwindflower commented Apr 8, 2024

Is this your first time submitting a feature request?

  • I have read the expectations for open source contributors
  • I have searched the existing issues, and I could not find an existing issue for this feature
  • I am requesting a straightforward extension of existing dbt functionality, rather than a Big Idea better suited to a discussion

Describe the feature

Right now, if a user wants to use re, os, etc in a Python model, they would rightfully think it important to add it to the packages list config argument of the model. In fact, dbt will throw a 'package not found' error for packages that aren't 3rd party. The Right Way at present is to just import and use them, but we don't flag that anywhere in the docs. It would be good to filter out the standard library packages and perhaps throw a warning instead of an error here, letting people know this isn't necessary, but still proceeding.

At present you need to do this, which is not super obvious:

import pandas as pd
import numpy as np
import re

def model(dbt, session):
    # dbt configuration
    dbt.config(packages=["pandas","numpy"])

Describe alternatives you've considered

  • Updating the docs to make this more clear
  • Throwing a clearer error
  • Filtering the packages and not throwing a warning at all, just ignoring the extra code

Who will this benefit?

Users of Python models.

Are you interested in contributing this feature?

No

Anything else?

@gwenwindflower gwenwindflower added enhancement New feature or request triage labels Apr 8, 2024
@dbeatty10 dbeatty10 self-assigned this Apr 8, 2024
@dbeatty10
Copy link
Contributor

Thanks for opening this @gwenwindflower !

Which adapter did you use? Could you provide a simple dbt python model that exhibits this issue?

Was it dbt-snowflake with a model like this, by any chance?

import pandas as pd
import numpy as np
import re

def model(dbt, session):
    dbt.config(packages=["pandas", "numpy", "re"])

    df = pd.DataFrame({"hello": ["world"]})
    return df

And an error like this?

00:23:57    Database Error in model my_python_model (models/my_python_model.py)
  100357 (P0000): Cannot create a Python function with the specified packages. Please check your packages specification and try again.
  compiled Code at target/run/my_project/models/my_python_model.py

@dbeatty10 dbeatty10 removed their assignment Apr 9, 2024
@gwenwindflower
Copy link
Author

gwenwindflower commented Apr 9, 2024

hey @dbeatty10, sorry for the lack of a firsthand repro, I reported this based on a user in the Community so didn't get the error myself! @aranke suggested it could be worthwhile to just fix this rather than updating the docs, and I tend to agree, particularly with the offered idea of a clear Warning over a mysterious Error. based on my conversation with the Community-member, this looks like exactly the simplified version of the model he was creating and error he was getting that confused him. Here's a link to the thread.

@dbeatty10
Copy link
Contributor

@aranke could you share the details of your proposed approach for this scenario?

If you can provide links to the relevant area(s) of the source code, that would be even better.

@dbeatty10 dbeatty10 added Refinement Maintainer input needed python_models and removed triage labels Apr 10, 2024
@aranke
Copy link
Member

aranke commented Apr 15, 2024

Code: TK

Python built-in modules: https://docs.python.org/3/library/sys.html#sys.builtin_module_names

Copy link
Contributor

This issue has been marked as Stale because it has been open for 180 days with no activity. If you would like the issue to remain open, please comment on the issue or else it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues that have gone stale label Oct 13, 2024
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment to notify the maintainers.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Oct 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request python_models Refinement Maintainer input needed stale Issues that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants