-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH introduce SKRUB_DATA_DIRECTORY envar to control the data directory #1216
Comments
Hey @thomass-dev, thank you for the suggestion. Although this is not a big change, I'm not 100% sure we need this, as we use |
I think that's definitely useful even for individual users (actually I would have guessed we already had that). Some users will want to choose the location of the data directory, eg to put it in a location shared with other users, on some different storage, somewhere that is not backed up, or simply to avoid cluttering their home directory. as this is a preference of the user / local machine, allowing to control it from the python code is not a solution, it should be a configuration file or env variable ( I prefer the env variable ) |
I'm not sure I understand this bit, what do you mean? |
I mean the python code may be shared with other developers, whereas the directory is a user-specific preference. for example if I want to run the skrub examples but have them store the data in |
Ok, I understand your point. Just for the sake of argument, couldn't the shared python code also accept a directory parameter from the user and route that to skrub? |
yes you are right -- basically the options for passing that info to a program could be a command-line argument, an env variable, or a config file. as a user I want to set it once and forget about it, not pass it every time I invoke a script that uses skrub, and as a developer I don't want to add boiler plate to all my scripts to expose that argument (for example the skrub examples don't have it, and if they did sphinx wouldn't pass it when building the doc). |
Ok, that makes sense, thanks for detailing your thoughts. |
Ok, that makes sense, thanks for detailing your thoughts.
I agree that the feature makes sense. Thanks!!
|
This is exactly the purpose of this issue. Thanks @jeromedockes to have deep-dived in my mind 😄 . To be explicit, my use-case is: In |
Nice, could you share the link to this dev? |
It should be awesome to control the data cache directory from another place than the python code itself.
Especially in a CI context, where we should ensure that the data is at the right place.
I propose to use envar to add another way to define the data directory.
The text was updated successfully, but these errors were encountered: