- 
          
- 
                Notifications
    You must be signed in to change notification settings 
- Fork 700
[GSoC 2025] Load model JSON files from URLs #5137
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
| I've updated the  Please let me know if any part of this approach needs changing or improvisation. | 
| @medha-14 can you change the base branch here? Would be much easier to review | 
        
          
                src/pybamm/dispatch/entry_points.py
              
                Outdated
          
        
      | def get_cache_path(url): | ||
| cache_dir = Path.home() / ".pybamm_cache" / "pybamm" / "models" | ||
| cache_dir.mkdir(parents=True, exist_ok=True) | ||
| file_hash = hashlib.md5(url.encode()).hexdigest() | ||
| return cache_dir / f"{file_hash}.json" | ||
|  | ||
|  | ||
| def clear_model_cache(): | ||
| cache_dir = Path.home() / ".pybamm_cache" / "pybamm" / "models" | ||
| if cache_dir.exists(): | ||
| for file in cache_dir.glob("*.json"): | ||
| file.unlink() | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be handled using platformdirs.user_cache_dir: https://platformdirs.readthedocs.io/en/latest/api.html#cache-directory
| 
 Hi @medha-14, could you please create a PR within your fork from this link? medha-14/PyBaMM@GSoC...medha-14:PyBaMM:load_json_from_url. I see a better diff there. We could add comments on that PR until #5056 is merged. Otherwise, I think it is on the right track, modulo that we need to use  | 
| Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@             Coverage Diff             @@
##           develop    #5137      +/-   ##
===========================================
- Coverage    98.88%   98.86%   -0.03%     
===========================================
  Files          320      320              
  Lines        26949    26988      +39     
===========================================
+ Hits         26648    26681      +33     
- Misses         301      307       +6     ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
 | 
| from pybamm.expression_tree.operations.serialise import Serialise | ||
|  | ||
| APP_NAME = "pybamm" | ||
| APP_AUTHOR = "pybamm" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can keep just APP_NAME and drop APP_AUTHOR, and we can set Path(user_cache_dir(APP_NAME)) / "models" as a constant here instead.
| def get_cache_path(url: str) -> Path: | ||
| cache_dir = _get_cache_dir() | ||
| file_hash = hashlib.md5(url.encode()).hexdigest() | ||
| return cache_dir / f"{file_hash}.json" | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why MD5 and not SHA-256? :)
| try: | ||
| file.unlink() | ||
| except Exception as e: | ||
| # Optional: log error instead of failing silently | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| # Optional: log error instead of failing silently | 
| def clear_model_cache() -> None: | ||
| cache_dir = _get_cache_dir() | ||
| for file in cache_dir.glob("*.json"): | ||
| try: | ||
| file.unlink() | ||
| except Exception as e: | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note, we also store some of PyBaMM's data files in the cache dir using a pooch registry: 
PyBaMM/src/pybamm/pybamm_data.py
Lines 126 to 139 in a1aa02c
| def get_data(self, filename: str): | |
| """ | |
| Fetches the data file from upstream and stores it in the local cache directory under pybamm directory. | |
| Parameters | |
| ---------- | |
| filename : str | |
| Name of the data file to be fetched from the registry. | |
| Returns | |
| ------- | |
| pathlib.PurePath | |
| """ | |
| self.registry.fetch(filename) | |
| return pathlib.Path(f"{self.path}/{self.version}/{filename}") | 
I wonder if we could reuse some of the code here, because clearing the cache directory of JSON files could have unintended side effects. We do have JSON files there: https://github.com/pybamm-team/pybamm-data/releases/tag/v1.0.1
Could we perhaps rewrite the PR to use pooch instead? That will also provide safer defaults than using urllib.request.urlretrieve() directly, and we could rely on it for the checksum/caching bits to see if we need to download the model JSON again or not.
| def Model( | ||
| model=None, | ||
| url=None, | ||
| force_download=False, | ||
| *args, | ||
| **kwargs, | ||
| ): | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can have better typing here.
Description
This PR is based on #5056 and contains changes that build upon it.
Once #5056 is merged, the diff for this PR will automatically update to show only the relevant changes.
Type of change
Please add a line in the relevant section of CHANGELOG.md to document the change (include PR #)
Important checks:
Please confirm the following before marking the PR as ready for review:
nox -s pre-commitnox -s testsnox -s doctests