-
Notifications
You must be signed in to change notification settings - Fork 1
fix(manual_entries): handle empty or invalid YAML files gracefully #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Previously, startup would fail with a Pydantic ValidationError if manual_entries.yaml was empty or missing required fields (e.g. `entries`). This commit adds a fallback to initialize the model with `entries=[]` if the file is empty or contains no data. Additionally: - A warning is logged when an empty manual_entries file is detected. - Robust parsing ensures the app does not crash on invalid or missing input. This improves resilience during project bootstrapping and API usage.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR improves resilience when loading manual entries by providing defaults and better error handling, and refactors project initialization to use dedicated load methods.
- Adds fallback to an empty
entries=[]model and warning on empty manual entries files - Wraps parsing logic in
try/except, adds support for JSON, and rejects unsupported extensions - Refactors
Projectconstructor into_load_*methods, initializes all attributes, and ensures manual entries file is touched and written
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| service/src/structure_comparer/manual_entries.py | Handle empty/invalid YAML by defaulting to entries=[], log warnings, catch exceptions, refine item access |
| service/src/structure_comparer/data/project.py | Replace inline setup with _load_* methods, initialize attributes, consistently touch/write manual_entries, add setters |
Comments suppressed due to low confidence (4)
service/src/structure_comparer/data/project.py:30
- [nitpick] Comment is in German; consider translating to English to keep code comments consistent with the rest of the codebase.
# <- erstellt Verzeichnis dank Property
service/src/structure_comparer/data/project.py:36
- The variable name
dirshadows the built-indir()function. Consider renaming it to something likesubdirorfolderfor clarity.
for dir in self.data_dir.iterdir():
service/src/structure_comparer/data/project.py:69
- [nitpick] This call will rewrite the manual entries file on every project load, even if no changes occurred. Consider only writing when defaults were applied or data was modified to avoid unnecessary file churn.
self.manual_entries.write()
service/src/structure_comparer/manual_entries.py:45
- [nitpick] Currently only
.yamlextension is supported; it may be helpful to also accept.ymlto accommodate common YAML file naming conventions.
elif self._file.suffix == ".yaml":
…ileField - Ensure ref_types handles missing targetProfile safely - Improve error logging in from_json and model conversion methods - Rename internal dict methods to _to_dict() and _to_pkg_dict() for clarity - Add type annotations and Optional return types where appropriate - Simplify boolean logic (e.g. mustSupport and path detection) - Ensure __eq__ properly type-checks before comparison
cybernop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you use the Black formatter as defined in the recommanded project options? It defines the non-imbiguous project formatting
| return ProjectModel( | ||
| name=self.name, mappings=mappings, comparisons=comparisons, packages=pkgs | ||
| name=self.name, | ||
| mappings=[m.to_base_model() for m in self.mappings.values()], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you tell me what the reasoning was here to inline those definitions? I think the previous way it was better readable and better debugable as one can see the values before there were passed to the constructor.
I would suggest to change it back
cybernop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am really thankful of your input, but would really appreciate if you would not change the style and readability of not related code and use the existing style and formatter
|
|
||
| def __has_pkg(self, name: str, version: str) -> bool: | ||
| return any([p.name == name and p.version == version for p in self.pkgs]) | ||
| def _has_package(self, name: str, version: str) -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you change this back as it does not improve readability and uses a common abbreviation. It only blows up the code for an internal method
| def data_dir(self) -> Path: | ||
| return self.dir / self.config.data_dir | ||
| data_path = self.dir / self.config.data_dir | ||
| data_path.mkdir(parents=True, exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think a getter function should not create a directory. I think it would be better to do this during some initialization.
| config_data = ProjectConfig(name=project_name) | ||
| config_data._file_path = path / "config.json" | ||
| config_data.write() | ||
| config = ProjectConfig(name=project_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why did you remove the comment? I think it was helpful for understanding whats going on.
| manual_entries_file = self.dir / self.config.manual_entries_file | ||
| def _read_manual_entries(self) -> None: | ||
| manual_file = self.dir / self.config.manual_entries_file | ||
| manual_file.touch(exist_ok=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer to not touch the file every time the server is started as this modifies the the file by updating its modification time.
If this should be done during the read it should only be done when the file does not exist
| } | ||
|
|
||
| def __load_mappings(self): | ||
| def _load_mappings(self) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need the file to be accessable from the outside? This Python notation makes it clear, that this is purely designed to called internally.
| if self._file.suffix == ".json": | ||
| self._data = ManualEntriesModel.model_validate_json(content) | ||
| elif self._file.suffix == ".yaml": | ||
| parsed = yaml.safe_load(content) or {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion the project is not valid if the file is not build correctly. It should be ensured that the project is created correclty instead of adding this.
| if self._file is None: | ||
| raise NotInitialized("ManualEntries file was not set") | ||
|
|
||
| if self._data is None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please re-add those empty lines as these improve readablility in my opinion.
| elif self._file.suffix == ".yaml": | ||
| content = yaml.safe_dump(self._data.model_dump()) | ||
| else: | ||
| raise ValueError(f"Unsupported file extension: {self._file.suffix}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this could crash the whole application depending from which part the write is called.
| else: | ||
| self.entries[i] = value | ||
| def __getitem__(self, key: str) -> ManualEntriesMappingModel: | ||
| for entry in self.entries: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason to switch away from next? I think its better readable is more performant
| raise KeyError(f"No entry found with ID: {key}") | ||
|
|
||
| def __setitem__(self, key: str, value: ManualEntriesMappingModel) -> None: | ||
| for i, entry in enumerate(self.entries): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as before regarding next(). In .data.project.py you introduce this notation and here you remove it.
cybernop
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also rebase on the latest state of main to enforce checks
Previously, startup would fail with a Pydantic ValidationError if manual_entries.yaml was empty or missing required fields (e.g.
entries). This commit adds a fallback to initialize the model withentries=[]if the file is empty or contains no data.Additionally:
This improves resilience during project bootstrapping and API usage.