Skip to content

Conversation

@SvenSommer
Copy link
Contributor

Previously, startup would fail with a Pydantic ValidationError if manual_entries.yaml was empty or missing required fields (e.g. entries). This commit adds a fallback to initialize the model with entries=[] if the file is empty or contains no data.

Additionally:

  • A warning is logged when an empty manual_entries file is detected.
  • Robust parsing ensures the app does not crash on invalid or missing input.

This improves resilience during project bootstrapping and API usage.

Previously, startup would fail with a Pydantic ValidationError if manual_entries.yaml
was empty or missing required fields (e.g. `entries`). This commit adds a fallback to
initialize the model with `entries=[]` if the file is empty or contains no data.

Additionally:
- A warning is logged when an empty manual_entries file is detected.
- Robust parsing ensures the app does not crash on invalid or missing input.

This improves resilience during project bootstrapping and API usage.
@SvenSommer SvenSommer requested review from Copilot and cybernop June 6, 2025 15:12
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR improves resilience when loading manual entries by providing defaults and better error handling, and refactors project initialization to use dedicated load methods.

  • Adds fallback to an empty entries=[] model and warning on empty manual entries files
  • Wraps parsing logic in try/except, adds support for JSON, and rejects unsupported extensions
  • Refactors Project constructor into _load_* methods, initializes all attributes, and ensures manual entries file is touched and written

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
service/src/structure_comparer/manual_entries.py Handle empty/invalid YAML by defaulting to entries=[], log warnings, catch exceptions, refine item access
service/src/structure_comparer/data/project.py Replace inline setup with _load_* methods, initialize attributes, consistently touch/write manual_entries, add setters
Comments suppressed due to low confidence (4)

service/src/structure_comparer/data/project.py:30

  • [nitpick] Comment is in German; consider translating to English to keep code comments consistent with the rest of the codebase.
# <- erstellt Verzeichnis dank Property

service/src/structure_comparer/data/project.py:36

  • The variable name dir shadows the built-in dir() function. Consider renaming it to something like subdir or folder for clarity.
for dir in self.data_dir.iterdir():

service/src/structure_comparer/data/project.py:69

  • [nitpick] This call will rewrite the manual entries file on every project load, even if no changes occurred. Consider only writing when defaults were applied or data was modified to avoid unnecessary file churn.
self.manual_entries.write()

service/src/structure_comparer/manual_entries.py:45

  • [nitpick] Currently only .yaml extension is supported; it may be helpful to also accept .yml to accommodate common YAML file naming conventions.
elif self._file.suffix == ".yaml":

…ileField

- Ensure ref_types handles missing targetProfile safely
- Improve error logging in from_json and model conversion methods
- Rename internal dict methods to _to_dict() and _to_pkg_dict() for clarity
- Add type annotations and Optional return types where appropriate
- Simplify boolean logic (e.g. mustSupport and path detection)
- Ensure __eq__ properly type-checks before comparison
Copy link
Contributor

@cybernop cybernop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you use the Black formatter as defined in the recommanded project options? It defines the non-imbiguous project formatting

return ProjectModel(
name=self.name, mappings=mappings, comparisons=comparisons, packages=pkgs
name=self.name,
mappings=[m.to_base_model() for m in self.mappings.values()],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you tell me what the reasoning was here to inline those definitions? I think the previous way it was better readable and better debugable as one can see the values before there were passed to the constructor.

I would suggest to change it back

Copy link
Contributor

@cybernop cybernop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am really thankful of your input, but would really appreciate if you would not change the style and readability of not related code and use the existing style and formatter


def __has_pkg(self, name: str, version: str) -> bool:
return any([p.name == name and p.version == version for p in self.pkgs])
def _has_package(self, name: str, version: str) -> bool:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you change this back as it does not improve readability and uses a common abbreviation. It only blows up the code for an internal method

def data_dir(self) -> Path:
return self.dir / self.config.data_dir
data_path = self.dir / self.config.data_dir
data_path.mkdir(parents=True, exist_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think a getter function should not create a directory. I think it would be better to do this during some initialization.

config_data = ProjectConfig(name=project_name)
config_data._file_path = path / "config.json"
config_data.write()
config = ProjectConfig(name=project_name)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove the comment? I think it was helpful for understanding whats going on.

manual_entries_file = self.dir / self.config.manual_entries_file
def _read_manual_entries(self) -> None:
manual_file = self.dir / self.config.manual_entries_file
manual_file.touch(exist_ok=True)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer to not touch the file every time the server is started as this modifies the the file by updating its modification time.

If this should be done during the read it should only be done when the file does not exist

}

def __load_mappings(self):
def _load_mappings(self) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you need the file to be accessable from the outside? This Python notation makes it clear, that this is purely designed to called internally.

if self._file.suffix == ".json":
self._data = ManualEntriesModel.model_validate_json(content)
elif self._file.suffix == ".yaml":
parsed = yaml.safe_load(content) or {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my opinion the project is not valid if the file is not build correctly. It should be ensured that the project is created correclty instead of adding this.

if self._file is None:
raise NotInitialized("ManualEntries file was not set")

if self._data is None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please re-add those empty lines as these improve readablility in my opinion.

elif self._file.suffix == ".yaml":
content = yaml.safe_dump(self._data.model_dump())
else:
raise ValueError(f"Unsupported file extension: {self._file.suffix}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this could crash the whole application depending from which part the write is called.

else:
self.entries[i] = value
def __getitem__(self, key: str) -> ManualEntriesMappingModel:
for entry in self.entries:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason to switch away from next? I think its better readable is more performant

raise KeyError(f"No entry found with ID: {key}")

def __setitem__(self, key: str, value: ManualEntriesMappingModel) -> None:
for i, entry in enumerate(self.entries):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as before regarding next(). In .data.project.py you introduce this notation and here you remove it.

Copy link
Contributor

@cybernop cybernop left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also rebase on the latest state of main to enforce checks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants