Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider moving zim::Metadata into libzim #785

Open
veloman-yunkan opened this issue Apr 27, 2023 · 7 comments
Open

Consider moving zim::Metadata into libzim #785

veloman-yunkan opened this issue Apr 27, 2023 · 7 comments

Comments

@veloman-yunkan
Copy link
Collaborator

openzim/zim-tools#339 introduced zim::Metadata in zim-tools source tree so that it could be immediately used as a shared utility between zimwriterfs and zimcheck. However libzim is a more logical home for zim::Metadata. Then (after an easy enhancement of zim::Metadata) simple constraints (involving a single metadata item) can be checked in zim::writer::Creator::addMetadata() and full checks can be run in zim::writer::Creator::finishZimCreation().

@kelson42
Copy link
Contributor

kelson42 commented May 2, 2023

@mgautierfr Any opinion? I have nothing against.

@mgautierfr
Copy link
Collaborator

I'm against that.
libzim doesn't know about metadata. All the api is agnostic to the metadata (you have a getMetadata(std::string name) but no getTitle()).

This is something already discussed, but libzim is content agnostic. I consider the metadata described in libzim format more as kiwix specification than zim ones. Simply because zim files without these metadata can correctly being read by libzim (but maybe not by kiwix readers). Checking the metadata is at same level than checking the url inside the articles.
It may break how we use zim files but not how we parse them.

@kelson42
Copy link
Contributor

kelson42 commented May 3, 2023

I consider the metadata described in libzim format more as kiwix specification than zim ones.

@mgautierfr I understand the way how you make the difference, but this is pretty much a personal POV. Metadata specs are part of the ZIM specification like the rest.

Beside the question of the principle, do you see any bad effect that such a move would generate?

@mgautierfr
Copy link
Collaborator

We already had this discussion in openzim/zim-tools#336 which is exactly the root issue leading to zim::Metadata.
I let you read again my comment (openzim/zim-tools#336 (comment)) you were agree with (to the point you moved the issue from libzim to zim-tools)

@kelson42
Copy link
Contributor

kelson42 commented May 19, 2023

@mgautierfr I understand your opinion (about the separation of duties between the libzim and the zim-tools) which is the one we have pursued since a long time. At this stage, I either fully disagree or agree with it. Actually, so far I have been supportive of your POV like you have mentioned (but I disagree - and always have AFAIK - with the assertion that metadata size limits are not part of the ZIM specification).

But I see that this metadata checking is currently under implementation at multiple levels in scrapers and binding and zim-tools. I'm concerned about this redundancy. This new reality is why I have asked "Beside the question of the principle, do you see any bad effect that such a move would generate?" (which is not a question you have already answered in the past AFAIK).

@rgaudin
Copy link
Member

rgaudin commented May 19, 2023

Just to be clear, redundancy-wise (already wrote that somewhere else), the scrapers needs to validate Metadata early to provide direct feedback as most of those important metadata are user-provided.

This means having direct access to checks (could be possible via libzim) but also complete checks.

We are quite happy with scraperlib and I think that in this case it's more practical to have redundancy in scraperlib and JS-binding rather than bringing ISO-639-3 and PNG library to libzim.

The point is sort of the same: if the spec allows setting invalid metadata, it's hard to justify such requirements just to check them. Similar to the separate writer/reader libraries topic.

@kelson42
Copy link
Contributor

Just to be clear, redundancy-wise (already wrote that somewhere else), the scrapers needs to validate Metadata early to provide direct feedback as most of those important metadata are user-provided.

This is a good point I have forgotten to say. We might be more interested in the definition of the Metadata constraints in the libzim than in the check themselves. This is at least one of the altnernative I consider.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants