Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove old (v0) titleindex (title pointer list) #800

Open
kelson42 opened this issue May 24, 2023 · 10 comments
Open

Remove old (v0) titleindex (title pointer list) #800

kelson42 opened this issue May 24, 2023 · 10 comments
Assignees
Milestone

Comments

@kelson42
Copy link
Contributor

At the time of "removal" of namespaces, the "old" titleindex has been superseeded by the new "titleindex which is available as a normal item via X/listing/titleordered/v1.

We should considering removing the old pointer list at . This should not be used anymore because deprecated 2 years ago and use storage "for nothing".

The move is easy just put 00 in the header in place on the title pointer list pointer and stop writing the old version of this pointer list.

But before doing that we should better check wit third party ZIM readers to verify that they don't use anymore:

  • The old index
  • The header pointer to the index
@kelson42 kelson42 self-assigned this May 24, 2023
@kelson42 kelson42 added this to the 9.0.0 milestone Jun 2, 2023
@Jaifroid
Copy link

Given the discussions on the complexities of supporting X/lisiting/titleOrdered/v1, and in particular on the fallbacks that are necessary if we don't find data in that listing (see kiwix/kiwix-js#597 and kiwix/kiwix-js#708), I would just urge a very cautious approach towards deprecating the titlePtrList, and a relatively long period of testing given the range of scrapers involved (i.e. different ZIM varieties).

@kelson42
Copy link
Contributor Author

kelson42 commented Jul 29, 2023

@Jaifroid I don't understand your comment: do we have any problem with the current ZIM spec and/or implementation on any part of our software portfolio?

@Jaifroid
Copy link

@kelson42 No, but this is a low-level potentially breaking change, and because we still use a custom JS backend in Kiwix JS, I would like to be sure to have enough time to make any low-level adjustments if needed.

@kelson42
Copy link
Contributor Author

@Jaifroid It's not breaking anything regarding current ZIM specification... if implementation of reader is OK. You are IMO the only one in position to check if kiwix JS implementation is OK and such a ticket won't break anything there. We need to be cautious with FUD effect.

@kelson42
Copy link
Contributor Author

@Jaifroid Should I assume all good on your side and we could consider us ready for this issue version 10 of the libzim?

@Jaifroid
Copy link

@kelson42 Apologies, I'm on holiday so not responding as fast as normally. I can't say for sure that there would be no consequences without a ZIM to test. From memory, we use the urlPtrPos to find the listing for X/titleOrdered/v1, and fall back to X/titleOrdered/v0, then fall back to titlePtrPos in that order, so we should be OK. However, what I can say for sure is that if there are any issues, we can easily fix those with a test ZIM. So do go ahead.

@mgautierfr
Copy link
Collaborator

Will do this.

  • Removing will be done only on the writing part.
  • titlePtrPos in zim header will be filled with 0
  • Minor version will be updated. We don't need to bump major version as reader compatible with current major version should already read X/titleOrdered/v1 first and so should be compatible with new version.
  • Reading will still try to load titlePtrPos if it doesn't found v1 version. (for compatibility with older zim files)
  • Not being able to load neither X/titleOrdered/v1 nor v0/titlePtrPos will be considered as hard error (invalid zim file) as it should never happen.

Testing:

New zim file without titlePtrPos/v0 will be created and added to our test zim file (https://github.com/openzim/zim-testing-suite). Old zim files there will be keep.

@mgautierfr mgautierfr assigned mgautierfr and unassigned kelson42 Jan 16, 2025
@Jaifroid
Copy link

@mgautierfr Thanks, I'd be grateful if you could let me know when a test ZIM is available. The only part of the plan I need to check if I need to update support for is the minor version bump, but it would be an easy fix if an update is necessary in our backend.

@mgautierfr
Copy link
Collaborator

The only part of the plan I need to check if I need to update support for is the minor version bump, but it would be an easy fix if an update is necessary in our backend.

I don't know how your reader is implemented but, if implemented as major/minor versions are intended to be used, you have nothing to do.

  • Major version is a way to tell readers that if they don't know how to read version M they should not try (and stop as soon as possible)
  • Minor version is a way to tell readers that new features has been added but they can ignore that and continue as zim file is compatible.

Say otherwise, a reader implemented with format/feature of zim version M.N (major.minor) when opening a zim:

  • If major version of the zim is >M => Quit telling that it doesn't know how to read it.
  • If minor version is >N => Continue as usual
  • If minor version <N or major version <M => May (depending of implementation) detect that some feature will not be available in zim file and deactivate it. May also tell that zim file is too old and quit.

@Jaifroid
Copy link

@mgautierfr Thanks! I think it works like that. I did code that part, so can unpick any issues if I made wrong assumptions. I think there won't be a problem as I remember coding it to start with the X/... indices, falling back from one to the other, and only fall back to the pointer list if nothing else found. Since there should be no ZIM with missing X/ indices and a missing pointer list, I shouldn't have to handle that situation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants