-
-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Backend: implement PopularPages KPI #33
Comments
@kelson42 @rgaudin @Popolechien (sorry discussion is a bit technical but it also has "business" sides) Do you have any input about what should be considered as an Page in this indicator? We want to get the number of visits per page (package + page names indeed) but only "real" pages should be considered and not all "dummy" assets needed to display the page (images, CSS, JS, ...). The situation is that we base our My first instinct was to only consider URLs whose content-type contains either "html" or "epub" or "pdf", but in fact we also have videos and audios and maybe other kind of stuff. And one single object of interest might even need multiple html request to load the whole content (e.g. with iFrames, ...). Maybe we should consider only "html" stuff (instead of "html" + "epub" + ...) since otherwise we get duplicated stuff on a lot of occasions (e.g. once for the "html" page holding the video and once for the video itself). But this is not true when the ZIM contains an application (e.g. freecodecamp, sooner or later Kolibri ...) which has only one page and loads assets dynamically (we will have only one "html" page per client and it will be a generic name). I don't know how we could detect that stuff properly (or at least for 80% of the cases). Probably what we want is to have only (and all) ZIM items marked as |
Given the limited progress in our reflections here, I dig a bit and opened kiwix/libkiwix#1026 |
The more we dive into this issue and into kiwix/libkiwix#1026, the more doubts I have about the pertinence this KPI. If we want to measure this KPI correctly, it has many impacts in terms of software (see kiwix/libkiwix#1026 to just cover the ZIM case, not all other apps). This KPI also induces a large impact in terms of storage on the offspot (I guess it might account for at least 80% of the DB size) + significant impact on live performance of the metrics subsystem (many logs will need to be analyzed in fine details to confirm it is a page or an asset). Finally, I heard in some of our discussions that this might not cover the real need which is more around specific pages / assets (APKs, ...) for which we need details but might never make it to the top 50. So I ask the question pretty boldly: are we really sure we want to include this KPI in metrics v1? |
@benoit74 Just to be clear, the question is about whether we monitor individual article metrics. This does not impact the measurement at the zim level, correct? |
Yes, measurement at ZIM level is done (with two KPIs so far, usage in minutes and number of visits) |
Ok so let's park this one for v2 or until we think this through a little better. |
This issue has been automatically marked as stale because it has not had recent activity. It will be now be reviewed manually. Thank you for your contributions. |
Implement backend logic to create PopularPages KPI:
Nota: this is in fact already mostly ready on
main
branch but needs to be revisited following recent discussions and name change.Important to discuss / adjust: how do we make the distinction between a real object and an asset? Currently we suppose that everything with an html/epub/pdf content-type is a real object (and hence tracked) and everything else is an asset (and hence ignored).
The text was updated successfully, but these errors were encountered: