Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

/manage_catalogView takes ages to open in large sites #155

Open
sauzher opened this issue Sep 26, 2024 · 12 comments
Open

/manage_catalogView takes ages to open in large sites #155

sauzher opened this issue Sep 26, 2024 · 12 comments

Comments

@sauzher
Copy link

sauzher commented Sep 26, 2024

BUG/PROBLEM REPORT and Proposal of Improvement/Solution

What I did:

Populate a Plone (portal_catalog) with tens of thousands objects (15.000 is enough to experiment the issue)

What I expect to happen:

In the Zcatalog 2.13 the manage_catalogView opens almost immediately regardless of cardinality of the catalog.

I expect the same behavior today.

What actually happened:

The dtml page takes too much time to load. In a production enviroment it could easly lead to a proxy error.

What version of Python and Zope/Addons I am using:

Zope => 4, Plone >= 5.2

Some Insight

The line of code that takes 99% of the time to be executed is:

https://github.com/zopefoundation/DocumentTemplate/blob/9496ce63234a849961ef7bcd85890632bb14bf3d/src/DocumentTemplate/DT_Let.py#L85

in 2.13 the document tempate procedure to expand records (render_blocks method) was written in C, here

From 3.x onwards, the render_blocks method is written in pure python

Proposal of Improvement

Simply do not render the results Table if no path is specified.

In that way, the manager has still the opportunity

  1. to read how many records are in the catalog
  2. to specify a path to start a search with. Giving '/' will search the full catalog taking the time it needs. But most of the time the manager wants to access to a specific folder, or even a specific brain, to fire an update or a remove operation.

alessandro (from the Salamina PloneSprint 2024)

@sauzher
Copy link
Author

sauzher commented Sep 26, 2024

the idea:

immagine

@d-maurer
Copy link
Contributor

in 2.13 the document tempate procedure to expand records (render_blocks method) was written in C, here

From 3.x onwards, the render_blocks method is written in pure python

I am almost sure that this is not the primary problem problem cause.

@d-maurer
Copy link
Contributor

Simply do not render the results Table if no path is specified.

In that way, the manager has still the opportunity

1. to read how  many records are in the catalog

2. to specify a path to start a search with. Giving '/' will search the full catalog taking the time it needs. But most of the time the manager wants to access to a specific folder, or even a specific brain, to fire an update or a remove operation.

I suggest you use profiling to find out in more detail which operations are responsible for the huge time (one possibility would be to use dm.zope.profile):
if no path is specified, the list is determined by a searchAll: this operation should be fast, even for huge catalogs. The remaining view uses batching (with size 20) and should get the initial pages independent of the catalog size; time should increase linearly with the accessed page number.
If you specify a path, then the path index is used. This is a quite costly index - but not so much for path with few components.

Thus, in principle, the current catalogView should behave decently even for huge catalogs. There is no need to change the principle of its operation (e.g. handle cases with and without path specially). If you observe huge times, some details gets wrong.

@davisagli
Copy link
Member

I have also noticed this problem. I agree with @d-maurer that in principle the operation should be lazy and therefore fast, so it would be good if someone can investigate why that is not the case.

However I also agree that it would make sense to not do any query when the view is first loaded without any parameters. It's unlikely the user is looking for an item that appears in the first batch, so showing results at this point doesn't add much value.

@d-maurer
Copy link
Contributor

d-maurer commented Sep 27, 2024 via email

@sauzher
Copy link
Author

sauzher commented Sep 27, 2024

@d-maurer

I am almost sure that this is not the primary problem problem cause.

you're absolutely right: the problem is not there. Further investigetion shown me that the lazymap is interated at least 4 times in the process of populating dtml-in batch. So, with this commit @hannosch choose to unpack it one time only as soon as possible and let iterating over the List type.

The batch does not work as expected and it should be fixed, but as @davisagli pointed out:

[...] It's unlikely the user is looking for an item that appears in the first batch, so showing results at this point doesn't add much value.

So, is the workaround I proposed still valuable? I'm preparing a PR even for 5.x branch. It's almost harmless.

sauzher added a commit that referenced this issue Sep 27, 2024
@drfho
Copy link
Contributor

drfho commented Sep 27, 2024

@d-maurer, @sauzher thank you very for your interesting findings. Actually we saw this slow-down, too.
Just for inspiration:
The fix 783f0be may help quickly but as long Zope allows inserting a ZCatalog object without a path-attribute this patch may end up in invisibility of the catalog items.

[...] It's unlikely the user is looking for an item that appears in the first batch, so showing results at this point doesn't add much value.

IMHO the initial page has a high value because you get a quick overwiew of the quality of the indexed data. So, it would be great to avoid the the formly introduced list conversion and maintain this first glance to the data.

@d-maurer
Copy link
Contributor

d-maurer commented Sep 28, 2024 via email

@d-maurer
Copy link
Contributor

d-maurer commented Oct 1, 2024

I suggest the following change:

  • ZCatalog gets a new method catalogued_objects(self, min=None, max=None) returning self._catalog.uids.items(min, max), i.e. a lazy sequence of catalogued (uid, rid) pairs with min <= uid <= max (for min/max not None).
  • manage_CatalogView drops the Type information and uses the new catalogued_objects to present the object selection: it displays the uid and uses the rid to generate the link to the catalog details for the object
  • to be fully efficient, support lazy batching again, support general iterators DocumentTemplate#76 needs to be used.

@davisagli
Copy link
Member

@d-maurer I guess you are proposing to only make this change for when filterpath is empty? The suggestion makes sense to me, but I don't particularly like the name catalogued_objects since it only returns record ids and not any full result object. My suggestion would be:

def get_uids(self):
    return self._catalog.uids

and then use self.get_uids().items(min, max)

@d-maurer
Copy link
Contributor

d-maurer commented Oct 2, 2024 via email

@davisagli
Copy link
Member

@d-maurer Okay, that all makes sense

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants