`/manage_catalogView` takes ages to open in large sites #155

sauzher · 2024-09-26T10:18:16Z

BUG/PROBLEM REPORT and Proposal of Improvement/Solution

What I did:

Populate a Plone (portal_catalog) with tens of thousands objects (15.000 is enough to experiment the issue)

What I expect to happen:

In the Zcatalog 2.13 the manage_catalogView opens almost immediately regardless of cardinality of the catalog.

I expect the same behavior today.

What actually happened:

The dtml page takes too much time to load. In a production enviroment it could easly lead to a proxy error.

What version of Python and Zope/Addons I am using:

Zope => 4, Plone >= 5.2

Some Insight

The line of code that takes 99% of the time to be executed is:

https://github.com/zopefoundation/DocumentTemplate/blob/9496ce63234a849961ef7bcd85890632bb14bf3d/src/DocumentTemplate/DT_Let.py#L85

in 2.13 the document tempate procedure to expand records (render_blocks method) was written in C, here

From 3.x onwards, the render_blocks method is written in pure python

Proposal of Improvement

Simply do not render the results Table if no path is specified.

In that way, the manager has still the opportunity

to read how many records are in the catalog
to specify a path to start a search with. Giving '/' will search the full catalog taking the time it needs. But most of the time the manager wants to access to a specific folder, or even a specific brain, to fire an update or a remove operation.

alessandro (from the Salamina PloneSprint 2024)

The text was updated successfully, but these errors were encountered:

sauzher · 2024-09-26T14:17:29Z

the idea:

d-maurer · 2024-09-26T14:46:43Z

in 2.13 the document tempate procedure to expand records (render_blocks method) was written in C, here

From 3.x onwards, the render_blocks method is written in pure python

I am almost sure that this is not the primary problem problem cause.

d-maurer · 2024-09-26T15:14:24Z

Simply do not render the results Table if no path is specified.

In that way, the manager has still the opportunity

1. to read how  many records are in the catalog

2. to specify a path to start a search with. Giving '/' will search the full catalog taking the time it needs. But most of the time the manager wants to access to a specific folder, or even a specific brain, to fire an update or a remove operation.

I suggest you use profiling to find out in more detail which operations are responsible for the huge time (one possibility would be to use dm.zope.profile):
if no path is specified, the list is determined by a searchAll: this operation should be fast, even for huge catalogs. The remaining view uses batching (with size 20) and should get the initial pages independent of the catalog size; time should increase linearly with the accessed page number.
If you specify a path, then the path index is used. This is a quite costly index - but not so much for path with few components.

Thus, in principle, the current catalogView should behave decently even for huge catalogs. There is no need to change the principle of its operation (e.g. handle cases with and without path specially). If you observe huge times, some details gets wrong.

davisagli · 2024-09-26T15:20:44Z

I have also noticed this problem. I agree with @d-maurer that in principle the operation should be lazy and therefore fast, so it would be good if someone can investigate why that is not the case.

However I also agree that it would make sense to not do any query when the view is first loaded without any parameters. It's unlikely the user is looking for an item that appears in the first batch, so showing results at this point doesn't add much value.

d-maurer · 2024-09-27T06:24:19Z

David Glick wrote at 2024-9-26 08:21 -0700:

I have also noticed this problem. I agree with @d-maurer that in principle the operation should be lazy and therefore fast, so it would be good if someone can investigate why that is not the case.

I have used `dm.zope.profile` to profile `manage_catalogView` (Plone 5.2, Python 3.10, Products.ZCatalog 5.4): Catalog size Time Lazy.getitem (#/Time) 40 0.131 n/a 1776 0.246 5331/0.118 2641 0.344 7926/0.223 We can conclude from this that the batching does not work: `manage_catalogView` uses `dtml-in` for its batching; it uses 3 `dtml-in`: to access the content, for `previous` and `next`. Apparently, each of them calls `Lazy.getitem` for each element in the list.

sauzher · 2024-09-27T08:05:28Z

@d-maurer

I am almost sure that this is not the primary problem problem cause.

you're absolutely right: the problem is not there. Further investigetion shown me that the lazymap is interated at least 4 times in the process of populating dtml-in batch. So, with this commit @hannosch choose to unpack it one time only as soon as possible and let iterating over the List type.

The batch does not work as expected and it should be fixed, but as @davisagli pointed out:

[...] It's unlikely the user is looking for an item that appears in the first batch, so showing results at this point doesn't add much value.

So, is the workaround I proposed still valuable? I'm preparing a PR even for 5.x branch. It's almost harmless.

…unnecessary loading time (proxy error) #155

drfho · 2024-09-27T11:17:35Z

@d-maurer, @sauzher thank you very for your interesting findings. Actually we saw this slow-down, too.
Just for inspiration:
The fix 783f0be may help quickly but as long Zope allows inserting a ZCatalog object without a path-attribute this patch may end up in invisibility of the catalog items.

[...] It's unlikely the user is looking for an item that appears in the first batch, so showing results at this point doesn't add much value.

IMHO the initial page has a high value because you get a quick overwiew of the quality of the indexed data. So, it would be great to avoid the the formly introduced list conversion and maintain this first glance to the data.

d-maurer · 2024-09-28T09:07:32Z

***@***.*** wrote at 2024-9-27 08:24 +0200:

... I have used `dm.zope.profile` to profile `manage_catalogView` (Plone 5.2, Python 3.10, Products.ZCatalog 5.4): Catalog size Time Lazy.getitem (#/Time) ... 2641 0.344 7926/0.223

We have redone the profiling with zopefoundation/DocumentTemplate#76 Catalog size Time Lazy.getitem (#/Time) 2641 0.102 n/a Almost surely, the PR above fixes the problem.

d-maurer · 2024-10-01T10:20:46Z

I suggest the following change:

ZCatalog gets a new method catalogued_objects(self, min=None, max=None) returning self._catalog.uids.items(min, max), i.e. a lazy sequence of catalogued (uid, rid) pairs with min <= uid <= max (for min/max not None).
manage_CatalogView drops the Type information and uses the new catalogued_objects to present the object selection: it displays the uid and uses the rid to generate the link to the catalog details for the object
to be fully efficient, support lazy batching again, support general iterators DocumentTemplate#76 needs to be used.

davisagli · 2024-10-02T03:56:23Z

@d-maurer I guess you are proposing to only make this change for when filterpath is empty? The suggestion makes sense to me, but I don't particularly like the name catalogued_objects since it only returns record ids and not any full result object. My suggestion would be:

def get_uids(self):
    return self._catalog.uids

and then use self.get_uids().items(min, max)

d-maurer · 2024-10-02T05:11:49Z

David Glick wrote at 2024-10-1 20:56 -0700:

@d-maurer I guess you are proposing to only make this change for when filterpath is empty?

No. I propose this change to be used always and implement the "filterpath" functionality with the `min` parameter of `catalogued_objects`. Likely `"catalogued_objects(min=REQUEST.get('filterpath', '')"` can be used. Remember: the time you have observed with the `DocumentTemplate` PR will not change significantly when you use a filter (unless this filter is very specific).

The suggestion makes sense to me, but I don't particularly like the name `catalogued_objects` since it only returns record ids and not any full result object. My suggestion would be: ``` def get_uids(self): return self._catalog.uids ```

I do not feel strongly about the name `catalogued_objects` and I do not plan to implement my suggestion; if someone implements it, he/she can fill in the details. I do not think we should expose the mutable `uids` however, but an immutable `uids.items(min, max)`. And I believe we should not have separate implementations for the "filtered" and "not filtered" cases but implement the "filtered" case with the `min` parameter (and otherwise use the same view logic).

davisagli · 2024-10-02T06:53:04Z

@d-maurer Okay, that all makes sense

d-maurer mentioned this issue Sep 27, 2024

dtml-in batching accesses the complete list zopefoundation/DocumentTemplate#75

Closed

sauzher added a commit that referenced this issue Sep 27, 2024

this renders the results table only if a path is specified, avoiding …

783f0be

…unnecessary loading time (proxy error) #155

sauzher mentioned this issue Oct 1, 2024

support lazy batching again, support general iterators zopefoundation/DocumentTemplate#76

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

`/manage_catalogView` takes ages to open in large sites #155

`/manage_catalogView` takes ages to open in large sites #155

sauzher commented Sep 26, 2024

sauzher commented Sep 26, 2024

d-maurer commented Sep 26, 2024

d-maurer commented Sep 26, 2024

davisagli commented Sep 26, 2024

d-maurer commented Sep 27, 2024 via email

sauzher commented Sep 27, 2024

drfho commented Sep 27, 2024 •

edited

Loading

d-maurer commented Sep 28, 2024 via email

d-maurer commented Oct 1, 2024

davisagli commented Oct 2, 2024

d-maurer commented Oct 2, 2024 via email

davisagli commented Oct 2, 2024

/manage_catalogView takes ages to open in large sites #155

/manage_catalogView takes ages to open in large sites #155

Comments

sauzher commented Sep 26, 2024

BUG/PROBLEM REPORT and Proposal of Improvement/Solution

What I did:

What I expect to happen:

What actually happened:

What version of Python and Zope/Addons I am using:

Some Insight

Proposal of Improvement

sauzher commented Sep 26, 2024

d-maurer commented Sep 26, 2024

d-maurer commented Sep 26, 2024

davisagli commented Sep 26, 2024

d-maurer commented Sep 27, 2024 via email

sauzher commented Sep 27, 2024

drfho commented Sep 27, 2024 • edited Loading

d-maurer commented Sep 28, 2024 via email

d-maurer commented Oct 1, 2024

davisagli commented Oct 2, 2024

d-maurer commented Oct 2, 2024 via email

davisagli commented Oct 2, 2024

`/manage_catalogView` takes ages to open in large sites #155

`/manage_catalogView` takes ages to open in large sites #155

drfho commented Sep 27, 2024 •

edited

Loading