Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize project activity storage #1220

Closed
wants to merge 5 commits into from

Conversation

urubens
Copy link
Contributor

@urubens urubens commented May 11, 2022

Currently the endpoint api/project.json with the parameter withLastActivity needs to make a join with a group by on the command_history table. On Cytomine instances with large data sets, with many history, this join becomes very very slow.

However, this request is very frequently used in the webUI to list the project by last activity (default behavior). This PR proposes a other solution, with an intermediary table to store the last project activity. It is much faster than before.

The drawback of this solution is that it uses a SQL trigger on command_history, which probably has a negative effect on all add/edit/delete commands, but is a not been measured.

In all cases, we need to find a solution to solve this performance issue, because current implementation makes the webUI very unresponsive and unuasable.

A possible alternative maybe to study could be the usage of a PostgreSQL view ? I don't know how this is managed internally and if it's more efficient than a trigger.

@geektortoise geektortoise requested a review from loic911 May 11, 2022 12:43
@geektortoise
Copy link
Contributor

@loic911 : I didn't collect this domain because I didn't like the "postgresql cache of mongo data" solution... and as I didn't have the perf problem on our server, it was not a priority to me.

I ping you because it can be an addition to your current reflexion about command, triggers and optimization

@urubens
Copy link
Contributor Author

urubens commented May 11, 2022

This is not a cache of MongoDB data. The commands & command_history table are stored in PostgreSQL database.

@geektortoise
Copy link
Contributor

geektortoise commented May 11, 2022

Yes ! Sorry, I mixed the concepts.

@urubens
Copy link
Contributor Author

urubens commented May 11, 2022

But that's true that command history could be a good candidate to be stored in a nosql database (but I don't have the whole command system in mind, so maybe it is not the case). However, it then would raise the question on how to efficiently join data from SQL DB and data from noSQL DB :/

@loic911
Copy link
Contributor

loic911 commented May 11, 2022

However, this request is very frequently used in the webUI to list the project by last activity (default behavior). This PR proposes a other solution, with an intermediary table to store the last project activity. It is much faster than before.

It seems to be the best solutions but This will indeed probably slow down add/edit/delete.

I see a potential probblem here (to be confirmed/test):
The transaction does not lock the last_activity row.
This means that if 2 requests are run in // on the same project:

  • You may have two INSERT for a single project
  • You may have issues with UPDATE because the two transactions will do the update at the same time.

A possible way to test this is to run the annotation benchmark written by Ba Thien as it insert lots of annotations on multiple threads.

A possible alternative maybe to study could be the usage of a PostgreSQL view ? I don't know how this is managed internally and if it's more efficient than a trigger.

I don't think so, I think a view is only a way to "encapsulate" a request. So you will simply replace this
from += "LEFT OUTER JOIN " + "( SELECT project_id, MAX(created) max_date " + " FROM command_history " + " GROUP BY project_id " + ") activities ON p.id = activities.project_id

By something like that:
from += "LEFT OUTER JOIN last_activity_view ON p.id = activities.project_id
Not a performance improvement but a "readability/less redundancy" improvement.

Another possibility is to avoid the use of triggers and to keep in memory a structure that maps project and the last modification date and to sync it frequently (let says every min) in the database.
BTW that's the only solutions I see to remove trigger for (user/algo/reviewed)annotations count.

@waliens
Copy link

waliens commented Jun 29, 2023

Needs to be migrated to spring (or already has been, needs to be checked)

@waliens waliens closed this Jun 29, 2023
@bathienle bathienle deleted the feat-project-activity branch July 6, 2023 09:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants