-
-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making live-queries (ie. subscription-lds) more efficient/scalable #725
Comments
Desired optimizations: 1) Cache subscription-query results, and apply row-changes to its results directly (ie. mutating the cache)Rather than re-running the full query for a collection each time a row changes, instead read the change-data within the replication-stream entry, find the subscription-queries that are affected by the change, and directly apply those changes in memory. Of course, this is not possible for every query (eg. if the query depends on computed-values from functions), but it is possible for a large portion of queries, and -- at least in my project -- that subset is worth the effort to support. (in fact, for my current codebase, 100% of the live-queries should be supportable in this way, due to the much less expressive querying that Firestore supports) Some posts/repos that are relevant:
Granted, this caching system is expected to add complications with regard to permissions; you have to make sure that the cache is invalidated when the user's permissions change, otherwise it may result in unauthorized data leakage/persistence. The InvaliDB papers discuss this issue, and the solution it uses. (two of the papers: short version, long version) The permission issue is likely an area where the solution settled on will not work universally (due to the flexibility of the Postgres RLS system). However, if the solution found can handle the use-cases of 80+% of projects/queries (for example, by declaring certain rules the developer has to follow for reliable usage), it can still be useful. Admittedly, this non-universality will likely keep the advanced live-query system in the domain of a third-party addition, rather than something in the core; this thread is still relevant, however, in that it can guide discussion on what "primitives" are necessary within the subscription-lds and lds packages in order to support a third-party solution. 2) Batch equivalent subscriptions from multiple usersGiven the above caching system, it's expected that there will be many concurrent queries amassed, for systems with many users. This implies significant overhead, as each user (and for each subscribed collection) will require their own "query match" check, when a row is changed. To help reduce this query-matching overhead, it is important to be able to merge equivalent subscriptions wherever possible. For example, if there are 1000 users and each subscribes to a shared list of events for the month, we want only one subscription made instead of 1000. Again, this functionality requires that the permission system be constrained enough that the system can meaningfully batch subscriptions without breaking security rules. This means that the solution will not work for every project (at least not without certain restrictions or hints being added). While this makes it problematic to include within Postgraphile proper, the optimization can be very beneficial for projects (or portions of them) where maximum performance is more important than RLS's full flexibility, and the developers are willing to use a third-party solution for it. 3) Add partitioning system, so the query-matching itself doesn't overload the live-query serversAs a project scales, the number of concurrent users will make it infeasible for a single server to manage the subscriptions for every client (assuming that they access different parts of the database, and/or that the user count is so high that merely pushing out the deltas of shared tables to everyone is too much). Thus, it will be necessary to support having multiple live-query servers running, each handling a subset of the subscriptions. That's not enough though; you have to also partition the notifications of changes to each live-query server, so that the query-matching itself does not create a bottleneck. The fully horizontally-scalable solution (with multiple servers, partitioned by both query and object-id) is briefly described and visualized here. 4) When a subscription-query's result changes, send only the delta to the client, rather than the whole listThis may require holding a buffer of recent changes, to reliably send it to all of the subscribed clients. This would have some overhead; but, one way or another, Firestore and InvaliDB have proven that the delta-sending approach is workable. (I've confirmed with chrome's dev-tools that Firestore only sends the deltas, and I know the same is true for InvaliDB.) It also requires a change on the client side, to be able to receive the "delta" and apply it to its own in-browser cache. (it's possible there exists an Apollo library for this sort of thing that could be utilized) That said, this optimization still seems to be the easiest of the four to implement, as it doesn't require special permission checks, nor does it require communication between the database server and "live query" servers. It's also arguably the most relevant for early usage, as you're going to see immediate reductions in network bandwidth/costs (eg. by 99%, for single-row changes to a 100 item collection that has live-query subscriptions), versus performance limits you'd merely hit later down the road (relating to query-matching and such within your servers, as optimized by entries 1-3). All in all, there's quite a lot of work required to get the performance of Postgraphile's live-queries on par with that of Firestore or InvaliDB. Realistically, I will probably give up before I implement all four. 😅 That said, even if only one or two are managed to be implemented, they should still provide some meaningful performance benefits -- perhaps enough to make live-queries more generally usable. Anyway, if anyone's interested in taking on one or more or these tasks, let me know and I'll focus on the others. In the meantime, I'll look into what steps are required for implementing optimization 4. (since it seems the most straightforward) |
It's been clear to me for a while that the main issue with live queries as they currently stand is that they send the entire response rather than a delta. This prevents a tonne of optimisations, for example we could easily say "user 21's username is now 'Flibble'" but if that was a subset of a larger query then we can't send the rest of the data without recalculating it for the various reasons you outline (and doing so may "roll back" changes that the client's cache has already received due to mutations they've performed since the live query started). This is also clear to others working on live queries in the GraphQL ecosystem, and there's even beginning to be discussions around using JSON Patch for this. I very much encourage you to engage with the other people working on this problem in the GraphQL ecosystem so that we end up with a standard live queries solution rather than a vendor-specific one; I build live queries in the way that I did for PostGraphile because it maximized compatibility - all GraphQL clients that support GraphQL subscriptions automatically support PostGraphile live queries - but the trade-offs are heavy. This is great stuff; I should warn you that what I'm working on for V5 may invalidate work that you might take on attempting to fix optimizations 1-3 in V4; but optimization 4 will be beneficial for both (and for the GraphQL ecosystem in general!) so I definitely encourage you to work on that. |
Ah yes, good point; I recall reading some discussions in the GraphQL repos relating to live queries and JSON patches. I'll look for those discussions again, and link the relevant ones below. (will update this post as I find them) Relevant discussions:
Relevant repos: I'll also keep in mind your note about there being changes in V5 that may affect optimizations 1-3. |
I'm starting work on trying to integrate the graphql-live-query-patch library (and its related packages) with Postgraphile. The library is designed to make live-query result-streaming more efficient by only sending json patches of the "deltas" between the result set, each time it changes. (so optimization no. 4 above) The integration instructions, on the server side, say to wrap/run a function on the results of the graphql execute function. However, I do not know how to do this for Postgraphile. I did a search in the postgraphile repo for uses of the graphql execute function: https://github.com/graphile/postgraphile/search?q=execute This search result, on line 87 of liveSubscribe.ts, seems like it may be one of the places where the call/result needs to be wrapped. However, I do not know what the recommended approach is for adding that wrapping/intercepting. Should I:
No. 3 seems ideal to me, but I don't currently know of any "hooks" in Postgraphile that do what I need. (wrapping/intercepting the result of the graphql execute function) |
Please add a PR that adds just a new pluginHook call wherever you need it; see the pluginHooks.ts file for existing hooks. (You may want to experiment via a fork before doing this, but having your project run via a plugin means you won’t require my authorisation to get changes merged which means you can probably advance it faster/further and release new major versions over time that don’t have to sync with PostGraphile’s majors.) |
Taking a closer look, it seems like the least-intrusive way to integrate the functionality needed would be to let the user supply a custom value for the I noticed that there are two different websocket-server options:
Anyway, it looks like letting the user supply a custom Does this integration approach sound fine? If so, I'll put together a pull-request with a new option for it. (and then I can adjust the option name and transferring code, if needed, based on your feedback/review) |
I've confirmed that adding that single custom field to the postgraphile options (to change the Here is a screenshot of the network data for my new test app: The entry second from the bottom is a regular "full result set" supplied by postgraphile. The entry at the very bottom is a "json patch" result-set, describing only the delta betwen the initial "full result set", and the latest results. I'll now work on integrating the graphql-live-query-patch library on the client-side. |
It works! I've got json-patches being generated on the server, and applied on the client, for the updated result-sets of live-queries (as triggered automatically by the subscription-lds plugin). Here is a screenshot of the initial result-set, transferred over websockets, followed by three "json patch" updates (which were applied successfully): I tried to make use of existing libraries where possible, relying on graphql-live-query-patch for the patch generation and application logic, and not changing any of the default transport protocols used by Apollo and Postgraphile. The only change needed in the Postgraphile codebase is:
And the only thing the user will need to do (once I create the helper package) is:
I'll add a comment here once I've created the helper package, and link to the instructions on how to use it. |
Very cool; instead of adding a subscribe option, could you add a generic pluginHook which can supply/augment it instead? |
Working on it. I think I'm almost there; the only issue is that the postgraphile How should I make the |
I've finished updating the code to use a new It's working fine, however I'm currently using a hack to make the I'll wait until @benjie recommends a transfer approach before proceeding, as there are a few approaches with different side-effects. Example approaches:
No. 3 is currently my preferred, though it seems like there should still be a better way. For reference, here is the current delta between my branch and upstream: graphile/crystal@main...Venryx:main |
I've made a Github repo for the Postgraphile and Apollo plugins: https://github.com/pg-lq/postgraphile-live-query The two plugins are published as the following packages (see above repo for more info): And here is a small "todo app" demo repo that uses the plugins above: https://github.com/Venryx/postgraphile-live-query-demos The main difficulty of installation atm is that the new postgraphile hook the plugin relies on has not been merged into postgraphile's upstream yet. Once benjie has time to look the changes over, I'll put together a pull-request, so the plugin becomes easy to use for anyone without messing with postgraphile forks. |
It seems like |
I've done the refactoring in graphile/crystal#1482 to help your PR stay small; so you just need to add the relevant plugin hooks. |
Awesome, thanks; I will do that soon. (in a day or two most likely) |
Basic implementation here: graphile/crystal#1483 I did rename the hook because it's not strictly |
Sounds good. After renaming the hook in my plugin's code, it looks like it'll be ready to go in future Postgraphile versions. :) |
Feature description
The live-query feature is impressive, but it appears to be lacking a number of optimizations needed for high scalability.
I know that most functionality that's achieved through live-queries can be approximated through use of the more efficient TRIGGER/NOTIFY subscriptions; however, the live-query approach is much more "elegant" in my view, and it requires less "wiring" to set up each time.
Because live-queries are not a high-priority feature for most developers, I expect that I will need to develop most of the optimizations myself. This will probably be a lot of work, but I'd like to give a try at it, as if I manage it, I can then benefit from the optimizations in all my future projects. (as well as the ~half dozen projects I currently have on Firestore that I plan to migrate)
This thread is my attempt to list those potential optimizations, discuss them, and eventually attempt to develop support for them. (either by enhancing the existing subscription-lds plugin, or creating a new plugin that sits on top of it)
Motivating example
What motivated this feature request for me, is that I am attempting to transition a project from Google Firestore to Postgraphile + Postgres, and its codebase is heavily reliant on the live-query functionality that Firestore supports.
I'm not in a rush to complete the transition (the current site works fine for now), but I would like to gradually extend Postgraphile until its live-query support can scale to a level comparable to Firestore. (granted, it will likely be more expensive and require horizontal-scaling/dedicated-live-query-servers, but that's acceptable)
Like mentioned, most developers seem to get by with the simpler TRIGGER/NOTIFY subscriptions. However, I think if the live-query functionality were optimized enough that you could use it on larger production sites, a lot more developers would be interested in using it.
From what I understand, Firebase/Firestore and Baqend/InvaliDB are the only two database solutions which have thoroughly
scalable live-queries at the moment. And as Firestore and InvaliDB are both proprietary, this makes scalable live-query support in Postgraphile particularly desirable.
The Medium article above, written by Wolfram Wingerath (the developer of InvaliDB's live-query solution), explains what some of those necessary optimizations are. (beyond the Medium article, he's also written papers on the subject, which describe the challenges and solutions in greater depth)
Breaking changes
It depends on the implementation details. The optimizations will likely require some architectural extensions (eg. the concept of "partitioned live-query servers"), but most of these changes should be able to be implemented in external packages, rather than the subscription-lds and lds packages themselves. (For now, I'll just maintain an independent fork with the changes needed.)
Supporting development
I [tick all that apply]:
My next post will attempt to describe what some of the desired optimizations are.
The text was updated successfully, but these errors were encountered: