-
Notifications
You must be signed in to change notification settings - Fork 2.2k
graph/db: use /*SLICE:<field_name>*/
to optimise various graph queries
#10081
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @ellemouton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request introduces significant optimizations to graph database queries by leveraging SQL's IN
clause with a custom /*SLICE:<field_name>*/
directive. To manage potentially large parameter lists, a new ExecutePagedQuery
helper has been implemented, allowing these batch queries to be processed in a paginated fashion. This enhancement aims to reduce the number of database calls and improve overall query performance, especially for operations involving multiple graph entities.
Highlights
- Database Query Optimization: Implemented the
/*SLICE:<field_name>*/
directive to allow for efficientWHERE x IN <list>
SQL queries, significantly reducing the number of database calls by batching multiple lookups into a single query. - Paginated Query Execution: Introduced a new
ExecutePagedQuery
helper function that enables queries with large lists of parameters to be executed in paginated batches. This prevents database parameter limits from being hit and ensures robust handling of extensive data sets. - Refactored Graph Store Queries: Applied the new paginated query mechanism to key methods within the
SQLStore
, specificallyFetchChanInfos
andPruneGraph
, to improve their performance and efficiency when processing multiple channel IDs or outpoints. - SQLC Tooling Enhancement: Added a post-generation script to
scripts/gen_sqlc_docker.sh
to correctly handlesqlc.slice()
directives for Postgres. This ensures proper parameter substitution in the generated Go code, addressing a limitation insqlc
's default behavior forIN
clauses. - Test Suite Improvements: Updated various graph tests to use
require.NoError
,require.Len
, andrequire.Empty
fromtestify/require
for more concise and robust error and assertion handling, improving test readability and maintainability.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command>
or @gemini-code-assist <command>
. Below is a summary of the supported commands.
Feature | Command | Description |
---|---|---|
Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/
folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a generic, paginated query helper to optimize database queries. This is a great performance improvement that reduces the number of database calls by using IN
clauses, while also preventing issues with an unbound number of parameters. The implementation is solid, with good test coverage for the new functionality.
result := "" | ||
for i := diff + 1; i <= numTotalArgs; i++ { | ||
if i == numTotalArgs { | ||
result += fmt.Sprintf("$%d", i) | ||
|
||
continue | ||
} | ||
result += fmt.Sprintf("$%d,", i) | ||
} | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current implementation of makeQueryParams
uses string concatenation inside a loop, which can be inefficient in Go as it leads to multiple memory allocations. For better performance, especially given that this function can be called with a page size of 100 or more, I recommend using strings.Builder
. This will pre-allocate a buffer and avoid re-allocations in the loop.
if numListArgs == 0 {
return ""
}
var b strings.Builder
// Pre-allocate a rough estimation of the buffer size to avoid
// re-allocations. A parameter like $1000, takes 6 bytes.
b.Grow(numListArgs * 6)
diff := numTotalArgs - numListArgs
for i := 0; i < numListArgs; i++ {
if i > 0 {
b.WriteString(",")
}
fmt.Fprintf(&b, "$%d", i+diff+1)
}
return b.String()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks like this is indeed way more performant.
Here are the unit and benchmark tests I added for this (to validate Gemini's claim):
// BenchmarkMakeQueryParams benchmarks the makeQueryParams function for
// various argument sizes. This helps to ensure the function performs
// efficiently when generating SQL query parameter strings for different
// input sizes.
func BenchmarkMakeQueryParams(b *testing.B) {
cases := []struct {
totalArgs int
listArgs int
}{
{totalArgs: 5, listArgs: 2},
{totalArgs: 10, listArgs: 3},
{totalArgs: 50, listArgs: 10},
{totalArgs: 100, listArgs: 20},
}
for _, c := range cases {
name := fmt.Sprintf("totalArgs=%d/listArgs=%d", c.totalArgs,
c.listArgs)
b.Run(name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = makeQueryParams(c.totalArgs, c.listArgs)
}
})
}
}
// TestMakeQueryParams tests the makeQueryParams function for various
// argument sizes and verifies the output matches the expected SQL
// parameter string. The function is assumed to generate a comma-separated
// list of parameters in the form $N for use in SQL queries.
func TestMakeQueryParams(t *testing.T) {
testCases := []struct {
totalArgs int
listArgs int
expected string
}{
{totalArgs: 5, listArgs: 2, expected: "$4,$5"},
{totalArgs: 10, listArgs: 3, expected: "$8,$9,$10"},
{totalArgs: 1, listArgs: 1, expected: "$1"},
{totalArgs: 3, listArgs: 0, expected: ""},
{totalArgs: 4, listArgs: 4, expected: "$1,$2,$3,$4"},
}
for _, tc := range testCases {
result := makeQueryParams(tc.totalArgs, tc.listArgs)
require.Equal(
t, tc.expected, result,
"unexpected result for totalArgs=%d, listArgs=%d",
tc.totalArgs, tc.listArgs,
)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice! cool, updated. It did mean needing to change the replacement script a little so that we can handle the error that now needs to be returned from the helper function 👍
a3f3f2e
to
9ea2ec6
Compare
This copies the workaround introduced in the taproot-assets code base and will allow us to use `WHERE x in <list>` type queries.
59ccaf4
to
2661ab2
Compare
Along with a test for it. This helper will allow us to easily create a pagination wrapper for queries that will make use of the new /*SLICE:<field_name>*/ directive. The next commit will add a test showing this.
Here, a new query (GetChannelsByOutpoints) is added which makes use of the /*SLICE:outpoints*/ directive & added workaround. This is then used in a test to demonstrate how the ExecutePagedQuery helper can be used to wrap a query like this such that calls are done in pages. The query that has been added will also be used by live code paths in an upcoming commit.
b89b18b
to
5d9bf9b
Compare
/gemini review |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces the use of the /*SLICE:<field_name>*/
directive to optimize graph queries by enabling the use of WHERE x IN <list of x type>
SQL queries. It also adds an ExecutePagedQuery
helper to handle these queries in a paginated fashion. The code changes include modifications to the graph database schema, query functions, and test cases to support this new feature.
This commit adds a new GetChannelsByOutpoints query which takes a slice of outpoint strings. This lets us then update PruneGraph to use paginated calls to GetChannelsByOutpoints instead of making one DB call per outpoint.
Remove a TODO by making use of the new sqldb.ExecutePagedQuery to fetch channels in batches rather than one by one.
5d9bf9b
to
bf32adb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! I assume all of these changed methods have sufficient unit test coverage already?
result := "" | ||
for i := diff + 1; i <= numTotalArgs; i++ { | ||
if i == numTotalArgs { | ||
result += fmt.Sprintf("$%d", i) | ||
|
||
continue | ||
} | ||
result += fmt.Sprintf("$%d,", i) | ||
} | ||
return result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, looks like this is indeed way more performant.
Here are the unit and benchmark tests I added for this (to validate Gemini's claim):
// BenchmarkMakeQueryParams benchmarks the makeQueryParams function for
// various argument sizes. This helps to ensure the function performs
// efficiently when generating SQL query parameter strings for different
// input sizes.
func BenchmarkMakeQueryParams(b *testing.B) {
cases := []struct {
totalArgs int
listArgs int
}{
{totalArgs: 5, listArgs: 2},
{totalArgs: 10, listArgs: 3},
{totalArgs: 50, listArgs: 10},
{totalArgs: 100, listArgs: 20},
}
for _, c := range cases {
name := fmt.Sprintf("totalArgs=%d/listArgs=%d", c.totalArgs,
c.listArgs)
b.Run(name, func(b *testing.B) {
for i := 0; i < b.N; i++ {
_ = makeQueryParams(c.totalArgs, c.listArgs)
}
})
}
}
// TestMakeQueryParams tests the makeQueryParams function for various
// argument sizes and verifies the output matches the expected SQL
// parameter string. The function is assumed to generate a comma-separated
// list of parameters in the form $N for use in SQL queries.
func TestMakeQueryParams(t *testing.T) {
testCases := []struct {
totalArgs int
listArgs int
expected string
}{
{totalArgs: 5, listArgs: 2, expected: "$4,$5"},
{totalArgs: 10, listArgs: 3, expected: "$8,$9,$10"},
{totalArgs: 1, listArgs: 1, expected: "$1"},
{totalArgs: 3, listArgs: 0, expected: ""},
{totalArgs: 4, listArgs: 4, expected: "$1,$2,$3,$4"},
}
for _, tc := range testCases {
result := makeQueryParams(tc.totalArgs, tc.listArgs)
require.Equal(
t, tc.expected, result,
"unexpected result for totalArgs=%d, listArgs=%d",
tc.totalArgs, tc.listArgs,
)
}
}
@@ -7,6 +7,11 @@ import ( | |||
"testing" | |||
) | |||
|
|||
// isSQLite is false if the build tag is set to test_db_postgres. It is used in | |||
// tests that compile for both SQLite and Postgres databases to determine | |||
// which database implementation is being used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add a TODO here that with sqldbv2
there will be a DatabaseType
on the BaseDB
struct that will allow us to query this at runtime.
@@ -2404,9 +2391,19 @@ func (s *SQLStore) PruneGraph(spentOutputs []*wire.OutPoint, | |||
} | |||
|
|||
closedChans = append(closedChans, info) | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we also need to do a list query for the DeleteChannel
call above? Probably not, as we don't expect thousands of channels to be closed by a single block worth of transactions...
return fmt.Errorf("unable to fetch channel: %w", | ||
err) | ||
} | ||
// Define the callback function for processing each channel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: missing punctuation in some comments here and below.
// ChannelUpdateInfo. This allows us to quickly delete channels that we | ||
// already know about. | ||
for _, chanInfo := range chansInfo { | ||
infoLookup[chanInfo.ShortChannelID.ToUint64()] = chanInfo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we also need to re-fill this map in the reset callback?
@@ -2446,7 +2496,7 @@ func (s *SQLStore) PruneGraph(spentOutputs []*wire.OutPoint, | |||
// NOTE: this fetches channels for all protocol versions. | |||
func (s *SQLStore) forEachChanInOutpoints(ctx context.Context, db SQLQueries, | |||
outpoints []*wire.OutPoint, cb func(ctx context.Context, | |||
row sqlc.GetChannelsByOutpointsRow) error) error { | |||
row sqlc.GetChannelsByOutpointsRow) error) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: there seems to be some back-and-forth in the formatting of these... I know GoLand sometimes does things it shouldn't which is quite annoying...
In this PR, the
/*SLICE:<field_name>*/
directive is made use of so that we can performWHERE x IN <list of x type>
SQL queries. This then allows us to perform fewer DB calls.We dont want to give our SQL lib an unbound number of items in the parameter though so this
PR also adds an
ExecutePagedQuery
helper that gives us the ability to call these new types ofSQL queries in a paginated fashion,.