Skip to content

graph/db: use /*SLICE:<field_name>*/ to optimise various graph queries #10081

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 7 commits into
base: master
Choose a base branch
from

Conversation

ellemouton
Copy link
Collaborator

In this PR, the /*SLICE:<field_name>*/ directive is made use of so that we can perform
WHERE x IN <list of x type> SQL queries. This then allows us to perform fewer DB calls.

We dont want to give our SQL lib an unbound number of items in the parameter though so this
PR also adds an ExecutePagedQuery helper that gives us the ability to call these new types of
SQL queries in a paginated fashion,.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @ellemouton, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces significant optimizations to graph database queries by leveraging SQL's IN clause with a custom /*SLICE:<field_name>*/ directive. To manage potentially large parameter lists, a new ExecutePagedQuery helper has been implemented, allowing these batch queries to be processed in a paginated fashion. This enhancement aims to reduce the number of database calls and improve overall query performance, especially for operations involving multiple graph entities.

Highlights

  • Database Query Optimization: Implemented the /*SLICE:<field_name>*/ directive to allow for efficient WHERE x IN <list> SQL queries, significantly reducing the number of database calls by batching multiple lookups into a single query.
  • Paginated Query Execution: Introduced a new ExecutePagedQuery helper function that enables queries with large lists of parameters to be executed in paginated batches. This prevents database parameter limits from being hit and ensures robust handling of extensive data sets.
  • Refactored Graph Store Queries: Applied the new paginated query mechanism to key methods within the SQLStore, specifically FetchChanInfos and PruneGraph, to improve their performance and efficiency when processing multiple channel IDs or outpoints.
  • SQLC Tooling Enhancement: Added a post-generation script to scripts/gen_sqlc_docker.sh to correctly handle sqlc.slice() directives for Postgres. This ensures proper parameter substitution in the generated Go code, addressing a limitation in sqlc's default behavior for IN clauses.
  • Test Suite Improvements: Updated various graph tests to use require.NoError, require.Len, and require.Empty from testify/require for more concise and robust error and assertion handling, improving test readability and maintainability.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a generic, paginated query helper to optimize database queries. This is a great performance improvement that reduces the number of database calls by using IN clauses, while also preventing issues with an unbound number of parameters. The implementation is solid, with good test coverage for the new functionality.

Comment on lines +11 to +20
result := ""
for i := diff + 1; i <= numTotalArgs; i++ {
if i == numTotalArgs {
result += fmt.Sprintf("$%d", i)

continue
}
result += fmt.Sprintf("$%d,", i)
}
return result

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation of makeQueryParams uses string concatenation inside a loop, which can be inefficient in Go as it leads to multiple memory allocations. For better performance, especially given that this function can be called with a page size of 100 or more, I recommend using strings.Builder. This will pre-allocate a buffer and avoid re-allocations in the loop.

if numListArgs == 0 {
		return ""
	}

	var b strings.Builder

	// Pre-allocate a rough estimation of the buffer size to avoid
	// re-allocations. A parameter like $1000, takes 6 bytes.
	b.Grow(numListArgs * 6)

	diff := numTotalArgs - numListArgs
	for i := 0; i < numListArgs; i++ {
		if i > 0 {
			b.WriteString(",")
		}
		fmt.Fprintf(&b, "$%d", i+diff+1)
	}
	return b.String()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks like this is indeed way more performant.

Here are the unit and benchmark tests I added for this (to validate Gemini's claim):

// BenchmarkMakeQueryParams benchmarks the makeQueryParams function for
// various argument sizes. This helps to ensure the function performs
// efficiently when generating SQL query parameter strings for different
// input sizes.
func BenchmarkMakeQueryParams(b *testing.B) {
	cases := []struct {
		totalArgs int
		listArgs  int
	}{
		{totalArgs: 5, listArgs: 2},
		{totalArgs: 10, listArgs: 3},
		{totalArgs: 50, listArgs: 10},
		{totalArgs: 100, listArgs: 20},
	}

	for _, c := range cases {
		name := fmt.Sprintf("totalArgs=%d/listArgs=%d", c.totalArgs,
			c.listArgs)
		b.Run(name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				_ = makeQueryParams(c.totalArgs, c.listArgs)
			}
		})
	}
}

// TestMakeQueryParams tests the makeQueryParams function for various
// argument sizes and verifies the output matches the expected SQL
// parameter string. The function is assumed to generate a comma-separated
// list of parameters in the form $N for use in SQL queries.
func TestMakeQueryParams(t *testing.T) {
	testCases := []struct {
		totalArgs int
		listArgs  int
		expected  string
	}{
		{totalArgs: 5, listArgs: 2, expected: "$4,$5"},
		{totalArgs: 10, listArgs: 3, expected: "$8,$9,$10"},
		{totalArgs: 1, listArgs: 1, expected: "$1"},
		{totalArgs: 3, listArgs: 0, expected: ""},
		{totalArgs: 4, listArgs: 4, expected: "$1,$2,$3,$4"},
	}

	for _, tc := range testCases {
		result := makeQueryParams(tc.totalArgs, tc.listArgs)
		require.Equal(
			t, tc.expected, result,
			"unexpected result for totalArgs=%d, listArgs=%d",
			tc.totalArgs, tc.listArgs,
		)
	}
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice! cool, updated. It did mean needing to change the replacement script a little so that we can handle the error that now needs to be returned from the helper function 👍

This copies the workaround introduced in the taproot-assets code base
and will allow us to use `WHERE x in <list>` type queries.
@ellemouton ellemouton force-pushed the graphSQLSlices branch 2 times, most recently from 59ccaf4 to 2661ab2 Compare July 16, 2025 06:43
Along with a test for it. This helper will allow us to easily create a
pagination wrapper for queries that will make use of the new
/*SLICE:<field_name>*/ directive. The next commit will add a test
showing this.
Here, a new query (GetChannelsByOutpoints) is added which makes use of
the /*SLICE:outpoints*/ directive & added workaround. This is then used
in a test to demonstrate how the ExecutePagedQuery helper can be used to
wrap a query like this such that calls are done in pages.

The query that has been added will also be used by live code paths in an
upcoming commit.
@ellemouton ellemouton force-pushed the graphSQLSlices branch 2 times, most recently from b89b18b to 5d9bf9b Compare July 16, 2025 06:54
@ellemouton
Copy link
Collaborator Author

/gemini review

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the use of the /*SLICE:<field_name>*/ directive to optimize graph queries by enabling the use of WHERE x IN <list of x type> SQL queries. It also adds an ExecutePagedQuery helper to handle these queries in a paginated fashion. The code changes include modifications to the graph database schema, query functions, and test cases to support this new feature.

This commit adds a new GetChannelsByOutpoints query which takes a slice
of outpoint strings. This lets us then update PruneGraph to use
paginated calls to GetChannelsByOutpoints instead of making one DB call
per outpoint.
Remove a TODO by making use of the new sqldb.ExecutePagedQuery to fetch
channels in batches rather than one by one.
Copy link
Collaborator

@guggero guggero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice! I assume all of these changed methods have sufficient unit test coverage already?

Comment on lines +11 to +20
result := ""
for i := diff + 1; i <= numTotalArgs; i++ {
if i == numTotalArgs {
result += fmt.Sprintf("$%d", i)

continue
}
result += fmt.Sprintf("$%d,", i)
}
return result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, looks like this is indeed way more performant.

Here are the unit and benchmark tests I added for this (to validate Gemini's claim):

// BenchmarkMakeQueryParams benchmarks the makeQueryParams function for
// various argument sizes. This helps to ensure the function performs
// efficiently when generating SQL query parameter strings for different
// input sizes.
func BenchmarkMakeQueryParams(b *testing.B) {
	cases := []struct {
		totalArgs int
		listArgs  int
	}{
		{totalArgs: 5, listArgs: 2},
		{totalArgs: 10, listArgs: 3},
		{totalArgs: 50, listArgs: 10},
		{totalArgs: 100, listArgs: 20},
	}

	for _, c := range cases {
		name := fmt.Sprintf("totalArgs=%d/listArgs=%d", c.totalArgs,
			c.listArgs)
		b.Run(name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				_ = makeQueryParams(c.totalArgs, c.listArgs)
			}
		})
	}
}

// TestMakeQueryParams tests the makeQueryParams function for various
// argument sizes and verifies the output matches the expected SQL
// parameter string. The function is assumed to generate a comma-separated
// list of parameters in the form $N for use in SQL queries.
func TestMakeQueryParams(t *testing.T) {
	testCases := []struct {
		totalArgs int
		listArgs  int
		expected  string
	}{
		{totalArgs: 5, listArgs: 2, expected: "$4,$5"},
		{totalArgs: 10, listArgs: 3, expected: "$8,$9,$10"},
		{totalArgs: 1, listArgs: 1, expected: "$1"},
		{totalArgs: 3, listArgs: 0, expected: ""},
		{totalArgs: 4, listArgs: 4, expected: "$1,$2,$3,$4"},
	}

	for _, tc := range testCases {
		result := makeQueryParams(tc.totalArgs, tc.listArgs)
		require.Equal(
			t, tc.expected, result,
			"unexpected result for totalArgs=%d, listArgs=%d",
			tc.totalArgs, tc.listArgs,
		)
	}
}

@@ -7,6 +7,11 @@ import (
"testing"
)

// isSQLite is false if the build tag is set to test_db_postgres. It is used in
// tests that compile for both SQLite and Postgres databases to determine
// which database implementation is being used.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a TODO here that with sqldbv2 there will be a DatabaseType on the BaseDB struct that will allow us to query this at runtime.

@@ -2404,9 +2391,19 @@ func (s *SQLStore) PruneGraph(spentOutputs []*wire.OutPoint,
}

closedChans = append(closedChans, info)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we also need to do a list query for the DeleteChannel call above? Probably not, as we don't expect thousands of channels to be closed by a single block worth of transactions...

return fmt.Errorf("unable to fetch channel: %w",
err)
}
// Define the callback function for processing each channel
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: missing punctuation in some comments here and below.

// ChannelUpdateInfo. This allows us to quickly delete channels that we
// already know about.
for _, chanInfo := range chansInfo {
infoLookup[chanInfo.ShortChannelID.ToUint64()] = chanInfo
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we also need to re-fill this map in the reset callback?

@@ -2446,7 +2496,7 @@ func (s *SQLStore) PruneGraph(spentOutputs []*wire.OutPoint,
// NOTE: this fetches channels for all protocol versions.
func (s *SQLStore) forEachChanInOutpoints(ctx context.Context, db SQLQueries,
outpoints []*wire.OutPoint, cb func(ctx context.Context,
row sqlc.GetChannelsByOutpointsRow) error) error {
row sqlc.GetChannelsByOutpointsRow) error) error {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: there seems to be some back-and-forth in the formatting of these... I know GoLand sometimes does things it shouldn't which is quite annoying...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants