-
Notifications
You must be signed in to change notification settings - Fork 286
Iscp integration, hnsw update improvement #22594
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
PR Compliance Guide 🔍Below is a summary of compliance checks for this PR:
Compliance status legend🟢 - Fully Compliant🟡 - Partial Compliant 🔴 - Not Compliant ⚪ - Requires Further Human Verification 🏷️ - Compliance label |
PR Code Suggestions ✨Explore these optional code suggestions:
|
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue ##21835
What this PR does / why we need it:
load the models once and perform all batches from DataRetriever and finally save model files into database.
This changes save a lot of time for upload/download the model every time there is a 8192 vector block.
PR Type
Enhancement, Tests
Description
• ISCP Integration: Implemented comprehensive Index Sync Consumer Producer (ISCP) system for asynchronous index management with CDC (Change Data Capture) support
• HNSW Update Improvements: Refactored HNSW synchronization from function-based to struct-based architecture with
HnswSync
for better modularity and async operations• SQL Process Abstraction: Introduced
SqlProcess
abstraction layer replacingprocess.Process
across vector index operations for background SQL execution• Async Index Support: Added full async support for HNSW, IVF, and fulltext indexes with CDC task creation and management
• DDL Integration: Enhanced DDL operations (CREATE, ALTER, DROP) with ISCP job registration and cleanup for vector and fulltext indexes
• Performance Optimizations: Replaced JSON library with sonic for better performance and added thread management to HNSW models
• Comprehensive Testing: Added extensive test coverage for async index operations including HNSW f64, IVF, and fulltext indexes
• Transaction Management: Refactored consumers to use direct transaction management with proper timeout configurations
Diagram Walkthrough
File Walkthrough
2 files
sync.go
Refactor HNSW CDC sync to struct-based async architecture
pkg/vectorindex/hnsw/sync.go
• Refactored
CdcSync
function into a struct-based approach withHnswSync
struct• Separated CDC operations into
Update
,Save
, andRunOnce
methods for better modularity• Added
DownloadAll
method forpre-downloading index models
• Changed function signatures to use
sqlexec.SqlProcess
instead ofprocess.Process
mock_consumer.go
Refactor ISCP consumer for direct transaction management
pkg/iscp/mock_consumer.go
• Updated internal SQL consumer to use direct transaction management
•
Added engine and transaction client dependencies for background
operations
• Modified consume methods to work with
client.TxnOperator
instead of
executor.TxnExecutor
• Enhanced error handling and
transaction lifecycle management
23 files
sync_test.go
Update HNSW sync tests for new struct-based API
pkg/vectorindex/hnsw/sync_test.go
• Updated all test functions to use new
HnswSync
struct API• Changed
mock functions to accept
sqlexec.SqlProcess
instead ofprocess.Process
• Added new test for continuous update operations with small capacity
• Modified test setup to create
SqlProcess
instancesindex_consumer_test.go
Enhance ISCP consumer tests with IVF index support
pkg/iscp/index_consumer_test.go
• Added support for IVF index testing with new table definitions and
consumer info
• Replaced mock SQL executor with stub-based approach
using
ExecWithResult
• Added comprehensive tests for IVF snapshot and
tail operations
• Updated HNSW consumer tests to use new SQL writer
approach
cache_test.go
Update vector index cache tests for SQL process abstraction
pkg/vectorindex/cache/cache_test.go
• Updated all cache test methods to use
sqlexec.SqlProcess
instead ofprocess.Process
• Modified mock search implementations to accept new
SQL process parameter
• Added
SqlProcess
creation in all testfunctions
model_test.go
HNSW model tests updated for SqlProcess interface
pkg/vectorindex/hnsw/model_test.go
• Updated test functions to use
sqlexec.SqlProcess
instead ofprocess.Process
• Modified mock functions to accept
SqlProcess
parameter
• Updated all
LoadMetadata
,LoadIndex
, andLoadIndexFromBuffer
calls to use new interfaceiscp_util_test.go
ISCP utility functions test suite implementation
pkg/sql/compile/iscp_util_test.go
• Added comprehensive test suite for ISCP utility functions
•
Implemented mock functions for testing error scenarios in job
registration/unregistration
• Added tests for index CDC validation,
task creation, and deletion operations
search_test.go
IVF-flat search tests updated for SqlProcess interface
pkg/vectorindex/ivfflat/search_test.go
• Updated test mock functions to use
sqlexec.SqlProcess
instead ofprocess.Process
• Modified test setup to create
SqlProcess
wrapperaround
process.Process
• Updated all search function calls to use new
interface
search_test.go
HNSW search tests updated for SqlProcess interface
pkg/vectorindex/hnsw/search_test.go
• Updated mock functions to use
sqlexec.SqlProcess
parameter•
Modified test setup to create
SqlProcess
wrapper• Updated cache
search calls to use new interface
build_test.go
HNSW build tests updated for SqlProcess interface
pkg/vectorindex/hnsw/build_test.go
• Updated test functions to use
sqlexec.SqlProcess
wrapper• Modified
NewHnswBuild
calls to use new interfaceivf_search_test.go
IVF search table function tests updated
pkg/sql/colexec/table_function/ivf_search_test.go
• Updated mock functions and test interfaces to use
sqlexec.SqlProcess
• Modified mock search implementation to work with new interface
build_dml_util_test.go
DML utility async index tests added
pkg/sql/plan/build_dml_util_test.go
• Added test cases for async fulltext index functions
• Implemented
tests for invalid JSON and async flag validation
fulltext_test.go
Fulltext table function tests updated
pkg/sql/colexec/table_function/fulltext_test.go
• Updated mock functions to use
sqlexec.SqlProcess
parameter•
Modified test setup to work with new interface
hnsw_search_test.go
HNSW search table function tests updated
pkg/sql/colexec/table_function/hnsw_search_test.go
• Updated mock search implementation to use
sqlexec.SqlProcess
•
Modified test interfaces for new SqlProcess parameter
hnsw_create_test.go
HNSW create table function tests updated
pkg/sql/colexec/table_function/hnsw_create_test.go
• Updated mock function to use
sqlexec.SqlProcess
parameter• Modified
test setup for new interface
sqlexec_test.go
SQL execution tests updated for SqlProcess
pkg/vectorindex/sqlexec/sqlexec_test.go
• Updated tests to use
NewSqlProcess
wrapper around process• Modified
RunTxn
calls to use SqlProcess interfacetypes_test.go
Vector index types test updated for capacity parameter
pkg/vectorindex/types_test.go
• Updated test to pass capacity parameter to
NewVectorIndexCdc
vector_hnsw_f64_async.sql
Async HNSW vector index integration tests
test/distributed/cases/vector/vector_hnsw_f64_async.sql
• Added comprehensive test cases for async HNSW index functionality
•
Includes tests for insert, update, delete operations with CDC
synchronization
• Tests vector similarity search after async index
updates
fulltext_async.result
Async fulltext index test results
test/distributed/cases/fulltext/fulltext_async.result
• Added expected results for async fulltext index tests
• Shows
successful async index creation and search functionality
fulltext_async.sql
Async fulltext index integration tests
test/distributed/cases/fulltext/fulltext_async.sql
• Added test cases for async fulltext index functionality
• Tests
index creation, data insertion, search, and table operations
vector_ivf_async.result
New test results for asynchronous IVF vector indexing
test/distributed/cases/vector/vector_ivf_async.result
• Added new test result file for asynchronous IVF (Inverted File)
vector index functionality
• Contains test results for creating IVF
indexes with
ASYNC
keyword and vector similarity queries• Includes
tests for
experimental_ivf_index
setting, table creation, indexcreation, and L2 distance queries
• Tests both small datasets and
larger datasets loaded from CSV files with sleep delays for async
operations
vector_hnsw.result
Updated HNSW test results for async index operations
test/distributed/cases/vector/vector_hnsw.result
• Added
sleep(30)
call and corresponding result to allow time forasynchronous index building
• Updated query results to show proper
vector similarity search results after async index completion
• Added
drop table vector_index_01
statement and minor precision changes incosine distance values
vector_ivf_async.sql
New test cases for asynchronous IVF vector indexing
test/distributed/cases/vector/vector_ivf_async.sql
• New test file for asynchronous IVF vector index functionality
•
Tests
experimental_ivf_index
setting, table creation with vectorcolumns, and async index creation
• Includes vector similarity queries
using
L2_DISTANCE
function with sleep delays for async operations•
Tests both small manual datasets and larger CSV file imports with
async index building
vector_hnsw_f64_async.result
New test results for asynchronous HNSW f64 vector indexing
test/distributed/cases/vector/vector_hnsw_f64_async.result
• Added new test result file for asynchronous HNSW vector indexing
with 64-bit float vectors
• Contains test results for CRUD operations
(insert, update, delete) with async HNSW indexes
• Includes vector
similarity search results using
L2_DISTANCE
with sleep delays forasync operations
• Tests both small datasets and larger CSV file
imports with async index building
vector_hnsw.sql
Updated HNSW test cases for async index operations
test/distributed/cases/vector/vector_hnsw.sql
• Added
sleep(30)
call to allow time for asynchronous index buildingbefore queries
• Updated comment from "no result found" to "async
index update so result found"
• Added
drop table vector_index_01
statement for proper test cleanup
26 files
alter.go
Integrate ISCP job management in table alteration workflow
pkg/sql/compile/alter.go
• Added ISCP job management during table alteration operations
•
Implemented index CDC task creation for unaffected indexes during copy
operations
• Enhanced clone logic to handle index table information
with unique and algorithm details
• Added proper cleanup of ISCP jobs
for temporary tables
model.go
Enhance HNSW model with thread management and SQL process support
pkg/vectorindex/hnsw/model.go
• Added
NThread
field toHnswModel
struct for thread management•
Refactored index initialization into separate
initIndex
method•
Enhanced error handling in index creation and loading operations
•
Updated method signatures to use
sqlexec.SqlProcess
instead ofprocess.Process
sqlexec.go
Add SQL execution abstraction for background operations
pkg/vectorindex/sqlexec/sqlexec.go
• Introduced
SqlContext
andSqlProcess
abstractions for background SQLexecution
• Added support for running SQL operations without frontend
process context
• Implemented
RunTxnWithSqlContext
for backgroundtransaction management
• Enhanced SQL execution functions to work with
both frontend and background contexts
ddl.go
Integrate ISCP job management across DDL operations
pkg/sql/compile/ddl.go
• Added ISCP job registration and management for various DDL
operations
• Implemented index CDC task creation for vector and
fulltext indexes
• Added cleanup of ISCP jobs during database and
table drops
• Enhanced IVF index handling with async support and ISCP
integration
index_consumer.go
ISCP index consumer refactoring with async HNSW support
pkg/iscp/index_consumer.go
• Refactored
IndexConsumer
to useengine.Engine
andclient.TxnClient
instead of
executor.SQLExecutor
• Added separate execution paths for
HNSW and other index types with
runHnsw
andrunIndex
functions•
Implemented transaction management using
sqlexec.RunTxnWithSqlContext
with different timeout configurations
• Added support for JSON-based
CDC data processing for HNSW indexes using
sonic.Unmarshal
search.go
IVF-flat search interface updated to SqlProcess
pkg/vectorindex/ivfflat/search.go
• Replaced
process.Process
withsqlexec.SqlProcess
throughout thesearch implementation
• Updated context handling to use
sqlproc.GetContext()
instead ofproc.Ctx
• Modified function
signatures for
LoadIndex
,Search
,findCentroids
, andsearchEntries
methods
build_dml_util.go
DML utility functions enhanced with async index support
pkg/sql/plan/build_dml_util.go
• Added async index checks using
catalog.IsIndexAsync()
to skipsynchronous processing
• Updated comment formatting for better
readability in delete plans documentation
• Added early returns for
async indexes in multiple index building functions
search.go
HNSW search interface updated to SqlProcess
pkg/vectorindex/hnsw/search.go
• Replaced
process.Process
withsqlexec.SqlProcess
in search interface• Updated
LoadMetadata
,LoadIndex
, andSearch
method signatures•
Modified context handling to use
sqlproc.GetContext()
cache.go
Vector index cache updated for SqlProcess interface
pkg/vectorindex/cache/cache.go
• Updated
VectorIndexSearchIf
interface to usesqlexec.SqlProcess
instead of
process.Process
• Modified
Load
andSearch
methodsignatures throughout the cache implementation
• Updated all cache
operations to work with new SqlProcess interface
ddl_index_algo.go
DDL index algorithms enhanced with async support
pkg/sql/compile/ddl_index_algo.go
• Added async index support for fulltext and HNSW indexes
•
Implemented CDC task creation for async indexes instead of direct SQL
execution
• Added conditional logic to handle both sync and async
index creation workflows
index_sqlwriter.go
HNSW SQL writer refactored for JSON-based CDC
pkg/iscp/index_sqlwriter.go
• Modified HNSW SQL writer to return JSON data instead of SQL
statements
• Added
NewSync
method to createHnswSync
instances•
Updated CDC writer capacity configuration
ivf_search.go
IVF search table function updated for SqlProcess
pkg/sql/colexec/table_function/ivf_search.go
• Updated IVF search table function to use
sqlexec.NewSqlProcess(proc)
• Modified
getVersion
and cache search calls to use SqlProcessinterface
build.go
HNSW build interface updated to SqlProcess
pkg/vectorindex/hnsw/build.go
• Updated
NewHnswBuild
and related methods to usesqlexec.SqlProcess
•
Modified context handling to use
sqlproc.GetContext()
hnsw_search.go
HNSW search table function updated for SqlProcess
pkg/sql/colexec/table_function/hnsw_search.go
• Updated HNSW search table function to use
sqlexec.NewSqlProcess(proc)
• Modified cache search call to use
SqlProcess interface
data_retriever.go
Data retriever watermark update refactored
pkg/iscp/data_retriever.go
• Updated
UpdateWatermark
method to use direct SQL execution withtransaction
• Replaced executor interface with context, cnUUID, and
txn parameters
• Added proper timeout and system account context
handling
types.go
Vector index types updated with sonic JSON library
pkg/vectorindex/types.go
• Replaced
json
package withsonic
for better performance• Updated
NewVectorIndexCdc
to accept capacity parameter• Modified JSON
marshaling to use sonic library
hnsw_create.go
HNSW create table function updated for SqlProcess
pkg/sql/colexec/table_function/hnsw_create.go
• Updated HNSW creation functions to use
sqlexec.NewSqlProcess(proc)
•
Modified
NewHnswBuild
calls to use SqlProcess interfaceconsumer.go
Consumer factory updated with engine and transaction client
pkg/iscp/consumer.go
• Updated
NewConsumer
function to acceptengine.Engine
andclient.TxnClient
parameters• Modified consumer creation calls to pass
additional parameters
sql.go
IVF-flat SQL utilities updated for SqlProcess
pkg/vectorindex/ivfflat/sql.go
• Updated
GetVersion
function to usesqlexec.SqlProcess
instead ofprocess.Process
• Modified context handling and error messages
types.go
ISCP types interface updated for direct transaction handling
pkg/iscp/types.go
• Updated
DataRetriever
interface to changeUpdateWatermark
methodsignature
• Removed executor dependency in favor of direct context and
transaction parameters
metadata_scan.go
Metadata scan updated for SqlProcess interface
pkg/sql/colexec/table_function/metadata_scan.go
• Updated metadata scan to use
sqlexec.NewSqlProcess(proc)
for SQLexecution
• Modified
RunSql
call to use SqlProcess interfacefulltext.go
Fulltext table function updated for SqlProcess
pkg/sql/colexec/table_function/fulltext.go
• Updated fulltext functions to use
sqlexec.NewSqlProcess(proc)
•
Modified SQL execution calls to use SqlProcess interface
ivf_create.go
IVF create table function updated for SqlProcess
pkg/sql/colexec/table_function/ivf_create.go
• Updated IVF creation functions to use
sqlexec.NewSqlProcess(proc)
•
Modified version retrieval and SQL execution calls
iteration.go
ISCP iteration updated for new consumer interface
pkg/iscp/iteration.go
• Updated
NewConsumer
call to includecnEngine
andcnTxnClient
parameters
watermark_updater.go
Watermark updater function made mockable
pkg/iscp/watermark_updater.go
• Made
ExecWithResult
function a variable for easier testing/mockingutil.go
Compile utility function error handling improved
pkg/sql/compile/util.go
• Updated
genInsertIndexTableSqlForFullTextIndex
to return erroralongside SQL strings
• Added error handling to function signature
2 files
function_id_test.go
Remove HNSW CDC update function ID
pkg/sql/plan/function/function_id_test.go
• Removed
HNSW_CDC_UPDATE
function ID from predefined function IDs•
Updated
FUNCTION_END_NUMBER
to reflect the removalfunction_id.go
HNSW CDC update function ID removed
pkg/sql/plan/function/function_id.go
• Removed
HNSW_CDC_UPDATE
function ID constant• Decremented
FUNCTION_END_NUMBER
accordingly1 files
iscp_util.go
ISCP utility functions for index CDC management
pkg/sql/compile/iscp_util.go
• Implemented ISCP (Index Sync Consumer Producer) utility functions
for CDC task management
• Added functions for creating, deleting, and
validating index CDC tasks
• Implemented job
registration/unregistration with proper error handling
• Added support
for different sinker types based on index algorithms
5 files