Skip to content

Conversation

cpegeric
Copy link
Contributor

@cpegeric cpegeric commented Sep 16, 2025

User description

What type of PR is this?

  • API-change
  • BUG
  • Improvement
  • Documentation
  • Feature
  • Test and CI
  • Code Refactoring

Which issue(s) this PR fixes:

issue #21835

What this PR does / why we need it:

index update with ISCP


PR Type

Enhancement, Feature, Tests, Bug fix


Description

Major Feature: Implements ISCP (Index Synchronization Change Processing) with CDC (Change Data Capture) support for vector indexes
HNSW Generic Types: Refactors HNSW vector index implementation to use generic types with types.RealNumbers constraint
Async Index Support: Adds ASYNC keyword support for fulltext and vector indexes with asynchronous processing capabilities
CDC Infrastructure: Implements comprehensive CDC synchronization for HNSW, IVF-flat, and fulltext indexes with SQL generation
Index Consumer: Adds IndexConsumer for processing index synchronization data in both snapshot and tail modes
DDL Integration: Integrates ISCP job management into DDL operations (CREATE, DROP, ALTER TABLE) with CDC task lifecycle
SQL Writer Framework: Implements IndexSqlWriter interface with algorithm-specific implementations for different index types
Test Coverage: Adds comprehensive test suites for all new CDC, ISCP, and async functionality
Bug Fixes: Fixes null handling in watermark updater and fulltext index parameter processing
Enhanced Error Messages: Improves vector dimension mismatch error messages for better clarity


Diagram Walkthrough

flowchart LR
  DDL["DDL Operations"] --> ISCP["ISCP Job Manager"]
  ISCP --> CDC["CDC Tasks"]
  CDC --> IndexConsumer["Index Consumer"]
  IndexConsumer --> SQLWriter["Index SQL Writer"]
  SQLWriter --> HNSW["HNSW Generic Model"]
  SQLWriter --> IVF["IVF-flat Index"]
  SQLWriter --> Fulltext["Fulltext Index"]
  HNSW --> Sync["CDC Sync"]
  Parser["MySQL Parser"] --> Async["ASYNC Keyword"]
  Async --> DDL
Loading

File Walkthrough

Relevant files
Feature
17 files
index_sqlwriter.go
Add index SQL writer implementations for vector indexes   

pkg/iscp/index_sqlwriter.go

• Implements IndexSqlWriter interface with three concrete
implementations: FulltextSqlWriter, IvfflatSqlWriter, and
HnswSqlWriter
• Provides SQL generation for different vector index
algorithms (fulltext, IVFFLAT, HNSW) with CDC operations (insert,
upsert, delete)
• Includes factory function NewIndexSqlWriter to
create appropriate writer based on algorithm type
• Implements generic
type support for HNSW with HnswSqlWriter[T types.RealNumbers]

+650/-0 
sync.go
Add HNSW vector index CDC synchronization support               

pkg/vectorindex/hnsw/sync.go

• Implements CDC synchronization functionality for HNSW vector indexes
with CdcSync function
• Provides HnswSync struct for managing index
updates, insertions, and deletions
• Includes parallel processing
support for bulk operations and sequential updates
• Generates SQL
statements for metadata and index table updates

+681/-0 
model.go
Add HNSW vector index model with generic type support       

pkg/vectorindex/hnsw/model.go

• Implements HnswModel struct for HNSW vector index operations with
generic type support
• Provides methods for index building, loading,
saving, searching, and CDC operations
• Includes file-based
persistence with chunked loading from database
• Supports concurrent
operations with atomic counters and proper resource management

+590/-0 
index_consumer.go
Implement index consumer for CDC data processing                 

pkg/iscp/index_consumer.go

• Implemented IndexConsumer for processing index synchronization data

• Handles both snapshot and tail (CDC) data processing modes
• Manages
SQL execution through channels and transaction handling
• Supports
different index algorithms (HNSW, IVF-flat, fulltext)

+441/-0 
ddl.go
Integrate ISCP CDC tasks into DDL operations                         

pkg/sql/compile/ddl.go

• Integrated ISCP job management into DDL operations
• Added CDC task
creation/deletion for index operations
• Updated table operations
(create, drop, truncate, alter) to handle index CDC tasks
• Added
support for async index updates with PITR integration

+80/-3   
cdc_util.go
Add CDC utilities for index synchronization management     

pkg/sql/compile/cdc_util.go

• Added utility functions for managing CDC tasks and PITR for indexes

• Implements job registration/unregistration with ISCP system

Handles creation and deletion of index-specific CDC tasks
• Manages
PITR lifecycle for index synchronization

+273/-0 
types.go
Add CDC data structures and async parameter support           

pkg/vectorindex/types.go

• Added CDC-related data structures and constants
• Implemented
VectorIndexCdc and VectorIndexCdcEntry with generic types
• Added CDC
operation types (INSERT, UPSERT, DELETE)
• Enhanced parameter
structures with async support

+88/-0   
func_hnsw.go
Add HNSW CDC update function implementation                           

pkg/sql/plan/function/func_hnsw.go

• New function hnswCdcUpdate for handling HNSW CDC (Change Data
Capture) updates
• Processes database, table, type, dimension, and CDC
JSON parameters
• Calls hnsw.CdcSync for float32 vector types with
logging

+91/-0   
ddl_index_algo.go
Implement async index support with CDC task creation         

pkg/sql/compile/ddl_index_algo.go

• Added async index support for fulltext indexes with CDC task
creation
• Enhanced HNSW index handling to register CDC update tasks

Added logging import and async parameter checking

+37/-5   
sqlexec.go
Add transaction execution utility function                             

pkg/vectorindex/sqlexec/sqlexec.go

• Added new RunTxn function for executing transactions with proper
context setup
• Handles account ID extraction and SQL executor
configuration
• Provides transaction execution with proper options and
error handling

+27/-0   
create.go
Add async option support to index creation syntax               

pkg/sql/parsers/tree/create.go

• Added Async boolean field to IndexOption struct
• Enhanced Format
method to output "ASYNC " when async flag is true

+4/-0     
list_builtIn.go
Register HNSW CDC update built-in function                             

pkg/sql/plan/function/list_builtIn.go

• Added new HNSW_CDC_UPDATE function definition with proper overload

Function accepts 5 parameters including database, table, type,
dimension, and CDC data
• Returns uint64 type and uses hnswCdcUpdate
as execution logic

+21/-0   
function_id.go
Register HNSW CDC update function identifier                         

pkg/sql/plan/function/function_id.go

• Added HNSW_CDC_UPDATE function ID constant (349)
• Updated
FUNCTION_END_NUMBER to 350
• Registered "hnsw_cdc_update" function
name mapping

+7/-1     
keywords.go
Add ASYNC keyword to MySQL parser                                               

pkg/sql/parsers/dialect/mysql/keywords.go

• Added "async" keyword mapping to ASYNC token
• Registered new
keyword in the MySQL dialect parser

+1/-0     
consumer.go
Add index sync consumer type support                                         

pkg/iscp/consumer.go

• Added support for ConsumerType_IndexSync consumer type
• Returns
NewIndexConsumer for index synchronization operations

+3/-0     
types.go
Add async parameter to fulltext parser configuration         

pkg/fulltext/types.go

• Added Async field to FullTextParserParam struct
• Enhanced fulltext
parser parameters to support async operations

+1/-0     
mysql_sql.y
Add ASYNC syntax support to MySQL grammar                               

pkg/sql/parsers/dialect/mysql/mysql_sql.y

• Added ASYNC token definition and grammar rules
• Enhanced index
option parsing to handle async flag
• Added async keyword to
non-reserved keywords list

+10/-1   
Enhancement
19 files
util.go
Enable comprehensive data type support in ISCP utilities 

pkg/iscp/util.go

• Uncomments and enables support for additional data types in
extractRowFromVector and convertColIntoSql functions
• Adds support
for JSON, bit, array types, date/time types, decimal types, UUID, and
other specialized types
• Includes appendHex function for binary data
formatting
• Improves NULL value handling with proper type casting

+134/-127
alter.go
Add ISCP job management for ALTER TABLE operations             

pkg/sql/compile/alter.go

• Adds ISCP job cleanup during ALTER TABLE operations
• Includes
DropAllIndexCdcTasks call to remove CDC tasks for temporary tables

Adds fulltext index handling in the reindex process
• Improves error
handling and logging for ALTER TABLE copy operations

+32/-11 
search.go
Refactor HNSW search with generic types and model abstraction

pkg/vectorindex/hnsw/search.go

• Refactored HnswSearch to use generic types with types.RealNumbers
constraint
• Replaced HnswSearchIndex with HnswModel[T] for better
type safety
• Simplified search implementation by removing file
loading logic
• Updated metadata loading to use generic LoadMetadata
function

+34/-209
build_dml_util.go
Add async index support to DML operations                               

pkg/sql/plan/build_dml_util.go

• Added async index support to skip synchronous index operations

Updated multi-table index handling to check for async configuration

Modified fulltext and IVF index processing to respect async settings

Enhanced MultiTableIndex structure with IndexAlgoParams field

+53/-4   
fulltext.go
Enhance fulltext tokenization with composite key support 

pkg/sql/plan/fulltext.go

• Enhanced fulltext index tokenization to support both table and
values scans
• Added support for composite primary keys in fulltext
operations
• Improved parameter handling for different scan types

Added primary key type extraction for values-based operations

+54/-12 
secondary_index_utils.go
Add async parameter support to index configuration             

pkg/catalog/secondary_index_utils.go

• Added async parameter support to index configuration
• Implemented
IsIndexAsync function to check async settings
• Enhanced parameter
parsing to handle async flag
• Updated fulltext and vector index
parameter handling

+38/-3   
build_show_util.go
Add async parameter support in CREATE TABLE SQL construction

pkg/sql/plan/build_show_util.go

• Enhanced ConstructCreateTableSQL to handle async parameter from
index algo params
• Added JSON parsing for async flag and appends
"ASYNC" to index string when true
• Improved error handling for JSON
parsing operations

+23/-10 
func_cast.go
Enhance array casting with dimension validation                   

pkg/sql/plan/function/func_cast.go

• Enhanced strToArray function with dimension validation
• Added
bypass for max dimension check and proper error handling
• Improved
array conversion with dimension mismatch detection

+11/-2   
hnsw_create.go
Update HNSW creation to use generic types                               

pkg/sql/colexec/table_function/hnsw_create.go

• Updated hnswCreateState to use generic HnswBuild[float32] type

Modified NewHnswBuild call to use generic float32 type parameter

+2/-2     
hnsw.go
Relax HNSW query builder node type constraints                     

pkg/sql/plan/hnsw.go

• Commented out TABLE_SCAN node type validation
• Removed strict node
type checking for HNSW query building

+6/-4     
mock_consumer.go
Use system account constant in mock consumer                         

pkg/iscp/mock_consumer.go

• Updated context creation to use catalog.System_Account instead of
hardcoded uint32(0)
• Improved system account constant usage for
consistency

+1/-1     
types.go
Add vector array type description formatting                         

pkg/container/types/types.go

• Added DescString method cases for T_array_float32 and
T_array_float64
• Returns formatted vector type descriptions like
"VECF32(128)" and "VECF64(128)"

+4/-0     
hnsw_search.go
Update HNSW search to use generic types                                   

pkg/sql/colexec/table_function/hnsw_search.go

• Updated newHnswAlgoFn to use generic NewHnswSearch[float32] call

Modified HNSW search algorithm instantiation with float32 type
parameter

+1/-1     
data_retriever.go
Add account and table ID getters to data retriever             

pkg/iscp/data_retriever.go

• Added GetAccountID() and GetTableID() methods to DataRetrieverImpl

Provides access to account and table identifiers for data retrieval
operations

+8/-0     
types.go
Add algorithm parameters to multi-table index structure   

pkg/sql/plan/types.go

• Added IndexAlgoParams field to MultiTableIndex struct
• Enhanced
multi-table index structure to store algorithm parameters

+3/-2     
types.go
Extend DataRetriever interface with ID getters                     

pkg/iscp/types.go

• Added GetAccountID() and GetTableID() methods to DataRetriever
interface
• Extended interface to provide access to account and table
identifiers

+2/-0     
vector_hnsw.result
Update vector dimension error message format                         

test/distributed/cases/vector/vector_hnsw.result

• Updated error message format for vector dimension mismatch
• Changed
from "vector ops between different dimensions" to "expected vector
dimension X != actual dimension Y"

+1/-1     
vector_index.result
Improve vector index dimension error messages                       

test/distributed/cases/vector/vector_index.result

• Updated error message format for vector dimension validation

Improved error message clarity for dimension mismatch scenarios

+1/-1     
array.result
Update array dimension error message format                           

test/distributed/cases/array/array.result

• Updated vector dimension error messages to use clearer format

Changed error text to "expected vector dimension X != actual dimension
Y"

+2/-2     
Tests
15 files
index_consumer_test.go
Add comprehensive test suite for index consumer                   

pkg/iscp/index_consumer_test.go

• Adds comprehensive test suite for index consumer functionality

Includes mock implementations for data retrieval, SQL execution, and
error handling
• Tests both snapshot and tail data processing
scenarios for HNSW indexes
• Validates SQL generation and CDC
operation handling

+381/-0 
sync_test.go
Add comprehensive HNSW CDC synchronization test suite       

pkg/vectorindex/hnsw/sync_test.go

• Added comprehensive test suite for HNSW CDC synchronization
functionality
• Tests cover various scenarios: empty sync, upsert,
delete, mixed operations, and multi-file handling
• Includes mock
functions for SQL execution and streaming operations
• Tests handle
shuffled data and large datasets (up to 1M entries)

+371/-0 
index_sqlwriter_test.go
Add comprehensive index SQL writer test suite                       

pkg/iscp/index_sqlwriter_test.go

• Added test suite for index SQL writers (fulltext, HNSW, IVF-flat)

Tests cover insert, upsert, delete operations for different index
types
• Includes tests for composite primary keys and various data
types
• Validates SQL generation for different vector index algorithms

+341/-0 
util_test.go
Add utility tests for data type conversion and SQL generation

pkg/iscp/util_test.go

• Added utility tests for data type conversion and SQL generation

Tests various data types including JSON, arrays, dates, decimals,
UUIDs
• Validates proper SQL formatting for different vector and
scalar types
• Includes comprehensive type conversion validation

+204/-0 
search_test.go
Enhance HNSW search tests with multi-file support               

pkg/vectorindex/hnsw/search_test.go

• Enhanced existing search tests with multi-file support
• Added mock
functions for 2-file scenarios and catalog operations
• Extended test
coverage for metadata and index batch creation
• Added utility
functions for creating test batches with different file configurations

+113/-1 
model_test.go
Add comprehensive HNSW model test suite                                   

pkg/vectorindex/hnsw/model_test.go

• Added comprehensive test suite for HNSW model operations
• Tests
model loading, searching, adding/removing vectors, and SQL generation

• Includes error handling tests for nil model scenarios
• Validates
model state management and persistence operations

+207/-0 
func_hnsw_test.go
Add HNSW CDC update function test suite                                   

pkg/sql/plan/function/func_hnsw_test.go

• Added test suite for HNSW CDC update function
• Tests various error
conditions and parameter validation
• Validates null parameter
handling and JSON parsing
• Ensures proper error handling for invalid
inputs

+129/-0 
build_test.go
Update HNSW build tests for generic type system                   

pkg/vectorindex/hnsw/build_test.go

• Updated build tests to use generic HNSW types
• Modified test
functions to work with HnswModel[float32] instead of HnswSearchIndex

Updated constructor calls to use generic type parameters
• Maintained
existing test functionality with new type system

+9/-9     
mysql_sql_test.go
Add ASYNC keyword test cases for index creation                   

pkg/sql/parsers/dialect/mysql/mysql_sql_test.go

• Added test cases for ASYNC keyword in fulltext and vector index
creation statements
• Updated expected output to include ASYNC keyword
in uppercase format

+9/-1     
types_test.go
Add vector index CDC operations test suite                             

pkg/vectorindex/types_test.go

• New test file for vector index CDC operations
• Tests Insert,
Delete, Upsert operations and JSON serialization
• Validates CDC state
management and JSON output format

+63/-0   
vector_ivf_async.result
IVF async vector index test results                                           

test/distributed/cases/vector/vector_ivf_async.result

• Test results for IVF vector index with ASYNC keyword functionality

Validates async index creation, data insertion, and vector similarity
queries
• Demonstrates proper async index behavior with sleep delays

+58/-0   
vector_ivf_async.sql
IVF async vector index test cases                                               

test/distributed/cases/vector/vector_ivf_async.sql

• Test cases for IVF vector indexes with ASYNC keyword
• Tests index
creation, data loading, and vector similarity searches
• Includes
sleep statements to allow async operations to complete

+59/-0   
vector_hnsw_async.result
HNSW async vector index test results                                         

test/distributed/cases/vector/vector_hnsw_async.result

• Test results for HNSW vector index with ASYNC functionality

Validates async HNSW index creation and vector operations
• Shows
proper handling of CDC updates and similarity searches

+66/-0   
fulltext_async.sql
Fulltext async index test cases                                                   

test/distributed/cases/fulltext/fulltext_async.sql

• Test cases for fulltext index with ASYNC keyword support
• Tests
async fulltext index creation and search functionality
• Includes
multilingual content and null value handling

+21/-0   
fulltext_async.result
Fulltext async index test results                                               

test/distributed/cases/fulltext/fulltext_async.result

• Test results for async fulltext index functionality
• Validates
async fulltext search operations and result accuracy
• Shows proper
handling of multilingual and null content

+19/-0   
Code refactoring
1 files
build.go
Refactor HNSW build to use generic types and shared model

pkg/vectorindex/hnsw/build.go

• Refactors HNSW build functionality to use generic types with
HnswBuild[T types.RealNumbers]
• Removes HnswBuildIndex struct and
related methods (moved to model.go)
• Updates function signatures and
type definitions to support generic vector types
• Maintains
multi-threaded building capabilities with channel-based communication

+30/-202
Configuration changes
1 files
function_id_test.go
Add HNSW CDC update function ID                                                   

pkg/sql/plan/function/function_id_test.go

• Adds HNSW_CDC_UPDATE function ID (349) to predefined function IDs
map
• Updates FUNCTION_END_NUMBER from 349 to 350 to accommodate new
function

+3/-1     
Bug fix
3 files
watermark_updater.go
Fix ISCP watermark updater null handling bugs                       

pkg/iscp/watermark_updater.go

• Fixed bug in unregisterJobsByDBName to handle empty tableIDs array

Corrected null check index in queryIndexLog function
• Added
conditional execution to prevent SQL errors

+15/-9   
iteration.go
Improve ISCP iteration error handling and context setup   

pkg/iscp/iteration.go

• Added error handling for CollectChanges function call
• Enhanced
consumer execution with proper tenant context setup
• Fixed context
propagation for system account operations

+5/-1     
build_ddl.go
Fix fulltext index algorithm parameters processing             

pkg/sql/plan/build_ddl.go

• Fixed fulltext index table building to always process
IndexAlgoParams
• Removed conditional check that prevented parameter
processing

+4/-4     
Additional files
3 files
util.go +2/-2     
mysql_sql.go +8607/-8632
vector_hnsw_async.sql +96/-0   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Review effort 5/5 size/XL Denotes a PR that changes [1000, 1999] lines
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants