-
Notifications
You must be signed in to change notification settings - Fork 286
IGNOREME: Iscp integration #22519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
IGNOREME: Iscp integration #22519
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
User description
What type of PR is this?
Which issue(s) this PR fixes:
issue #21835
What this PR does / why we need it:
index update with ISCP
PR Type
Enhancement, Feature, Tests, Bug fix
Description
• Major Feature: Implements ISCP (Index Synchronization Change Processing) with CDC (Change Data Capture) support for vector indexes
• HNSW Generic Types: Refactors HNSW vector index implementation to use generic types with
types.RealNumbers
constraint• Async Index Support: Adds ASYNC keyword support for fulltext and vector indexes with asynchronous processing capabilities
• CDC Infrastructure: Implements comprehensive CDC synchronization for HNSW, IVF-flat, and fulltext indexes with SQL generation
• Index Consumer: Adds IndexConsumer for processing index synchronization data in both snapshot and tail modes
• DDL Integration: Integrates ISCP job management into DDL operations (CREATE, DROP, ALTER TABLE) with CDC task lifecycle
• SQL Writer Framework: Implements IndexSqlWriter interface with algorithm-specific implementations for different index types
• Test Coverage: Adds comprehensive test suites for all new CDC, ISCP, and async functionality
• Bug Fixes: Fixes null handling in watermark updater and fulltext index parameter processing
• Enhanced Error Messages: Improves vector dimension mismatch error messages for better clarity
Diagram Walkthrough
File Walkthrough
17 files
index_sqlwriter.go
Add index SQL writer implementations for vector indexes
pkg/iscp/index_sqlwriter.go
• Implements IndexSqlWriter interface with three concrete
implementations: FulltextSqlWriter, IvfflatSqlWriter, and
HnswSqlWriter
• Provides SQL generation for different vector index
algorithms (fulltext, IVFFLAT, HNSW) with CDC operations (insert,
upsert, delete)
• Includes factory function
NewIndexSqlWriter
tocreate appropriate writer based on algorithm type
• Implements generic
type support for HNSW with
HnswSqlWriter[T types.RealNumbers]
sync.go
Add HNSW vector index CDC synchronization support
pkg/vectorindex/hnsw/sync.go
• Implements CDC synchronization functionality for HNSW vector indexes
with
CdcSync
function• Provides
HnswSync
struct for managing indexupdates, insertions, and deletions
• Includes parallel processing
support for bulk operations and sequential updates
• Generates SQL
statements for metadata and index table updates
model.go
Add HNSW vector index model with generic type support
pkg/vectorindex/hnsw/model.go
• Implements
HnswModel
struct for HNSW vector index operations withgeneric type support
• Provides methods for index building, loading,
saving, searching, and CDC operations
• Includes file-based
persistence with chunked loading from database
• Supports concurrent
operations with atomic counters and proper resource management
index_consumer.go
Implement index consumer for CDC data processing
pkg/iscp/index_consumer.go
• Implemented
IndexConsumer
for processing index synchronization data• Handles both snapshot and tail (CDC) data processing modes
• Manages
SQL execution through channels and transaction handling
• Supports
different index algorithms (HNSW, IVF-flat, fulltext)
ddl.go
Integrate ISCP CDC tasks into DDL operations
pkg/sql/compile/ddl.go
• Integrated ISCP job management into DDL operations
• Added CDC task
creation/deletion for index operations
• Updated table operations
(create, drop, truncate, alter) to handle index CDC tasks
• Added
support for async index updates with PITR integration
cdc_util.go
Add CDC utilities for index synchronization management
pkg/sql/compile/cdc_util.go
• Added utility functions for managing CDC tasks and PITR for indexes
• Implements job registration/unregistration with ISCP system
•
Handles creation and deletion of index-specific CDC tasks
• Manages
PITR lifecycle for index synchronization
types.go
Add CDC data structures and async parameter support
pkg/vectorindex/types.go
• Added CDC-related data structures and constants
• Implemented
VectorIndexCdc
andVectorIndexCdcEntry
with generic types• Added CDC
operation types (INSERT, UPSERT, DELETE)
• Enhanced parameter
structures with async support
func_hnsw.go
Add HNSW CDC update function implementation
pkg/sql/plan/function/func_hnsw.go
• New function
hnswCdcUpdate
for handling HNSW CDC (Change DataCapture) updates
• Processes database, table, type, dimension, and CDC
JSON parameters
• Calls
hnsw.CdcSync
for float32 vector types withlogging
ddl_index_algo.go
Implement async index support with CDC task creation
pkg/sql/compile/ddl_index_algo.go
• Added async index support for fulltext indexes with CDC task
creation
• Enhanced HNSW index handling to register CDC update tasks
•
Added logging import and async parameter checking
sqlexec.go
Add transaction execution utility function
pkg/vectorindex/sqlexec/sqlexec.go
• Added new
RunTxn
function for executing transactions with propercontext setup
• Handles account ID extraction and SQL executor
configuration
• Provides transaction execution with proper options and
error handling
create.go
Add async option support to index creation syntax
pkg/sql/parsers/tree/create.go
• Added
Async
boolean field toIndexOption
struct• Enhanced
Format
method to output "ASYNC " when async flag is true
list_builtIn.go
Register HNSW CDC update built-in function
pkg/sql/plan/function/list_builtIn.go
• Added new
HNSW_CDC_UPDATE
function definition with proper overload•
Function accepts 5 parameters including database, table, type,
dimension, and CDC data
• Returns uint64 type and uses
hnswCdcUpdate
as execution logic
function_id.go
Register HNSW CDC update function identifier
pkg/sql/plan/function/function_id.go
• Added
HNSW_CDC_UPDATE
function ID constant (349)• Updated
FUNCTION_END_NUMBER
to 350• Registered "hnsw_cdc_update" function
name mapping
keywords.go
Add ASYNC keyword to MySQL parser
pkg/sql/parsers/dialect/mysql/keywords.go
• Added "async" keyword mapping to
ASYNC
token• Registered new
keyword in the MySQL dialect parser
consumer.go
Add index sync consumer type support
pkg/iscp/consumer.go
• Added support for
ConsumerType_IndexSync
consumer type• Returns
NewIndexConsumer
for index synchronization operationstypes.go
Add async parameter to fulltext parser configuration
pkg/fulltext/types.go
• Added
Async
field toFullTextParserParam
struct• Enhanced fulltext
parser parameters to support async operations
mysql_sql.y
Add ASYNC syntax support to MySQL grammar
pkg/sql/parsers/dialect/mysql/mysql_sql.y
• Added
ASYNC
token definition and grammar rules• Enhanced index
option parsing to handle async flag
• Added async keyword to
non-reserved keywords list
19 files
util.go
Enable comprehensive data type support in ISCP utilities
pkg/iscp/util.go
• Uncomments and enables support for additional data types in
extractRowFromVector
andconvertColIntoSql
functions• Adds support
for JSON, bit, array types, date/time types, decimal types, UUID, and
other specialized types
• Includes
appendHex
function for binary dataformatting
• Improves NULL value handling with proper type casting
alter.go
Add ISCP job management for ALTER TABLE operations
pkg/sql/compile/alter.go
• Adds ISCP job cleanup during ALTER TABLE operations
• Includes
DropAllIndexCdcTasks
call to remove CDC tasks for temporary tables•
Adds fulltext index handling in the reindex process
• Improves error
handling and logging for ALTER TABLE copy operations
search.go
Refactor HNSW search with generic types and model abstraction
pkg/vectorindex/hnsw/search.go
• Refactored
HnswSearch
to use generic types withtypes.RealNumbers
constraint
• Replaced
HnswSearchIndex
withHnswModel[T]
for bettertype safety
• Simplified search implementation by removing file
loading logic
• Updated metadata loading to use generic
LoadMetadata
function
build_dml_util.go
Add async index support to DML operations
pkg/sql/plan/build_dml_util.go
• Added async index support to skip synchronous index operations
•
Updated multi-table index handling to check for async configuration
•
Modified fulltext and IVF index processing to respect async settings
•
Enhanced
MultiTableIndex
structure withIndexAlgoParams
fieldfulltext.go
Enhance fulltext tokenization with composite key support
pkg/sql/plan/fulltext.go
• Enhanced fulltext index tokenization to support both table and
values scans
• Added support for composite primary keys in fulltext
operations
• Improved parameter handling for different scan types
•
Added primary key type extraction for values-based operations
secondary_index_utils.go
Add async parameter support to index configuration
pkg/catalog/secondary_index_utils.go
• Added async parameter support to index configuration
• Implemented
IsIndexAsync
function to check async settings• Enhanced parameter
parsing to handle async flag
• Updated fulltext and vector index
parameter handling
build_show_util.go
Add async parameter support in CREATE TABLE SQL construction
pkg/sql/plan/build_show_util.go
• Enhanced
ConstructCreateTableSQL
to handleasync
parameter fromindex algo params
• Added JSON parsing for async flag and appends
"ASYNC" to index string when true
• Improved error handling for JSON
parsing operations
func_cast.go
Enhance array casting with dimension validation
pkg/sql/plan/function/func_cast.go
• Enhanced
strToArray
function with dimension validation• Added
bypass for max dimension check and proper error handling
• Improved
array conversion with dimension mismatch detection
hnsw_create.go
Update HNSW creation to use generic types
pkg/sql/colexec/table_function/hnsw_create.go
• Updated
hnswCreateState
to use genericHnswBuild[float32]
type•
Modified
NewHnswBuild
call to use generic float32 type parameterhnsw.go
Relax HNSW query builder node type constraints
pkg/sql/plan/hnsw.go
• Commented out TABLE_SCAN node type validation
• Removed strict node
type checking for HNSW query building
mock_consumer.go
Use system account constant in mock consumer
pkg/iscp/mock_consumer.go
• Updated context creation to use
catalog.System_Account
instead ofhardcoded uint32(0)
• Improved system account constant usage for
consistency
types.go
Add vector array type description formatting
pkg/container/types/types.go
• Added
DescString
method cases forT_array_float32
andT_array_float64
• Returns formatted vector type descriptions like
"VECF32(128)" and "VECF64(128)"
hnsw_search.go
Update HNSW search to use generic types
pkg/sql/colexec/table_function/hnsw_search.go
• Updated
newHnswAlgoFn
to use genericNewHnswSearch[float32]
call•
Modified HNSW search algorithm instantiation with float32 type
parameter
data_retriever.go
Add account and table ID getters to data retriever
pkg/iscp/data_retriever.go
• Added
GetAccountID()
andGetTableID()
methods toDataRetrieverImpl
•
Provides access to account and table identifiers for data retrieval
operations
types.go
Add algorithm parameters to multi-table index structure
pkg/sql/plan/types.go
• Added
IndexAlgoParams
field toMultiTableIndex
struct• Enhanced
multi-table index structure to store algorithm parameters
types.go
Extend DataRetriever interface with ID getters
pkg/iscp/types.go
• Added
GetAccountID()
andGetTableID()
methods toDataRetriever
interface
• Extended interface to provide access to account and table
identifiers
vector_hnsw.result
Update vector dimension error message format
test/distributed/cases/vector/vector_hnsw.result
• Updated error message format for vector dimension mismatch
• Changed
from "vector ops between different dimensions" to "expected vector
dimension X != actual dimension Y"
vector_index.result
Improve vector index dimension error messages
test/distributed/cases/vector/vector_index.result
• Updated error message format for vector dimension validation
•
Improved error message clarity for dimension mismatch scenarios
array.result
Update array dimension error message format
test/distributed/cases/array/array.result
• Updated vector dimension error messages to use clearer format
•
Changed error text to "expected vector dimension X != actual dimension
Y"
15 files
index_consumer_test.go
Add comprehensive test suite for index consumer
pkg/iscp/index_consumer_test.go
• Adds comprehensive test suite for index consumer functionality
•
Includes mock implementations for data retrieval, SQL execution, and
error handling
• Tests both snapshot and tail data processing
scenarios for HNSW indexes
• Validates SQL generation and CDC
operation handling
sync_test.go
Add comprehensive HNSW CDC synchronization test suite
pkg/vectorindex/hnsw/sync_test.go
• Added comprehensive test suite for HNSW CDC synchronization
functionality
• Tests cover various scenarios: empty sync, upsert,
delete, mixed operations, and multi-file handling
• Includes mock
functions for SQL execution and streaming operations
• Tests handle
shuffled data and large datasets (up to 1M entries)
index_sqlwriter_test.go
Add comprehensive index SQL writer test suite
pkg/iscp/index_sqlwriter_test.go
• Added test suite for index SQL writers (fulltext, HNSW, IVF-flat)
•
Tests cover insert, upsert, delete operations for different index
types
• Includes tests for composite primary keys and various data
types
• Validates SQL generation for different vector index algorithms
util_test.go
Add utility tests for data type conversion and SQL generation
pkg/iscp/util_test.go
• Added utility tests for data type conversion and SQL generation
•
Tests various data types including JSON, arrays, dates, decimals,
UUIDs
• Validates proper SQL formatting for different vector and
scalar types
• Includes comprehensive type conversion validation
search_test.go
Enhance HNSW search tests with multi-file support
pkg/vectorindex/hnsw/search_test.go
• Enhanced existing search tests with multi-file support
• Added mock
functions for 2-file scenarios and catalog operations
• Extended test
coverage for metadata and index batch creation
• Added utility
functions for creating test batches with different file configurations
model_test.go
Add comprehensive HNSW model test suite
pkg/vectorindex/hnsw/model_test.go
• Added comprehensive test suite for HNSW model operations
• Tests
model loading, searching, adding/removing vectors, and SQL generation
• Includes error handling tests for nil model scenarios
• Validates
model state management and persistence operations
func_hnsw_test.go
Add HNSW CDC update function test suite
pkg/sql/plan/function/func_hnsw_test.go
• Added test suite for HNSW CDC update function
• Tests various error
conditions and parameter validation
• Validates null parameter
handling and JSON parsing
• Ensures proper error handling for invalid
inputs
build_test.go
Update HNSW build tests for generic type system
pkg/vectorindex/hnsw/build_test.go
• Updated build tests to use generic HNSW types
• Modified test
functions to work with
HnswModel[float32]
instead ofHnswSearchIndex
•
Updated constructor calls to use generic type parameters
• Maintained
existing test functionality with new type system
mysql_sql_test.go
Add ASYNC keyword test cases for index creation
pkg/sql/parsers/dialect/mysql/mysql_sql_test.go
• Added test cases for
ASYNC
keyword in fulltext and vector indexcreation statements
• Updated expected output to include
ASYNC
keywordin uppercase format
types_test.go
Add vector index CDC operations test suite
pkg/vectorindex/types_test.go
• New test file for vector index CDC operations
• Tests Insert,
Delete, Upsert operations and JSON serialization
• Validates CDC state
management and JSON output format
vector_ivf_async.result
IVF async vector index test results
test/distributed/cases/vector/vector_ivf_async.result
• Test results for IVF vector index with ASYNC keyword functionality
•
Validates async index creation, data insertion, and vector similarity
queries
• Demonstrates proper async index behavior with sleep delays
vector_ivf_async.sql
IVF async vector index test cases
test/distributed/cases/vector/vector_ivf_async.sql
• Test cases for IVF vector indexes with ASYNC keyword
• Tests index
creation, data loading, and vector similarity searches
• Includes
sleep statements to allow async operations to complete
vector_hnsw_async.result
HNSW async vector index test results
test/distributed/cases/vector/vector_hnsw_async.result
• Test results for HNSW vector index with ASYNC functionality
•
Validates async HNSW index creation and vector operations
• Shows
proper handling of CDC updates and similarity searches
fulltext_async.sql
Fulltext async index test cases
test/distributed/cases/fulltext/fulltext_async.sql
• Test cases for fulltext index with ASYNC keyword support
• Tests
async fulltext index creation and search functionality
• Includes
multilingual content and null value handling
fulltext_async.result
Fulltext async index test results
test/distributed/cases/fulltext/fulltext_async.result
• Test results for async fulltext index functionality
• Validates
async fulltext search operations and result accuracy
• Shows proper
handling of multilingual and null content
1 files
build.go
Refactor HNSW build to use generic types and shared model
pkg/vectorindex/hnsw/build.go
• Refactors HNSW build functionality to use generic types with
HnswBuild[T types.RealNumbers]
• Removes
HnswBuildIndex
struct andrelated methods (moved to model.go)
• Updates function signatures and
type definitions to support generic vector types
• Maintains
multi-threaded building capabilities with channel-based communication
1 files
function_id_test.go
Add HNSW CDC update function ID
pkg/sql/plan/function/function_id_test.go
• Adds
HNSW_CDC_UPDATE
function ID (349) to predefined function IDsmap
• Updates
FUNCTION_END_NUMBER
from 349 to 350 to accommodate newfunction
3 files
watermark_updater.go
Fix ISCP watermark updater null handling bugs
pkg/iscp/watermark_updater.go
• Fixed bug in
unregisterJobsByDBName
to handle empty tableIDs array•
Corrected null check index in
queryIndexLog
function• Added
conditional execution to prevent SQL errors
iteration.go
Improve ISCP iteration error handling and context setup
pkg/iscp/iteration.go
• Added error handling for
CollectChanges
function call• Enhanced
consumer execution with proper tenant context setup
• Fixed context
propagation for system account operations
build_ddl.go
Fix fulltext index algorithm parameters processing
pkg/sql/plan/build_ddl.go
• Fixed fulltext index table building to always process
IndexAlgoParams
• Removed conditional check that prevented parameter
processing
3 files