-
Notifications
You must be signed in to change notification settings - Fork 0
Clp integration s3 cleanup #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThis pull request introduces several new components and updates across the repository. An exception is added in the Changes
Sequence Diagram(s)sequenceDiagram
participant U as User/System
participant CF as ClpConnectorFactory
participant C as ClpConnector
participant DS as ClpDataSource
U->>CF: newConnector(id, config, executor)
CF->>C: Instantiate ClpConnector with ClpConfig
C->>U: Return new ClpConnector instance
U->>C: Invoke createDataSource(...)
C->>DS: Create and return ClpDataSource instance
sequenceDiagram
participant DS as ClpDataSource
participant Cur as Cursor
participant QR as QueryRunner
DS->>Cur: addSplit(split)
Cur->>Cur: load_archive()
Cur->>Cur: execute_query(query, output_columns)
Cur->>QR: Pass query execution results
QR->>Cur: Return data/results
Cur->>DS: Provide fetched rows
DS->>DS: next() returns RowVector with data
Tip ⚡🧪 Multi-step agentic review comment chat (experimental)
✨ Finishing Touches
🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 9
🧹 Nitpick comments (42)
CMake/FindANTLR.cmake (3)
18-18
: Address the TODO regarding ANTLR cleanup.You have a TODO comment here suggesting cleanup of the ANTLR CMake files. Would you like assistance to refactor or remove outdated parts? Taking action on this can help maintain clarity and reduce technical debt.
37-54
: Consider adding a fallback for ANTLR version detection failures.When the
execute_process(...)
invocation fails to detect the ANTLR version, you currently terminate with an error. It can be beneficial to provide a fallback or more descriptive error handling to guide users in correcting their environment setup.
55-161
: Ensure the generated ANTLR outputs don’t clutter source directories.The macro’s logic to generate source files in
antlr4cpp_generated_src
underCMAKE_CURRENT_BINARY_DIR
is typically good practice. Still, be sure your.gitignore
excludes or manages generated files. This helps avoid accidental staging of these files and keeps the source tree clean.velox/connectors/clp/ClpTableHandle.cpp (2)
4-6
: Consider whether a pass-through implementation is necessary.The
toString()
method is currently just forwarding to the parent class implementation. If no specialized behavior is needed, consider whether this method needs to be explicitly implemented at all.
8-10
: Consider whether a pass-through implementation is necessary.Similarly, the
serialize()
method is just calling the parent class implementation. If no specialized behavior is required, this method may be unnecessary.velox/connectors/clp/ClpColumnHandle.h (2)
27-29
: Consider whether const member variables are appropriate for all fields.Using const member variables ensures immutability, which is good for thread safety. However, it also means these values cannot be modified after construction. Ensure this restriction aligns with the intended usage of the class.
6-30
: Consider adding serialization/deserialization support.For connector components, serialization and deserialization capabilities are often needed. Consider adding methods like
serialize()
and a staticdeserialize()
method if this class needs to be transferred over the network or persisted.velox/connectors/clp/CMakeLists.txt (1)
3-4
: Ensure SOURCE_PATH_SIZE definition is properly scopedThe SOURCE_PATH_SIZE definition is added globally using add_definitions(). Consider using target_compile_definitions() to scope this definition to only the targets that need it, rather than affecting the entire build.
-string(LENGTH "${CMAKE_SOURCE_DIR}/" SOURCE_PATH_SIZE) -add_definitions("-DSOURCE_PATH_SIZE=${SOURCE_PATH_SIZE}") +string(LENGTH "${CMAKE_SOURCE_DIR}/" SOURCE_PATH_SIZE) +target_compile_definitions(velox_clp_connector PRIVATE + "SOURCE_PATH_SIZE=${SOURCE_PATH_SIZE}")velox/connectors/clp/search_lib/CMakeLists.txt (1)
14-152
: Consider organizing source files into logical groups with comments.The source file list is quite extensive and would benefit from organization into logical groups with comments explaining each group's purpose (e.g., readers, writers, parsers, etc.). This would enhance maintainability and make it easier for developers to understand the codebase structure.
CMakeLists.txt (3)
126-126
: CLP connector enabled by default.The CLP connector is enabled by default (
VELOX_ENABLE_CLP_CONNECTOR
is set toON
), which might increase build times and dependencies for users who don't need this connector. Consider setting it toOFF
by default to follow the pattern of other optional connectors like S3, GCS, and ABFS.-option(VELOX_ENABLE_CLP_CONNECTOR "Build CLP connector." ON) +option(VELOX_ENABLE_CLP_CONNECTOR "Build CLP connector." OFF)
489-501
: Inconsistent code formatting in conditional blocks.The conditional blocks for CLP connector dependencies have inconsistent spacing. There's a space after "if" but before the opening parenthesis, which differs from the coding style used in the rest of the file.
-if (${VELOX_ENABLE_CLP_CONNECTOR}) +if(${VELOX_ENABLE_CLP_CONNECTOR})Similar fix should be applied to the second conditional block as well.
489-496
: Consider adding explicit version requirements for all external dependencies.While spdlog has a version requirement (1.12.0), other dependencies like msgpack-cxx don't have explicit version requirements. This could lead to compatibility issues if different versions are used.
velox_set_source(msgpack-cxx) - velox_resolve_dependency(msgpack-cxx) + velox_resolve_dependency(msgpack-cxx 4.1.1)velox/connectors/clp/ClpTableHandle.h (2)
8-14
: Inconsistent parameter handling in constructor.The constructor takes
tableName
as a const reference but stores a copy, whilequery
is taken as a shared_ptr and moved. Consider consistent parameter handling, especially if you're taking ownership of both parameters.ClpTableHandle( std::string connectorId, - const std::string& tableName, + std::string tableName, std::shared_ptr<std::string> query) : ConnectorTableHandle(connectorId), - tableName_(tableName), + tableName_(std::move(tableName)), query_(std::move(query)) {}
16-22
: Unnecessary nodiscard attribute on accessor methods.Using the
[[nodiscard]]
attribute on simple getters is unnecessary as they don't produce results that should be used. This attribute is more appropriate for operations that return a value that should not be ignored.- [[nodiscard]] const std::string& tableName() const { + const std::string& tableName() const { return tableName_; } - [[nodiscard]] const std::shared_ptr<std::string>& query() const { + const std::shared_ptr<std::string>& query() const { return query_; }CMake/ExternalAntlr4Cpp.cmake (1)
135-138
: Consider enabling C++ standard configuration.The file has commented-out options for setting the C++ standard for ANTLR4. Consider uncommenting one of these lines to ensure compatibility with the project's C++ standard (C++17).
-DCMAKE_BUILD_TYPE:STRING=${CMAKE_BUILD_TYPE} -DWITH_STATIC_CRT:BOOL=${ANTLR4_WITH_STATIC_CRT} -DDISABLE_WARNINGS:BOOL=ON # -DCMAKE_CXX_STANDARD:STRING=17 # if desired, compile the runtime with a different C++ standard - # -DCMAKE_CXX_STANDARD:STRING=${CMAKE_CXX_STANDARD} # alternatively, compile the runtime with the same C++ standard as the outer project + -DCMAKE_CXX_STANDARD:STRING=${CMAKE_CXX_STANDARD} # compile the runtime with the same C++ standard as the outer project INSTALL_COMMAND "" EXCLUDE_FROM_ALL 1)velox/connectors/clp/ClpConnector.cpp (3)
5-11
: Ensure executor pointer validity.
The constructor setsexecutor_
but does not verify its validity. Consider adding a check or defensive code in caseexecutor
is null.
13-26
: Consider error handling for invalid column handles.
While this method currently relies onClpDataSource
for validation, a pre-check here could produce more targeted error messages for the caller.
28-34
: Implement or track unimplemented data sink functionality.
createDataSink
is currently unimplemented. Ensure you have a plan or placeholder test to track this missing functionality.Do you want to create a stub that returns a descriptive error to alert calling code?
velox/connectors/clp/ClpDataSource.cpp (2)
15-92
: Use consistent logging instead of std::cout.
Line 128 prints withstd::cout
, which may not align with Velox logging conventions. Consider replacing withVELOX_INFO
,spdlog
, or another consistent logging mechanism.
94-112
: Validate split details further.
The logic branches oninputSource_
but does not safeguardarchiveDir_
orarchiveId
. If they are empty or invalid, the cursor might not initialize properly.velox/connectors/clp/search_lib/OrderedProjection.cpp (2)
32-61
: Confirm column type coverage.
The switch statement handles the knownColumnType
cases. If new types are introduced later, ensure they are also supported in this method to prevent runtime exceptions.
70-128
: Improve diagnostic details for partial matches.
When a column partially matches a path and then fails later tokens, there is no detailed log or user-facing error. Consider logging or throwing a specialized error to aid debugging in complex schemas.velox/connectors/clp/search_lib/Cursor.cpp (4)
16-18
: Discourage usage ofusing namespace
in the implementation file
While convenient, this can introduce name collisions or confusion about symbol origins in larger codebases. Consider using fully qualified names to improve maintainability and clarity.
75-75
: Address theTODO
comment
Line 75 references aTODO: fix it
. Leaving it unresolved might create confusion for future maintainers. Would you like assistance in opening an issue to track this item or proposing an immediate fix?
153-174
: Consider adding dedicated unit tests for query preprocessing
The parsing and logical transformations here are non-trivial. Adding targeted tests will enable early detection of syntax or logical errors and ensure robustness.
265-334
: Refactor duplicatedfetch_next
methods
Both overloads share similar logic for cycling through archives and schemas. Consider introducing a common helper function or reorganizing the flow to reduce code duplication and simplify maintenance.Also applies to: 336-405
velox/connectors/clp/ClpConfig.h (1)
1-1
: Apply required formatting
The pipeline indicates formatting issues at line 1. Kindly run the project’s standard formatting tool (e.g., clang-format) to resolve these warnings.🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/search_lib/OrderedProjection.h (2)
1-1
: Run code formatting on file
The pipeline has flagged format issues. Running the consistent formatting tool will keep code style uniform and resolve any spacing or alignment discrepancies.🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
11-19
: Improve error messages inOperationFailed
Providing clearer error messages (e.g., including a descriptive string) facilitates easier troubleshooting. Consider augmenting the exception with contextual details about the operation that failed.velox/connectors/clp/ClpDataSource.h (3)
31-40
: Clarify concurrency behaviour for split addition.The
addSplit
method is declared, but the concurrency model is unclear. If multiple threads may invokeaddSplit
, consider thread-safety measures (e.g. locking a mutex) to avoid race conditions.
42-53
: Consider atomic or synchronized counters.
getCompletedBytes()
andgetCompletedRows()
return totals from private members that could be updated concurrently if this class is used in a multi-threaded environment. Use atomic operations or appropriate locking for reliable concurrency support.
55-71
: Use of Boost.Filesystem instead of std::filesystem.C++17 and later provide
<filesystem>
in the standard library. Depending on your minimum compiler requirements, consider switching tostd::filesystem
to reduce external dependencies.velox/connectors/clp/search_lib/Cursor.h (3)
1-16
: Harmonize ingestion of standard and library headers.Ensure that your inclusion of headers like
<optional>
,<vector>
, and<string>
is consistent across the codebase. This helps maintain clarity and consistency for future maintainers.
19-28
: Extend error codes for thorough troubleshooting.The
ErrorCode
enum might benefit from additional entries to capture different granular errors (e.g. network failures) if your system can encounter them. This would facilitate more refined error handling in the future.
30-35
: Document theField
structure.Consider including brief documentation for what each field member represents and how to interpret
ColumnType
. This helps clarify usage for new contributors or future references.velox/connectors/clp/ClpConnector.h (3)
1-5
: Apply code formatting to align with project standards.The pipeline has flagged format issues in this file. Please run the recommended code style tool or apply the provided patch to resolve these warnings.
🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
14-18
: Provide rationale for excluding dynamic filters.
canAddDynamicFilter()
always returns false. If dynamic filters are valuable in the future, consider adding references or comments either explaining the limitation or pointing to JIRA tasks for potential implementation.
35-40
: Ensure the data sink is implemented or intentionally left out.
createDataSink
is declared but may not be fully implemented if CLP is read-only. If so, consider clarifying this with either a comment or aVELOX_NYI
.Would you like me to add a placeholder for
createDataSink
with aVELOX_NYI
or a comment indicating that data insertion is not supported?velox/connectors/clp/search_lib/QueryRunner.h (1)
78-80
: Emptyinit
method implementation.
It appears that thisinit
method is intentionally left empty. If it is a placeholder, consider adding a TODO comment or removing it if it is no longer needed.velox/connectors/clp/search_lib/QueryRunner.cpp (3)
108-127
: Suspicious debugging code at line 118.
Inside thefetch_next
method withVectorPtr
s, the statementint a = 1;
does nothing. This might be leftover debug code. Consider removing it to avoid confusion.- int a = 1;
278-352
: Usefalse == ret
instead of!ret
for consistency with coding guidelines.
On line 344, the code uses!ret
. The guidelines recommend preferringfalse == ret
rather than!ret
. Updating this would maintain style consistency.- ret = cur->is_inverted() ? !ret : ret; + ret = cur->is_inverted() ? (false == ret) : ret;
354-412
: Use const reference to avoid copying map entries.
Within the loop on line 377 (for (auto entry : m_datestring_readers)
), you are copying the map entries rather than referencing them. Using a const reference might improve efficiency and clarity.-for (auto entry : m_datestring_readers) { +for (auto const& entry : m_datestring_readers) {
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
third-party/antlr/antlr-4.13.1-complete.jar
is excluded by!**/*.jar
📒 Files selected for processing (27)
.gitignore
(1 hunks).gitmodules
(1 hunks)CMake/ExternalAntlr4Cpp.cmake
(1 hunks)CMake/FindANTLR.cmake
(1 hunks)CMake/resolve_dependency_modules/msgpack-cxx.cmake
(1 hunks)CMake/resolve_dependency_modules/spdlog.cmake
(1 hunks)CMakeLists.txt
(4 hunks)Makefile
(1 hunks)velox/connectors/CMakeLists.txt
(1 hunks)velox/connectors/clp/CMakeLists.txt
(1 hunks)velox/connectors/clp/ClpColumnHandle.h
(1 hunks)velox/connectors/clp/ClpConfig.h
(1 hunks)velox/connectors/clp/ClpConnector.cpp
(1 hunks)velox/connectors/clp/ClpConnector.h
(1 hunks)velox/connectors/clp/ClpConnectorSplit.h
(1 hunks)velox/connectors/clp/ClpDataSource.cpp
(1 hunks)velox/connectors/clp/ClpDataSource.h
(1 hunks)velox/connectors/clp/ClpTableHandle.cpp
(1 hunks)velox/connectors/clp/ClpTableHandle.h
(1 hunks)velox/connectors/clp/search_lib/CMakeLists.txt
(1 hunks)velox/connectors/clp/search_lib/Cursor.cpp
(1 hunks)velox/connectors/clp/search_lib/Cursor.h
(1 hunks)velox/connectors/clp/search_lib/OrderedProjection.cpp
(1 hunks)velox/connectors/clp/search_lib/OrderedProjection.h
(1 hunks)velox/connectors/clp/search_lib/QueryRunner.cpp
(1 hunks)velox/connectors/clp/search_lib/QueryRunner.h
(1 hunks)velox/connectors/clp/search_lib/clp_src
(1 hunks)
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ...
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
velox/connectors/clp/ClpTableHandle.cpp
velox/connectors/clp/ClpDataSource.cpp
velox/connectors/clp/search_lib/OrderedProjection.cpp
velox/connectors/clp/search_lib/Cursor.cpp
velox/connectors/clp/ClpConnector.cpp
velox/connectors/clp/search_lib/QueryRunner.cpp
🪛 GitHub Actions: Run Checks
velox/connectors/CMakeLists.txt
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/CMakeLists.txt
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
CMake/resolve_dependency_modules/spdlog.cmake
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
CMake/resolve_dependency_modules/msgpack-cxx.cmake
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/search_lib/CMakeLists.txt
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
CMake/ExternalAntlr4Cpp.cmake
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
CMakeLists.txt
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/ClpDataSource.cpp
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
CMake/FindANTLR.cmake
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/search_lib/OrderedProjection.h
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/ClpConfig.h
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/ClpDataSource.h
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/ClpConnector.h
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
🪛 Gitleaks (8.21.2)
velox/connectors/clp/ClpConfig.h
17-17: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
🔇 Additional comments (26)
velox/connectors/clp/search_lib/clp_src (1)
1-1
: Subproject Commit Reference CheckThis subproject commit with hash
8fce672d4ec3a45010e3afb0f0f5b0e44d98fe04
has been added as expected. Please ensure that this commit hash accurately reflects the state of the subproject you intend to integrate with the CLP connector, and double-check that all necessary submodule updates have been applied..gitmodules (1)
1-3
: Please ensure submodule commit pinning for predictability.Adding a new submodule referencing an external repository is fine. However, you might consider pinning the submodule to a specific commit or tag instead of tracking a moving branch. This prevents unexpected changes in upstream code and ensures reproducible builds.
Makefile (1)
22-22
: Confirm the rationale for disabling warnings as errors.Changing
TREAT_WARNINGS_AS_ERRORS
to0
suppresses potential build-time awareness of code issues. Be sure this is an intentional decision. Otherwise, you risk overlooking valid compiler warnings that may degrade code health or lead to regressions.CMake/FindANTLR.cmake (1)
21-23
: Validate the fixed OpenJDK installation path on macOS.By force-setting
JAVA_HOME
to/usr/local/opt/openjdk@11/
, you might encounter compatibility issues on developer machines that use a different installation path. Consider searching for the correct Java path or falling back to a default if OpenJDK 11 is not found in this exact directory..gitignore (1)
149-149
: LGTM: The addition of the ANTLR JAR exception is appropriate.This change allows Git to track the ANTLR JAR file while still ignoring all other JAR files, which is necessary for the CLP connector implementation and dependencies.
velox/connectors/CMakeLists.txt (1)
20-22
: LGTM: The conditional inclusion of the CLP connector follows the pattern used for other connectors.This change enables the build system to conditionally include the CLP connector based on the
VELOX_ENABLE_CLP_CONNECTOR
variable, which is consistent with how other connectors are integrated in this file.velox/connectors/clp/ClpColumnHandle.h (2)
8-12
: LGTM: The constructor is well-designed and initializes all member variables.The constructor properly takes all necessary parameters and initializes the member variables in the initialization list, which is good practice.
14-24
: LGTM: The accessor methods are correctly implemented.The get methods are properly defined as const methods returning const references where appropriate.
velox/connectors/clp/ClpConnectorSplit.h (1)
1-25
: Well-structured split implementation with appropriate class designThis new class follows good C++ practices with a clear constructor, proper use of const member variables, and appropriate base class initialization. The toString() method provides useful diagnostic output.
I appreciate the use of [[nodiscard]] which ensures callers don't accidentally ignore return values, and the consistent naming conventions across the class structure.
CMake/resolve_dependency_modules/spdlog.cmake (1)
1-34
:❓ Verification inconclusive
CMake dependency module correctly handles spdlog integration
The spdlog dependency is properly configured with version pinning, SHA256 checksum verification, and correct URL construction. The use of FetchContent with OVERRIDE_FIND_PACKAGE is appropriate for ensuring consistent builds.
I like that you've enabled SPDLOG_FMT_EXTERNAL to avoid potential conflicts with other fmt library versions.
🏁 Script executed:
#!/bin/bash # Verify if the specified spdlog version is still current and appropriate curl -s https://api.github.com/repos/gabime/spdlog/releases/latest | grep -o '"tag_name": "v[0-9.]*"' | cut -d '"' -f 4Length of output: 129
Action Required: Confirm Intentional spdlog Version Pinning
The dependency module is correctly configured with version pinning, SHA256 checksum verification, and proper URL construction. However, our verification shows that the latest release of spdlog is v1.15.1, whereas the module currently pins to v1.12.0. Please confirm whether using v1.12.0 is intentional for stability or compatibility reasons.
- Verify that the chosen version (v1.12.0) is deliberately retained despite the more recent upstream release.
- Ensure that any rationale for not updating to v1.15.1 is documented if necessary.
🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/CMakeLists.txt (2)
6-11
: Clean library definition with proper source filesThe library is correctly defined as an OBJECT library with appropriate source files.
13-23
: Complete and well-structured dependency listThe dependencies are properly specified with a clear PRIVATE designation. The mix of internal (clp-s-search, velox_connector) and external (simdjson, msgpack-cxx, spdlog, etc.) dependencies looks appropriate for this connector.
CMake/resolve_dependency_modules/msgpack-cxx.cmake (1)
1-33
:✅ Verification successful
Well-configured msgpack-cxx dependency module
The msgpack-cxx dependency is properly configured with version pinning, SHA256 checksum verification, and correct URL construction. The use of FetchContent with OVERRIDE_FIND_PACKAGE ensures consistent builds.
🏁 Script executed:
#!/bin/bash # Verify if the specified msgpack-cxx version is still current curl -s https://api.github.com/repos/msgpack/msgpack-c/releases/latest | grep -o '"tag_name": "[^"]*"' | cut -d '"' -f 4Length of output: 132
Dependency Module Configuration Approved
The msgpack-cxx dependency module is correctly configured:
- The version pinning (cpp-7.0.0) is confirmed to be current.
- The SHA256 checksum verification and URL construction are implemented properly.
- The use of FetchContent with OVERRIDE_FIND_PACKAGE ensures consistent builds.
No further modifications are required.
🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
velox/connectors/clp/search_lib/CMakeLists.txt (1)
164-164
:❓ Verification inconclusive
C++20 standard differs from project default.
The project uses C++17 as the standard (as set in the main CMakeLists.txt), but this library is configured to use C++20. Consider documenting the reason for this deviation or aligning with the project standard if there are no specific C++20 features required.
🏁 Script executed:
#!/bin/bash # Check for C++20 specific features in the search library code rg -l "\[\[likely\]\]|\[\[unlikely\]\]|requires|concept|constinit|consteval|using enum|co_await|co_yield|co_return|std::span|std::format" ${CLP_SRC_DIR}/clp_s/Length of output: 195
Action Required: Verify C++20 Usage or Document Necessity
The build configuration in
velox/connectors/clp/search_lib/CMakeLists.txt
sets the compile standard to C++20 viatarget_compile_features(clp-s-search PRIVATE cxx_std_20)while the overall project defaults to C++17. The initial verification attempt using a regex search failed due to an incorrect directory path, so we couldn’t confirm whether any C++20-specific language features (such as
[[likely]]
,[[unlikely]]
, concepts, or coroutines) are actually in use.
- Next Steps:
- Re-run the search within
velox/connectors/clp/search_lib
to verify if any C++20 features are present. For example:#!/bin/bash rg -l "\[\[likely\]\]|\[\[unlikely\]\]|requires|concept|constinit|consteval|using enum|co_await|co_yield|co_return|std::span|std::format" velox/connectors/clp/search_lib/- If no C++20 features are found, consider either aligning this library’s standard with the project default (C++17) or document clearly why C++20 is necessary.
- If there are C++20 features being used, document their usage and rationale in the code or project documentation.
Please review and verify this change to ensure that the library’s use of C++20 is both necessary and well-documented.
Please run this script to verify whether any C++20-specific features are present in the search library code. This will help us decide whether to align the library’s standard with the project default (C++17) or document the need for C++20.
🏁 Script executed:
#!/bin/bash # Re-run the search for C++20-specific features within the correct directory. rg -l "\[\[likely\]\]|\[\[unlikely\]\]|requires|concept|constinit|consteval|using enum|co_await|co_yield|co_return|std::span|std::format" velox/connectors/clp/search_lib/Length of output: 294
Ensure C++20 Standard Usage Is Justified
The file
velox/connectors/clp/search_lib/CMakeLists.txt
configures theclp-s-search
target to compile with C++20, which deviates from the project’s default of C++17. The initial automated search did not return evidence of C++20-specific features in this library; however, this outcome may be inconclusive.
- Action Items:
- Please run the above script to verify if any C++20 features (e.g.
[[likely]]
,concept
s, coroutine keywords, etc.) are present in the library.- If the search remains empty, consider either aligning the library with the project default (C++17) or explicitly document the requirement for C++20 if specific features are needed.
- Conduct a manual review if necessary to confirm that no overlooked C++20-specific code is in use.
🏁 Script executed:
#!/bin/bash # First, verify that the directory exists. if [ -d "velox/connectors/clp/search_lib" ]; then echo "Directory exists: velox/connectors/clp/search_lib" # Now search for C++20-specific features within this directory. rg -l "\[\[likely\]\]|\[\[unlikely\]\]|requires|concept|constinit|consteval|using enum|co_await|co_yield|co_return|std::span|std::format" "velox/connectors/clp/search_lib/" else echo "Directory 'velox/connectors/clp/search_lib' not found." fiLength of output: 448
C++ Standard Discrepancy: Verify and Document Necessity
In
velox/connectors/clp/search_lib/CMakeLists.txt
, the targetclp-s-search
is configured to compile with C++20 via:target_compile_features(clp-s-search PRIVATE cxx_std_20)However, the overall project uses C++17 by default.
A search for C++20-specific features in the
velox/connectors/clp/search_lib/
directory did not return any results.If no C++20 features (e.g.
[[likely]]
, concepts, coroutines, etc.) are being used, either align this library with the project standard (C++17) or document the rationale for using C++20.Please review and confirm the intended standard for this library.
velox/connectors/clp/ClpTableHandle.h (1)
24-26
:✅ Verification successful
Method declarations without implementation.
The
toString()
andserialize()
methods are declared but their implementations are not visible in this file. Ensure that these are properly implemented in a corresponding .cpp file.
🏁 Script executed:
#!/bin/bash # Check for implementations of toString and serialize methods echo "Searching for ClpTableHandle::toString implementation:" rg -A 3 "ClpTableHandle::toString" --type cpp velox/connectors/clp/ echo "Searching for ClpTableHandle::serialize implementation:" rg -A 3 "ClpTableHandle::serialize" --type cpp velox/connectors/clp/Length of output: 933
Revised Review Comment – Implementation Verified
We've verified that the implementations of the declared methods are present in the corresponding .cpp file. Specifically:
ClpTableHandle::toString()
is implemented invelox/connectors/clp/ClpTableHandle.cpp
ClpTableHandle::serialize()
is implemented in the same fileThere are no further corrections needed based on the current state of the code.
velox/connectors/clp/ClpConnector.cpp (1)
36-36
: Connector factory registration looks good.
No issues here.velox/connectors/clp/ClpDataSource.cpp (1)
114-136
: Check for data misalignment.
Vectors are created assuming the cursor returns data in the same order and size. If the cursor returns fewer columns or different ordering, the logic could break silently. Consider validating the vector layout against the fields returned fromcursor_
.Are you sure that field order always matches the
fields_
vector? If not, consider verifying via a test or an assertion.velox/connectors/clp/search_lib/OrderedProjection.cpp (2)
13-30
: Prevent potential double insertion.
The check in lines 21-25 avoids duplicates, which is good. Ensure thatColumnDescriptor::operator==
aligns with your intended semantics (e.g., pointer equality vs. field-by-field comparisons).Could you confirm that the equality operator checks more than just the object pointer?
63-68
: Method design is clear.
The loop overm_selected_columns
effectively resolves each column. No concerns here.velox/connectors/clp/ClpConfig.h (1)
17-17
: False positive potential from static analysis
The string assigned tokS3Bucket
appears to be a configuration key rather than an API credential. You may safely disregard the warning or add a rule to suppress this false positive.🧰 Tools
🪛 Gitleaks (8.21.2)
17-17: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
velox/connectors/clp/ClpDataSource.h (2)
1-11
: Check the portability of POSIX headers.The includes
<unistd.h>
and<sys/wait.h>
are not portable across all platforms (e.g. Windows). If cross-platform builds are required, consider conditionally including these headers or using a more portable abstraction.Would you like me to search for any usage of POSIX-specific functions within the codebase to confirm whether a platform-agnostic approach is necessary?
🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
20-30
: Validate your pointer arguments.Ensure that
clpConfig
,tableHandle
, andcolumnHandles
are valid references to prevent potential null pointer dereferences and runtime errors.Consider verifying their usage in the codebase with a script scan or test coverage to confirm.
velox/connectors/clp/search_lib/Cursor.h (1)
44-81
: Validate input query parameters.Within
execute_query
, ensure that the received query string and output columns undergo robust validation to handle user-supplied or potentially malformed queries.velox/connectors/clp/ClpConnector.h (2)
7-13
: Confirm executor ownership.Storing a raw pointer
folly::Executor* executor
suggests that the connector does not manage the executor lifecycle. Ensure that the executor remains valid for as long as this object can be used.
23-30
: Validate column handles increateDataSource
.Make sure that invalid or missing column handles do not lead to undefined behaviour in dependent classes such as
ClpDataSource
.velox/connectors/clp/search_lib/QueryRunner.h (1)
87-89
:filter
method's always-false return might be a placeholder.
This override currently always returnsfalse
, which could render the filter logic non-operational. Verify whether this is intended or a placeholder that needs further implementation.
@@ -0,0 +1,34 @@ | |||
# Copyright (c) Facebook, Inc. and its affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Address the formatting issue flagged by pipeline
The pipeline check indicates there are formatting issues in this file. Please apply the provided formatting patch to resolve this warning.
🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
@@ -0,0 +1,26 @@ | |||
add_subdirectory(search_lib) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix formatting issues flagged by the pipeline
Please apply the formatting fix provided by the pipeline to resolve the formatting issues.
🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
|
||
velox_include_directories(velox_clp_connector PRIVATE) | ||
|
||
target_compile_features(velox_clp_connector PRIVATE cxx_std_20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
C++20 standard is specified as required
Requiring C++20 standard features is appropriate, but ensure this is consistent with the rest of the Velox project's requirements.
🏁 Script executed:
#!/bin/bash
# Check if C++20 is used consistently across the project
grep -r "cxx_std_" --include="CMakeLists.txt" . | grep -v "cxx_std_20" | head -n 10
Length of output: 279
C++ Standard Inconsistency: Align Required or Justify Divergence
The velox_clp_connector
now requires C++20 (line 26 in velox/connectors/clp/CMakeLists.txt
), but other modules—evidenced by occurrences in velox/experimental/breeze/CMakeLists.txt
and velox/experimental/breeze/test/CMakeLists.txt
—are still using C++17. Please either update these modules to use C++20 to maintain consistency across the Velox project or provide a clear rationale for this divergence.
- File:
velox/connectors/clp/CMakeLists.txt
(line 26; usingcxx_std_20
) - Files in
velox/experimental/breeze
continue withcxx_std_17
@@ -0,0 +1,33 @@ | |||
# Copyright (c) Facebook, Inc. and its affiliates. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix formatting issues flagged by pipeline
Please apply the formatting fixes provided by the pipeline to resolve the formatting issues.
🧰 Tools
🪛 GitHub Actions: Run Checks
[warning] 1-1: Found format issues. Please apply fix using the provided patch.
${CLP_SRC_DIR}/clp_s/BufferViewReader.hpp | ||
${CLP_SRC_DIR}/clp_s/ColumnReader.cpp | ||
${CLP_SRC_DIR}/clp_s/ColumnReader.hpp | ||
${CLP_SRC_DIR}/clp_s/ColumnReader.cpp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duplicate source file entry detected.
The source file ${CLP_SRC_DIR}/clp_s/ColumnReader.cpp
is listed twice in the CLP_SRC_FILES
variable (lines 23 and 25), which may cause build issues or warnings.
- ${CLP_SRC_DIR}/clp_s/ColumnReader.cpp
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
${CLP_SRC_DIR}/clp_s/ColumnReader.cpp | |
set(CLP_SRC_FILES | |
${CLP_SRC_DIR}/clp_s/ColumnReader.cpp # original entry on line 23 | |
- ${CLP_SRC_DIR}/clp_s/ColumnReader.cpp # duplicate entry removed from line 25 | |
# ... any other source files ... | |
) |
[[nodiscard]] const std::shared_ptr<std::string>& query() const { | ||
return query_; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Consider using a more appropriate type for query representation.
Using a shared_ptr<std::string>
for the query seems unusual. If the query is optional, consider using std::optional<std::string>
. If ownership sharing is necessary, document the reason.
Also applies to: 30-30
🏁 Script executed:
#!/bin/bash
# Check how the query is used in related files
echo "Searching for query_ usage in ClpTableHandle:"
rg -A 5 "query_" --type cpp velox/connectors/clp/
echo "Searching for query() method calls:"
rg -A 5 "query\(\)" --type cpp velox/connectors/clp/
Length of output: 19091
Action Required: Query Representation in ClpTableHandle
- In
velox/connectors/clp/ClpTableHandle.h
(lines 20–22), thequery()
method returns astd::shared_ptr<std::string>
used to represent an optional query value. - Usage in
velox/connectors/clp/ClpDataSource.cpp
(and related files) checks whether the pointer is non-null before accessing its contents, which suggests that the intent is to model an optional value. - If the query is indeed optional, please consider replacing
std::shared_ptr<std::string>
withstd::optional<std::string>
to more clearly convey this intent. Alternatively, if shared ownership is necessary, add documentation to explain why a shared pointer was chosen over other alternatives.
Also applies to the similar snippet at line 30.
if(NOT DEFINED ANTLR4_TAG) | ||
# Set to branch name to keep library updated at the cost of needing to rebuild after 'clean' | ||
# Set to commit hash to keep the build stable and does not need to rebuild after 'clean' | ||
set(ANTLR4_TAG master) | ||
endif() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Use of master branch as default tag could lead to non-reproducible builds.
Using the master
branch as the default tag for the ANTLR4 repository could lead to non-reproducible builds as the branch can change over time. Consider using a specific version tag or commit hash for better build reproducibility.
if(NOT DEFINED ANTLR4_TAG)
# Set to branch name to keep library updated at the cost of needing to rebuild after 'clean'
# Set to commit hash to keep the build stable and does not need to rebuild after 'clean'
- set(ANTLR4_TAG master)
+ set(ANTLR4_TAG 4.13.1)
endif()
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
if(NOT DEFINED ANTLR4_TAG) | |
# Set to branch name to keep library updated at the cost of needing to rebuild after 'clean' | |
# Set to commit hash to keep the build stable and does not need to rebuild after 'clean' | |
set(ANTLR4_TAG master) | |
endif() | |
if(NOT DEFINED ANTLR4_TAG) | |
# Set to branch name to keep library updated at the cost of needing to rebuild after 'clean' | |
# Set to commit hash to keep the build stable and does not need to rebuild after 'clean' | |
set(ANTLR4_TAG 4.13.1) | |
endif() |
ErrorCode Cursor::execute_query( | ||
std::string& query, | ||
std::vector<Field>& output_columns) { | ||
m_output_columns = output_columns; | ||
m_query = query; | ||
m_completed_archive_cycles = false; | ||
|
||
m_error_code = preprocess_query(); | ||
if (m_error_code != ErrorCode::Success) { | ||
return m_error_code; | ||
} | ||
|
||
m_end_archive_index = m_current_archive_index; | ||
while (false == m_completed_archive_cycles) { | ||
m_error_code = load_archive(); | ||
if (ErrorCode::InternalError == m_error_code) { | ||
return m_error_code; | ||
} | ||
|
||
if (ErrorCode::Success != m_error_code) { | ||
move_to_next_archive(); | ||
continue; | ||
} | ||
|
||
m_query_runner = std::make_shared<QueryRunner>( | ||
m_expr, | ||
m_schema_match, | ||
m_ignore_case, | ||
m_schema_map, | ||
m_schema_tree, | ||
m_projection, | ||
m_var_dict, | ||
m_log_dict); | ||
|
||
// clear the stage from last run | ||
m_query_runner->populate_string_queries(); | ||
|
||
// probably have another class for query evaluation and filter | ||
while (false == m_completed_schema_cycles) { | ||
m_current_schema_id = m_matched_schemas[m_current_schema_index]; | ||
|
||
m_query_runner->set_schema(m_current_schema_id); | ||
|
||
m_query_runner->populate_searched_wildcard_columns(); | ||
|
||
m_expression_value = m_query_runner->constant_propagate(); | ||
|
||
if (m_expression_value != EvaluatedValue::False) { | ||
m_query_runner->add_wildcard_columns_to_searched_columns(); | ||
if (m_archive_read_stage < ArchiveReadStage::TablesInitialized) { | ||
m_archive_reader.open_packed_streams(); | ||
m_archive_read_stage = ArchiveReadStage::TablesInitialized; | ||
} | ||
|
||
auto& reader = m_archive_reader.read_schema_table( | ||
m_current_schema_id, false, false); | ||
reader.initialize_filter_with_column_map(m_query_runner.get()); | ||
m_error_code = ErrorCode::Success; | ||
m_current_schema_table_loaded = true; | ||
break; | ||
} | ||
|
||
m_current_schema_index = | ||
(m_current_schema_index + 1) % m_matched_schemas.size(); | ||
m_completed_schema_cycles = m_current_schema_index == m_end_schema_index; | ||
m_error_code = ErrorCode::DictionaryNotFound; | ||
m_current_schema_table_loaded = false; | ||
} | ||
|
||
if (m_expression_value != EvaluatedValue::False) { | ||
break; | ||
} | ||
|
||
move_to_next_archive(); | ||
} | ||
|
||
return m_error_code; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Reduce the complexity of execute_query
This method covers numerous steps (parsing, loading, iteration, schema matching). Splitting it into smaller private functions can enhance readability and maintainability.
size_t num_rows, | ||
std::vector<facebook::velox::VectorPtr>& column_vectors); | ||
|
||
size_t fetch_next(size_t num_rows, std::vector<ColumnData>& column_vectors); | ||
|
||
private: | ||
/** | ||
* Moves to the next archive in the list of archives. | ||
*/ | ||
void move_to_next_archive(); | ||
|
||
/** | ||
* Preprocesses the query to. | ||
*/ | ||
ErrorCode preprocess_query(); | ||
|
||
/** | ||
* Loads the archive at the current index. | ||
*/ | ||
ErrorCode load_archive(); | ||
|
||
ErrorCode m_error_code; | ||
bool m_ignore_case; | ||
|
||
std::string m_archive_path; | ||
clp_s::InputSource m_input_source; | ||
std::vector<std::string> m_archive_ids; | ||
size_t m_current_archive_index; | ||
size_t m_end_archive_index; | ||
bool m_completed_archive_cycles; | ||
std::vector<int32_t> m_matched_schemas; | ||
size_t m_current_schema_index; | ||
size_t m_end_schema_index; | ||
bool m_completed_schema_cycles; | ||
int32_t m_current_schema_id; | ||
|
||
ArchiveReadStage m_archive_read_stage; | ||
|
||
std::shared_ptr<clp_s::search::Expression> m_expr; | ||
std::shared_ptr<OrderedProjection> m_projection; | ||
std::string m_query; | ||
std::vector<Field> m_output_columns; | ||
|
||
bool m_current_schema_table_loaded; | ||
|
||
std::shared_ptr<clp_s::search::SchemaMatch> m_schema_match; | ||
|
||
std::shared_ptr<QueryRunner> m_query_runner; | ||
|
||
clp_s::EvaluatedValue m_expression_value; | ||
|
||
clp_s::ArchiveReader m_archive_reader; | ||
clp_s::SchemaReader* m_schema_reader; | ||
std::shared_ptr<clp_s::SchemaTree> m_schema_tree; | ||
std::shared_ptr<clp_s::ReaderUtils::SchemaMap> m_schema_map; | ||
std::shared_ptr<clp_s::VariableDictionaryReader> m_var_dict; | ||
std::shared_ptr<clp_s::LogTypeDictionaryReader> m_log_dict; | ||
std::shared_ptr<clp_s::LogTypeDictionaryReader> m_array_dict; | ||
std::shared_ptr<clp_s::TimestampDictionaryReader> m_timestamp_dict; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
Assess concurrency for archival operations.
fetch_next
, move_to_next_archive
, and load_archive
might be called on multiple threads if your design implies parallel reading. Confirm that you are properly synchronizing access to shared state such as m_current_archive_index
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (9)
velox/connectors/clp/search_lib/Cursor.h (2)
50-54
: Constructor parameter matches member variable nameThe parameter
m_ignore_case
has the same name as the member variable, which could cause confusion. Consider renaming the parameter to avoid shadowing the member variable.- explicit Cursor( - clp_s::InputSource input_source, - std::vector<std::string> archive_paths, - bool m_ignore_case); + explicit Cursor( + clp_s::InputSource input_source, + std::vector<std::string> archive_paths, + bool ignore_case);
90-101
: Incomplete documentation for preprocess_query()The documentation comment for this method is incomplete and doesn't describe what the method does, even though it's a crucial step in query processing.
- /** - * Preprocesses the query to. - */ + /** + * Preprocesses the query to standardize and validate it. + * @return The error code indicating success or failure. + */velox/connectors/clp/search_lib/Cursor.cpp (4)
50-50
: Remove unresolved TODO commentThe TODO comment doesn't specify what needs to be fixed, making it difficult for other developers to understand the intention.
Either provide a more descriptive comment explaining what needs to be fixed or remove the TODO if the issue has been resolved.
73-74
: Improve readability of assignment within conditionalUsing assignment within a conditional statement can be error-prone and less readable, as it's easy to confuse with an equality check.
- if (m_expr = m_schema_match->run(m_expr); - std::dynamic_pointer_cast<EmptyExpr>(m_expr)) { + m_expr = m_schema_match->run(m_expr); + if (std::dynamic_pointer_cast<EmptyExpr>(m_expr)) {
137-138
: Similar assignment within conditional issue in multiple placesSimilar to the previous comment, there are other instances of assignments within conditional statements that could be made clearer.
For example, for lines 137-138:
- if (m_expr = standardize_pass.run(m_expr); - std::dynamic_pointer_cast<EmptyExpr>(m_expr)) { + m_expr = standardize_pass.run(m_expr); + if (std::dynamic_pointer_cast<EmptyExpr>(m_expr)) {Apply similar changes to the other instances at lines 144-145 and 151-152.
Also applies to: 144-145, 151-152
239-379
: Reduce code duplication between the twofetch_next
methodsThere's significant code duplication between the two
fetch_next
methods. Consider refactoring common logic into a private helper method.Create a private template method that handles the common logic, and have both public methods call it with their specific parameters:
template <typename T> size_t fetch_next_impl(size_t num_rows, std::vector<T>& column_vectors) { // Common implementation // ... }Then modify the public methods to use this implementation.
velox/connectors/clp/ClpConnector.cpp (1)
26-32
: Provide an implementation or a clear TODO forcreateDataSink
.
This method is declared but not implemented. Adding a TODO clarifies upcoming work and avoids confusion for other contributors.Do you need help drafting an initial implementation or opening a tracking issue?
velox/connectors/clp/ClpConnector.h (2)
14-18
: Check for potential configuration nullability.
Thoughstd::shared_ptr
typically guards against null usage, consider adding explicit safety checks to prevent runtime errors in edge cases.
35-39
: Method scope is declared but remains unimplemented.
Consider adding aVELOX_NYI
or a documented TODO to ensure clarity in future development.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
velox/connectors/clp/CMakeLists.txt
(1 hunks)velox/connectors/clp/ClpConnector.cpp
(1 hunks)velox/connectors/clp/ClpConnector.h
(1 hunks)velox/connectors/clp/ClpDataSource.cpp
(1 hunks)velox/connectors/clp/search_lib/Cursor.cpp
(1 hunks)velox/connectors/clp/search_lib/Cursor.h
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
- velox/connectors/clp/CMakeLists.txt
- velox/connectors/clp/ClpDataSource.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
velox/connectors/clp/ClpConnector.cpp
velox/connectors/clp/search_lib/Cursor.cpp
🧬 Code Definitions (2)
velox/connectors/clp/ClpConnector.cpp (1)
velox/connectors/clp/ClpConnector.h (5) (5)
ClpConnector
(10:12)id
(55:61)id
(55:59)outputType
(23:29)inputType
(35:39)
velox/connectors/clp/search_lib/Cursor.cpp (1)
velox/connectors/clp/search_lib/Cursor.h (4) (4)
Cursor
(58:61)query
(70:72)num_rows
(80:82)num_rows
(84:84)
🔇 Additional comments (11)
velox/connectors/clp/search_lib/Cursor.h (1)
86-101
: Assess concurrency for archive operations
move_to_next_archive
,load_archive
, and other methods might be called on multiple threads if your design implies parallel reading. Confirm that you are properly synchronizing access to shared state such asm_current_archive_index
.velox/connectors/clp/search_lib/Cursor.cpp (1)
160-237
: Reduce the complexity ofexecute_query
This method covers numerous steps (parsing, loading, iteration, schema matching). Splitting it into smaller private functions can enhance readability and maintainability.
velox/connectors/clp/ClpConnector.cpp (2)
5-9
: Consider validating the incoming config pointer.
Even though it's likely non-null, adding a small check or assertion can help prevent accidental null dereferences.
11-24
: Implementation appears correct and straightforward.
The creation ofClpDataSource
with all relevant parameters is well-organized, and returning a unique pointer is well aligned with ownership semantics.velox/connectors/clp/ClpConnector.h (7)
10-12
: Constructor usage is consistent with the base class.
Accepting bothid
andconfig
ensures flexibility in connector instantiation.
19-21
:canAddDynamicFilter
returning false is acceptable.
If the connector doesn't support dynamic filters, confirming this explicitly avoids unexpected usage elsewhere.
23-30
: Method signature matches the corresponding implementation in the .cpp file.
Everything appears consistent for creating data sources.
31-33
:supportsSplitPreload
returning false is appropriate if unavailable.
Clearly communicates that the connector does not optimize with a split preload mechanism.
42-42
: Good practice to store the specialized config object.
Retainingconfig_
as an internal member fosters a cohesive design, as it centralizes connector-specific configuration.
49-53
:ClpConnectorFactory
constructor properly initializes the connector name and pattern.
InvokingTimestampPattern::init()
upon factory construction is a suitable place to guarantee readiness.
55-61
:newConnector
method is concise and matches the factory pattern.
This approach neatly instantiates theClpConnector
, avoiding complex logic in the factory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (8)
velox/connectors/clp/search_lib/OrderedProjection.cpp (2)
13-30
: Validate exception messages for clarity.
The duplicate column and unresolved descriptor checks are well-handled. However, including a descriptive error message inOperationFailed
would make troubleshooting easier.
32-61
: Add a default case for future-proofing.
Theswitch
statement does not handle a default scenario. Consider adding a default branch (even if empty) to safeguard against unhandledColumnType
values that may be introduced later.+ default: + // Handle or ignore new column types gracefully + break;velox/connectors/clp/search_lib/QueryRunner.h (2)
78-82
: Document the empty method body.
The secondinit
overload is empty. If this method is intentionally a stub, consider adding a brief comment to clarify its status or a TODO tag to indicate future implementation plans.
88-90
: Implement or rename the placeholder filter method.
Currently,filter
always returns false. If the method is expected to perform real filtering, consider implementing it or marking it as a temporary placeholder.velox/connectors/clp/search_lib/QueryRunner.cpp (4)
15-16
: Replace macro with a static helper function.
Using a macro foreval(op, a, b)
can reduce clarity and hinder debugging. A small, inlined helper function or lambda would be more robust and type-safe.-#define eval(op, a, b) (((op) == FilterOperation::EQ) ? ((a) == (b)) : ((a) != (b))) +inline bool eval(FilterOperation op, auto a, auto b) { + if (op == FilterOperation::EQ) { + return a == b; + } + return a != b; +}
279-353
: Complex expression evaluation logic.
Manually tracking evaluation state with a stack is valid but can be hard to maintain. Consider a more traditional recursive evaluation to improve readability and reduce the risk of logical errors.
644-713
: Consider partial matching or substring search.
The current array filter logic checks for an exact string match in each element. If partial or flexible matching becomes a requirement, updating this logic to handle more advanced patterns may be advisable.
1270-1285
: Enhance error handling for date parsing.
If parsing fails or a non-integer operand is inadvertently passed, highlight the issue clearly. Currently,as_int
can silently fail, returning false without an exception.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
velox/connectors/clp/search_lib/Cursor.cpp
(1 hunks)velox/connectors/clp/search_lib/OrderedProjection.cpp
(1 hunks)velox/connectors/clp/search_lib/QueryRunner.cpp
(1 hunks)velox/connectors/clp/search_lib/QueryRunner.h
(1 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- velox/connectors/clp/search_lib/Cursor.cpp
🧰 Additional context used
📓 Path-based instructions (1)
`**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}`: - Prefer `false == ` rather than `!`.
**/*.{cpp,hpp,java,js,jsx,tpp,ts,tsx}
: - Preferfalse == <expression>
rather than!<expression>
.
velox/connectors/clp/search_lib/OrderedProjection.cpp
velox/connectors/clp/search_lib/QueryRunner.cpp
🧬 Code Definitions (1)
velox/connectors/clp/search_lib/QueryRunner.h (1)
velox/connectors/clp/search_lib/QueryRunner.cpp (6) (6)
populate_string_queries
(947-1024)populate_string_queries
(947-948)populate_searched_wildcard_columns
(1026-1057)populate_searched_wildcard_columns
(1026-1027)constant_propagate
(1066-1268)constant_propagate
(1066-1067)
🔇 Additional comments (8)
velox/connectors/clp/search_lib/OrderedProjection.cpp (2)
63-68
: Short and clear method implementation.
The loop cleanly resolves each column descriptor. No issues detected.
70-129
: ConfirmschemaTree
validity before resolution.
Iftree
might be null or improperly initialized, adding a check could prevent runtime errors. Otherwise, this resolution logic is well-structured and consistent with your code style.velox/connectors/clp/search_lib/QueryRunner.h (2)
106-117
: Good use of separate method for constant propagation.
The design keeps expression simplification distinct. This separation improves readability.
130-135
: Check vector size assumptions.
When callingfetch_next
, ensurecolumn_vectors.size()
aligns withm_projection
. If usage deviates, validating or documenting assumptions would prevent runtime issues.velox/connectors/clp/search_lib/QueryRunner.cpp (4)
78-82
: Ensure date reader assignments are unique.
Your comment implies that aDateStringColumnReader
for a specific column ID should not repeat. Verify no unintended conflicts can arise if schemas evolve or if multiple columns share the same node ID.
108-127
: Validate message capacity in output vectors.
Ensurecolumn_vectors[0]
(and subsequent columns) have sufficient space to accommodatenum_rows_fetched + 1
entries, especially when projecting numerous rows in a single fetch.
760-843
: Handle nested wildcard arrays with caution.
This method recursively checks arrays and objects. If deeper nesting or new data structures are introduced, ensure the recursion depth is handled gracefully to avoid stack overflows.
913-945
: Efficient boolean filtering.
Returningtrue
as soon as a match is found avoids unnecessary reads or comparisons. This short-circuit approach is optimal for performance.
Summary by CodeRabbit
New Features
Chores