-
Notifications
You must be signed in to change notification settings - Fork 181
Add unified function interface with function discovery API #5039
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add unified function interface with function discovery API #5039
Conversation
Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Chen Dai <[email protected]>
… for now Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Chen Dai <[email protected]>
📝 WalkthroughWalkthroughIntroduces engine-agnostic function support to the Unified Execution Runtime through a new interface (UnifiedFunction) and repository pattern (UnifiedFunctionRepository). Includes a Calcite-backed adapter implementation, comprehensive test coverage, and updated documentation describing the type system, function discovery, and evaluation flow. Changes
Sequence DiagramssequenceDiagram
participant Client
participant Repository as UnifiedFunctionRepository
participant Descriptor as UnifiedFunctionDescriptor
participant Builder as UnifiedFunctionBuilder
participant Adapter as UnifiedFunctionCalciteAdapter
participant RexBuilder
participant PPLFuncImpTable
Client->>Repository: loadFunction(functionName)
Repository->>PPLFuncImpTable: enumerate available functions
PPLFuncImpTable-->>Repository: list of operators
Repository->>Descriptor: create descriptor with builder
Descriptor-->>Repository: UnifiedFunctionDescriptor
Repository-->>Client: Optional<UnifiedFunctionDescriptor>
Client->>Builder: build(inputTypes)
Builder->>RexBuilder: create(rexBuilder, functionName, inputTypes)
RexBuilder->>PPLFuncImpTable: resolve function
RexBuilder->>RexBuilder: compile to RexExecutable
RexBuilder->>Adapter: create adapter with compiledCode
Adapter-->>Builder: UnifiedFunction
Builder-->>Client: UnifiedFunction instance
sequenceDiagram
participant Client
participant Function as UnifiedFunction
participant Adapter as UnifiedFunctionCalciteAdapter
participant RexExecutor as RexExecutorImpl
participant DataContext
Client->>Function: eval(inputs)
Function->>Adapter: eval(inputs)
Adapter->>DataContext: create with "inputRecord"
DataContext->>DataContext: populate with input values
Adapter->>RexExecutor: execute(compiledCode, DataContext)
RexExecutor->>RexExecutor: evaluate expression
RexExecutor-->>Adapter: result
Adapter-->>Function: return result or null
Function-->>Client: Object (evaluated result)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~50 minutes Possibly related PRs
Suggested reviewers
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Comment |
Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Chen Dai <[email protected]>
Signed-off-by: Chen Dai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🤖 Fix all issues with AI agents
In @api/README.md:
- Around line 112-117: Update the README example to use the correct class name
UnifiedFunctionDescriptor everywhere instead of FunctionDescriptor; adjust the
variable types and any comments referencing FunctionDescriptor (e.g., change
List<FunctionDescriptor> allFunctions to List<UnifiedFunctionDescriptor>
allFunctions and for (FunctionDescriptor descriptor : allFunctions) to for
(UnifiedFunctionDescriptor descriptor : allFunctions)) and ensure any calls like
descriptor.getFunctionName() and descriptor.getBuilder() remain valid with
UnifiedFunctionDescriptor.
- Around line 119-121: The README example incorrectly uses FunctionDescriptor
and doesn't handle the Optional return from loadFunction; update the snippet to
use Optional<UnifiedFunctionDescriptor> (or UnifiedFunctionDescriptor) as
returned by repository.loadFunction("UPPER") and demonstrate handling the
Optional (e.g., check isPresent/ifPresent or orElseThrow) before using the
descriptor so the example matches the actual API.
In
@api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java:
- Around line 68-79: The UnifiedFunctionBuilder interface currently extends
Serializable which forces lambdas (e.g., builders capturing a non-serializable
RexBuilder) to be serializable and breaks implementations; either remove
"extends Serializable" from UnifiedFunctionBuilder if builders don't need to be
serialized, or refactor the builder implementations so they do not capture
non-serializable objects (move RexBuilder usage into the build(...) call or
create a serializable factory that supplies RexBuilder), ensuring only
UnifiedFunction (not the builder) remains serializable if that was the
intention.
🧹 Nitpick comments (4)
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.java (1)
32-55: Good test coverage for basic functionality.The tests cover function creation, invalid name handling, and evaluation. Consider adding tests for edge cases per coding guidelines:
- Null input handling:
upperFunc.eval(List.of((Object) null))- Empty string input:
upperFunc.eval(List.of(""))- Functions with multiple parameters (if applicable)
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.java (1)
66-77: Consider verifying function evaluation in builder test.The test validates that the builder creates a function with the correct name, but doesn't verify the function can actually be evaluated. This would strengthen the test.
💡 Suggested enhancement
assertNotNull("Function should be created", jsonFunc); assertTrue("Function name should be JSON", jsonFunc.getFunctionName().equalsIgnoreCase("JSON")); + + // Verify the function can be evaluated + Object result = jsonFunc.eval(List.of("{\"key\": \"value\"}")); + assertNotNull("Function should produce a result", result);api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapter.java (1)
99-117:SqlTypeName.valueOf()can throwIllegalArgumentExceptionfor invalid type names.Both
buildInputRowTypeandmakeInputRefsuseSqlTypeName.valueOf(inputTypes.get(i))which throwsIllegalArgumentExceptionif the type name is invalid. Consider adding validation with a meaningful error message.💡 Suggested improvement
private static RelDataType buildInputRowType(RexBuilder rexBuilder, List<String> inputTypes) { RelDataTypeFactory typeFactory = rexBuilder.getTypeFactory(); RelDataTypeFactory.Builder builder = typeFactory.builder(); for (int i = 0; i < inputTypes.size(); i++) { - RelDataType relType = typeFactory.createSqlType(SqlTypeName.valueOf(inputTypes.get(i))); + SqlTypeName sqlTypeName; + try { + sqlTypeName = SqlTypeName.valueOf(inputTypes.get(i)); + } catch (IllegalArgumentException e) { + throw new IllegalArgumentException( + "Invalid SQL type name: " + inputTypes.get(i) + " at position " + i, e); + } + RelDataType relType = typeFactory.createSqlType(sqlTypeName); builder.add("_" + i, relType); } return builder.build(); }api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java (1)
52-56: Performance concern:loadFunction()iterates all operators on every call.Calling
loadFunctions()for eachloadFunction(name)lookup is inefficient, especially if called repeatedly. Consider caching the result or using a Map for O(1) lookups.💡 Suggested optimization
+ /** Cached function descriptors, keyed by uppercase function name. */ + private Map<String, UnifiedFunctionDescriptor> cachedDescriptors; + public List<UnifiedFunctionDescriptor> loadFunctions() { - RexBuilder rexBuilder = context.getPlanContext().rexBuilder; - return PPLBuiltinOperators.instance().getOperatorList().stream() - .filter(SqlUserDefinedFunction.class::isInstance) - .map( - operator -> { - String functionName = operator.getName(); - UnifiedFunctionBuilder builder = - inputTypes -> - UnifiedFunctionCalciteAdapter.create(rexBuilder, functionName, inputTypes); - return new UnifiedFunctionDescriptor(functionName, builder); - }) - .collect(Collectors.toList()); + return new ArrayList<>(getOrLoadDescriptors().values()); + } + + public Optional<UnifiedFunctionDescriptor> loadFunction(String functionName) { + return Optional.ofNullable(getOrLoadDescriptors().get(functionName.toUpperCase())); + } + + private Map<String, UnifiedFunctionDescriptor> getOrLoadDescriptors() { + if (cachedDescriptors == null) { + RexBuilder rexBuilder = context.getPlanContext().rexBuilder; + cachedDescriptors = PPLBuiltinOperators.instance().getOperatorList().stream() + .filter(SqlUserDefinedFunction.class::isInstance) + .collect(Collectors.toMap( + op -> op.getName().toUpperCase(), + op -> { + String functionName = op.getName(); + UnifiedFunctionBuilder builder = + inputTypes -> + UnifiedFunctionCalciteAdapter.create(rexBuilder, functionName, inputTypes); + return new UnifiedFunctionDescriptor(functionName, builder); + }, + (a, b) -> a)); // Handle duplicates by keeping first + } + return cachedDescriptors; } - - public Optional<UnifiedFunctionDescriptor> loadFunction(String functionName) { - return loadFunctions().stream() - .filter(desc -> desc.getFunctionName().equalsIgnoreCase(functionName)) - .findFirst(); - }
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (6)
api/README.mdapi/src/main/java/org/opensearch/sql/api/function/UnifiedFunction.javaapi/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapter.javaapi/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.java
🧰 Additional context used
📓 Path-based instructions (3)
**/*.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*.java: UsePascalCasefor class names (e.g.,QueryExecutor)
UsecamelCasefor method and variable names (e.g.,executeQuery)
UseUPPER_SNAKE_CASEfor constants (e.g.,MAX_RETRY_COUNT)
Keep methods under 20 lines with single responsibility
All public classes and methods must have proper JavaDoc
Use specific exception types with meaningful messages for error handling
PreferOptional<T>for nullable returns in Java
Avoid unnecessary object creation in loops
UseStringBuilderfor string concatenation in loops
Validate all user inputs, especially queries
Sanitize data before logging to prevent injection attacks
Use try-with-resources for proper resource cleanup in Java
Maintain Java 11 compatibility when possible for OpenSearch 2.x
Document Calcite-specific workarounds in code
Files:
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunction.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.javaapi/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapter.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.javaapi/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java
⚙️ CodeRabbit configuration file
**/*.java: - Flag methods >50 lines as potentially too complex - suggest refactoring
- Flag classes >500 lines as needing organization review
- Check for dead code, unused imports, and unused variables
- Identify code reuse opportunities across similar implementations
- Assess holistic maintainability - is code easy to understand and modify?
- Flag code that appears AI-generated without sufficient human review
- Verify Java naming conventions (PascalCase for classes, camelCase for methods/variables)
- Check for proper JavaDoc on public classes and methods
- Flag redundant comments that restate obvious code
- Ensure proper error handling with specific exception types
- Check for Optional usage instead of null returns
- Validate proper use of try-with-resources for resource management
Files:
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunction.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.javaapi/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapter.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.javaapi/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java
**/*Test.java
📄 CodeRabbit inference engine (.rules/REVIEW_GUIDELINES.md)
**/*Test.java: All new business logic requires unit tests
Name unit tests with*Test.javasuffix in OpenSearch SQL
Files:
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.java
**/test/**/*.java
⚙️ CodeRabbit configuration file
**/test/**/*.java: - Verify NULL input tests for all new functions
- Check boundary condition tests (min/max values, empty inputs)
- Validate error condition tests (invalid inputs, exceptions)
- Ensure multi-document tests for per-document operations
- Flag smoke tests without meaningful assertions
- Check test naming follows pattern: test
- Verify test data is realistic and covers edge cases
- Verify test coverage for new business logic
- Ensure tests are independent and don't rely on execution order
- Validate meaningful test data that reflects real-world scenarios
- Check for proper cleanup of test resources
Files:
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.java
🧠 Learnings (4)
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*Test.java : Name unit tests with `*Test.java` suffix in OpenSearch SQL
Applied to files:
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Applies to **/*IT.java : Name integration tests with `*IT.java` suffix in OpenSearch SQL
Applied to files:
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.java
📚 Learning: 2025-12-29T05:32:03.491Z
Learnt from: LantaoJin
Repo: opensearch-project/sql PR: 4993
File: opensearch/src/main/java/org/opensearch/sql/opensearch/planner/physical/CalciteEnumerableTopK.java:20-20
Timestamp: 2025-12-29T05:32:03.491Z
Learning: For any custom Calcite RelNode class (e.g., ones that extend EnumerableLimitSort or other Calcite RelNode types), always override the copy method. If copy is not overridden, cloning/copy operations may downgrade the instance to the parent class type, losing the custom behavior. In your implementation, ensure copy returns a new instance of the concrete class with all relevant fields and traits preserved, mirroring the current instance state.
Applied to files:
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapter.javaapi/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.java
📚 Learning: 2025-12-02T17:27:55.938Z
Learnt from: CR
Repo: opensearch-project/sql PR: 0
File: .rules/REVIEW_GUIDELINES.md:0-0
Timestamp: 2025-12-02T17:27:55.938Z
Learning: Test SQL generation and optimization paths for Calcite integration changes
Applied to files:
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.javaapi/README.md
🧬 Code graph analysis (1)
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.java (1)
api/src/test/java/org/opensearch/sql/api/UnifiedQueryTestBase.java (1)
UnifiedQueryTestBase(34-131)
🔇 Additional comments (6)
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunction.java (1)
11-52: LGTM! Clean, minimal interface with comprehensive documentation.The interface is well-designed for engine-agnostic function abstraction. The JavaDoc clearly explains the type representation strategy using SQL type name strings, and the contract is straightforward with proper null handling documentation for
eval().api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapterTest.java (1)
57-84: Serialization test is well-structured and validates both metadata and functionality.The round-trip serialization test properly validates that the compiled code survives serialization, which is critical for the cross-engine distribution use case described in the PR objectives.
api/src/test/java/org/opensearch/sql/api/function/UnifiedFunctionRepositoryTest.java (1)
29-38: Good basic coverage for loading all functions.The test validates that at least one function is loaded and that each descriptor has non-null name and builder. This provides good smoke test coverage.
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionCalciteAdapter.java (2)
25-31: Good use of Lombok annotations andserialVersionUID.The
@EqualsAndHashCode(exclude = "compiledCode")is appropriate sincecompiledCodeis derived from the other fields and would make equals/hashCode verbose without adding semantic value.
63-71: The suggested RexExecutable caching optimization is not applicable here.Creating a new
RexExecutableinstance per call is intentional and appropriate. Eacheval()invocation requires a freshDataContextwith different input data—caching and reusing the executor viasetDataContext()would require careful handling of mutable state and add unnecessary synchronization overhead. The cost of instantiatingRexExecutable(compiledCode, functionName)is negligible compared to the actual code execution, so this optimization provides no meaningful performance benefit.The current implementation is sound. If performance becomes a concern, the code comment at lines 76-79 already acknowledges the possibility of changing the internal implementation.
Likely an incorrect or invalid review comment.
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java (1)
58-66: Good design:@Valuefor immutable descriptor with clean API.The
UnifiedFunctionDescriptorusing Lombok's@Valueensures immutability and auto-generates appropriate equals/hashCode/toString. The JavaDoc is clear.
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java
Show resolved
Hide resolved
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java
Show resolved
Hide resolved
api/src/main/java/org/opensearch/sql/api/function/UnifiedFunctionRepository.java
Show resolved
Hide resolved
Signed-off-by: Chen Dai <[email protected]>
* Add initial impl for unified function and repository Signed-off-by: Chen Dai <[email protected]> * Simplify unified function impl by built-in rex executor Signed-off-by: Chen Dai <[email protected]> * Extract unified function builder abstraction Signed-off-by: Chen Dai <[email protected]> * Refactor unified function repository internal and javadoc Signed-off-by: Chen Dai <[email protected]> * Remove type converter and use Calcite's sql type name as unified type for now Signed-off-by: Chen Dai <[email protected]> * Update readme Signed-off-by: Chen Dai <[email protected]> * Refactor unified function repository API and javadoc Signed-off-by: Chen Dai <[email protected]> * Refactor unified function calcite adapter to remove rex executable field Signed-off-by: Chen Dai <[email protected]> * Update readme Signed-off-by: Chen Dai <[email protected]> * Polish all javadoc comments Signed-off-by: Chen Dai <[email protected]> * Refactor calcite adapter create with utility methods Signed-off-by: Chen Dai <[email protected]> * Address comments from CodeRabbit Signed-off-by: Chen Dai <[email protected]> --------- Signed-off-by: Chen Dai <[email protected]> (cherry picked from commit 3f646d5) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Description
This PR introduces a unified function abstraction that lets PPL functions be represented and evaluated in an engine-agnostic way. A function can be implemented once and reused by external execution engines (e.g., Spark) without duplicating engine-specific logic.
Design Highlights
UnifiedType.UnifiedRecord) until required.RexNodeserialization complexity. We will improve this later if any security or performance concern arises.eval()currently executes via the pre-compiled code string internally. In the future, we can introduce an explicitcodegen()API on unified function interface to cleanly separate interpreted vs compiled execution, and to integrate with engine-native codegen pipelines (e.g., Spark WholeStageCodegen).Key Components
UnifiedFunction: Serializable interface representing a function with name, input/output types, and evaluation capability.UnifiedFunctionCalciteAdapter: Calcite-based implementation that adapts existing function implementation ofRexNode.UnifiedFunctionRepository: Discovers and loads PPL functions as function descriptors with lazy instantiation via builders.Future Work
In the future, we can make unified functions fully independent of Calcite by moving function implementations and discovery off Calcite APIs (in both the unified function implementations and repository). At that point, we can also introduce richer abstractions—such as a unified type and record model—by referencing LinkedIn’s Transport framework https://github.com/linkedin/transport.
Related Issues
Resolves partially #4782, opensearch-project/opensearch-spark#1281
Check List
--signoffor-s.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.