Skip to content

[QA] Add FieldFormatCommandIT for the analytics-engine REST path#21544

Merged
mch2 merged 1 commit intoopensearch-project:mainfrom
ahkcs:feature/mustang-fieldformat-command
May 7, 2026
Merged

[QA] Add FieldFormatCommandIT for the analytics-engine REST path#21544
mch2 merged 1 commit intoopensearch-project:mainfrom
ahkcs:feature/mustang-fieldformat-command

Conversation

@ahkcs
Copy link
Copy Markdown
Contributor

@ahkcs ahkcs commented May 7, 2026

Summary

PPL fieldformat already works on the analytics-engine route with no code changes — it lowers to a plain Eval node whose expressions route through CONCAT and CAST, both already wired into the DataFusion backend's STANDARD_PROJECT_OPS via #21498. This PR is QA-only.

The unique surface vs plain eval is the prefix-. / suffix-. string-concat sugar emitted by AstExpressionBuilder.visitFieldFormatEvalClause:

fieldformat x = "prefix".CAST(y AS STRING)." suffix"

…expands to a chain of CONCAT calls. Both +-form and the dotted form route through Calcite's || operator, which isthmus' default catalog binds natively — no extension lookup, no adapter required.

Pass rate

MultisearchCommandIT is added by this PR; before this PR there was no analytics-engine QA pin for fieldformat. After this PR:

Test Status
testFieldformatPlusConcat
testFieldformatPrefixDotCast
testFieldformatCastDotSuffix
testFieldformatPrefixDotCastDotSuffix
Total 4 / 4

Full QA suite regression check: 132 / 132 across 17 ITs (no regressions in any pre-existing IT after FieldFormatCommandIT joined the suite).

The v2-side CalciteFieldFormatCommandIT itself goes from 0 / 5 → 4 / 5 on the analytics route once the corresponding SQL-repo IT-init fix lands (opensearch-project/sql#5417, modeled on #5407 for CalciteEvalCommandIT). The remaining 5th case uses tostring(age, "commas") — a multi-mode UDF tracked separately as out of scope; estimated effort ~1 day for a native Rust UDF + Substrait extension if pursued.

Tests

Four tests against the in-process QA cluster via test-ppl-frontend:

Test Shape
testFieldformatPlusConcat 'Hello ' + str0 — basic +-concat.
testFieldformatPrefixDotCast 'Code: '.CAST(int0 AS STRING) — StringDotlogicalExpression branch.
testFieldformatCastDotSuffix CAST(int0 AS STRING).' pts' — LogicalExpressionDotString branch.
testFieldformatPrefixDotCastDotSuffix 'Code: '.CAST(int0 AS STRING).' pts' — combined.

Each test filters where isnotnull(int0) before sorting/limiting so the deterministic-row assertions don't flap on the calcs dataset's six null int0 rows (Calcite's default ascending sort puts nulls first).

Test plan

  • ./gradlew :sandbox:qa:analytics-engine-rest:integTest --tests '*FieldFormatCommandIT' -Dsandbox.enabled=true4 / 4 green
  • ./gradlew :sandbox:qa:analytics-engine-rest:integTest -Dsandbox.enabled=true132 / 132 across 17 ITs (no regressions in any existing IT).

Related

PPL `fieldformat` is a Calcite-only command that lowers to a plain Eval node
(see SQL plugin's `AstBuilder.visitFieldformatCommand`). Its expressions go
through Calcite's || (CONCAT) operator and CAST, both already wired in the
DataFusion backend's STANDARD_PROJECT_OPS via opensearch-project#21498. **No code changes
required for the analytics route — this PR is QA-only.**

The unique surface vs plain `eval` is the prefix-{`.`} and suffix-{`.`}
string-concat sugar emitted by `AstExpressionBuilder.visitFieldFormatEvalClause`
for the StringDotlogicalExpression / LogicalExpressionDotString rules:

  fieldformat x = "prefix".CAST(y AS STRING)." suffix"

expands to a chain of CONCAT calls. Both forms route through the existing
CONCAT capability — no extension lookup or adapter needed since isthmus'
default catalog binds the || operator natively.

Four tests against the in-process QA cluster, exercising the analytics path
end-to-end via the test-ppl-frontend plugin:

| Test | Shape |
|---|---|
| `testFieldformatPlusConcat` | `'Hello ' + str0` — basic +-concat. |
| `testFieldformatPrefixDotCast` | `'Code: '.CAST(int0 AS STRING)` — StringDotlogicalExpression branch. |
| `testFieldformatCastDotSuffix` | `CAST(int0 AS STRING).' pts'` — LogicalExpressionDotString branch. |
| `testFieldformatPrefixDotCastDotSuffix` | `'Code: '.CAST(int0 AS STRING).' pts'` — combined. |

Tests filter `where isnotnull(int0)` before sorting/limiting so the
deterministic-row assertions don't flap on the calcs dataset's six null int0
rows (Calcite's default ascending sort puts nulls first).

Out of scope: the v2-side `testFieldFormatStringConcatenationWithNullFieldToString`
uses `tostring(age, "commas")` — a multi-mode UDF (binary / hex / commas /
duration) with substantial Java logic in `ToStringFunction`. Adding it to
the analytics path would need either Calcite-level rewrites or a DataFusion
Rust UDF; tracked separately.

Validates: 4/4 FieldFormatCommandIT pass; full
:sandbox:qa:analytics-engine-rest:integTest suite green
(**132 tests across 17 ITs**, no regressions).

Signed-off-by: Kai Huang <[email protected]>
@ahkcs ahkcs requested a review from a team as a code owner May 7, 2026 18:14
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

Thread Safety

The static field dataProvisioned is not thread-safe. If tests run in parallel, multiple threads could simultaneously read false and each attempt to provision the dataset, potentially causing race conditions or duplicate provisioning. Consider using a @BeforeClass annotated method or a synchronized/atomic approach instead.

private static boolean dataProvisioned = false;

private void ensureDataProvisioned() throws IOException {
    if (dataProvisioned == false) {
        DatasetProvisioner.provision(client(), DATASET);
        dataProvisioned = true;
    }
}
JSON Injection

The executePpl method constructs a JSON body by string interpolation using escapeJson. If escapeJson does not properly escape all special characters (e.g., backslashes, control characters), the constructed JSON could be malformed or allow injection. Verify that escapeJson is robust, or consider using a proper JSON serialization library.

private Map<String, Object> executePpl(String ppl) throws IOException {
    ensureDataProvisioned();
    Request request = new Request("POST", "/_analytics/ppl");
    request.setJsonEntity("{\"query\": \"" + escapeJson(ppl) + "\"}");
    Response response = client().performRequest(request);
    return assertOkAndParse(response, "PPL: " + ppl);
}
Unchecked Cast

The unchecked cast (List<List<Object>>) response.get("rows") is suppressed with @SuppressWarnings("unchecked") but could throw a ClassCastException at runtime if the response structure differs from expectations (e.g., rows contain non-List elements). Consider adding a type-safe check or a more descriptive error message.

@SuppressWarnings("unchecked")
List<List<Object>> actualRows = (List<List<Object>>) response.get("rows");
assertNotNull("Response missing 'rows' for query: " + ppl, actualRows);

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix thread-safety of static provisioning flag

The dataProvisioned flag is a non-volatile static field accessed from instance
methods without synchronization. In a multi-threaded test runner, two threads could
both read false simultaneously and provision the dataset twice, or one thread could
see a stale value. Mark the field volatile or use a synchronized block to ensure
safe publication.

sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/FieldFormatCommandIT.java [45-52]

-private static boolean dataProvisioned = false;
+private static volatile boolean dataProvisioned = false;
 
 private void ensureDataProvisioned() throws IOException {
     if (dataProvisioned == false) {
-        DatasetProvisioner.provision(client(), DATASET);
-        dataProvisioned = true;
+        synchronized (FieldFormatCommandIT.class) {
+            if (dataProvisioned == false) {
+                DatasetProvisioner.provision(client(), DATASET);
+                dataProvisioned = true;
+            }
+        }
     }
 }
Suggestion importance[1-10]: 5

__

Why: The thread-safety concern for dataProvisioned is legitimate in a multi-threaded test runner, and the double-checked locking pattern with volatile is a correct solution. However, integration tests typically run sequentially, reducing the practical impact of this issue.

Low
Use safe JSON serialization for request body

The JSON entity is constructed via string concatenation, which is fragile and can
break if escapeJson doesn't handle all edge cases (e.g., embedded quotes or
backslashes in the PPL string). Use a proper JSON serialization approach or at
minimum ensure the escaping is robust. A malformed JSON body will cause the request
to fail with a confusing error rather than a clear test failure.

sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/FieldFormatCommandIT.java [165-171]

 private Map<String, Object> executePpl(String ppl) throws IOException {
     ensureDataProvisioned();
     Request request = new Request("POST", "/_analytics/ppl");
-    request.setJsonEntity("{\"query\": \"" + escapeJson(ppl) + "\"}");
+    String jsonBody = String.format("{\"query\": %s}", toJsonString(ppl));
+    request.setJsonEntity(jsonBody);
     Response response = client().performRequest(request);
     return assertOkAndParse(response, "PPL: " + ppl);
 }
Suggestion importance[1-10]: 4

__

Why: The concern about fragile string concatenation for JSON is valid, but the suggestion introduces toJsonString which is not defined anywhere in the codebase shown, making the improved_code non-functional as-is. The existing escapeJson method is already used for this purpose, so the improvement is marginal.

Low
General
Use delta-based floating-point comparison

Using Double.compare(e, a) != 0 for numeric equality performs an exact
floating-point comparison, which can cause flaky failures for values that differ
only in floating-point representation. For integer-valued numbers this is fine, but
for any decimal fields this will be fragile. Consider using a delta-based comparison
(e.g., Math.abs(e - a) > 1e-9) to tolerate minor floating-point differences.

sandbox/qa/analytics-engine-rest/src/test/java/org/opensearch/analytics/qa/FieldFormatCommandIT.java [178-185]

 if (expected instanceof Number && actual instanceof Number) {
     double e = ((Number) expected).doubleValue();
     double a = ((Number) actual).doubleValue();
-    if (Double.compare(e, a) != 0) {
+    if (Math.abs(e - a) > 1e-9) {
         fail(message + ": expected <" + expected + "> but was <" + actual + ">");
     }
     return;
 }
Suggestion importance[1-10]: 5

__

Why: Using a delta-based comparison (Math.abs(e - a) > 1e-9) is a good practice for floating-point comparisons to avoid flaky tests. The current tests only use integer values, but the helper is general-purpose, so this improvement adds robustness for future tests with decimal fields.

Low

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

❌ Gradle check result for f0ea4c4: FAILURE

Please examine the workflow log, locate, and copy-paste the failure(s) below, then iterate to green. Is the failure a flaky test unrelated to your change?

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 7, 2026

✅ Gradle check result for f0ea4c4: SUCCESS

@codecov
Copy link
Copy Markdown

codecov Bot commented May 7, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 73.39%. Comparing base (6888345) to head (f0ea4c4).
⚠️ Report is 4 commits behind head on main.

Additional details and impacted files
@@             Coverage Diff              @@
##               main   #21544      +/-   ##
============================================
- Coverage     73.42%   73.39%   -0.03%     
- Complexity    74547    74552       +5     
============================================
  Files          5978     5978              
  Lines        338743   338743              
  Branches      48843    48843              
============================================
- Hits         248707   248621      -86     
- Misses        70229    70287      +58     
- Partials      19807    19835      +28     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mch2 mch2 merged commit 9cada03 into opensearch-project:main May 7, 2026
21 of 23 checks passed
imRishN pushed a commit to imRishN/OpenSearch that referenced this pull request May 8, 2026
…nsearch-project#21544)

PPL `fieldformat` is a Calcite-only command that lowers to a plain Eval node
(see SQL plugin's `AstBuilder.visitFieldformatCommand`). Its expressions go
through Calcite's || (CONCAT) operator and CAST, both already wired in the
DataFusion backend's STANDARD_PROJECT_OPS via opensearch-project#21498. **No code changes
required for the analytics route — this PR is QA-only.**

The unique surface vs plain `eval` is the prefix-{`.`} and suffix-{`.`}
string-concat sugar emitted by `AstExpressionBuilder.visitFieldFormatEvalClause`
for the StringDotlogicalExpression / LogicalExpressionDotString rules:

  fieldformat x = "prefix".CAST(y AS STRING)." suffix"

expands to a chain of CONCAT calls. Both forms route through the existing
CONCAT capability — no extension lookup or adapter needed since isthmus'
default catalog binds the || operator natively.

Four tests against the in-process QA cluster, exercising the analytics path
end-to-end via the test-ppl-frontend plugin:

| Test | Shape |
|---|---|
| `testFieldformatPlusConcat` | `'Hello ' + str0` — basic +-concat. |
| `testFieldformatPrefixDotCast` | `'Code: '.CAST(int0 AS STRING)` — StringDotlogicalExpression branch. |
| `testFieldformatCastDotSuffix` | `CAST(int0 AS STRING).' pts'` — LogicalExpressionDotString branch. |
| `testFieldformatPrefixDotCastDotSuffix` | `'Code: '.CAST(int0 AS STRING).' pts'` — combined. |

Tests filter `where isnotnull(int0)` before sorting/limiting so the
deterministic-row assertions don't flap on the calcs dataset's six null int0
rows (Calcite's default ascending sort puts nulls first).

Out of scope: the v2-side `testFieldFormatStringConcatenationWithNullFieldToString`
uses `tostring(age, "commas")` — a multi-mode UDF (binary / hex / commas /
duration) with substantial Java logic in `ToStringFunction`. Adding it to
the analytics path would need either Calcite-level rewrites or a DataFusion
Rust UDF; tracked separately.

Validates: 4/4 FieldFormatCommandIT pass; full
:sandbox:qa:analytics-engine-rest:integTest suite green
(**132 tests across 17 ITs**, no regressions).

Signed-off-by: Kai Huang <[email protected]>
Bukhtawar pushed a commit to Bukhtawar/OpenSearch that referenced this pull request May 10, 2026
…nsearch-project#21544)

PPL `fieldformat` is a Calcite-only command that lowers to a plain Eval node
(see SQL plugin's `AstBuilder.visitFieldformatCommand`). Its expressions go
through Calcite's || (CONCAT) operator and CAST, both already wired in the
DataFusion backend's STANDARD_PROJECT_OPS via opensearch-project#21498. **No code changes
required for the analytics route — this PR is QA-only.**

The unique surface vs plain `eval` is the prefix-{`.`} and suffix-{`.`}
string-concat sugar emitted by `AstExpressionBuilder.visitFieldFormatEvalClause`
for the StringDotlogicalExpression / LogicalExpressionDotString rules:

  fieldformat x = "prefix".CAST(y AS STRING)." suffix"

expands to a chain of CONCAT calls. Both forms route through the existing
CONCAT capability — no extension lookup or adapter needed since isthmus'
default catalog binds the || operator natively.

Four tests against the in-process QA cluster, exercising the analytics path
end-to-end via the test-ppl-frontend plugin:

| Test | Shape |
|---|---|
| `testFieldformatPlusConcat` | `'Hello ' + str0` — basic +-concat. |
| `testFieldformatPrefixDotCast` | `'Code: '.CAST(int0 AS STRING)` — StringDotlogicalExpression branch. |
| `testFieldformatCastDotSuffix` | `CAST(int0 AS STRING).' pts'` — LogicalExpressionDotString branch. |
| `testFieldformatPrefixDotCastDotSuffix` | `'Code: '.CAST(int0 AS STRING).' pts'` — combined. |

Tests filter `where isnotnull(int0)` before sorting/limiting so the
deterministic-row assertions don't flap on the calcs dataset's six null int0
rows (Calcite's default ascending sort puts nulls first).

Out of scope: the v2-side `testFieldFormatStringConcatenationWithNullFieldToString`
uses `tostring(age, "commas")` — a multi-mode UDF (binary / hex / commas /
duration) with substantial Java logic in `ToStringFunction`. Adding it to
the analytics path would need either Calcite-level rewrites or a DataFusion
Rust UDF; tracked separately.

Validates: 4/4 FieldFormatCommandIT pass; full
:sandbox:qa:analytics-engine-rest:integTest suite green
(**132 tests across 17 ITs**, no regressions).

Signed-off-by: Kai Huang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants