Skip to content

Suggest fields for 'field not found' errors#5402

Open
Swiddis wants to merge 4 commits intoopensearch-project:mainfrom
Swiddis:feat/field-errors-ppl
Open

Suggest fields for 'field not found' errors#5402
Swiddis wants to merge 4 commits intoopensearch-project:mainfrom
Swiddis:feat/field-errors-ppl

Conversation

@Swiddis
Copy link
Copy Markdown
Collaborator

@Swiddis Swiddis commented May 1, 2026

Description

// echo 'source=big5 | fields hostname' | ppl
{
  "error": {
    "reason": "Field [hostname] not found.",
    "code": "FIELD_NOT_FOUND",
    "suggestion": "Did you mean: host.name",
    "context": {
      "stage_description": "Parsing and validating the query",
      "stage": "analyzing",
      "requested_field": "hostname",
      "available_fields": [
        "agent",
        "agent.ephemeral_id",
        ...
      ],
      "query_pos": {
        "column": 21,
        "line": 1
      }
    },
    "details": "Field [hostname] not found.",
    "location": [
      "while preparing and validating the query plan"
    ],
    "type": "IllegalArgumentException"
  },
  "status": 400
}

Related Issues

N/A

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Swiddis added 3 commits May 1, 2026 19:54
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@Swiddis Swiddis changed the title tweak: Available fields reporting in field not found errors Available fields reporting in field not found errors May 1, 2026
@Swiddis Swiddis changed the title Available fields reporting in field not found errors Suggest fields for 'field not found' errors May 1, 2026
@Swiddis Swiddis added enhancement New feature or request error-experience Issues related to how we handle failure cases in the plugin. labels May 1, 2026
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

PR Reviewer Guide 🔍

(Review updated until commit 1950fac)

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 Security concerns

Sensitive information exposure:
The error response now includes available_fields, which lists all field names in the current query context. In environments with field-level security or multi-tenant indices, this could expose schema information (field names) to unauthorized users who intentionally trigger a field-not-found error. Consider restricting this information based on user permissions or making it opt-in via configuration.

✅ No TODO sections
🔀 Multiple PR themes

Sub-PR theme: Add Levenshtein distance and closest match utilities

Relevant files:

  • common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java
  • core/src/test/java/org/opensearch/sql/common/utils/StringUtilsTest.java

Sub-PR theme: Attach source position to QualifiedName from PPL parser

Relevant files:

  • core/src/main/java/org/opensearch/sql/ast/expression/QualifiedName.java
  • ppl/src/main/java/org/opensearch/sql/ppl/parser/AstExpressionBuilder.java

Sub-PR theme: Enrich field-not-found errors with suggestions and context

Relevant files:

  • core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java
  • integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java
  • docs/user/ppl/cmd/mvcombine.md

⚡ Recommended focus areas for review

Threshold Logic

The threshold in findClosestMatch uses Math.max(4, target.length() / 2). For short field names (e.g., length 2-4), this allows a distance of up to 4, which may produce poor suggestions. For example, a 2-character target could match almost anything. Consider tightening the threshold for short strings or using a relative ratio instead.

if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) {
  return Optional.of(bestMatch);
}
Possible Exception

In getNotFoundException, context.relBuilder.peek() is called without guarding against an empty relBuilder stack. If the relBuilder has no relations pushed, this will throw an EmptyStackException or similar, masking the original field-not-found error.

List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames();
Available Fields Exposure

The error response now includes the full list of available_fields from the current row type. In multi-tenant or security-sensitive deployments, exposing all field names to the end user could leak schema information. Consider whether this should be gated behind a configuration flag or privilege check.

List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames();

ErrorReport.Builder builder =
    ErrorReport.wrap(
            new IllegalArgumentException(
                String.format("Field [%s] not found.", node.toString())))
        .code(ErrorCode.FIELD_NOT_FOUND)
        .context("requested_field", node.toString())
        .context("available_fields", availableFields);
Position Accuracy

The source position is captured only from the first identifier token (ctx.get(0)). For multi-part qualified names (e.g., table.field), the reported column will point to the start of the qualifier rather than the specific part that caused the error, which may confuse users.

if (!ctx.isEmpty()) {
  ParserRuleContext first = ctx.get(0);
  int line = first.getStart().getLine();
  int column = first.getStart().getCharPositionInLine();
  return new QualifiedName(parts, line, column);
Case-Insensitive Matching

findClosestMatch compares lowercased versions of both target and candidate to compute distance, but returns the original-cased candidate. This is correct for display, but the distance is computed on lowercased strings while the original field names may differ in case. Ensure this is the intended behavior and that it doesn't cause misleading suggestions when field names differ only in case.

int distance = levenshteinDistance(target.toLowerCase(), candidate.toLowerCase());
if (distance < bestDistance) {
  bestDistance = distance;
  bestMatch = candidate;
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

PR Code Suggestions ✨

Latest suggestions up to 1950fac
Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against empty builder stack exception

Calling context.relBuilder.peek() can throw an EmptyStackException if the builder's
stack is empty (e.g., in early resolution stages). This would produce a confusing
error that masks the original "field not found" issue. Add a null/empty check or
wrap in a try-catch before accessing the row type.

core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [334]

-List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+List<String> availableFields;
+try {
+  availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+} catch (Exception e) {
+  availableFields = List.of();
+}
Suggestion importance[1-10]: 6

__

Why: Calling context.relBuilder.peek() on an empty stack would throw an EmptyStackException, masking the original field-not-found error. The try-catch guard is a reasonable defensive measure, though the likelihood depends on how this method is called.

Low
General
Tighten fuzzy match threshold for short strings

The threshold Math.max(4, target.length() / 2) is very permissive for short strings.
For example, a 2-character target like "id" would accept matches with distance up to
4, meaning almost any 2–6 character field would be suggested. Consider using a
stricter threshold, such as Math.max(2, target.length() / 3), to avoid misleading
suggestions.

common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java [174]

-if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) {
+if (bestMatch != null && bestDistance <= Math.max(2, target.length() / 3)) {
Suggestion importance[1-10]: 5

__

Why: The threshold Math.max(4, target.length() / 2) is indeed very permissive for short strings like "id", potentially producing misleading suggestions. However, the optimal threshold is subjective and the suggested Math.max(2, target.length() / 3) may be too strict for longer strings.

Low
Remove unused variable in test

The variable stack is assigned but never used in the test, which is dead code and
may indicate the intent was to verify the suggestion in the stack trace rather than
the exception message. Either use stack in the assertions or remove the unused
variable.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java [107-111]

-String stack = org.apache.commons.lang3.exception.ExceptionUtils.getStackTrace(e);
 verifyErrorMessageContains(e, "Field [nam] not found.");
 // Verify suggestion based on Levenshtein distance
 verifyErrorMessageContains(e, "Did you mean: name");
 // Verify available fields are listed
 verifyErrorMessageContains(e, "available_fields");
Suggestion importance[1-10]: 4

__

Why: The variable stack is assigned but never used, which is dead code. Removing it improves code clarity, though it's a minor style issue with no functional impact.

Low

Previous suggestions

Suggestions up to commit 8d6f36d
CategorySuggestion                                                                                                                                    Impact
Possible issue
Guard against empty relBuilder stack

context.relBuilder.peek() can throw an EmptyStackException if the builder's stack is
empty, which could happen in certain edge cases. You should guard against this to
avoid masking the original "field not found" error with an unrelated exception.

core/src/main/java/org/opensearch/sql/calcite/QualifiedNameResolver.java [334]

-List<String> availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+List<String> availableFields;
+try {
+  availableFields = context.relBuilder.peek().getRowType().getFieldNames();
+} catch (Exception e) {
+  availableFields = List.of();
+}
Suggestion importance[1-10]: 5

__

Why: If context.relBuilder stack is empty, peek() would throw an unrelated exception masking the original "field not found" error. The guard is a reasonable defensive measure, though this edge case may be unlikely in practice.

Low
General
Fix overly permissive suggestion threshold for short strings

When target is a very short string (e.g., length 1 or 2), target.length() / 2
evaluates to 0 or 1 due to integer division, but Math.max(4, ...) will always return
4, which may be too permissive for short strings. Consider using a relative
threshold (e.g., a fraction of the target length) combined with a small absolute cap
to avoid suggesting unrelated fields for short inputs.

common/src/main/java/org/opensearch/sql/common/utils/StringUtils.java [174]

-if (bestMatch != null && bestDistance <= Math.max(4, target.length() / 2)) {
+int threshold = Math.min(4, Math.max(1, target.length() / 2));
+if (bestMatch != null && bestDistance <= threshold) {
Suggestion importance[1-10]: 5

__

Why: The current threshold Math.max(4, target.length() / 2) is always at least 4, which is too permissive for short strings (e.g., a 2-character target could match almost anything). The suggested Math.min(4, Math.max(1, target.length() / 2)) is more conservative and avoids false suggestions for short inputs.

Low
Remove unused variable in test

The variable stack is assigned but never used in the test, which is dead code.
Either use it in an assertion or remove it to keep the test clean and avoid
confusion.

integ-test/src/test/java/org/opensearch/sql/calcite/remote/CalcitePPLBasicIT.java [107]

-String stack = org.apache.commons.lang3.exception.ExceptionUtils.getStackTrace(e);
 verifyErrorMessageContains(e, "Field [nam] not found.");
 // Verify suggestion based on Levenshtein distance
 verifyErrorMessageContains(e, "Did you mean: name");
 // Verify available fields are listed
 verifyErrorMessageContains(e, "available_fields");
Suggestion importance[1-10]: 3

__

Why: The variable stack is assigned but never used, making it dead code. Removing it improves test clarity, though it has no functional impact.

Low

Signed-off-by: Simeon Widdis <sawiddis@amazon.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

Persistent review updated to latest commit 1950fac

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request error-experience Issues related to how we handle failure cases in the plugin.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant