Skip to content

Add ARRAY_TO_CSV helper function #4587#5404

Open
ishag4 wants to merge 1 commit intoopensearch-project:mainfrom
ishag4:issue-4587
Open

Add ARRAY_TO_CSV helper function #4587#5404
ishag4 wants to merge 1 commit intoopensearch-project:mainfrom
ishag4:issue-4587

Conversation

@ishag4
Copy link
Copy Markdown

@ishag4 ishag4 commented May 4, 2026

Description

[Describe what this change achieves]

Related Issues

Resolves #4587

Check List

  • New functionality includes testing.
  • New functionality has been documented.
  • New functionality has javadoc added.
  • New functionality has a user manual doc added.
  • New PPL command checklist all confirmed.
  • API changes companion pull request created.
  • Commits are signed per the DCO using --signoff or -s.
  • Public documentation issue/PR created.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Isha Gupta <igupta24@apple.com>
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

PR Reviewer Guide 🔍

Here are some key observations to aid the review process:

🧪 PR contains tests
🔒 No security concerns identified
✅ No TODO sections
🔀 No multiple PR themes
⚡ Recommended focus areas for review

CSV Escaping

The arrayToCsv method does not handle CSV special characters. If an element contains the delimiter, a double-quote, or a newline, the output will not be valid CSV. For example, an element like "hello,world" with the default comma delimiter would produce malformed CSV output. Proper CSV escaping (quoting fields that contain the delimiter or quotes) should be considered.

public static String arrayToCsv(List<Object> array, String delimiter) {
  if (array == null) {
    return null;
  }

  if (delimiter == null) {
    delimiter = ",";
  }

  if (array.isEmpty()) {
    return "";
  }

  StringBuilder result = new StringBuilder();
  for (int i = 0; i < array.size(); i++) {
    if (i > 0) {
      result.append(delimiter);
    }
    Object element = array.get(i);
    if (element != null) {
      result.append(element.toString());
    }
  }

  return result.toString();
}
Null Element Handling

Null elements in the array are silently converted to empty strings (no output between delimiters). This behavior may be surprising to users and is not documented. Consider whether null elements should produce a specific representation (e.g., "NULL", empty string, or raise an error), and document the chosen behavior.

Object element = array.get(i);
if (element != null) {
  result.append(element.toString());
}
Missing Integration Test

There are only unit tests for the static arrayToCsv method. There are no integration or end-to-end tests verifying that the function works correctly when invoked via PPL queries (e.g., eval result = array_to_csv(array_field) or eval result = array_to_csv(array_field, "|")). Integration tests should be added to validate the full pipeline.

/*
 * Copyright OpenSearch Contributors
 * SPDX-License-Identifier: Apache-2.0
 */

package org.opensearch.sql.expression.function.CollectionUDF;

import java.util.List;
import org.apache.calcite.adapter.enumerable.NotNullImplementor;
import org.apache.calcite.adapter.enumerable.NullPolicy;
import org.apache.calcite.adapter.enumerable.RexToLixTranslator;
import org.apache.calcite.linq4j.tree.Expression;
import org.apache.calcite.linq4j.tree.Expressions;
import org.apache.calcite.linq4j.tree.Types;
import org.apache.calcite.rel.type.RelDataTypeFactory;
import org.apache.calcite.rex.RexCall;
import org.apache.calcite.sql.type.CompositeOperandTypeChecker;
import org.apache.calcite.sql.type.OperandTypes;
import org.apache.calcite.sql.type.SqlReturnTypeInference;
import org.apache.calcite.sql.type.SqlTypeFamily;
import org.apache.calcite.sql.type.SqlTypeName;
import org.opensearch.sql.expression.function.ImplementorUDF;
import org.opensearch.sql.expression.function.UDFOperandMetadata;

/**
 * ARRAY_TO_CSV function implementation that converts an array to a CSV string.
 */
public class ArrayToCsvFunctionImpl extends ImplementorUDF {

  public ArrayToCsvFunctionImpl() {
    super(new ArrayToCsvImplementor(), NullPolicy.ARG0);
  }

  @Override
  public SqlReturnTypeInference getReturnTypeInference() {
    return sqlOperatorBinding -> {
      RelDataTypeFactory typeFactory = sqlOperatorBinding.getTypeFactory();
      return typeFactory.createTypeWithNullability(
          typeFactory.createSqlType(SqlTypeName.VARCHAR), true);
    };
  }

  @Override
  public UDFOperandMetadata getOperandMetadata() {
    // Accept ARRAY as first argument, optional STRING as second argument (delimiter)
    return UDFOperandMetadata.wrap(
        (CompositeOperandTypeChecker)
            OperandTypes.family(SqlTypeFamily.ARRAY)
                .or(OperandTypes.family(SqlTypeFamily.ARRAY, SqlTypeFamily.CHARACTER)));
  }

  public static class ArrayToCsvImplementor implements NotNullImplementor {
    @Override
    public Expression implement(
        RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) {
      // Handle both 1-argument (with default delimiter) and 2-argument cases
      if (translatedOperands.size() == 1) {
        // ARRAY_TO_CSV(array) - use default delimiter ","
        return Expressions.call(
            Types.lookupMethod(
                ArrayToCsvFunctionImpl.class, "arrayToCsv", List.class, String.class),
            translatedOperands.get(0),
            Expressions.constant(","));
      } else if (translatedOperands.size() == 2) {
        // ARRAY_TO_CSV(array, delimiter)
        return Expressions.call(
            Types.lookupMethod(
                ArrayToCsvFunctionImpl.class, "arrayToCsv", List.class, String.class),
            translatedOperands.get(0),
            translatedOperands.get(1));
      } else {
        throw new IllegalArgumentException(
            "ARRAY_TO_CSV expects 1 or 2 arguments, got " + translatedOperands.size());
      }
    }
  }

  /**
   * Converts an array to a CSV string.
   *
   * @param array The array to convert
   * @param delimiter The delimiter to use for joining values
   * @return CSV string representation of the array
   */
  public static String arrayToCsv(List<Object> array, String delimiter) {
    if (array == null) {
      return null;
    }

    if (delimiter == null) {
      delimiter = ",";
    }

    if (array.isEmpty()) {
      return "";
    }

    StringBuilder result = new StringBuilder();
    for (int i = 0; i < array.size(); i++) {
      if (i > 0) {
        result.append(delimiter);
      }
      Object element = array.get(i);
      if (element != null) {
        result.append(element.toString());
      }
    }

    return result.toString();
  }
}

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 4, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
Possible issue
Fix null policy to cover all arguments

The class uses NullPolicy.ARG0, which means the implementor is only called when the
first argument (the array) is non-null. However, the second argument (delimiter)
could still be null at runtime, and the arrayToCsv method handles that by defaulting
to ",". This is fine, but the NotNullImplementor contract assumes non-null inputs —
if the delimiter expression evaluates to null at the Calcite expression level, it
may not be passed correctly. Consider using NullPolicy.ANY or NullPolicy.NONE to
ensure consistent null handling for both arguments.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayToCsvFunctionImpl.java [30-32]

-public static class ArrayToCsvImplementor implements NotNullImplementor {
-  @Override
-  public Expression implement(
-      RexToLixTranslator translator, RexCall call, List<Expression> translatedOperands) {
-    // Handle both 1-argument (with default delimiter) and 2-argument cases
-    if (translatedOperands.size() == 1) {
-      ...
-    } else if (translatedOperands.size() == 2) {
-      ...
-    } else {
-      throw new IllegalArgumentException(
-          "ARRAY_TO_CSV expects 1 or 2 arguments, got " + translatedOperands.size());
-    }
-  }
+public ArrayToCsvFunctionImpl() {
+  super(new ArrayToCsvImplementor(), NullPolicy.ANY);
 }
Suggestion importance[1-10]: 5

__

Why: Using NullPolicy.ARG0 means only the first argument's nullability is checked, but the second argument (delimiter) could be null at the Calcite expression level. Changing to NullPolicy.ANY would ensure consistent null handling for both arguments, though the arrayToCsv method already handles null delimiters gracefully.

Low
General
Use safer null-tolerant string conversion

When an element is null, the delimiter is still appended but no value is added,
which produces correct output (e.g., "a,,b"). However, the leading-null case (e.g.,
[null, "a"]) produces ",a" — a leading delimiter — which may be unexpected. The
current behavior is tested and accepted, but consider using String.join with
null-to-empty-string mapping via streams for clarity and correctness consistency.
More critically, using String.valueOf(element) instead of element.toString() avoids
a potential NullPointerException if the null check is ever removed or refactored.

core/src/main/java/org/opensearch/sql/expression/function/CollectionUDF/ArrayToCsvFunctionImpl.java [99-107]

 for (int i = 0; i < array.size(); i++) {
   if (i > 0) {
     result.append(delimiter);
   }
   Object element = array.get(i);
   if (element != null) {
-    result.append(element.toString());
+    result.append(String.valueOf(element));
   }
 }
Suggestion importance[1-10]: 2

__

Why: The change from element.toString() to String.valueOf(element) is functionally equivalent here since there's already a null check (if (element != null)) guarding the call. The suggestion provides minimal safety improvement since the null check already prevents NPE.

Low

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add ARRAY_TO_CSV helper function

1 participant