What is the bug?
With the automatic type conversion introduced in #4599, PPL queries that contain certain type mismatches are rewritten to use the SAFE_CAST function.
When these queries are translated to Spark SQL via SparkSqlDialect, SAFE_CAST is emitted as-is in the generated SQL. However, Spark SQL does not provide a SAFE_CAST function, the resulting SQL is invalid and fails at analysis time in Spark.
How can one reproduce the bug?
This issue was first observed in the PPL unification PoC: opensearch-project/opensearch-spark#1281 (comment)
spark-sql (default)> search source=test_events;
@timestamp host packets message
2025-09-08 10:00:00 server1 60 {"category":1, "resource":"A"}
2025-09-08 10:01:00 server1 120 {"category":2, "resource":"B"}
2025-09-08 10:02:00 server1 60 {"category":3, "resource":"C"}
2025-09-08 10:02:30 server2 180 {"category":4, "resource":"D"}
spark-sql (default)> search source=test_events | spath input=message category | eval cat = abs(category);
[PARSE_SYNTAX_ERROR] Syntax error at or near 'AS'.(line 1, pos 153)
== SQL ==
SELECT `@timestamp`, `host`, `packets`, `message`, `JSON_EXTRACT`(`message`, 'category') `category`, ABS(SAFE_CAST(`JSON_EXTRACT`(`message`, 'category') AS DOUBLE)) `cat`
---------------------------------------------------------------------------------------------------------------------------------------------------------^^^
FROM `spark_catalog`.`default`.`test_events`
What is the expected behavior?
SAFE_CAST should be translated to an equivalent TRY_CAST function in Spark SQL, so that it behaves correctly and does not produce invalid SQL.
SELECT TRY_CAST('123' AS INT);
TRY_CAST(123 AS INT)
123
spark-sql (default)> SELECT TRY_CAST('123abc' AS INT);
TRY_CAST(123abc AS INT)
NULL
What is your host/environment?
Do you have any screenshots?
N/A
Do you have any additional context?
- In Spark SQL,
CAST behaves somewhat like TRY_CAST only when ANSI mode is disabled; relying on this is not safe, so CAST should not be used to emulate SAFE_CAST semantics.
- Beyond this specific bug, there may be semantic discrepancies in which type conversions are considered valid between OpenSearch PPL and Spark, so we may need additional alignment around type conversion behavior.
What is the bug?
With the automatic type conversion introduced in #4599, PPL queries that contain certain type mismatches are rewritten to use the
SAFE_CASTfunction.When these queries are translated to Spark SQL via
SparkSqlDialect,SAFE_CASTis emitted as-is in the generated SQL. However, Spark SQL does not provide aSAFE_CASTfunction, the resulting SQL is invalid and fails at analysis time in Spark.How can one reproduce the bug?
This issue was first observed in the PPL unification PoC: opensearch-project/opensearch-spark#1281 (comment)
What is the expected behavior?
SAFE_CASTshould be translated to an equivalentTRY_CASTfunction in Spark SQL, so that it behaves correctly and does not produce invalid SQL.What is your host/environment?
Do you have any screenshots?
N/A
Do you have any additional context?
CASTbehaves somewhat likeTRY_CASTonly when ANSI mode is disabled; relying on this is not safe, soCASTshould not be used to emulateSAFE_CASTsemantics.