Skip to content

[SPARK-48665][PYTHON][CONNECT] Support providing a dict in pyspark lit to create a map. #49318

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

skanderboudawara
Copy link

@skanderboudawara skanderboudawara commented Dec 27, 2024

What changes were proposed in this pull request?

Reopening the PR done by Ronserruya and addidng small changes.

Added the option to provide a dict to pyspark.sql.functions.lit in order to create a map

Why are the changes needed?

To make it easier to create a map in pyspark.
Currently, it is only possible via create_map which requires a sequence of key,value,key,value...
Scala already supports such functionality using typedLit

A similar PR was done in the past to add similar functionality for the creating of an array using a list, so I tried to follow all the changes done there as well.

Does this PR introduce any user-facing change?

Yes, docstring of lit was edited, and new functionality was added

Before:

from pyspark.sql import functions as F
F.lit({"a":1})
# pyspark.errors.exceptions.captured.SparkRuntimeException: [UNSUPPORTED_FEATURE.LITERAL_TYPE] The feature is not supported: Literal for '{asd=2}' of class java.util.HashMap.

After:

from pyspark.sql import functions as F
F.lit({"a":1, "b": 2})
# Column<'map(a, 1, b, 2)'>

How was this patch tested?

Manual tests + unittest in CI

Was this patch authored or co-authored using generative AI tooling?

No


with self.sql_conf(
{
"spark.sql.ansi.enabled": False,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how ansi affect this feature?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without ansi the result is {“a”: 1, “b”: 2, “c”: None}
with the ansi {“a”:”1”, “b”:”2”, “c”: None} all the int are strings

)
from pyspark.sql import SparkSession

spark = SparkSession.getActiveSession()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we require an active session here? @HyukjinKwon

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use the default value of spark.sql.pyspark.inferNestedDictAsStruct.enabled when SparkSession.getActiveSession returns None. For Connect too.


spark = SparkSession.getActiveSession()
dict_as_struct = (
spark.conf.get("spark.sql.pyspark.inferNestedDictAsStruct.enabled")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this triggers a Config RPC, for nested cases, it will re-trigger multiple times.
We should cache the config, and make sure only at most one invocation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall I create a new cached function in utils

@lru_cache()
def __get_conf_nested():
        spark = SparkSession.getActiveSession()
        dict_as_struct = (
            spark.conf.get("spark.sql.pyspark.inferNestedDictAsStruct.enabled")
            if spark
            else “true"
        )
       return dict_as_struct

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@zhengruifeng , @HyukjinKwon I have made the necessary changes

Copy link

We're closing this PR because it hasn't been updated in a while. This isn't a judgement on the merit of the PR in any way. It's just a way of keeping the PR queue manageable.
If you'd like to revive this PR, please reopen it and ask a committer to remove the Stale tag!

@github-actions github-actions bot added the Stale label Apr 25, 2025
@github-actions github-actions bot closed this Apr 26, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants