You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
At the moment, dask-expr struggles to deal with imbalanced (coiled/benchmarks#1367 (comment)) or very small (coiled/benchmarks#1381) partitions. We should improve this which likely requires overhauling the Parquet reading to collect better statistics.
The text was updated successfully, but these errors were encountered:
If the path is a directory then it might be faster to ask the filesystem (POSIX or object store) for the size of a directory. That might be faster and more scalable.
We have to perform a list operation on all stores right now unless a metadata files is provided. This list operation includes file sizes even without accessing parquet metadata.
At the moment,
dask-expr
struggles to deal with imbalanced (coiled/benchmarks#1367 (comment)) or very small (coiled/benchmarks#1381) partitions. We should improve this which likely requires overhauling the Parquet reading to collect better statistics.The text was updated successfully, but these errors were encountered: