Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix small typo in docs/details/error-shuffle.md #32

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/details/error-shuffle.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@

FetchFailed exceptions are mainly due to misconfiguration of ```spark.sql.shuffle.partitions```:

1. Too few shuffle partitions: Having too few shuffle partitions means you could have a shuffle block that is larger than the limit(Integer.MaxValue=~2GB) or OOM(Exit code 143). The symptom for this can also be long-running tasks where the blocks are large but not reached the limit. A quick fix is to increase the shuffle/reducer parallelism by increasing ```spark.sqlshuffle.partitions```(default is 500).
1. Too few shuffle partitions: Having too few shuffle partitions means you could have a shuffle block that is larger than the limit(Integer.MaxValue=~2GB) or OOM(Exit code 143). The symptom for this can also be long-running tasks where the blocks are large but not reached the limit. A quick fix is to increase the shuffle/reducer parallelism by increasing ```spark.sql.shuffle.partitions```(default is 500).
2. Too many shuffle partitions: Too many shuffle partitions could put a stress on the shuffle service and could run into errors like ```network timeout``` ```. Note that the shuffle service is a shared service for all the jobs running on the cluster so it is possible that someone else's job with high shuffle activity could cause errors for your job. It is worth checking to see if there is a pattern of these failures for your job to confirm if it is an issue with your job or not. Also note that the higher the shuffle partitions, the more likely that you would see this issue.


Expand Down