Skip to content

Commit 0263f92

Browse files
Ming Leiakpm00
authored andcommitted
lib/group_cpus.c: avoid acquiring cpu hotplug lock in group_cpus_evenly
group_cpus_evenly() could be part of storage driver's error handler, such as nvme driver, when may happen during CPU hotplug, in which storage queue has to drain its pending IOs because all CPUs associated with the queue are offline and the queue is becoming inactive. And handling IO needs error handler to provide forward progress. Then deadlock is caused: 1) inside CPU hotplug handler, CPU hotplug lock is held, and blk-mq's handler is waiting for inflight IO 2) error handler is waiting for CPU hotplug lock 3) inflight IO can't be completed in blk-mq's CPU hotplug handler because error handling can't provide forward progress. Solve the deadlock by not holding CPU hotplug lock in group_cpus_evenly(), in which two stage spreads are taken: 1) the 1st stage is over all present CPUs; 2) the end stage is over all other CPUs. Turns out the two stage spread just needs consistent 'cpu_present_mask', and remove the CPU hotplug lock by storing it into one local cache. This way doesn't change correctness, because all CPUs are still covered. Link: https://lkml.kernel.org/r/[email protected] Signed-off-by: Ming Lei <[email protected]> Reported-by: Yi Zhang <[email protected]> Reported-by: Guangwu Zhang <[email protected]> Tested-by: Guangwu Zhang <[email protected]> Reviewed-by: Chengming Zhou <[email protected]> Reviewed-by: Jens Axboe <[email protected]> Cc: Keith Busch <[email protected]> Cc: <[email protected]> Signed-off-by: Andrew Morton <[email protected]>
1 parent ee34db3 commit 0263f92

File tree

1 file changed

+16
-6
lines changed

1 file changed

+16
-6
lines changed

lib/group_cpus.c

Lines changed: 16 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -366,13 +366,25 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
366366
if (!masks)
367367
goto fail_node_to_cpumask;
368368

369-
/* Stabilize the cpumasks */
370-
cpus_read_lock();
371369
build_node_to_cpumask(node_to_cpumask);
372370

371+
/*
372+
* Make a local cache of 'cpu_present_mask', so the two stages
373+
* spread can observe consistent 'cpu_present_mask' without holding
374+
* cpu hotplug lock, then we can reduce deadlock risk with cpu
375+
* hotplug code.
376+
*
377+
* Here CPU hotplug may happen when reading `cpu_present_mask`, and
378+
* we can live with the case because it only affects that hotplug
379+
* CPU is handled in the 1st or 2nd stage, and either way is correct
380+
* from API user viewpoint since 2-stage spread is sort of
381+
* optimization.
382+
*/
383+
cpumask_copy(npresmsk, data_race(cpu_present_mask));
384+
373385
/* grouping present CPUs first */
374386
ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
375-
cpu_present_mask, nmsk, masks);
387+
npresmsk, nmsk, masks);
376388
if (ret < 0)
377389
goto fail_build_affinity;
378390
nr_present = ret;
@@ -387,15 +399,13 @@ struct cpumask *group_cpus_evenly(unsigned int numgrps)
387399
curgrp = 0;
388400
else
389401
curgrp = nr_present;
390-
cpumask_andnot(npresmsk, cpu_possible_mask, cpu_present_mask);
402+
cpumask_andnot(npresmsk, cpu_possible_mask, npresmsk);
391403
ret = __group_cpus_evenly(curgrp, numgrps, node_to_cpumask,
392404
npresmsk, nmsk, masks);
393405
if (ret >= 0)
394406
nr_others = ret;
395407

396408
fail_build_affinity:
397-
cpus_read_unlock();
398-
399409
if (ret >= 0)
400410
WARN_ON(nr_present + nr_others < numgrps);
401411

0 commit comments

Comments
 (0)