Skip to content

Commit 97c7039

Browse files
fdmananaheftig
authored andcommitted
btrfs: only run the extent map shrinker from kswapd tasks
Currently the extent map shrinker can be run by any task when attempting to allocate memory and there's enough memory pressure to trigger it. To avoid too much latency we stop iterating over extent maps and removing them once the task needs to reschudle. This logic was introduced in commit b3ebb9b ("btrfs: stop extent map shrinker if reschedule is needed"). While that solved high latency problems for some use cases, it's still not enough because with a too high number of tasks entering the extent map shrinker code, either due to memory allocations or because they are a kswapd task, we end up having a very high level of contention on some spin locks, namely: 1) The fs_info->fs_roots_radix_lock spin lock, which we need to find roots to iterate over their inodes; 2) The spin lock of the xarray used to track open inodes for a root (struct btrfs_root::inodes) - on 6.10 kernels and below, it used to be a red black tree and the spin lock was root->inode_lock; 3) The fs_info->delayed_iput_lock spin lock since the shrinker adds delayed iputs (calls btrfs_add_delayed_iput()). Instead of allowing the extent map shrinker to be run by any task, make it run only by kswapd tasks. This still solves the problem of running into OOM situations due to an unbounded extent map creation, which is simple to trigger by direct IO writes, as described in the changelog of commit 956a17d ("btrfs: add a shrinker for extent maps"), and by a similar case when doing buffered IO on files with a very large number of holes (keeping the file open and creating many holes, whose extent maps are only released when the file is closed). Reported-by: kzd <[email protected]> Link: https://bugzilla.kernel.org/show_bug.cgi?id=219121 Reported-by: Octavia Togami <[email protected]> Link: https://lore.kernel.org/linux-btrfs/CAHPNGSSt-a4ZZWrtJdVyYnJFscFjP9S7rMcvEMaNSpR556DdLA@mail.gmail.com/ Fixes: 956a17d ("btrfs: add a shrinker for extent maps") CC: [email protected] # 6.10+ Tested-by: kzd <[email protected]> Tested-by: Octavia Togami <[email protected]> Signed-off-by: Filipe Manana <[email protected]> Cherry-picked-for: https://gitlab.archlinux.org/archlinux/packaging/packages/linux/-/issues/70
1 parent 6dd7406 commit 97c7039

File tree

2 files changed

+16
-16
lines changed

2 files changed

+16
-16
lines changed

fs/btrfs/extent_map.c

+6-16
Original file line numberDiff line numberDiff line change
@@ -1065,8 +1065,7 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, struct btrfs_em_shrink_c
10651065
return 0;
10661066

10671067
/*
1068-
* We want to be fast because we can be called from any path trying to
1069-
* allocate memory, so if the lock is busy we don't want to spend time
1068+
* We want to be fast so if the lock is busy we don't want to spend time
10701069
* waiting for it - either some task is about to do IO for the inode or
10711070
* we may have another task shrinking extent maps, here in this code, so
10721071
* skip this inode.
@@ -1109,9 +1108,7 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, struct btrfs_em_shrink_c
11091108
/*
11101109
* Stop if we need to reschedule or there's contention on the
11111110
* lock. This is to avoid slowing other tasks trying to take the
1112-
* lock and because the shrinker might be called during a memory
1113-
* allocation path and we want to avoid taking a very long time
1114-
* and slowing down all sorts of tasks.
1111+
* lock.
11151112
*/
11161113
if (need_resched() || rwlock_needbreak(&tree->lock))
11171114
break;
@@ -1139,12 +1136,7 @@ static long btrfs_scan_root(struct btrfs_root *root, struct btrfs_em_shrink_ctx
11391136
if (ctx->scanned >= ctx->nr_to_scan)
11401137
break;
11411138

1142-
/*
1143-
* We may be called from memory allocation paths, so we don't
1144-
* want to take too much time and slowdown tasks.
1145-
*/
1146-
if (need_resched())
1147-
break;
1139+
cond_resched();
11481140

11491141
inode = btrfs_find_first_inode(root, min_ino);
11501142
}
@@ -1202,14 +1194,12 @@ long btrfs_free_extent_maps(struct btrfs_fs_info *fs_info, long nr_to_scan)
12021194
ctx.last_ino);
12031195
}
12041196

1205-
/*
1206-
* We may be called from memory allocation paths, so we don't want to
1207-
* take too much time and slowdown tasks, so stop if we need reschedule.
1208-
*/
1209-
while (ctx.scanned < ctx.nr_to_scan && !need_resched()) {
1197+
while (ctx.scanned < ctx.nr_to_scan) {
12101198
struct btrfs_root *root;
12111199
unsigned long count;
12121200

1201+
cond_resched();
1202+
12131203
spin_lock(&fs_info->fs_roots_radix_lock);
12141204
count = radix_tree_gang_lookup(&fs_info->fs_roots_radix,
12151205
(void **)&root,

fs/btrfs/super.c

+10
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,7 @@
2828
#include <linux/btrfs.h>
2929
#include <linux/security.h>
3030
#include <linux/fs_parser.h>
31+
#include <linux/swap.h>
3132
#include "messages.h"
3233
#include "delayed-inode.h"
3334
#include "ctree.h"
@@ -2394,6 +2395,15 @@ static long btrfs_free_cached_objects(struct super_block *sb, struct shrink_cont
23942395
const long nr_to_scan = min_t(unsigned long, LONG_MAX, sc->nr_to_scan);
23952396
struct btrfs_fs_info *fs_info = btrfs_sb(sb);
23962397

2398+
/*
2399+
* We may be called from any task trying to allocate memory and we don't
2400+
* want to slow it down with scanning and dropping extent maps. It would
2401+
* also cause heavy lock contention if many tasks concurrently enter
2402+
* here. Therefore only allow kswapd tasks to scan and drop extent maps.
2403+
*/
2404+
if (!current_is_kswapd())
2405+
return 0;
2406+
23972407
return btrfs_free_extent_maps(fs_info, nr_to_scan);
23982408
}
23992409

0 commit comments

Comments
 (0)