fs: Deferred inode reclaim#1300
Open
vfsci-bot[bot] wants to merge 4 commits intovfs.base.cifrom
Open
Conversation
When inode has dirtied timestamps, we currently call sync_lazytime() on last iput. This is done because inode with any dirty bit set is not inserted into LRU and dirty timestamps expire only after many (12 by default) hours so these inodes would be sitting outside of LRU aging for a really long time. However this can result in doing IO and consequently GFP_NOFAIL allocations from dentry reclaim making MM complain. Sample trace for ext4 is: prune_dcache_sb shrink_dentry_list __dentry_kill iput sync_lazytime __mark_inode_dirty ext4_dirty_inode __ext4_mark_inode_dirty ext4_reserve_inode_write ext4_get_inode_loc bdev_getblk __filemap_get_folio_mpol Avoid this dirtying on last iput by reshuffling unused inodes to the beginning of b_dirty_time list and clobbering dirtied_time_when instead so that they get written during next periodic writeback. Signed-off-by: Jan Kara <jack@suse.cz>
Reclaim of some inodes is rather complex requiring running transactions or doing other IO. Consequently filesystems end up doing GFP_NOFAIL allocations from kswapd or even direct reclaim which is problematic because forward progress of these allocations isn't guaranteed. Add infrastructure for marking inodes whose reclaim is difficult and offload reclaim of such inodes into a workqueue to not block kswapd with difficult inode reclaim. Signed-off-by: Jan Kara <jack@suse.cz>
Deferring difficult inode reclaim from prune_icache_sb() to a workqueue removes the natural feedback loop of blocking tasks in direct reclaim until they make space for new allocations. This can result in the list of deferred inodes to grow beyond any bounds and possibly push the machine to a reclaim storm or OOM. Add a throttling mechanism slowing down tasks in mark_inode_reclaim_deferred() if the list of deferred inodes to reclaim grows over limit. We measure average time it takes to reclaim inode on deferred list and block tasks proportionally to that. Signed-off-by: Jan Kara <jack@suse.cz>
When we have to free preallocations during inode eviction, we need to load block bitmaps and run transaction to modify them. This takes time and also requires GFP_NOFAIL allocations. Mark inodes with preallocated blocks as needing offloading of inode reclaim to a workqueue so that we don't block reclaim for long and potentially deadlock MM subsystem. Signed-off-by: Jan Kara <jack@suse.cz>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Series: https://patchwork.kernel.org/project/linux-fsdevel/list/?series=1087657
Submitter: Jan Kara
Version: 1
Patches: 4/4
Message-ID:
<20260429174850.18223-1-jack@suse.cz>Base: vfs.base.ci
Lore: https://lore.kernel.org/linux-fsdevel/20260429174850.18223-1-jack@suse.cz
Automated by ml2pr