* [LSF/MM/BPF TOPIC] Direct Reclaim and Filesystems
@ 2026-04-09 21:09 Boris Burkov
0 siblings, 0 replies; only message in thread
From: Boris Burkov @ 2026-04-09 21:09 UTC (permalink / raw)
To: linux-fsdevel; +Cc: lsf-pc, linux-mm, linux-btrfs
Hello,
A theme that we (Shakeel, JP, I, and others at Meta) have observed in the fleet
at Meta is a tension between btrfs and direct reclaim. This has manifested in a
variety of different ways. All situations also must be considered w.r.t. memcg
reclaim and global reclaim. There is no overall "assignment of blame" intended,
just a desire to build a deeper understanding of best practices and paths
forward for all the components involved. I work on BTRFS and have minimal direct
experience with how other filesystems besides btrfs handle such challenges, but
I imagine there must be some overlap in challenges.
I think this is probably too large a topic for a single session, but I am
curious if any of the categories of issues are broadly interesting. I
personally think the one that cuts across the groups the most is the question
of reclaim cpu usage.
- The filesystem triggering direct reclaim [2]
Especially when the filesystem is holding a lock like the inode rwsem or
a filesystem internal lock (like the btrfs btree locks), this results in
unexpectedly high latency for the filesystem user, and in the case of memcg
reclaim and held locks, will unfairly affect the latency of other cgroups not
under reclaim. We are working on categorizing these and reducing them case by
case, but a clearer statement about valid allocation contexts and GFP flags
could be broadly useful.
- Reclaim freeing metadata and/or forcing metadata writeback [1][3][4]
In btrfs, this results in redundant work fetching and writing btree nodes if
it happens to hot nodes in the btree. Should we be trying to lock some of
these nodes down from reclaim? If so, how many is appropriate/safe?
- High reclaim CPU usage [1][4][6]
It is possible to rapidly generate a very large amount of direct reclaim, for
example by doing parallel page cache reads larger than the cgroup limit from
many tasks in a memory.[high|max] constrained cgroup. This will then use a great
deal of CPU attempting to do the direct reclaim. This CPU usage can become so
extreme, and can be emphasized with cpuset cgroups, that we end up being unable
to schedule tasks holding important shared locks and massively tank the
throughput of the system. I have been able to reproduce conditions where even
killing the offending cgroup can take minutes. Some crude early experiments have
shown that throttling the reclaim cpu usage reduces the intensity of some of
these problems. Can this also be attacked via cgroup cpu throttling? Proxy
execution? What about the same issues under significant global direct reclaim?
- Filesystem doing expensive work while in direct reclaim [5]
In BTRFS, compression can result in relatively expensive work while trying to do
writeback urgently. Jan brought up issues around synchronous expensive work in
inode reclaim as an LSF/MM/BPF topic already.
Thanks for reading and thanks in advance for any feedback and thoughts,
Boris
Links:
[1] btrfs memcg accounting separation (AS_KERNEL_FILE)
https://lore.kernel.org/linux-btrfs/f09c4e2c90351d4cb30a1969f7a863b9238bd291.1755812945.git.boris@bur.io/
[2] btrfs readahead direct reclaim reduction
https://lore.kernel.org/linux-btrfs/9fd974c2-00aa-4906-8cab-ec0d85750c4b@gmx.com/
[3] btrfs re-cowing inhibition
https://lore.kernel.org/linux-btrfs/cover.1772097864.git.loemra.dev@gmail.com/
[4] btrfs csum tree write locking reduction
Link: https://lore.kernel.org/linux-btrfs/aa5a3d849cb093a767e08616258c03c7eec8fe26.1753806780.git.boris@bur.io/#r
[5] Jan Kara's proposal to discuss complex cleanup in reclaim
https://lore.kernel.org/linux-fsdevel/c18f8189b755c13064f51d93bfcaddb15300f9f8.camel@kernel.org/T/#m319eb6245485bb7c71171a55bf700cc1409a144d
[6] LPC previous discussion of cpu hogging and locks (unrelated to reclaim).
https://www.youtube.com/watch?v=_N-nXJHiDNo
^ permalink raw reply [flat|nested] only message in thread
only message in thread, other threads:[~2026-04-09 21:09 UTC | newest]
Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-04-09 21:09 [LSF/MM/BPF TOPIC] Direct Reclaim and Filesystems Boris Burkov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox