From: Qu Wenruo <wqu@suse.com>
To: linux-btrfs@vger.kernel.org, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: [PATCH v3 0/3] btrfs: only use bdev's page cache for super block writeback
Date: Sat, 10 Jan 2026 14:26:18 +1030 [thread overview]
Message-ID: <cover.1768017091.git.wqu@suse.com> (raw)
[CHANGELOG]
v3:
- Rebased to the latest for-next
There is minor conflicts against the recent fix on
read_cache_page_gfp().
- Still use the folio locked flag to track writeback
There is a patch that reduced the size of btrfs_device to exactly 512
bytes, adding new wait and atomic is not that worthy anymore
- Delete read_cache_page_gfp() function completely
Btrfs is the last user of that function.
v2:
- Still use page cache for super block writes
This is to ensure the user space won't see any half-backed super block
caused by the race between bio writes and buffered read on the bdev.
This is exposed by generic/492 which user space command blkid may
fail to see the updated superblock.
This also brings a slight imbalance, that our super block read is
always uncached, but the superblock write is always cached.
RFC->v1:
- Make sb_write_pointer() use bdev_rw_virt()
That is the missing location that still uses bdev's page cache, thanks
Johannes for exposing this one.
- Replace btrfs_release_disk_super() with kfree()
There is no need to keep that helper, and such replace will help us
exposing locations which are still using the old page cache, like the
above case.
- Only scratch the magic number of a super block in
btrfs_scratch_superblock()
To keep the behavior the same.
- Use GFP_NOFS when allocating memory
This is also to keep the old behavior.
Although I'd say btrfs_read_disk_super() call sites are safe, as they
are either scanning a device, or at mount time, thus out of the write
path and should be safe.
The sb_write_pointer() one still needs the old GFP_NOFS flag as they
can be called when writing the super block.
Btrfs has a long history using bdev's page cache for super block IOs.
It looks even weird in the older days that we manually setting different
page flags without going through the regular dirty -> lock -> writeback
-> clear writeback sequence.
Thankfully we're moving away from unnecessary bdev's page flag
modification, starting with commit bc00965dbff7 ("btrfs: count super
block write errors in device instead of tracking folio error state"),
we no longer relies on page cache to detect super block IO errors.
But we're still using the bdev's page cache for:
- Reading super blocks
Reading a whole folio just to grab a 4KiB super block can be
overkilled.
And this is the easiest one to kill, just kmalloc() and bdev_rw_virt() will
handle it well.
- Scratching super blocks
We can use bdev_rw_virt() to write a super block with its magic
zeroed.
However we also need to invalidate the cache to ensure the user space
won't see the out-of-date cached super block.
- Writing super blocks
We're using the page cache of bdev, for a different purpose.
We want to ensure the user space scanning tools like blkid seeing a
consistent content.
If we just go the bdev_rw_virt() path, the user space read can race
with our bio write, resulting inconsistent contents.
So here we still need to utilize the page cache of bdev, but with
comments explaining why we need to.
However this brings one small change:
- Device scan is no longer cached
For mount time it's totally fine, but every time a btrfs device is
touched, we will submit a 4K sync read from the disk.
The cost may not be that huge though.
Qu Wenruo (3):
btrfs: use bdev_rw_virt() to read and scratch the disk super block
btrfs: minor improvement on super block writeback
mm/filemap: remove read_cache_page_gfp()
fs/btrfs/disk-io.c | 45 +++++++++++++++----------
fs/btrfs/super.c | 4 +--
fs/btrfs/volumes.c | 74 ++++++++++++++++-------------------------
fs/btrfs/volumes.h | 4 +--
fs/btrfs/zoned.c | 26 +++++++++------
include/linux/pagemap.h | 2 --
mm/filemap.c | 23 -------------
7 files changed, 74 insertions(+), 104 deletions(-)
--
2.52.0
next reply other threads:[~2026-01-10 3:56 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-01-10 3:56 Qu Wenruo [this message]
2026-01-10 3:56 ` [PATCH v3 1/3] btrfs: use bdev_rw_virt() to read and scratch the disk super block Qu Wenruo
2026-01-10 5:56 ` Matthew Wilcox
2026-01-10 6:02 ` Qu Wenruo
2026-01-10 3:56 ` [PATCH v3 2/3] btrfs: minor improvement on super block writeback Qu Wenruo
2026-01-10 3:56 ` [PATCH v3 3/3] mm/filemap: remove read_cache_page_gfp() Qu Wenruo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=cover.1768017091.git.wqu@suse.com \
--to=wqu@suse.com \
--cc=linux-btrfs@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox