From: Tal Zussman <tz2294@columbia.edu>
To: "Matthew Wilcox (Oracle)" <willy@infradead.org>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>, Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
Brendan Jackman <jackmanb@google.com>,
Johannes Weiner <hannes@cmpxchg.org>, Zi Yan <ziy@nvidia.com>,
Jens Axboe <axboe@kernel.dk>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>
Cc: Christoph Hellwig <hch@infradead.org>,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
Tal Zussman <tz2294@columbia.edu>
Subject: [PATCH RFC v3 0/2] block: enable RWF_DONTCACHE for block devices
Date: Fri, 27 Feb 2026 11:41:06 -0500 [thread overview]
Message-ID: <20260227-blk-dontcache-v3-0-cd309ccd5868@columbia.edu> (raw)
Add support for using RWF_DONTCACHE with block devices and other
buffer_head-based I/O.
Dropbehind pruning needs to be done in non-IRQ context, but block
devices complete writeback in IRQ context. To fix this, we first defer
dropbehind completion initiated from IRQ context by scheduling a work
item to process a per-CPU batch of folios.
Then, add a block_write_begin_iocb() variant that threads the kiocb
through for RWF_DONTCACHE I/Os.
This support is useful for databases that operate on raw block devices,
among other userspace applications.
I tested this (with CONFIG_BUFFER_HEAD=y) for reads and writes on a
single block device on a VM, so results may be noisy.
Reads were tested on the root partition with a 45GB range (~2x RAM).
Writes were tested on a disabled swap parition (~1GB) in a memcg of size
244MB to force reclaim pressure.
Results:
===== READS (/dev/nvme0n1p2) =====
sec normal MB/s dontcache MB/s
---- ------------ --------------
1 993.9 1799.6
2 992.8 1693.8
3 923.4 2565.9
4 1013.5 3917.3
5 1557.9 2438.2
6 2363.4 1844.3
7 1447.9 2048.6
8 899.4 1951.7
9 1246.8 1756.1
10 1139.0 1665.6
11 1089.7 1707.7
12 1270.4 1736.5
13 1244.0 1756.3
14 1389.7 1566.2
---- ------------ --------------
avg 1258.0 2005.4 (+59%)
==== WRITES (/dev/nvme0n1p3) =====
sec normal MB/s dontcache MB/s
---- ------------ --------------
1 2396.1 9670.6
2 8444.8 9391.5
3 770.8 9400.8
4 61.5 9565.9
5 7701.0 8832.6
6 8634.3 9912.9
7 469.2 9835.4
8 8588.5 9587.2
9 8602.2 9334.8
10 591.1 8678.8
11 8528.7 3847.0
---- ------------ --------------
avg 4981.7 8914.3 (+79%)
---
Changes in v3:
- 1/2: Convert dropbehind deferral to per-CPU folio_batches protected by
local_lock using per-CPU work items, to reduce contention, per Jens.
- 1/2: Call folio_end_dropbehind_irq() directly from
folio_end_writeback(), per Jens.
- 1/2: Add CPU hotplug dead callback to drain the departing CPU's folio
batch.
- 2/2: Introduce block_write_begin_iocb(), per Christoph.
- 2/2: Dropped R-b due to changes.
- Link to v2: https://lore.kernel.org/r/20260225-blk-dontcache-v2-0-70e7ac4f7108@columbia.edu
Changes in v2:
- Add R-b from Jan Kara for 2/2.
- Add patch to defer dropbehind completion from IRQ context via a work
item (1/2).
- Add initial performance numbers to cover letter.
- Link to v1: https://lore.kernel.org/r/20260218-blk-dontcache-v1-1-fad6675ef71f@columbia.edu
---
Tal Zussman (2):
filemap: defer dropbehind invalidation from IRQ context
block: enable RWF_DONTCACHE for block devices
block/fops.c | 5 +-
fs/buffer.c | 19 ++++++-
include/linux/buffer_head.h | 3 +
include/linux/pagemap.h | 1 +
mm/filemap.c | 130 +++++++++++++++++++++++++++++++++++++++++---
mm/page_alloc.c | 1 +
6 files changed, 145 insertions(+), 14 deletions(-)
---
base-commit: 05f7e89ab9731565d8a62e3b5d1ec206485eeb0b
change-id: 20260218-blk-dontcache-338133dd045e
Best regards,
--
Tal Zussman <tz2294@columbia.edu>
next reply other threads:[~2026-02-27 16:41 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-27 16:41 Tal Zussman [this message]
2026-02-27 16:41 ` [PATCH RFC v3 1/2] filemap: defer dropbehind invalidation from IRQ context Tal Zussman
2026-02-27 16:41 ` [PATCH RFC v3 2/2] block: enable RWF_DONTCACHE for block devices Tal Zussman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260227-blk-dontcache-v3-0-cd309ccd5868@columbia.edu \
--to=tz2294@columbia.edu \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=axboe@kernel.dk \
--cc=brauner@kernel.org \
--cc=david@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=jack@suse.cz \
--cc=jackmanb@google.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=rppt@kernel.org \
--cc=surenb@google.com \
--cc=vbabka@suse.cz \
--cc=viro@zeniv.linux.org.uk \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox