Re: O_DIRECT vs BLK_FEAT_STABLE_WRITES, was Re: [PATCH] btrfs: never trust the bio from direct IO

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jan Kara <jack@suse.cz>
To: Christoph Hellwig <hch@infradead.org>
Cc: Jan Kara <jack@suse.cz>, Qu Wenruo <wqu@suse.com>,
	 linux-btrfs@vger.kernel.org, djwong@kernel.org,
	linux-xfs@vger.kernel.org,  linux-fsdevel@vger.kernel.org,
	linux-block@vger.kernel.org, linux-mm@kvack.org,
	 martin.petersen@oracle.com, jack@suse.com
Subject: Re: O_DIRECT vs BLK_FEAT_STABLE_WRITES, was Re: [PATCH] btrfs: never trust the bio from direct IO
Date: Mon, 20 Oct 2025 15:59:33 +0200	[thread overview]
Message-ID: <mciqzktudhier5d2wvjmh4odwqdszvbtcixbthiuuwrufrw3cj@5s2ffnffu4gc> (raw)
In-Reply-To: <aPYgm3ey4eiFB4_o@infradead.org>

On Mon 20-10-25 04:44:27, Christoph Hellwig wrote:
> On Mon, Oct 20, 2025 at 01:16:39PM +0200, Jan Kara wrote:
> > Hmm, this is an interesting twist in the problems with pinned pages - so
> > far I was thinking about problems where pinned page cache page gets
> > modified (e.g. through DIO or RDMA) and this causes checksum failures if
> > it races with writeback. If I understand you right, now you are concerned
> > about a situation where some page is used as a buffer for direct IO write
> > / RDMA and it gets modified while the DMA is running which causes checksum
> > mismatch?
> 
> Really all of the above.  Even worse this can also happen for reads,
> e.g. when the parity or checksum is calculated in the user buffer.

OK.

> > Writeprotecting the buffer before the DIO starts isn't that hard
> > to do (although it has a non-trivial cost) but we don't have a mechanism to
> > make sure the page cannot be writeably mapped while it is pinned (and
> > avoiding that without introducing deadlocks would be *fun*).
> 
> Well, this goes back to the old idea of maybe bounce buffering in that
> case?

The idea was to bounce buffer the page we are writing back in case we spot
a long-term pin we cannot just wait for - hence bouncing should be rare.
But in this more general setting it is challenging to not bounce buffer for
every IO (in which case you'd be basically at performance of RWF_DONTCACHE
IO or perhaps worse so why bother?). Essentially if you hand out the real
page underlying the buffer for the IO, all other attemps to do IO to that
page have to block - bouncing is no longer an option because even with
bouncing the second IO we could still corrupt data of the first IO once we
copy to the final buffer. And if we'd block waiting for the first IO to
complete, userspace could construct deadlock cycles - like racing IO to
pages A, B with IO to pages B, A. So far I'm not sure about a sane way out
of this...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

next prev parent reply	other threads:[~2025-10-20 13:59 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1ee861df6fbd8bf45ab42154f429a31819294352.1760951886.git.wqu@suse.com>
2025-10-20 10:00 ` Christoph Hellwig
2025-10-20 10:24   ` Qu Wenruo
2025-10-20 11:45     ` Christoph Hellwig
2025-10-20 11:16   ` Jan Kara
2025-10-20 11:44     ` Christoph Hellwig
2025-10-20 13:59       ` Jan Kara [this message]
2025-10-20 14:59         ` Matthew Wilcox
2025-10-20 15:58           ` Jan Kara
2025-10-20 17:55             ` John Hubbard
2025-10-21  8:27               ` Jan Kara
2025-10-21 16:56                 ` John Hubbard
2025-10-20 19:00             ` David Hildenbrand
2025-10-21  7:49               ` Christoph Hellwig
2025-10-21  7:57                 ` David Hildenbrand
2025-10-21  9:33                   ` Jan Kara
2025-10-21  9:43                     ` David Hildenbrand
2025-10-21  9:22                 ` Jan Kara
2025-10-21  9:37                   ` David Hildenbrand
2025-10-21  9:52                     ` Jan Kara
2025-10-21  3:17   ` Qu Wenruo
2025-10-21  7:48     ` Christoph Hellwig
2025-10-21  8:15       ` Qu Wenruo
2025-10-21 11:30         ` Johannes Thumshirn
2025-10-22  2:27           ` Qu Wenruo
2025-10-22  5:04             ` hch
2025-10-22  6:17               ` Qu Wenruo
2025-10-22  6:24                 ` hch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=mciqzktudhier5d2wvjmh4odwqdszvbtcixbthiuuwrufrw3cj@5s2ffnffu4gc \
    --to=jack@suse.cz \
    --cc=djwong@kernel.org \
    --cc=hch@infradead.org \
    --cc=jack@suse.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-btrfs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=wqu@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox