From: Matthew Wilcox <willy@infradead.org>
To: John Hubbard <jhubbard@nvidia.com>
Cc: Christoph Hellwig <hch@infradead.org>,
David Hildenbrand <david@redhat.com>,
David Howells <dhowells@redhat.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
linux-mm@kvack.org, Jens Axboe <axboe@kernel.dk>,
linux-block@vger.kernel.org
Subject: Re: Does GUP page unpinning have to be done in the pinning context?
Date: Thu, 10 Apr 2025 20:14:47 +0100 [thread overview]
Message-ID: <Z_gYpwn5TvvYap6N@casper.infradead.org> (raw)
In-Reply-To: <21dfcbfc-5295-4493-8ae1-eaa82f018472@nvidia.com>
On Thu, Apr 10, 2025 at 12:11:42PM -0700, John Hubbard wrote:
> On 4/10/25 12:28 AM, Christoph Hellwig wrote:
> > On Wed, Apr 09, 2025 at 07:56:07PM -0700, John Hubbard wrote:
> >> This topic always worries me, because the original problem with
> >> dirty pages is still unfixed: setting pages dirty upon unpinning
> >> is both widely done (last time I checked), and yet broken, because
> >> it doesn't do a mkdirty() call to set up writeback buffers.
> >>
> >> The solution always seemed to point toward "get a file lease on that
> >> range, before pinning", but it's a contentious design area to say
> >> the least.
> >
> > For the bio based direct I/O implementations we do set the pages
> > dirty before starting I/O using bio_set_pages_dirty, which uses
> > folio_mark_dirty and thus calls into the file systems using
> > ->dirty_folio. But we also do a second pass on I/O completion
> > before the buffers are unpinned. Which I think now that we pin
> > the folios is superfluous.
> >
>
> Oh actually I think I was wrong in my earlier reply about clearing
> the dirty bit. Because in Jan Kara's original bug report, what
> happened was that periodic writeback came in while the pages
> were pinned, and cleared the dirty bit--and also deleted the
> page buffers (file system specific behavior) that are required
> for writeback.
>
> So then later when the pages are unpinned and marked dirty,
> that causes the next writeback to fail in an unexpected way
> (it used to cause ext4 BUG checks, in fact).
>
> So the problem here is that these pinned pages can get cleaned
> while they are pinned, and then dirtied again by DMA (invisible
> to the filesystem).
Did we fix that already? Because it's relatively easy to writeback
pinned pages and _not_ clear the dirty flag. That handles the two
problems which are falsely thinking that a heavily-mapped order-0 page
is pinned (we write it back anyway, so don't lose data on crash),
and doesn't strip the bufferheads.
next prev parent reply other threads:[~2025-04-10 19:14 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-04 10:20 David Howells
2025-04-04 10:29 ` David Hildenbrand
2025-04-04 16:59 ` John Hubbard
2025-04-07 6:39 ` Christoph Hellwig
2025-04-10 2:56 ` John Hubbard
2025-04-10 7:28 ` Christoph Hellwig
2025-04-10 19:11 ` John Hubbard
2025-04-10 19:14 ` Matthew Wilcox [this message]
2025-04-10 19:34 ` John Hubbard
2025-05-12 6:21 ` Christoph Hellwig
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z_gYpwn5TvvYap6N@casper.infradead.org \
--to=willy@infradead.org \
--cc=axboe@kernel.dk \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=hch@infradead.org \
--cc=jhubbard@nvidia.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox