From: Matthew Wilcox <willy@infradead.org>
To: Hugh Dickins <hughd@google.com>
Cc: David Hildenbrand <david@redhat.com>,
Andrew Morton <akpm@linux-foundation.org>,
Will Deacon <will@kernel.org>, Shivank Garg <shivankg@amd.com>,
Christoph Hellwig <hch@infradead.org>,
Keir Fraser <keirf@google.com>, Jason Gunthorpe <jgg@ziepe.ca>,
John Hubbard <jhubbard@nvidia.com>,
Frederick Mayle <fmayle@google.com>, Peter Xu <peterx@redhat.com>,
"Aneesh Kumar K.V" <aneesh.kumar@kernel.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Vlastimil Babka <vbabka@suse.cz>,
Alexander Krabler <Alexander.Krabler@kuka.com>,
Ge Yang <yangge1116@126.com>, Li Zhe <lizhe.67@bytedance.com>,
Chris Li <chrisl@kernel.org>, Yu Zhao <yuzhao@google.com>,
Axel Rasmussen <axelrasmussen@google.com>,
Yuanchu Xie <yuanchu@google.com>, Wei Xu <weixugc@google.com>,
Konstantin Khlebnikov <koct9i@gmail.com>,
David Howells <dhowells@redhat.com>,
ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org
Subject: Re: [PATCH 1/7] mm: fix folio_expected_ref_count() when PG_private_2
Date: Mon, 8 Sep 2025 16:06:10 +0100 [thread overview]
Message-ID: <aL7w4qrJtvKE1cu5@casper.infradead.org> (raw)
In-Reply-To: <3973ecd7-d99c-6d38-7b53-2f3fca57b48d@google.com>
On Mon, Sep 08, 2025 at 03:27:47AM -0700, Hugh Dickins wrote:
> On Mon, 1 Sep 2025, David Hildenbrand wrote:
> > On 01.09.25 09:52, David Hildenbrand wrote:
> > > On 01.09.25 03:17, Hugh Dickins wrote:
> > >> On Mon, 1 Sep 2025, Matthew Wilcox wrote:
> > >>> On Sun, Aug 31, 2025 at 02:01:16AM -0700, Hugh Dickins wrote:
> > >>>> 6.16's folio_expected_ref_count() is forgetting the PG_private_2 flag,
> > >>>> which (like PG_private, but not in addition to PG_private) counts for
> > >>>> 1 more reference: it needs to be using folio_has_private() in place of
> > >>>> folio_test_private().
> > >>>
> > >>> No, it doesn't. I know it used to, but no filesystem was actually doing
> > >>> that. So I changed mm to match how filesystems actually worked.
>
> I think Matthew may be remembering how he wanted it to behave (? but he
> wanted it to go away completely) rather than how it ended up behaving:
> we've both found that PG_private_2 always goes with refcount increment.
Let me explain that better. No filesystem followed the documented rule
that the refcount must be incremented by one if either PG_private or
PG_private_2 was set. And no surprise; that's a very complicated rule
for filesystems to follow. Many of them weren't even following the rule
to increment the refcount by one when PG_private was set.
So some were incrementing the refcount by one if PG_private were set, but
not bumping the refcount by one if PG_private_2 were set (I think this is
how btrfs worked, and you seem to believe the same thing). Others were
bumping the refcount by two if both PG_private and PG_private_2 were set
(I think this is how netfs works today).
> > > Now, one problem would be if migration / splitting / ... code where we
> > > use folio_expected_ref_count() cannot deal with that additional
> > > reference properly, in which case this patch would indeed cause harm.
>
> Yes, that appears to be why Matthew said NAK and "dangerously wrong".
>
> So far as I could tell, there is no problem with nfs, it has, and has
> all along had, the appropriate release_folio and migrate_folio methods.
>
> ceph used to have what's needed, but 6.0's changes from page_has_private()
> to folio_test_private() (the change from "has" either bit to "test" just
> the one bit really should have been highlighted) broke the migration of
> ceph's PG_private_2 folios.
>
> (I think it may have got re-enabled in intervening releases: David
> Howells reinstated folio_has_private() inside fallback_migrate_folio()'s
> filemap_release_folio(), which may have been enough to get ceph's
> PG_private_2s migratable again; but then 6.15's ceph .migrate_folio =
> filemap_migrate_folio will have broken it again.)
>
> Folio migration does not and never has copied over PG_private_2 from
> src to dst; so my 1/7 patch would have permitted migration of a ceph
> PG_private_2 src folio to a dst folio left with refcount 1 more than
> it should be (plus whatever the consequences of migrating such a
> folio which should have waited for the flag to be cleared first).
But that's another problem. The current meaning of PG_fscache (and also
that has changed over the years!) is that the data in the folio is being
written to the fscache. So we _shouldn't_ migrate the folio as some
piece of storage hardware is busy reading from the old folio. And if
somebody else starts writing to the old folio, we'll have a corrupted
fscache.
So the current behaviour where we set private_2 and bump the refcount,
but don't take the private_2 status into account is the safe one,
because the elevated refcount means we'll skip the PG_fscache folio.
Maybe it'd be better to wait for it to clear. But since Dave Howells
is busy killing it off, I'm just inclined to wait for that to happen.
> I'm just going to drop this 1/7, and add a (briefer than this!)
> paragraph to 2/7 == 1/6's commit message in v2 later today.
Thank you!
next prev parent reply other threads:[~2025-09-08 15:06 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-31 8:57 [PATCH 0/7] mm: better GUP pin lru_add_drain_all() Hugh Dickins
2025-08-31 9:01 ` [PATCH 1/7] mm: fix folio_expected_ref_count() when PG_private_2 Hugh Dickins
2025-08-31 23:37 ` Matthew Wilcox
2025-09-01 1:17 ` Hugh Dickins
2025-09-01 7:52 ` David Hildenbrand
2025-09-01 8:04 ` David Hildenbrand
2025-09-08 10:27 ` Hugh Dickins
2025-09-08 15:06 ` Matthew Wilcox [this message]
2025-09-08 19:47 ` Hugh Dickins
2025-08-31 9:05 ` [PATCH 2/7] mm/gup: check ref_count instead of lru before migration Hugh Dickins
2025-09-01 8:00 ` David Hildenbrand
2025-09-08 10:40 ` Hugh Dickins
2025-09-08 14:08 ` David Hildenbrand
2025-09-08 19:57 ` Hugh Dickins
2025-09-08 20:17 ` David Hildenbrand
2025-08-31 9:08 ` [PATCH 3/7] mm/gup: local lru_add_drain() to avoid lru_add_drain_all() Hugh Dickins
2025-09-01 8:05 ` David Hildenbrand
2025-09-08 10:53 ` Hugh Dickins
2025-08-31 9:11 ` [PATCH 4/7] mm: Revert "mm/gup: clear the LRU flag of a page before adding to LRU batch" Hugh Dickins
2025-09-01 8:06 ` David Hildenbrand
2025-08-31 9:13 ` [PATCH 5/7] mm: Revert "mm: vmscan.c: fix OOM on swap stress test" Hugh Dickins
2025-09-01 8:07 ` David Hildenbrand
2025-08-31 9:16 ` [PATCH 6/7] mm: folio_may_be_cached() unless folio_test_large() Hugh Dickins
2025-09-01 8:13 ` David Hildenbrand
2025-09-08 11:19 ` Hugh Dickins
2025-09-08 14:35 ` David Hildenbrand
2025-09-08 20:04 ` Hugh Dickins
2025-09-08 20:11 ` David Hildenbrand
2025-08-31 9:18 ` [PATCH 7/7] mm: lru_add_drain_all() do local lru_add_drain() first Hugh Dickins
2025-09-01 8:14 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aL7w4qrJtvKE1cu5@casper.infradead.org \
--to=willy@infradead.org \
--cc=Alexander.Krabler@kuka.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@kernel.org \
--cc=axelrasmussen@google.com \
--cc=ceph-devel@vger.kernel.org \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=fmayle@google.com \
--cc=hannes@cmpxchg.org \
--cc=hch@infradead.org \
--cc=hughd@google.com \
--cc=jgg@ziepe.ca \
--cc=jhubbard@nvidia.com \
--cc=keirf@google.com \
--cc=koct9i@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizhe.67@bytedance.com \
--cc=peterx@redhat.com \
--cc=shivankg@amd.com \
--cc=vbabka@suse.cz \
--cc=weixugc@google.com \
--cc=will@kernel.org \
--cc=yangge1116@126.com \
--cc=yuanchu@google.com \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox