From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>, Jane Chu <jane.chu@oracle.com>
Cc: linux-mm@kvack.org, John Hubbard <jhubbard@nvidia.com>
Subject: Re: Splitting pinned folios
Date: Wed, 13 Mar 2024 10:20:46 +0100 [thread overview]
Message-ID: <57c9e228-9aca-4da6-a714-f175f053ff50@redhat.com> (raw)
In-Reply-To: <ZfEaetrM3P_nR41X@casper.infradead.org>
On 13.03.24 04:16, Matthew Wilcox wrote:
> On Tue, Mar 12, 2024 at 06:23:43PM -0700, Jane Chu wrote:
>> I noticed this recently
>
> OK, this is entirely different, so I'm going to start a new thread ;-)
>
>> * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>> * they are not mapped.
>> *
>> * Returns 0 if the hugepage is split successfully.
>> * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>> * us.
>> */
>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>> {
>>
>> I have a test case with poisoned shmem THP page that was mlocked and
>>
>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
>
> I'm going to blame John for this!
The description is wrong. Whoever calls split_huge_page_to_list() must
hold a folio reference.
That folio reference will be transferred to @page (not the head page)
once split. So @page can be used by the caller after the split succeeded.
> There's no reference to pincount
> anywhere in huge_memory.c, so I have no clue how this comment is even
Each pincount increment/decrement must be paired with a folio refcount
increment. Therefore, no pincount checks are required.
> close to true, nor do I understand how it could be done, since we don't
> know which pages in a folio are pinned.
As the description correctly says: "Returns -EBUSY if the page is pinned".
If that is not true, we'd have a real issue.
>
> I think we have to prohibit splits of folios that are GUP pinned.
>
In split_huge_page_to_list(), we make sure there are no additional folio
references of any kind (GUP pin, whatsoever).
can_split_folio() is racy but catches most of that. Then, we do the
folio_ref_freeze().
So I don't see how that could ever work with additional folio references
(including GUP pins). Unless serious BUG somewhere else.
In essence: we expect on a folio after completely unmapping it:
* 1 reference from the caller of split_huge_page_to_list()
* pagecache: 1 reference per subpage from the pagecache
* anon: 1 reference per subage from the swapcache if in the swapcache
Any additional reference would lead to a split failure.
We're holding the folio lock, so for anon folios we cannot remove it
from the swapcache concurrently.
For pagecache folios ... dunno :) I expect some folio-lock magic as well.
Reading "I have a test case with poisoned shmem THP page that was
mlocked and GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded."
If that is indeed true, I assume that page poisoning might have done
something very wrong with the large folio: for example, partially unmap
it from the pagecache (if that's even possible?) or accidentally drop a
folio reference. Then, we'd be missing to detecting the GUP pin when
freezing the refcount.
... any chance we can get the reproducer? [reading this mail from Willy
only]
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-03-13 9:20 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-03-13 3:16 Matthew Wilcox
2024-03-13 9:20 ` David Hildenbrand [this message]
2024-03-13 15:27 ` Matthew Wilcox
2024-03-13 16:53 ` David Hildenbrand
2024-03-13 18:52 ` Jane Chu
2024-03-13 22:25 ` John Hubbard
2024-03-14 2:46 ` updated documentation: " John Hubbard
2024-03-14 16:22 ` David Hildenbrand
2024-03-14 17:51 ` John Hubbard
2024-03-14 17:45 ` Matthew Wilcox
2024-03-14 17:57 ` John Hubbard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57c9e228-9aca-4da6-a714-f175f053ff50@redhat.com \
--to=david@redhat.com \
--cc=jane.chu@oracle.com \
--cc=jhubbard@nvidia.com \
--cc=linux-mm@kvack.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox