Re: Splitting pinned folios - David Hildenbrand

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Matthew Wilcox <willy@infradead.org>, Jane Chu <jane.chu@oracle.com>
Cc: linux-mm@kvack.org, John Hubbard <jhubbard@nvidia.com>
Subject: Re: Splitting pinned folios
Date: Wed, 13 Mar 2024 10:20:46 +0100	[thread overview]
Message-ID: <57c9e228-9aca-4da6-a714-f175f053ff50@redhat.com> (raw)
In-Reply-To: <ZfEaetrM3P_nR41X@casper.infradead.org>

On 13.03.24 04:16, Matthew Wilcox wrote:
> On Tue, Mar 12, 2024 at 06:23:43PM -0700, Jane Chu wrote:
>> I noticed this recently
> 
> OK, this is entirely different, so I'm going to start a new thread ;-)
> 
>>   * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>>   * they are not mapped.
>>   *
>>   * Returns 0 if the hugepage is split successfully.
>>   * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>>   * us.
>>   */
>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>> {
>>
>> I have a test case with poisoned shmem THP page that was mlocked and
>>
>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
> 
> I'm going to blame John for this!  

The description is wrong. Whoever calls split_huge_page_to_list() must 
hold a folio reference.

That folio reference will be transferred to @page (not the head page) 
once split. So @page can be used by the caller after the split succeeded.

> There's no reference to pincount
> anywhere in huge_memory.c, so I have no clue how this comment is even

Each pincount increment/decrement must be paired with a folio refcount 
increment. Therefore, no pincount checks are required.

> close to true, nor do I understand how it could be done, since we don't
> know which pages in a folio are pinned.

As the description correctly says: "Returns -EBUSY if the page is pinned".

If that is not true, we'd have a real issue.

> 
> I think we have to prohibit splits of folios that are GUP pinned.
> 

In split_huge_page_to_list(), we make sure there are no additional folio 
references of any kind (GUP pin, whatsoever).

can_split_folio() is racy but catches most of that. Then, we do the 
folio_ref_freeze().

So I don't see how that could ever work with additional folio references 
(including GUP pins). Unless serious BUG somewhere else.

In essence: we expect on a folio after completely unmapping it:
* 1 reference from the caller of split_huge_page_to_list()
* pagecache: 1 reference per subpage from the pagecache
* anon: 1 reference per subage from the swapcache if in the swapcache

Any additional reference would lead to a split failure.

We're holding the folio lock, so for anon folios we cannot remove it 
from the swapcache concurrently.

For pagecache folios ... dunno :) I expect some folio-lock magic as well.

Reading "I have a test case with poisoned shmem THP page that was 
mlocked and GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded."

If that is indeed true, I assume that page poisoning might have done 
something very wrong with the large folio: for example, partially unmap 
it from the pagecache (if that's even possible?) or accidentally drop a 
folio reference. Then, we'd be missing to detecting the GUP pin when 
freezing the refcount.

... any chance we can get the reproducer? [reading this mail from Willy 
only]

-- 
Cheers,

David / dhildenb

next prev parent reply	other threads:[~2024-03-13  9:20 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-13  3:16 Matthew Wilcox
2024-03-13  9:20 ` David Hildenbrand [this message]
2024-03-13 15:27   ` Matthew Wilcox
2024-03-13 16:53     ` David Hildenbrand
2024-03-13 18:52       ` Jane Chu
2024-03-13 22:25     ` John Hubbard
2024-03-14  2:46   ` updated documentation: " John Hubbard
2024-03-14 16:22     ` David Hildenbrand
2024-03-14 17:51       ` John Hubbard
2024-03-14 17:45     ` Matthew Wilcox
2024-03-14 17:57       ` John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57c9e228-9aca-4da6-a714-f175f053ff50@redhat.com \
    --to=david@redhat.com \
    --cc=jane.chu@oracle.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox