* Splitting pinned folios
@ 2024-03-13 3:16 Matthew Wilcox
2024-03-13 9:20 ` David Hildenbrand
0 siblings, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2024-03-13 3:16 UTC (permalink / raw)
To: Jane Chu; +Cc: linux-mm, John Hubbard, David Hildenbrand
On Tue, Mar 12, 2024 at 06:23:43PM -0700, Jane Chu wrote:
> I noticed this recently
OK, this is entirely different, so I'm going to start a new thread ;-)
> * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
> * they are not mapped.
> *
> * Returns 0 if the hugepage is split successfully.
> * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
> * us.
> */
> int split_huge_page_to_list(struct page *page, struct list_head *list)
> {
>
> I have a test case with poisoned shmem THP page that was mlocked and
>
> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
I'm going to blame John for this! There's no reference to pincount
anywhere in huge_memory.c, so I have no clue how this comment is even
close to true, nor do I understand how it could be done, since we don't
know which pages in a folio are pinned.
I think we have to prohibit splits of folios that are GUP pinned.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Splitting pinned folios
2024-03-13 3:16 Splitting pinned folios Matthew Wilcox
@ 2024-03-13 9:20 ` David Hildenbrand
2024-03-13 15:27 ` Matthew Wilcox
2024-03-14 2:46 ` updated documentation: " John Hubbard
0 siblings, 2 replies; 11+ messages in thread
From: David Hildenbrand @ 2024-03-13 9:20 UTC (permalink / raw)
To: Matthew Wilcox, Jane Chu; +Cc: linux-mm, John Hubbard
On 13.03.24 04:16, Matthew Wilcox wrote:
> On Tue, Mar 12, 2024 at 06:23:43PM -0700, Jane Chu wrote:
>> I noticed this recently
>
> OK, this is entirely different, so I'm going to start a new thread ;-)
>
>> * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>> * they are not mapped.
>> *
>> * Returns 0 if the hugepage is split successfully.
>> * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>> * us.
>> */
>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>> {
>>
>> I have a test case with poisoned shmem THP page that was mlocked and
>>
>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
>
> I'm going to blame John for this!
The description is wrong. Whoever calls split_huge_page_to_list() must
hold a folio reference.
That folio reference will be transferred to @page (not the head page)
once split. So @page can be used by the caller after the split succeeded.
> There's no reference to pincount
> anywhere in huge_memory.c, so I have no clue how this comment is even
Each pincount increment/decrement must be paired with a folio refcount
increment. Therefore, no pincount checks are required.
> close to true, nor do I understand how it could be done, since we don't
> know which pages in a folio are pinned.
As the description correctly says: "Returns -EBUSY if the page is pinned".
If that is not true, we'd have a real issue.
>
> I think we have to prohibit splits of folios that are GUP pinned.
>
In split_huge_page_to_list(), we make sure there are no additional folio
references of any kind (GUP pin, whatsoever).
can_split_folio() is racy but catches most of that. Then, we do the
folio_ref_freeze().
So I don't see how that could ever work with additional folio references
(including GUP pins). Unless serious BUG somewhere else.
In essence: we expect on a folio after completely unmapping it:
* 1 reference from the caller of split_huge_page_to_list()
* pagecache: 1 reference per subpage from the pagecache
* anon: 1 reference per subage from the swapcache if in the swapcache
Any additional reference would lead to a split failure.
We're holding the folio lock, so for anon folios we cannot remove it
from the swapcache concurrently.
For pagecache folios ... dunno :) I expect some folio-lock magic as well.
Reading "I have a test case with poisoned shmem THP page that was
mlocked and GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded."
If that is indeed true, I assume that page poisoning might have done
something very wrong with the large folio: for example, partially unmap
it from the pagecache (if that's even possible?) or accidentally drop a
folio reference. Then, we'd be missing to detecting the GUP pin when
freezing the refcount.
... any chance we can get the reproducer? [reading this mail from Willy
only]
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Splitting pinned folios
2024-03-13 9:20 ` David Hildenbrand
@ 2024-03-13 15:27 ` Matthew Wilcox
2024-03-13 16:53 ` David Hildenbrand
2024-03-13 22:25 ` John Hubbard
2024-03-14 2:46 ` updated documentation: " John Hubbard
1 sibling, 2 replies; 11+ messages in thread
From: Matthew Wilcox @ 2024-03-13 15:27 UTC (permalink / raw)
To: David Hildenbrand; +Cc: Jane Chu, linux-mm, John Hubbard
On Wed, Mar 13, 2024 at 10:20:46AM +0100, David Hildenbrand wrote:
> On 13.03.24 04:16, Matthew Wilcox wrote:
> > On Tue, Mar 12, 2024 at 06:23:43PM -0700, Jane Chu wrote:
> > > I noticed this recently
> >
> > OK, this is entirely different, so I'm going to start a new thread ;-)
> >
> > > * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
> > > * they are not mapped.
> > > *
> > > * Returns 0 if the hugepage is split successfully.
> > > * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
> > > * us.
> > > */
> > > int split_huge_page_to_list(struct page *page, struct list_head *list)
> > > {
> > >
> > > I have a test case with poisoned shmem THP page that was mlocked and
> > >
> > > GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
> >
> > I'm going to blame John for this!
>
> The description is wrong. Whoever calls split_huge_page_to_list() must hold
> a folio reference.
>
> That folio reference will be transferred to @page (not the head page) once
> split. So @page can be used by the caller after the split succeeded.
>
>
> > There's no reference to pincount
> > anywhere in huge_memory.c, so I have no clue how this comment is even
>
> Each pincount increment/decrement must be paired with a folio refcount
> increment. Therefore, no pincount checks are required.
I'd forgotten that. Any GUP pin will force failure of split. So I
don't know what Jane is seeing. memory_failure() tries to split THPs
and outputs an error if it can't. It doesn't try to handle them.
It probably should, but that's a subject for a different thread.
> In essence: we expect on a folio after completely unmapping it:
> * 1 reference from the caller of split_huge_page_to_list()
> * pagecache: 1 reference per subpage from the pagecache
* 1 reference if the folio has private data (thank you, bufferheads)
Oh. Is that the bug? can_split_folio() doesn't know that. This
usually isn't a problem because, eg, truncate_inode_partial_folio()
will remove the private data before calling split_folio(). But
memory-failure doesn't know about that rule ...
No, that's not it. shmem doesn't use the folio private flag. It's
still a bug, but it's not Jane's bug.
Jane, can we have the results of calling dump_page() on the page?
> Reading "I have a test case with poisoned shmem THP page that was mlocked
> and GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded."
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Splitting pinned folios
2024-03-13 15:27 ` Matthew Wilcox
@ 2024-03-13 16:53 ` David Hildenbrand
2024-03-13 18:52 ` Jane Chu
2024-03-13 22:25 ` John Hubbard
1 sibling, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-03-13 16:53 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Jane Chu, linux-mm, John Hubbard
> * 1 reference if the folio has private data (thank you, bufferheads)
>
> Oh. Is that the bug? can_split_folio() doesn't know that. This
> usually isn't a problem because, eg, truncate_inode_partial_folio()
> will remove the private data before calling split_folio(). But
> memory-failure doesn't know about that rule ...
The worst thing that could happen is that splitting the folio would fail
(because one more unexpected reference), not that we would split where
we shouldn't, right?
>
> No, that's not it. shmem doesn't use the folio private flag. It's
> still a bug, but it's not Jane's bug.
Maybe *something* really accidentally dropped a page reference :/
Reproducer + details would be great.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Splitting pinned folios
2024-03-13 16:53 ` David Hildenbrand
@ 2024-03-13 18:52 ` Jane Chu
0 siblings, 0 replies; 11+ messages in thread
From: Jane Chu @ 2024-03-13 18:52 UTC (permalink / raw)
To: David Hildenbrand, Matthew Wilcox; +Cc: linux-mm, John Hubbard, Jane Chu
On 3/13/2024 9:53 AM, David Hildenbrand wrote:
>> * 1 reference if the folio has private data (thank you, bufferheads)
>>
>> Oh. Is that the bug? can_split_folio() doesn't know that. This
>> usually isn't a problem because, eg, truncate_inode_partial_folio()
>> will remove the private data before calling split_folio(). But
>> memory-failure doesn't know about that rule ...
>
> The worst thing that could happen is that splitting the folio would
> fail (because one more unexpected reference), not that we would split
> where we shouldn't, right?
>
>>
>> No, that's not it. shmem doesn't use the folio private flag. It's
>> still a bug, but it's not Jane's bug.
>
> Maybe *something* really accidentally dropped a page reference :/
> Reproducer + details would be great.
>
Thank you Matthew and David, let me take some time digesting points you
raised, and rerun the test, hopefully not only on my hacked up kernel
for emulating GUP pin, but with real mr_register(), and report my findings.
thanks!
-jane
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Splitting pinned folios
2024-03-13 15:27 ` Matthew Wilcox
2024-03-13 16:53 ` David Hildenbrand
@ 2024-03-13 22:25 ` John Hubbard
1 sibling, 0 replies; 11+ messages in thread
From: John Hubbard @ 2024-03-13 22:25 UTC (permalink / raw)
To: Matthew Wilcox, David Hildenbrand; +Cc: Jane Chu, linux-mm
On 3/13/24 08:27, Matthew Wilcox wrote:
> On Wed, Mar 13, 2024 at 10:20:46AM +0100, David Hildenbrand wrote:
>> On 13.03.24 04:16, Matthew Wilcox wrote:
>>> On Tue, Mar 12, 2024 at 06:23:43PM -0700, Jane Chu wrote:
>>>> I noticed this recently
>>>
>>> OK, this is entirely different, so I'm going to start a new thread ;-)
>>>
>>>> * GUP pin and PG_locked transferred to @page. Rest subpages can be freed if
>>>> * they are not mapped.
>>>> *
>>>> * Returns 0 if the hugepage is split successfully.
>>>> * Returns -EBUSY if the page is pinned or if anon_vma disappeared from under
>>>> * us.
>>>> */
>>>> int split_huge_page_to_list(struct page *page, struct list_head *list)
>>>> {
>>>>
>>>> I have a test case with poisoned shmem THP page that was mlocked and
>>>>
>>>> GUP pinned (FOLL_LONGTERM|FOLL_WRITE), but the split succeeded.
>>>
>>> I'm going to blame John for this!
That's a reasonable initial step, yes. haha :)
>>
>> The description is wrong. Whoever calls split_huge_page_to_list() must hold
>> a folio reference.
>>
>> That folio reference will be transferred to @page (not the head page) once
>> split. So @page can be used by the caller after the split succeeded.
>>
Thanks David for responding so quickly with these gup/pup answers. Perhaps I
can fix up that documentation comment a bit.
>>
>>> There's no reference to pincount
>>> anywhere in huge_memory.c, so I have no clue how this comment is even
>>
>> Each pincount increment/decrement must be paired with a folio refcount
>> increment. Therefore, no pincount checks are required.
Yes! In try_grab_folio(), that behavior is documented, although due to
code movement and refactoring, the comment refers to behavior a few lines
above it:
/*
* When pinning a large folio, use an exact count to track it.
*
* However, be sure to *also* increment the normal folio
* refcount field at least once, so that the folio really
* is pinned. That's why the refcount from the earlier
* try_get_folio() is left intact.
*/
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply [flat|nested] 11+ messages in thread
* updated documentation: Splitting pinned folios
2024-03-13 9:20 ` David Hildenbrand
2024-03-13 15:27 ` Matthew Wilcox
@ 2024-03-14 2:46 ` John Hubbard
2024-03-14 16:22 ` David Hildenbrand
2024-03-14 17:45 ` Matthew Wilcox
1 sibling, 2 replies; 11+ messages in thread
From: John Hubbard @ 2024-03-14 2:46 UTC (permalink / raw)
To: David Hildenbrand, Matthew Wilcox, Jane Chu; +Cc: linux-mm
On 3/13/24 2:20 AM, David Hildenbrand wrote:
...
> The description is wrong. Whoever calls split_huge_page_to_list() must hold a folio reference.
>
> That folio reference will be transferred to @page (not the head page) once split. So @page can be used by the caller after the split succeeded.
>
David and all, does this updated draft comment look accurate?
/*
* This function splits a huge page into normal pages. @page can point to any
* subpage of the huge page to split. The split operation does not change the
* position of @page.
*
* Prerequisites:
*
* 1) The caller must hold a reference on the @page's owning folio, also known as
* the huge page.
*
* 2) The huge page must be locked.
*
* 3) The folio must not be pinned. Pinned folios will not be split; instead,
* the caller will receive an -EBUSY.
*
* After splitting, the folio's refcount is transfered to @page (not the head
* page, unless @page is actually the head page). The other subpages may be
* freed if they are not mapped.
*
* If @list is null, tail pages will be added to LRU list, otherwise, to @list.
*
* Both head page and tail pages will inherit mapping, flags, and so on from the
* hugepage.
*
* Returns 0 if the hugepage was split successfully.
*
* Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared
* from under us.
*/
int split_huge_page_to_list(struct page *page, struct list_head *list)
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: updated documentation: Splitting pinned folios
2024-03-14 2:46 ` updated documentation: " John Hubbard
@ 2024-03-14 16:22 ` David Hildenbrand
2024-03-14 17:51 ` John Hubbard
2024-03-14 17:45 ` Matthew Wilcox
1 sibling, 1 reply; 11+ messages in thread
From: David Hildenbrand @ 2024-03-14 16:22 UTC (permalink / raw)
To: John Hubbard, Matthew Wilcox, Jane Chu; +Cc: linux-mm
On 14.03.24 03:46, John Hubbard wrote:
> On 3/13/24 2:20 AM, David Hildenbrand wrote:
> ...
>> The description is wrong. Whoever calls split_huge_page_to_list() must hold a folio reference.
>>
>> That folio reference will be transferred to @page (not the head page) once split. So @page can be used by the caller after the split succeeded.
>>
>
> David and all, does this updated draft comment look accurate?
>
Note that in mm-unstable (maybe even mm-stable already), the function is
now called split_huge_page_to_list_to_order().
> /*
> * This function splits a huge page into normal pages. @page can point to any
> * subpage of the huge page to split. The split operation does not change the
> * position of @page.
> *
> * Prerequisites:
> *
> * 1) The caller must hold a reference on the @page's owning folio, also known as
> * the huge page.
> *
> * 2) The huge page must be locked.
> *
> * 3) The folio must not be pinned. Pinned folios will not be split; instead,
> * the caller will receive an -EBUSY.
Maybe focus on unexpected folio references:
3) The folio must not be pinned. Any unexpected folio references,
including GUP pins, will result in the folio not getting split; instead ...
> * > * After splitting, the folio's refcount is transfered to @page
(not the head
s/transfered/transferred/
> * page, unless @page is actually the head page). The other subpages may be
> * freed if they are not mapped.
"folio's refcount" might be a bit misleading. It's more like
"The caller's folio reference will be transferred to @page, resulting in
a raised refcount of @page after this call ... "
> *
> * If @list is null, tail pages will be added to LRU list, otherwise, to @list.
> *
> * Both head page and tail pages will inherit mapping, flags, and so on from the
> * hugepage.
> *
> * Returns 0 if the hugepage was split successfully.
> *
> * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared
> * from under us.
> */
> int split_huge_page_to_list(struct page *page, struct list_head *list)
>
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: updated documentation: Splitting pinned folios
2024-03-14 2:46 ` updated documentation: " John Hubbard
2024-03-14 16:22 ` David Hildenbrand
@ 2024-03-14 17:45 ` Matthew Wilcox
2024-03-14 17:57 ` John Hubbard
1 sibling, 1 reply; 11+ messages in thread
From: Matthew Wilcox @ 2024-03-14 17:45 UTC (permalink / raw)
To: John Hubbard; +Cc: David Hildenbrand, Jane Chu, linux-mm
On Wed, Mar 13, 2024 at 07:46:54PM -0700, John Hubbard wrote:
> /*
> * This function splits a huge page into normal pages. @page can point to any
s/huge page/large folio/ and s/normal/smaller/ ?
> * subpage of the huge page to split. The split operation does not change the
Trying to get away from "subpage", maybe "can point to any page within
the folio"?
I think following these suggestions throughout the text below will make
it clearer ...
> * position of @page.
> *
> * Prerequisites:
> *
> * 1) The caller must hold a reference on the @page's owning folio, also known as
> * the huge page.
> *
> * 2) The huge page must be locked.
> *
> * 3) The folio must not be pinned. Pinned folios will not be split; instead,
> * the caller will receive an -EBUSY.
> *
> * After splitting, the folio's refcount is transfered to @page (not the head
> * page, unless @page is actually the head page). The other subpages may be
> * freed if they are not mapped.
> *
> * If @list is null, tail pages will be added to LRU list, otherwise, to @list.
> *
> * Both head page and tail pages will inherit mapping, flags, and so on from the
> * hugepage.
> *
> * Returns 0 if the hugepage was split successfully.
> *
> * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared
> * from under us.
> */
> int split_huge_page_to_list(struct page *page, struct list_head *list)
>
> thanks,
> --
> John Hubbard
> NVIDIA
>
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: updated documentation: Splitting pinned folios
2024-03-14 16:22 ` David Hildenbrand
@ 2024-03-14 17:51 ` John Hubbard
0 siblings, 0 replies; 11+ messages in thread
From: John Hubbard @ 2024-03-14 17:51 UTC (permalink / raw)
To: David Hildenbrand, Matthew Wilcox, Jane Chu; +Cc: linux-mm
On 3/14/24 09:22, David Hildenbrand wrote:
> On 14.03.24 03:46, John Hubbard wrote:
>> On 3/13/24 2:20 AM, David Hildenbrand wrote:
>> ...
>>> The description is wrong. Whoever calls split_huge_page_to_list()
>>> must hold a folio reference.
>>>
>>> That folio reference will be transferred to @page (not the head page)
>>> once split. So @page can be used by the caller after the split
>>> succeeded.
>>>
>>
>> David and all, does this updated draft comment look accurate?
>>
>
> Note that in mm-unstable (maybe even mm-stable already), the function is
> now called split_huge_page_to_list_to_order().
Aha, yes already in mm-stable, and the comment is quite heavily modified
there, too. But it's straightforward to merge in relevant parts of what
we're trying to say here.
I can send out an actual patch after the merge window finishes.
>
>> /*
>> * This function splits a huge page into normal pages. @page can
>> point to any
>> * subpage of the huge page to split. The split operation does not
>> change the
>> * position of @page.
>> *
>> * Prerequisites:
>> *
>> * 1) The caller must hold a reference on the @page's owning folio,
>> also known as
>> * the huge page.
>> *
>> * 2) The huge page must be locked.
>> *
>> * 3) The folio must not be pinned. Pinned folios will not be split;
>> instead,
>> * the caller will receive an -EBUSY.
>
> Maybe focus on unexpected folio references:
>
> 3) The folio must not be pinned. Any unexpected folio references,
> including GUP pins, will result in the folio not getting split; instead ...
Yes, saying "unexpected" does make it clearer, agreed.
>
>> * > * After splitting, the folio's refcount is transfered to @page
> (not the head
>
> s/transfered/transferred/
>
>> * page, unless @page is actually the head page). The other subpages
>> may be
>> * freed if they are not mapped.
>
> "folio's refcount" might be a bit misleading. It's more like
>
> "The caller's folio reference will be transferred to @page, resulting in
> a raised refcount of @page after this call ... "
heh, my wording was actually wrong! Thanks for correcting it. I didn't
see where the @page picked up the folio's reference, so I thought I
must have misread your original explanation.
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: updated documentation: Splitting pinned folios
2024-03-14 17:45 ` Matthew Wilcox
@ 2024-03-14 17:57 ` John Hubbard
0 siblings, 0 replies; 11+ messages in thread
From: John Hubbard @ 2024-03-14 17:57 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: David Hildenbrand, Jane Chu, linux-mm
On 3/14/24 10:45, Matthew Wilcox wrote:
> On Wed, Mar 13, 2024 at 07:46:54PM -0700, John Hubbard wrote:
>> /*
>> * This function splits a huge page into normal pages. @page can point to any
>
> s/huge page/large folio/ and s/normal/smaller/ ?
OK. And yes, I did find that it's getting harder to figure out whether
to write "large folio" or "huge page" lately.
>
>> * subpage of the huge page to split. The split operation does not change the
>
> Trying to get away from "subpage", maybe "can point to any page within
> the folio"?
Sure, that works.
>
> I think following these suggestions throughout the text below will make
> it clearer ...
Will do. I'll post an actual patch against -rc1 with these and David's
corrections.
thanks,
--
John Hubbard
NVIDIA
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-03-14 17:58 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-13 3:16 Splitting pinned folios Matthew Wilcox
2024-03-13 9:20 ` David Hildenbrand
2024-03-13 15:27 ` Matthew Wilcox
2024-03-13 16:53 ` David Hildenbrand
2024-03-13 18:52 ` Jane Chu
2024-03-13 22:25 ` John Hubbard
2024-03-14 2:46 ` updated documentation: " John Hubbard
2024-03-14 16:22 ` David Hildenbrand
2024-03-14 17:51 ` John Hubbard
2024-03-14 17:45 ` Matthew Wilcox
2024-03-14 17:57 ` John Hubbard
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox