* [PATCH v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation @ 2024-04-18 15:18 David Hildenbrand 2024-04-19 0:15 ` John Hubbard 2024-04-22 14:21 ` Zi Yan 0 siblings, 2 replies; 5+ messages in thread From: David Hildenbrand @ 2024-04-18 15:18 UTC (permalink / raw) To: linux-kernel Cc: linux-mm, David Hildenbrand, John Hubbard, Zi Yan, Matthew Wilcox, Andrew Morton The documentation is wrong and relying on it almost resulted in BUGs in new callers: we return -EAGAIN on unexpected folio references, not -EBUSY. Let's fix that and also document which other return values we can currently see and why they could happen. Cc: John Hubbard <jhubbard@nvidia.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/huge_memory.c | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index ee12726291f1b..824eff9211db8 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2956,7 +2956,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) * * 3) The folio must not be pinned. Any unexpected folio references, including * GUP pins, will result in the folio not getting split; instead, the caller - * will receive an -EBUSY. + * will receive an -EAGAIN. * * 4) @new_order > 1, usually. Splitting to order-1 anonymous folios is not * supported for non-file-backed folios, because folio->_deferred_list, which @@ -2975,8 +2975,15 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) * * Returns 0 if the huge page was split successfully. * - * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared - * from under us. + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). + * + * Returns -EBUSY when trying to split the huge zeropage, if the folio is + * under writeback, if fs-specific folio metadata cannot currently be + * released, or if some unexpected race happened (e.g., anon VMA disappeared, + * truncation). + * + * Returns -EINVAL when trying to split to an order that is incompatible + * with the folio. Splitting to order 0 is compatible with all folios. */ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, unsigned int new_order) -- 2.44.0 ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation 2024-04-18 15:18 [PATCH v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation David Hildenbrand @ 2024-04-19 0:15 ` John Hubbard 2024-04-22 19:31 ` David Hildenbrand 2024-04-22 14:21 ` Zi Yan 1 sibling, 1 reply; 5+ messages in thread From: John Hubbard @ 2024-04-19 0:15 UTC (permalink / raw) To: David Hildenbrand, linux-kernel Cc: linux-mm, Zi Yan, Matthew Wilcox, Andrew Morton On 4/18/24 8:18 AM, David Hildenbrand wrote: > The documentation is wrong and relying on it almost resulted in BUGs > in new callers: we return -EAGAIN on unexpected folio references, not > -EBUSY. > > Let's fix that and also document which other return values we can > currently see and why they could happen. > > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/huge_memory.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index ee12726291f1b..824eff9211db8 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2956,7 +2956,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) > * > * 3) The folio must not be pinned. Any unexpected folio references, including > * GUP pins, will result in the folio not getting split; instead, the caller > - * will receive an -EBUSY. > + * will receive an -EAGAIN. > * > * 4) @new_order > 1, usually. Splitting to order-1 anonymous folios is not > * supported for non-file-backed folios, because folio->_deferred_list, which > @@ -2975,8 +2975,15 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) As an aside, the use of unconditional local_irq_disable() / local_irq_enable() calls in this routine almost makes me believe that we should have: 5) Local IRQs should be enabled. Because this routine may enable them. ...but I can't imagine a way to end up calling this with interrupts disabled, so it seems like documentation overkill. Just thought I'd mention it, though. > * > * Returns 0 if the huge page was split successfully. > * > - * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared > - * from under us. > + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). ...or if the folio was removed from the page cache before this routine got a chance to lock it, right? (See the "fail:" path.) > + * > + * Returns -EBUSY when trying to split the huge zeropage, if the folio is > + * under writeback, if fs-specific folio metadata cannot currently be > + * released, or if some unexpected race happened (e.g., anon VMA disappeared, > + * truncation). > + * > + * Returns -EINVAL when trying to split to an order that is incompatible > + * with the folio. Splitting to order 0 is compatible with all folios. > */ > int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, > unsigned int new_order) Otherwise, looks good. thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation 2024-04-19 0:15 ` John Hubbard @ 2024-04-22 19:31 ` David Hildenbrand 2024-04-22 19:36 ` John Hubbard 0 siblings, 1 reply; 5+ messages in thread From: David Hildenbrand @ 2024-04-22 19:31 UTC (permalink / raw) To: John Hubbard, linux-kernel Cc: linux-mm, Zi Yan, Matthew Wilcox, Andrew Morton On 19.04.24 02:15, John Hubbard wrote: > On 4/18/24 8:18 AM, David Hildenbrand wrote: >> The documentation is wrong and relying on it almost resulted in BUGs >> in new callers: we return -EAGAIN on unexpected folio references, not >> -EBUSY. >> >> Let's fix that and also document which other return values we can >> currently see and why they could happen. >> >> Cc: John Hubbard <jhubbard@nvidia.com> >> Cc: Zi Yan <ziy@nvidia.com> >> Cc: Matthew Wilcox <willy@infradead.org> >> Cc: Andrew Morton <akpm@linux-foundation.org> >> Signed-off-by: David Hildenbrand <david@redhat.com> >> --- >> mm/huge_memory.c | 13 ++++++++++--- >> 1 file changed, 10 insertions(+), 3 deletions(-) >> >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index ee12726291f1b..824eff9211db8 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -2956,7 +2956,7 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) >> * >> * 3) The folio must not be pinned. Any unexpected folio references, including >> * GUP pins, will result in the folio not getting split; instead, the caller >> - * will receive an -EBUSY. >> + * will receive an -EAGAIN. >> * >> * 4) @new_order > 1, usually. Splitting to order-1 anonymous folios is not >> * supported for non-file-backed folios, because folio->_deferred_list, which >> @@ -2975,8 +2975,15 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) > > As an aside, the use of unconditional local_irq_disable() / local_irq_enable() > calls in this routine almost makes me believe that we should have: > > 5) Local IRQs should be enabled. Because this routine may enable them. > > ...but I can't imagine a way to end up calling this with interrupts > disabled, so it seems like documentation overkill. Just thought I'd mention > it, though. Yes, I think there might be more issues lurking with disabled interrupts. anon_vma_lock_write() and i_mmap_lock_read() might even sleep ... so we must not be in any atomic context. that's why relevant page table walkers drop the PTL. > > >> * >> * Returns 0 if the huge page was split successfully. >> * >> - * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared >> - * from under us. >> + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). > > ...or if the folio was removed from the page cache before this routine > got a chance to lock it, right? (See the "fail:" path.) Right, that is sneaky. Let me extend to cover that case as well. diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 824eff9211db8..a7406267323ed 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2975,7 +2975,8 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) * * Returns 0 if the huge page was split successfully. * - * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP) or if + * the folio was concurrently removed from the page cache. * * Returns -EBUSY when trying to split the huge zeropage, if the folio is * under writeback, if fs-specific folio metadata cannot currently be Naive me would assume that this happens rarely ... but not an expert :) > >> + * >> + * Returns -EBUSY when trying to split the huge zeropage, if the folio is >> + * under writeback, if fs-specific folio metadata cannot currently be >> + * released, or if some unexpected race happened (e.g., anon VMA disappeared, >> + * truncation). >> + * >> + * Returns -EINVAL when trying to split to an order that is incompatible >> + * with the folio. Splitting to order 0 is compatible with all folios. >> */ >> int split_huge_page_to_list_to_order(struct page *page, struct list_head *list, >> unsigned int new_order) > > Otherwise, looks good. Thanks! -- Cheers, David / dhildenb ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation 2024-04-22 19:31 ` David Hildenbrand @ 2024-04-22 19:36 ` John Hubbard 0 siblings, 0 replies; 5+ messages in thread From: John Hubbard @ 2024-04-22 19:36 UTC (permalink / raw) To: David Hildenbrand, linux-kernel Cc: linux-mm, Zi Yan, Matthew Wilcox, Andrew Morton On 4/22/24 12:31 PM, David Hildenbrand wrote: > On 19.04.24 02:15, John Hubbard wrote: >> On 4/18/24 8:18 AM, David Hildenbrand wrote: ... >>> * >>> * Returns 0 if the huge page was split successfully. >>> * >>> - * Returns -EBUSY if @page's folio is pinned, or if the anon_vma disappeared >>> - * from under us. >>> + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). >> >> ...or if the folio was removed from the page cache before this routine >> got a chance to lock it, right? (See the "fail:" path.) > > Right, that is sneaky. Let me extend to cover that case as well. > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 824eff9211db8..a7406267323ed 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -2975,7 +2975,8 @@ bool can_split_folio(struct folio *folio, int *pextra_pins) > * > * Returns 0 if the huge page was split successfully. > * > - * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP). > + * Returns -EAGAIN if the folio has unexpected reference (e.g., GUP) or if > + * the folio was concurrently removed from the page cache. Looks good, Reviewed-by: John Hubbard <jhubbard@nvidia.com> thanks, -- John Hubbard NVIDIA ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation 2024-04-18 15:18 [PATCH v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation David Hildenbrand 2024-04-19 0:15 ` John Hubbard @ 2024-04-22 14:21 ` Zi Yan 1 sibling, 0 replies; 5+ messages in thread From: Zi Yan @ 2024-04-22 14:21 UTC (permalink / raw) To: David Hildenbrand Cc: linux-kernel, linux-mm, John Hubbard, Matthew Wilcox, Andrew Morton, Baolin Wang [-- Attachment #1: Type: text/plain, Size: 892 bytes --] On 18 Apr 2024, at 11:18, David Hildenbrand wrote: > The documentation is wrong and relying on it almost resulted in BUGs > in new callers: we return -EAGAIN on unexpected folio references, not > -EBUSY. +Baolin The code was changed at the commit fd4a7ac32918 ("mm: migrate: try again if THP split is failed due to page refcnt") without changing the comment. > > Let's fix that and also document which other return values we can > currently see and why they could happen. > > Cc: John Hubbard <jhubbard@nvidia.com> > Cc: Zi Yan <ziy@nvidia.com> > Cc: Matthew Wilcox <willy@infradead.org> > Cc: Andrew Morton <akpm@linux-foundation.org> > Signed-off-by: David Hildenbrand <david@redhat.com> > --- > mm/huge_memory.c | 13 ++++++++++--- > 1 file changed, 10 insertions(+), 3 deletions(-) The changes look good to me. Thanks. Reviewed-by: Zi Yan <ziy@nvidia.com> -- Best Regards, Yan, Zi [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 854 bytes --] ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-04-22 19:37 UTC | newest] Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2024-04-18 15:18 [PATCH v1] mm/huge_memory: improve split_huge_page_to_list_to_order() return value documentation David Hildenbrand 2024-04-19 0:15 ` John Hubbard 2024-04-22 19:31 ` David Hildenbrand 2024-04-22 19:36 ` John Hubbard 2024-04-22 14:21 ` Zi Yan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox