* [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
@ 2024-04-02 20:06 Matthew Wilcox (Oracle)
2024-04-02 21:19 ` Sidhartha Kumar
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Matthew Wilcox (Oracle) @ 2024-04-02 20:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: Matthew Wilcox (Oracle), linux-mm, Muchun Song
While this function returned a folio, it was still using __alloc_pages()
and __free_pages(). Use __folio_alloc() and put_folio() instead. This
actually removes a call to compound_head(), but more importantly, it
prepares us for the move to memdescs.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
mm/hugetlb.c | 33 ++++++++++++++++-----------------
1 file changed, 16 insertions(+), 17 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index a3bffa8debde..5f1e0b1a0d57 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
nodemask_t *node_alloc_noretry)
{
int order = huge_page_order(h);
- struct page *page;
+ struct folio *folio;
bool alloc_try_hard = true;
bool retry = true;
/*
- * By default we always try hard to allocate the page with
- * __GFP_RETRY_MAYFAIL flag. However, if we are allocating pages in
+ * By default we always try hard to allocate the folio with
+ * __GFP_RETRY_MAYFAIL flag. However, if we are allocating folios in
* a loop (to adjust global huge page counts) and previous allocation
* failed, do not continue to try hard on the same node. Use the
* node_alloc_noretry bitmap to manage this state information.
@@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
if (nid == NUMA_NO_NODE)
nid = numa_mem_id();
retry:
- page = __alloc_pages(gfp_mask, order, nid, nmask);
+ folio = __folio_alloc(gfp_mask, order, nid, nmask);
- /* Freeze head page */
- if (page && !page_ref_freeze(page, 1)) {
- __free_pages(page, order);
+ if (folio && !folio_ref_freeze(folio, 1)) {
+ folio_put(folio);
if (retry) { /* retry once */
retry = false;
goto retry;
}
/* WOW! twice in a row. */
- pr_warn("HugeTLB head page unexpected inflated ref count\n");
- page = NULL;
+ pr_warn("HugeTLB unexpected inflated folio ref count\n");
+ folio = NULL;
}
/*
- * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this
- * indicates an overall state change. Clear bit so that we resume
- * normal 'try hard' allocations.
+ * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
+ * folio this indicates an overall state change. Clear bit so
+ * that we resume normal 'try hard' allocations.
*/
- if (node_alloc_noretry && page && !alloc_try_hard)
+ if (node_alloc_noretry && folio && !alloc_try_hard)
node_clear(nid, *node_alloc_noretry);
/*
- * If we tried hard to get a page but failed, set bit so that
+ * If we tried hard to get a folio but failed, set bit so that
* subsequent attempts will not try as hard until there is an
* overall state change.
*/
- if (node_alloc_noretry && !page && alloc_try_hard)
+ if (node_alloc_noretry && !folio && alloc_try_hard)
node_set(nid, *node_alloc_noretry);
- if (!page) {
+ if (!folio) {
__count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
return NULL;
}
__count_vm_event(HTLB_BUDDY_PGALLOC);
- return page_folio(page);
+ return folio;
}
static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
--
2.43.0
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
2024-04-02 20:06 [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio Matthew Wilcox (Oracle)
@ 2024-04-02 21:19 ` Sidhartha Kumar
2024-04-03 4:55 ` Oscar Salvador
2024-04-03 6:00 ` Muchun Song
2 siblings, 0 replies; 9+ messages in thread
From: Sidhartha Kumar @ 2024-04-02 21:19 UTC (permalink / raw)
To: Matthew Wilcox (Oracle), Andrew Morton; +Cc: linux-mm, Muchun Song
On 4/2/24 1:06 PM, Matthew Wilcox (Oracle) wrote:
> While this function returned a folio, it was still using __alloc_pages()
> and __free_pages(). Use __folio_alloc() and put_folio() instead. This
> actually removes a call to compound_head(), but more importantly, it
> prepares us for the move to memdescs.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> ---
> mm/hugetlb.c | 33 ++++++++++++++++-----------------
> 1 file changed, 16 insertions(+), 17 deletions(-)
>
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index a3bffa8debde..5f1e0b1a0d57 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2177,13 +2177,13 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
> nodemask_t *node_alloc_noretry)
> {
> int order = huge_page_order(h);
> - struct page *page;
> + struct folio *folio;
> bool alloc_try_hard = true;
> bool retry = true;
>
> /*
> - * By default we always try hard to allocate the page with
> - * __GFP_RETRY_MAYFAIL flag. However, if we are allocating pages in
> + * By default we always try hard to allocate the folio with
> + * __GFP_RETRY_MAYFAIL flag. However, if we are allocating folios in
> * a loop (to adjust global huge page counts) and previous allocation
> * failed, do not continue to try hard on the same node. Use the
> * node_alloc_noretry bitmap to manage this state information.
> @@ -2196,43 +2196,42 @@ static struct folio *alloc_buddy_hugetlb_folio(struct hstate *h,
> if (nid == NUMA_NO_NODE)
> nid = numa_mem_id();
> retry:
> - page = __alloc_pages(gfp_mask, order, nid, nmask);
> + folio = __folio_alloc(gfp_mask, order, nid, nmask);
>
> - /* Freeze head page */
> - if (page && !page_ref_freeze(page, 1)) {
> - __free_pages(page, order);
> + if (folio && !folio_ref_freeze(folio, 1)) {
> + folio_put(folio);
> if (retry) { /* retry once */
> retry = false;
> goto retry;
> }
> /* WOW! twice in a row. */
> - pr_warn("HugeTLB head page unexpected inflated ref count\n");
> - page = NULL;
> + pr_warn("HugeTLB unexpected inflated folio ref count\n");
> + folio = NULL;
> }
>
> /*
> - * If we did not specify __GFP_RETRY_MAYFAIL, but still got a page this
> - * indicates an overall state change. Clear bit so that we resume
> - * normal 'try hard' allocations.
> + * If we did not specify __GFP_RETRY_MAYFAIL, but still got a
> + * folio this indicates an overall state change. Clear bit so
> + * that we resume normal 'try hard' allocations.
> */
> - if (node_alloc_noretry && page && !alloc_try_hard)
> + if (node_alloc_noretry && folio && !alloc_try_hard)
> node_clear(nid, *node_alloc_noretry);
>
> /*
> - * If we tried hard to get a page but failed, set bit so that
> + * If we tried hard to get a folio but failed, set bit so that
> * subsequent attempts will not try as hard until there is an
> * overall state change.
> */
> - if (node_alloc_noretry && !page && alloc_try_hard)
> + if (node_alloc_noretry && !folio && alloc_try_hard)
> node_set(nid, *node_alloc_noretry);
>
> - if (!page) {
> + if (!folio) {
> __count_vm_event(HTLB_BUDDY_PGALLOC_FAIL);
> return NULL;
> }
>
> __count_vm_event(HTLB_BUDDY_PGALLOC);
> - return page_folio(page);
> + return folio;
> }
>
> static struct folio *__alloc_fresh_hugetlb_folio(struct hstate *h,
Reviewed-by: Sidhartha Kumar <sidhartha.kumar@oracle.com>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
2024-04-02 20:06 [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio Matthew Wilcox (Oracle)
2024-04-02 21:19 ` Sidhartha Kumar
@ 2024-04-03 4:55 ` Oscar Salvador
2024-04-03 6:19 ` Muchun Song
2024-04-03 6:00 ` Muchun Song
2 siblings, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2024-04-03 4:55 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm, Muchun Song
On Tue, Apr 02, 2024 at 09:06:54PM +0100, Matthew Wilcox (Oracle) wrote:
> While this function returned a folio, it was still using __alloc_pages()
> and __free_pages(). Use __folio_alloc() and put_folio() instead. This
> actually removes a call to compound_head(), but more importantly, it
> prepares us for the move to memdescs.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Oscar Salvador <osalvador@suse.de>
> - page = __alloc_pages(gfp_mask, order, nid, nmask);
> + folio = __folio_alloc(gfp_mask, order, nid, nmask);
>
> - /* Freeze head page */
> - if (page && !page_ref_freeze(page, 1)) {
> - __free_pages(page, order);
> + if (folio && !folio_ref_freeze(folio, 1)) {
> + folio_put(folio);
This made me look again at the problem we had in the past with
speculative refcount vs hugetlb pages, and made me think whether there
are any more users trying to defeat speculative refcounts this way.
It was discussed some time ago that maybe all pages returned from the buddy
allocator should have its refcount frozen, to avoid this.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
2024-04-02 20:06 [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio Matthew Wilcox (Oracle)
2024-04-02 21:19 ` Sidhartha Kumar
2024-04-03 4:55 ` Oscar Salvador
@ 2024-04-03 6:00 ` Muchun Song
2 siblings, 0 replies; 9+ messages in thread
From: Muchun Song @ 2024-04-03 6:00 UTC (permalink / raw)
To: Matthew Wilcox (Oracle); +Cc: Andrew Morton, linux-mm
> On Apr 3, 2024, at 04:06, Matthew Wilcox (Oracle) <willy@infradead.org> wrote:
>
> While this function returned a folio, it was still using __alloc_pages()
> and __free_pages(). Use __folio_alloc() and put_folio() instead. This
> actually removes a call to compound_head(), but more importantly, it
> prepares us for the move to memdescs.
>
> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Muchun Song <muchun.song@linux.dev>
Thanks.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
2024-04-03 4:55 ` Oscar Salvador
@ 2024-04-03 6:19 ` Muchun Song
2024-04-03 7:25 ` Oscar Salvador
0 siblings, 1 reply; 9+ messages in thread
From: Muchun Song @ 2024-04-03 6:19 UTC (permalink / raw)
To: Oscar Salvador; +Cc: Matthew Wilcox (Oracle), Andrew Morton, linux-mm
> On Apr 3, 2024, at 12:55, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Tue, Apr 02, 2024 at 09:06:54PM +0100, Matthew Wilcox (Oracle) wrote:
>> While this function returned a folio, it was still using __alloc_pages()
>> and __free_pages(). Use __folio_alloc() and put_folio() instead. This
>> actually removes a call to compound_head(), but more importantly, it
>> prepares us for the move to memdescs.
>>
>> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
>
> Reviewed-by: Oscar Salvador <osalvador@suse.de>
>
>> - page = __alloc_pages(gfp_mask, order, nid, nmask);
>> + folio = __folio_alloc(gfp_mask, order, nid, nmask);
>>
>> - /* Freeze head page */
>> - if (page && !page_ref_freeze(page, 1)) {
>> - __free_pages(page, order);
>> + if (folio && !folio_ref_freeze(folio, 1)) {
>> + folio_put(folio);
>
> This made me look again at the problem we had in the past with
> speculative refcount vs hugetlb pages, and made me think whether there
> are any more users trying to defeat speculative refcounts this way.
> It was discussed some time ago that maybe all pages returned from the buddy
> allocator should have its refcount frozen, to avoid this.
I think you mean this patch [1], right? With alloc_frozen_pages()
introduced, we could get rid of the trick from HugeTLB code.
[1] https://lore.kernel.org/linux-mm/B4889CC0-5D36-44E2-B901-CDC5226995A2@oracle.com/T/#m20457752e757cdafc99c7f5e6a3d8cbbb65fcd3e
>
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
2024-04-03 6:19 ` Muchun Song
@ 2024-04-03 7:25 ` Oscar Salvador
2024-04-03 21:17 ` Matthew Wilcox
0 siblings, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2024-04-03 7:25 UTC (permalink / raw)
To: Muchun Song; +Cc: Matthew Wilcox (Oracle), Andrew Morton, linux-mm
On Wed, Apr 03, 2024 at 02:19:19PM +0800, Muchun Song wrote:
> I think you mean this patch [1], right? With alloc_frozen_pages()
> introduced, we could get rid of the trick from HugeTLB code.
Ah yes, that one, thanks.
It would be nice, but having read the discussion I am kind of skeptical.
But maybe some to revisit.
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
2024-04-03 7:25 ` Oscar Salvador
@ 2024-04-03 21:17 ` Matthew Wilcox
2024-04-04 11:13 ` Oscar Salvador
0 siblings, 1 reply; 9+ messages in thread
From: Matthew Wilcox @ 2024-04-03 21:17 UTC (permalink / raw)
To: Oscar Salvador; +Cc: Muchun Song, Andrew Morton, linux-mm
On Wed, Apr 03, 2024 at 09:25:41AM +0200, Oscar Salvador wrote:
> On Wed, Apr 03, 2024 at 02:19:19PM +0800, Muchun Song wrote:
> > I think you mean this patch [1], right? With alloc_frozen_pages()
> > introduced, we could get rid of the trick from HugeTLB code.
>
> Ah yes, that one, thanks.
> It would be nice, but having read the discussion I am kind of skeptical.
>
> But maybe some to revisit.
I haven't given up on it. It's just currently parked, awaiting more
cleanups, some of which I have scheduled for the next merge window.
Part of the memdesc project will involve not having refcounts for some
memdescs. Slab, percpu and pagetable don't need them, for example.
I think hugetlb is being unnecessarily paranoid here, tbh. Or maybe
this part is just badly structured; if we're allocating a hugetlb folio,
it should be fine for its refcount to be temporarily elevated by someone
else. Not sure I can figure out what's going on in
alloc_and_dissolve_hugetlb_folio() though.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
2024-04-03 21:17 ` Matthew Wilcox
@ 2024-04-04 11:13 ` Oscar Salvador
2024-04-07 7:14 ` Muchun Song
0 siblings, 1 reply; 9+ messages in thread
From: Oscar Salvador @ 2024-04-04 11:13 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Muchun Song, Andrew Morton, linux-mm
On Wed, Apr 03, 2024 at 10:17:45PM +0100, Matthew Wilcox wrote:
> I think hugetlb is being unnecessarily paranoid here, tbh. Or maybe
> this part is just badly structured; if we're allocating a hugetlb folio,
> it should be fine for its refcount to be temporarily elevated by someone
> else. Not sure I can figure out what's going on in
> alloc_and_dissolve_hugetlb_folio() though.
AFAICR, the problem comes when we need to remap the pages for vmemmap
optimization [1].
So, IIUC:
1) if someone comes around and grabs a refcount (say something doing
speculative stuff)
2) we do the remapping
3) that someone who took the refcount, now does a put_page()
4) vmemmap no longer points to the old page but the new one, meaning
that that 'put_page()' is done on the wrong page.
@Munchun: Did I get this right?
[1] https://lore.kernel.org/linux-mm/YupRjWRiz4lPo+y7@FVFYT0MHHV2J/
--
Oscar Salvador
SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio
2024-04-04 11:13 ` Oscar Salvador
@ 2024-04-07 7:14 ` Muchun Song
0 siblings, 0 replies; 9+ messages in thread
From: Muchun Song @ 2024-04-07 7:14 UTC (permalink / raw)
To: Oscar Salvador; +Cc: Matthew Wilcox, Andrew Morton, linux-mm
> On Apr 4, 2024, at 19:13, Oscar Salvador <osalvador@suse.de> wrote:
>
> On Wed, Apr 03, 2024 at 10:17:45PM +0100, Matthew Wilcox wrote:
>> I think hugetlb is being unnecessarily paranoid here, tbh. Or maybe
>> this part is just badly structured; if we're allocating a hugetlb folio,
>> it should be fine for its refcount to be temporarily elevated by someone
>> else. Not sure I can figure out what's going on in
>> alloc_and_dissolve_hugetlb_folio() though.
>
> AFAICR, the problem comes when we need to remap the pages for vmemmap
> optimization [1].
> So, IIUC:
>
> 1) if someone comes around and grabs a refcount (say something doing
> speculative stuff)
> 2) we do the remapping
> 3) that someone who took the refcount, now does a put_page()
> 4) vmemmap no longer points to the old page but the new one, meaning
> that that 'put_page()' is done on the wrong page.
>
> @Munchun: Did I get this right?
Right. But let me clarify this again. We need to keep the content of the
physical page constant throughout the processing of HVO (i.g. between
the copying of head vmemmap page and remapping of it). Zero-referenced page
could prevent others from updating the content of the page structs.
>
> [1] https://lore.kernel.org/linux-mm/YupRjWRiz4lPo+y7@FVFYT0MHHV2J/
Yes, I've also pointed it out here.
Thanks.
>
>
> --
> Oscar Salvador
> SUSE Labs
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-04-07 7:15 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-04-02 20:06 [PATCH] hugetlb: Convert alloc_buddy_hugetlb_folio to use a folio Matthew Wilcox (Oracle)
2024-04-02 21:19 ` Sidhartha Kumar
2024-04-03 4:55 ` Oscar Salvador
2024-04-03 6:19 ` Muchun Song
2024-04-03 7:25 ` Oscar Salvador
2024-04-03 21:17 ` Matthew Wilcox
2024-04-04 11:13 ` Oscar Salvador
2024-04-07 7:14 ` Muchun Song
2024-04-03 6:00 ` Muchun Song
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox