Re: [PATCH 4/4] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Kefeng Wang <wangkefeng.wang@huawei.com>
To: David Hildenbrand <david@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Oscar Salvador <osalvador@suse.de>,
	Muchun Song <muchun.song@linux.dev>, Zi Yan <ziy@nvidia.com>,
	Matthew Wilcox <willy@infradead.org>
Cc: <sidhartha.kumar@oracle.com>, <jane.chu@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Brendan Jackman <jackmanb@google.com>,
	Johannes Weiner <hannes@cmpxchg.org>, <linux-mm@kvack.org>
Subject: Re: [PATCH 4/4] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio()
Date: Sat, 13 Sep 2025 12:13:36 +0800	[thread overview]
Message-ID: <39ea6d31-ec9c-4053-a875-8e86a8676a62@huawei.com> (raw)
In-Reply-To: <5ede233c-c8c6-4067-afb3-df94b4222cda@redhat.com>



On 2025/9/13 2:07, David Hildenbrand wrote:
> 
>>>>
>>>> Would be interesting trying to see how much overhead would remain when
>>>> just dealing
>>
>> Patch2 skip atomic update in split_page() with 
>> alloc_contig_frozen_pages(),
>> I could test performance about old alloc_contig_pages() with/without
>> GFP_COMP and new alloc_contig_frozen_pages() when allocate same size.
> 
> Yes, that would be very interesting.


TEST1： alloc_contig_frozen_pages（split without atomic） + GFP_KERNEL
TEST2： alloc_contig_pages（split with atomic） + GFP_KERNEL
TEST3： alloc_contig_frozen_pages（split without atomic） + 
GFP_KERNEL|__GFP_COMP
TEST4： alloc_contig_pages（split with atomic） + GFP_KERNEL|__GFP_COMP

Alloc/free 10 * 1G, the result shows as usec

        TEST1  TEST2   TEST3   TEST4
alloc
       301024  296947  286681  288125
       297099  297194  286810  289906
       297134  300445  289950  288393
				
free				
       215014  215332  16730   16613
       214730  214948  16516   16588
       215727  214945  16589   16507

The perf profile show below,

split_free_frozen_pages()'s proportion is lower than split_free_pages，
but the total time is not reduced.

TEST1 alloc:
- sysctl_test_sysctl_handler.part.0
    - 96.92% alloc_contig_frozen_pages_noprof
       - 82.69% pfn_range_valid_contig
            pfn_to_online_page
       - 13.97% alloc_contig_range_frozen_noprof
            4.39% replace_free_hugepage_folios
          - 4.34% split_free_frozen_pages
               __list_add_valid_or_report
          - 3.22% undo_isolate_page_range
             - 3.04% unset_migratetype_isolate
                - __move_freepages_block_isolate
                     2.96% __move_freepages_block
          - 0.86% __drain_all_pages
               drain_pages_zone
               free_pcppages_bulk
          - 0.59% start_isolate_page_range
             - 0.51% set_migratetype_isolate
                  __move_freepages_block_isolate

TEST2 alloc:
  - sysctl_test_sysctl_handler.part.0
    - 99.82% alloc_contig_pages_old_noprof
       - 81.59% pfn_range_valid_contig
            pfn_to_online_page
       - 18.16% alloc_contig_range_old_noprof.constprop.0
          - 8.10% split_free_pages
               4.72% __split_page
               1.38% __list_add_valid_or_report
            5.83% replace_free_hugepage_folios
          - 3.07% undo_isolate_page_range
             - 3.01% unset_migratetype_isolate
                - __move_freepages_block_isolate
                     2.88% __move_freepages_block

TEST3 alloc:
- sysctl_test_sysctl_handler.part.0
    - 99.85% alloc_contig_frozen_pages_noprof
       - 86.04% pfn_range_valid_contig
            pfn_to_online_page
       - 13.52% alloc_contig_range_frozen_noprof
            4.23% replace_free_hugepage_folios
          - 3.99% prep_new_page
               prep_compound_page
          - 3.86% undo_isolate_page_range
             - 3.43% unset_migratetype_isolate
                - __move_freepages_block_isolate
                     3.35% __move_freepages_block
            0.54% __alloc_contig_migrate_range 



> 
>>
>>>>
>>>> OTOH, maybe we can leave GFP_COMPOUND support in but make the function
>>>> more generic, not limited to folios (I suspect many users will not want
>>>> folios, except hugetlb).
>>>>
>>
>> Maybe just only add cma_alloc_frozen(), and
>> cma_alloc()/hugetlb_cma_alloc_folio()
>> is the wrapper that calls it and set page refcount, like what we did in
>> other frozen allocation.
>>
>> struct page *cma_alloc_frozen(struct cma *cma, unsigned long count,
>>                 unsigned int align, gfp_t gfp);
> 
> I think it's weird that cma_alloc_frozen() consumes gfp_t when 
> cma_alloc() doesn't. So I would wish that we can clean that up (-> 
> remove gfp if possible).
> 
> So maybe we really want a
> 
> cma_alloc_frozen
> 
> and
> 
> cma_alloc_frozen_compound
> 
> whereby the latter consumes an "order" instead of "count + align".
> 
>>
>>>> Maybe just a
>>>>
>>>> struct page * cma_alloc_compound(struct cma *cma, unsigned int order,
>>>> unsigned int align, bool no_warn);
>>>
>>> ^ no need for the align as I realized, just like
>>> cma_alloc_folio() doesn't have.
>>
>> since cma_alloc_frozen is more generic, we need keep align,
>>
>>>
>>> I do wonder why we decided to allow cma_alloc_folio() to consume gfp_t
>>> flags when we don't do the same for cma_alloc().
>>>
>>
>> cma_alloc_folio() now is called by hugetlb allocation, the gfp could be
>> htlb_alloc_mask(), with/without __GFP_THISNODE/__GFP_RETRY_MAYFAIL...
>> and this could be used in alloc_contig_frozen_pages(),(eg,
>> gfp_zone).
> 
> Take a look at alloc_contig_range_noprof() where I added
> 
>      __alloc_contig_verify_gfp_mask()
> 
> Essentially, we ignore
> 
>      GFP_ZONEMASK | __GFP_RECLAIMABLE | __GFP_WRITE | __GFP_HARDWALL
>         | __GFP_THISNODE | __GFP_MOVABLE
> 
> And we *always* set __GFP_RETRY_MAYFAIL
> 
> I'd argue that hugetlb passing in random parameters that don't really 
> make sense for contig allcoations is rather an issue and makes you 
> believe that they would have any effect.
> 
> If cma_alloc doesn't provide them a cma_alloc_frozen or cma_alloc_folio 
> shouldn't provide them.

Oh, I make a mistake, the cma_alloc_folio calls  alloc_contig_range
(not alloc_contig_pages) to  alloc page from special cma ranges, and
__GFP_THISNODE only needed by hugetlb_cma_alloc_folio(), so the gfp
flags is not useful for cma_alloc_frozen or cma_alloc_folio, let's
remove it.



> 
>>
>> For cma_alloc() unconditional use GFP_KERNEL from commit
>> 6518202970c1 "mm/cma: remove unsupported gfp_mask parameter from
>> cma_alloc()", but this is another story.
>>
>> Back to this patchset, just add a new cma_alloc_frozen() shown
>> above and directly call it in cma_alloc()/hugetlb_cma_alloc_folio(),
>> get rid of folio_alloc_frozen_gigantic() and cma_alloc_folio(), and
>> we could do more optimization in the next step.
> 
> Yeah, that's probably a good first step, but while at it I would hope we 
> could just cleanup the gfp stuff and provide a more consistent interface.
> 

So
struct page *cma_alloc_frozen(struct cma *cma, unsigned long count, 
unsigned int align, bool no_warn) in include/linux/cma.h

and
struct page *cma_alloc_frozen_compound(struct cma *cma, unsigned int 
order) in mm/internal.h since maybe only hugetlb use it.

or keep cma_alloc_folio() but remove gfp,
struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp)

Any comments?

Thanks.

     prev parent reply	other threads:[~2025-09-13  4:13 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-09-11  6:56 [PATCH 0/4] mm: hugetlb: allocate frozen gigantic folio Kefeng Wang
2025-09-11  6:56 ` [PATCH 1/4] mm: debug_vm_pgtable: add debug_vm_pgtable_free_huge_page() Kefeng Wang
2025-09-12  6:58   ` Kefeng Wang
2025-09-11  6:56 ` [PATCH 2/4] mm: page_alloc: add alloc_contig_{range_frozen,frozen_pages}() Kefeng Wang
2025-09-11  6:56 ` [PATCH 3/4] mm: cma: add __cma_release() Kefeng Wang
2025-09-11  6:56 ` [PATCH 4/4] mm: hugetlb: allocate frozen pages in alloc_gigantic_folio() Kefeng Wang
2025-09-11  8:25   ` David Hildenbrand
2025-09-11  9:11     ` Kefeng Wang
2025-09-11 18:56       ` David Hildenbrand
2025-09-12  6:57         ` Kefeng Wang
2025-09-12  7:18           ` David Hildenbrand
2025-09-12  7:23             ` David Hildenbrand
2025-09-12  9:12               ` Kefeng Wang
2025-09-12 18:07                 ` David Hildenbrand
2025-09-13  4:13                   ` Kefeng Wang [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=39ea6d31-ec9c-4053-a875-8e86a8676a62@huawei.com \
    --to=wangkefeng.wang@huawei.com \
    --cc=akpm@linux-foundation.org \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jackmanb@google.com \
    --cc=jane.chu@oracle.com \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=osalvador@suse.de \
    --cc=sidhartha.kumar@oracle.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox