Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Usama Arif <usama.arif@bytedance.com>
To: Mike Rapoport <rppt@kernel.org>
Cc: linux-mm@kvack.org, muchun.song@linux.dev,
	mike.kravetz@oracle.com, linux-kernel@vger.kernel.org,
	fam.zheng@bytedance.com, liangma@liangbit.com,
	simon.evans@bytedance.com, punit.agrawal@bytedance.com
Subject: Re: [External] Re: [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region
Date: Wed, 26 Jul 2023 16:02:21 +0100	[thread overview]
Message-ID: <440d4a0e-c1ea-864b-54cb-aab74858319a@bytedance.com> (raw)
In-Reply-To: <20230726110113.GT1901145@kernel.org>



On 26/07/2023 12:01, Mike Rapoport wrote:
> On Mon, Jul 24, 2023 at 02:46:42PM +0100, Usama Arif wrote:
>> This propagates the hugepage size from the memblock APIs
>> (memblock_alloc_try_nid_raw and memblock_alloc_range_nid)
>> so that it can be stored in struct memblock region. This does not
>> introduce any functional change and hugepage_size is not used in
>> this commit. It is just a setup for the next commit where huge_pagesize
>> is used to skip initialization of struct pages that will be freed later
>> when HVO is enabled.
>>
>> Signed-off-by: Usama Arif <usama.arif@bytedance.com>
>> ---
>>   arch/arm64/mm/kasan_init.c                   |  2 +-
>>   arch/powerpc/platforms/pasemi/iommu.c        |  2 +-
>>   arch/powerpc/platforms/pseries/setup.c       |  4 +-
>>   arch/powerpc/sysdev/dart_iommu.c             |  2 +-
>>   include/linux/memblock.h                     |  8 ++-
>>   mm/cma.c                                     |  4 +-
>>   mm/hugetlb.c                                 |  6 +-
>>   mm/memblock.c                                | 60 ++++++++++++--------
>>   mm/mm_init.c                                 |  2 +-
>>   mm/sparse-vmemmap.c                          |  2 +-
>>   tools/testing/memblock/tests/alloc_nid_api.c |  2 +-
>>   11 files changed, 56 insertions(+), 38 deletions(-)
>>
> 
> [ snip ]
> 
>> diff --git a/include/linux/memblock.h b/include/linux/memblock.h
>> index f71ff9f0ec81..bb8019540d73 100644
>> --- a/include/linux/memblock.h
>> +++ b/include/linux/memblock.h
>> @@ -63,6 +63,7 @@ struct memblock_region {
>>   #ifdef CONFIG_NUMA
>>   	int nid;
>>   #endif
>> +	phys_addr_t hugepage_size;
>>   };
>>   
>>   /**
>> @@ -400,7 +401,8 @@ phys_addr_t memblock_phys_alloc_range(phys_addr_t size, phys_addr_t align,
>>   				      phys_addr_t start, phys_addr_t end);
>>   phys_addr_t memblock_alloc_range_nid(phys_addr_t size,
>>   				      phys_addr_t align, phys_addr_t start,
>> -				      phys_addr_t end, int nid, bool exact_nid);
>> +				      phys_addr_t end, int nid, bool exact_nid,
>> +				      phys_addr_t hugepage_size);
> 
> Rather than adding yet another parameter to memblock_phys_alloc_range() we
> can have an API that sets a flag on the reserved regions.
> With this the hugetlb reservation code can set a flag when HVO is
> enabled and memmap_init_reserved_pages() will skip regions with this flag
> set.
> 

Hi,

Thanks for the review.

I think you meant memblock_alloc_range_nid/memblock_alloc_try_nid_raw 
and not memblock_phys_alloc_range?

My initial approach was to use flags, but I think it looks worse than 
what I have done in this RFC (I have pushed the flags prototype at 
https://github.com/uarif1/linux/commits/flags_skip_prep_init_gigantic_HVO, 
top 4 commits for reference (the main difference is patch 2 and 4 from 
RFC)). The major points are (the bigger issue is in patch 4):

- (RFC vs flags patch 2 comparison) In the RFC, hugepage_size is 
propagated from memblock_alloc_try_nid_raw through function calls. When 
using flags, the "no_init" boolean is propogated from 
memblock_alloc_try_nid_raw through function calls until the region flags 
are available in memblock_add_range and the new MEMBLOCK_NOINIT flag is 
set. I think its a bit more tricky to introduce a new function to set 
the flag in the region AFTER the call to memblock_alloc_try_nid_raw has 
finished as the memblock_region can not be found.
So something (hugepage_size/flag information) still has to be propagated 
through function calls and a new argument needs to be added.

- (RFC vs flags patch 4 comparison) We can't skip initialization of the 
whole region, only the tail pages. We still need to initialize the 
HUGETLB_VMEMMAP_RESERVE_SIZE (PAGE_SIZE) struct pages for each gigantic 
page.
In the RFC, hugepage_size from patch 2 was used in the for loop in 
memmap_init_reserved_pages in patch 4 to reserve 
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages for every hugepage_size. This 
looks very simple and not hacky.
If we use a flag, there are 2 ways to initialize the 
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage:

1. (implemented in github link patch 4) memmap_init_reserved_pages skips 
the region for initialization as you suggested, and then we initialize 
HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per hugepage somewhere later 
(I did it in gather_bootmem_prealloc). When calling 
reserve_bootmem_region in gather_bootmem_prealloc, we need to skip 
early_page_uninitialised and this makes it look a bit hacky.

2. We initialize the HUGETLB_VMEMMAP_RESERVE_SIZE struct pages per 
hugepage in memmap_init_reserved_pages itself. As we have used a flag 
and havent passed hugepage_size, we need to get the gigantic page size 
somehow. There doesnt seem to be a nice way to determine the gigantic 
page size in that function which is architecture dependent. I think 
gigantic page size can be given by PAGE_SIZE << (PUD_SHIFT - 
PAGE_SHIFT), but not sure if this is ok for all architectures? If we can 
use PAGE_SIZE << (PUD_SHIFT - PAGE_SHIFT) it will look much better than 
point 1.

Both the RFC patches and the github flags implementation work, but I 
think RFC patches look much cleaner. If there is a strong preference for 
the the github patches I can send it to mailing list?

Thanks,
Usama


>>   phys_addr_t memblock_phys_alloc_try_nid(phys_addr_t size, phys_addr_t align, int nid);
>>   
>>   static __always_inline phys_addr_t memblock_phys_alloc(phys_addr_t size,
>> @@ -415,7 +417,7 @@ void *memblock_alloc_exact_nid_raw(phys_addr_t size, phys_addr_t align,
>>   				 int nid);
>>   void *memblock_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
>>   				 phys_addr_t min_addr, phys_addr_t max_addr,
>> -				 int nid);
>> +				 int nid, phys_addr_t hugepage_size);
>>   void *memblock_alloc_try_nid(phys_addr_t size, phys_addr_t align,
>>   			     phys_addr_t min_addr, phys_addr_t max_addr,
>>   			     int nid);
>> @@ -431,7 +433,7 @@ static inline void *memblock_alloc_raw(phys_addr_t size,
>>   {
>>   	return memblock_alloc_try_nid_raw(size, align, MEMBLOCK_LOW_LIMIT,
>>   					  MEMBLOCK_ALLOC_ACCESSIBLE,
>> -					  NUMA_NO_NODE);
>> +					  NUMA_NO_NODE, 0);
>>   }
>>   
>>   static inline void *memblock_alloc_from(phys_addr_t size,
>

next prev parent reply	other threads:[~2023-07-26 15:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-07-24 13:46 [RFC 0/4] mm/memblock: Skip prep and initialization of struct pages freed later by HVO Usama Arif
2023-07-24 13:46 ` [RFC 1/4] mm/hugetlb: Skip prep of tail pages when HVO is enabled Usama Arif
2023-07-24 13:46 ` [RFC 2/4] mm/memblock: Add hugepage_size member to struct memblock_region Usama Arif
2023-07-26 11:01   ` Mike Rapoport
2023-07-26 15:02     ` Usama Arif [this message]
2023-07-27  4:30       ` [External] " Mike Rapoport
2023-07-27 20:56         ` Usama Arif
2023-07-24 13:46 ` [RFC 3/4] mm/hugetlb_vmemmap: Use nid of the head page to reallocate it Usama Arif
2023-07-24 13:46 ` [RFC 4/4] mm/memblock: Skip initialization of struct pages freed later by HVO Usama Arif
2023-07-26 10:34 ` [RFC 0/4] mm/memblock: Skip prep and " Usama Arif

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=440d4a0e-c1ea-864b-54cb-aab74858319a@bytedance.com \
    --to=usama.arif@bytedance.com \
    --cc=fam.zheng@bytedance.com \
    --cc=liangma@liangbit.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mike.kravetz@oracle.com \
    --cc=muchun.song@linux.dev \
    --cc=punit.agrawal@bytedance.com \
    --cc=rppt@kernel.org \
    --cc=simon.evans@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox