RE: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Mike Kravetz <mike.kravetz@oracle.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Cc: "x86@kernel.org" <x86@kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Linuxarm <linuxarm@huawei.com>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	Roman Gushchin <guro@fb.com>,
	Catalin Marinas <catalin.marinas@arm.com>,
	Will Deacon <will@kernel.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	H.Peter Anvin <hpa@zytor.com>,
	"Mike Rapoport" <rppt@linux.ibm.com>,
	Anshuman Khandual <anshuman.khandual@arm.com>,
	Jonathan Cameron <jonathan.cameron@huawei.com>
Subject: RE: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory
Date: Wed, 15 Jul 2020 11:14:32 +0000	[thread overview]
Message-ID: <B926444035E5E2439431908E3842AFD258764F@DGGEMI525-MBS.china.huawei.com> (raw)
In-Reply-To: <359ea1d0-b1fd-d09f-d28a-a44655834277@oracle.com>



> -----Original Message-----
> From: Mike Kravetz [mailto:mike.kravetz@oracle.com]
> Sent: Wednesday, July 15, 2020 11:21 AM
> To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>;
> akpm@linux-foundation.org
> Cc: x86@kernel.org; linux-mm@kvack.org; linux-kernel@vger.kernel.org;
> Linuxarm <linuxarm@huawei.com>; linux-arm-kernel@lists.infradead.org;
> Roman Gushchin <guro@fb.com>; Catalin Marinas
> <catalin.marinas@arm.com>; Will Deacon <will@kernel.org>; Thomas Gleixner
> <tglx@linutronix.de>; Ingo Molnar <mingo@redhat.com>; Borislav Petkov
> <bp@alien8.de>; H.Peter Anvin <hpa@zytor.com>; Mike Rapoport
> <rppt@linux.ibm.com>; Anshuman Khandual
> <anshuman.khandual@arm.com>; Jonathan Cameron
> <jonathan.cameron@huawei.com>
> Subject: Re: [PATCH v3] mm/hugetlb: split hugetlb_cma in nodes with memory
> 
> On 7/10/20 5:09 AM, Barry Song wrote:
> > Online nodes are not necessarily memory containing nodes. Splitting
> > huge_cma in online nodes can lead to inconsistent hugetlb_cma size
> > with user setting. For example, for one system with 4 numa nodes and
> > only one of them has memory, if users set hugetlb_cma to 4GB, it will
> > split into four 1GB. So only the node with memory will get 1GB CMA.
> > All other three nodes get nothing. That means the whole system gets
> > only 1GB CMA while users ask for 4GB.
> >
> > Thus, it is more sensible to split hugetlb_cma in nodes with memory.
> > For the above case, the only node with memory will reserve 4GB cma
> > which is same with user setting in bootargs. In order to split cma
> > in nodes with memory, hugetlb_cma_reserve() should scan over those
> > nodes with N_MEMORY state rather than N_ONLINE state. That means
> > the function should be called only after arch code has finished
> > setting the N_MEMORY state of nodes.
> >
> > The problem is always there if N_ONLINE != N_MEMORY. It is a general
> > problem to all platforms. But there is some trivial difference among
> > different architectures.
> > For example, for ARM64, before hugetlb_cma_reserve() is called, all
> > nodes have got N_ONLINE state. So hugetlb will get inconsistent cma
> > size when some online nodes have no memory. For x86 case, the problem
> > is hidden because X86 happens to just set N_ONLINE on the nodes with
> > memory when hugetlb_cma_reserve() is called.
> >
> > Anyway, this patch moves to scan N_MEMORY in hugetlb_cma_reserve()
> > and lets both x86 and ARM64 call the function after N_MEMORY state
> > is ready. It also documents the requirement in the definition of
> > hugetlb_cma_reserve().
> >
> > Cc: Roman Gushchin <guro@fb.com>
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Thomas Gleixner <tglx@linutronix.de>
> > Cc: Ingo Molnar <mingo@redhat.com>
> > Cc: Borislav Petkov <bp@alien8.de>
> > Cc: H. Peter Anvin <hpa@zytor.com>
> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> > Cc: Mike Rapoport <rppt@linux.ibm.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > Cc: Jonathan Cameron <jonathan.cameron@huawei.com>
> > Signed-off-by: Barry Song <song.bao.hua@hisilicon.com>
> 
> I agree we should only be concerned with N_MEMORY nodes for the CMA
> reservations.  However, this patch got me thinking:
> - Do we really have to initiate the CMA reservations from arch specific code?
> - Can we move the call to reserve CMA a little later into hugetlb arch
>   independent code?
> 
> I know the cma_declare_contiguous_nid() routine says it should be called
> from arch specific code.  However, unless I am missing something that seems
> mostly about timing.
> 
> What about a change like this on top of this patch?
> 
> From 72b5b9a623f8711ad7f79f1a8f910906245f5d07 Mon Sep 17 00:00:00
> 2001
> From: Mike Kravetz <mike.kravetz@oracle.com>
> Date: Tue, 14 Jul 2020 15:54:46 -0700
> Subject: [PATCH] hugetlb: move cma allocation call to arch independent code
> 
> Instead of calling hugetlb_cma_reserve() from arch specific code,
> call from arch independent code when a gigantic page hstate is
> created.  This is late enough in the init process that all numa
> memory information should be initialized.  And, it is early enough
> to still use early memory allocator.
> 
> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com>
> ---
>  arch/arm64/mm/init.c    | 10 ----------
>  arch/x86/kernel/setup.c |  9 ---------
>  mm/hugetlb.c            |  8 +++++++-
>  3 files changed, 7 insertions(+), 20 deletions(-)
> 
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index 79806732f4b4..ff0ff584dde9 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -427,16 +427,6 @@ void __init bootmem_init(void)
>  	sparse_init();
>  	zone_sizes_init(min, max);
> 
> -	/*
> -	 * must be done after zone_sizes_init() which calls free_area_init()
> -	 * that calls node_set_state() to initialize node_states[N_MEMORY]
> -	 * because hugetlb_cma_reserve() will scan over nodes with N_MEMORY
> -	 * state
> -	 */
> -#ifdef CONFIG_ARM64_4K_PAGES
> -	hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> -#endif
> -
>  	memblock_dump_all();
>  }
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index a1a9712090ae..111c8467fafa 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -1177,15 +1177,6 @@ void __init setup_arch(char **cmdline_p)
> 
>  	x86_init.paging.pagetable_init();
> 
> -	/*
> -	 * must be done after zone_sizes_init() which calls free_area_init()
> -	 * that calls node_set_state() to initialize node_states[N_MEMORY]
> -	 * because hugetlb_cma_reserve() will scan over nodes with N_MEMORY
> -	 * state
> -	 */
> -	if (boot_cpu_has(X86_FEATURE_GBPAGES))
> -		hugetlb_cma_reserve(PUD_SHIFT - PAGE_SHIFT);
> -
>  	kasan_init();
> 
>  	/*
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index f24acb3af741..a0007d1d12d2 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -3273,6 +3273,9 @@ void __init hugetlb_add_hstate(unsigned int order)
>  	snprintf(h->name, HSTATE_NAME_LEN, "hugepages-%lukB",
>  					huge_page_size(h)/1024);
> 
> +	if (order >= MAX_ORDER && hugetlb_cma_size)
> +		hugetlb_cma_reserve(order);

Hello, Mike,
I am not sure if it is necessarily correct as the order for gigantic pages is arch-dependent:
https://lkml.org/lkml/2020/7/1/14

> +
>  	parsed_hstate = h;
>  }
> 
> @@ -5647,7 +5650,10 @@ void __init hugetlb_cma_reserve(int order)
>  	unsigned long size, reserved, per_node;
>  	int nid;
> 
> -	cma_reserve_called = true;
> +	if (cma_reserve_called)
> +		return;
> +	else
> +		cma_reserve_called = true;
> 
>  	if (!hugetlb_cma_size)
>  		return;

Thanks
Barry

next prev parent reply	other threads:[~2020-07-15 11:14 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-10 12:09 Barry Song
2020-07-14 23:21 ` Mike Kravetz
2020-07-15  8:18   ` Will Deacon
2020-07-15 16:59     ` Mike Kravetz
2020-07-16  8:12       ` Will Deacon
2020-07-16 18:25         ` Mike Kravetz
2020-07-17  5:02           ` Anshuman Khandual
2020-07-17  8:36             ` Will Deacon
2020-07-17  9:51               ` Anshuman Khandual
2020-07-17 17:37                 ` Mike Kravetz
2020-07-20  6:14                   ` Anshuman Khandual
2020-07-17 17:02             ` Mike Kravetz
2020-07-20  6:22               ` Anshuman Khandual
2020-07-20 18:17                 ` Mike Kravetz
2020-07-27 14:37                   ` Aneesh Kumar K.V
2020-07-27 17:52                     ` Mike Kravetz
2020-07-15 11:14   ` Song Bao Hua (Barry Song) [this message]
2020-07-15 17:05     ` Mike Kravetz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=B926444035E5E2439431908E3842AFD258764F@DGGEMI525-MBS.china.huawei.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=bp@alien8.de \
    --cc=catalin.marinas@arm.com \
    --cc=guro@fb.com \
    --cc=hpa@zytor.com \
    --cc=jonathan.cameron@huawei.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxarm@huawei.com \
    --cc=mike.kravetz@oracle.com \
    --cc=mingo@redhat.com \
    --cc=rppt@linux.ibm.com \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox