[PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
@ 2023-09-05  3:13 Yuan Can
  2023-09-05  3:13 ` [PATCH 2/2] mm: hugetlb_vmemmap: allow alloc_vmemmap_page_list() ignore watermarks Yuan Can
  2023-09-05  9:06 ` [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Muchun Song
  0 siblings, 2 replies; 10+ messages in thread
From: Yuan Can @ 2023-09-05  3:13 UTC (permalink / raw)
  To: mike.kravetz, muchun.song, akpm, linux-mm; +Cc: wangkefeng.wang, yuancan

The decreasing of hugetlb pages number failed with the following message
given:

 sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
 CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
 Hardware name: linux,dummy-virt (DT)
 Call trace:
  dump_backtrace.part.6+0x84/0xe4
  show_stack+0x18/0x24
  dump_stack_lvl+0x48/0x60
  dump_stack+0x18/0x24
  warn_alloc+0x100/0x1bc
  __alloc_pages_slowpath.constprop.107+0xa40/0xad8
  __alloc_pages+0x244/0x2d0
  hugetlb_vmemmap_restore+0x104/0x1e4
  __update_and_free_hugetlb_folio+0x44/0x1f4
  update_and_free_hugetlb_folio+0x20/0x68
  update_and_free_pages_bulk+0x4c/0xac
  set_max_huge_pages+0x198/0x334
  nr_hugepages_store_common+0x118/0x178
  nr_hugepages_store+0x18/0x24
  kobj_attr_store+0x18/0x2c
  sysfs_kf_write+0x40/0x54
  kernfs_fop_write_iter+0x164/0x1dc
  vfs_write+0x3a8/0x460
  ksys_write+0x6c/0x100
  __arm64_sys_write+0x1c/0x28
  invoke_syscall+0x44/0x100
  el0_svc_common.constprop.1+0x6c/0xe4
  do_el0_svc+0x38/0x94
  el0_svc+0x28/0x74
  el0t_64_sync_handler+0xa0/0xc4
  el0t_64_sync+0x174/0x178
 Mem-Info:
  ...

The reason is that the hugetlb pages being released are allocated from
movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
need to be allocated from the same node during the hugetlb pages
releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
node is always failed. Fix this problem by removing __GFP_THISNODE.

Signed-off-by: Yuan Can <yuancan@huawei.com>
---
 mm/hugetlb_vmemmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index c2007ef5e9b0..0485e471d224 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
 static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
 				   struct list_head *list)
 {
-	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
+	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
 	int nid = page_to_nid((struct page *)start);
 	struct page *page, *next;
-- 
2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 2/2] mm: hugetlb_vmemmap: allow alloc_vmemmap_page_list() ignore watermarks
  2023-09-05  3:13 [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Yuan Can
@ 2023-09-05  3:13 ` Yuan Can
  2023-09-05  6:59   ` Muchun Song
  2023-09-05  9:06 ` [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Muchun Song
  1 sibling, 1 reply; 10+ messages in thread
From: Yuan Can @ 2023-09-05  3:13 UTC (permalink / raw)
  To: mike.kravetz, muchun.song, akpm, linux-mm; +Cc: wangkefeng.wang, yuancan

The alloc_vmemmap_page_list() is called when hugetlb get freed, more memory
will be returned to buddy after it succeed, thus work with __GFP_MEMALLOC
to allow it ignore watermarks.

Signed-off-by: Yuan Can <yuancan@huawei.com>
---
 mm/hugetlb_vmemmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 0485e471d224..dc0b9247a1f9 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
 static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
 				   struct list_head *list)
 {
-	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
+	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_MEMALLOC;
 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
 	int nid = page_to_nid((struct page *)start);
 	struct page *page, *next;
-- 
2.17.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 2/2] mm: hugetlb_vmemmap: allow alloc_vmemmap_page_list() ignore watermarks
  2023-09-05  3:13 ` [PATCH 2/2] mm: hugetlb_vmemmap: allow alloc_vmemmap_page_list() ignore watermarks Yuan Can
@ 2023-09-05  6:59   ` Muchun Song
  0 siblings, 0 replies; 10+ messages in thread
From: Muchun Song @ 2023-09-05  6:59 UTC (permalink / raw)
  To: Yuan Can; +Cc: Mike Kravetz, Andrew Morton, Linux-MM, Kefeng Wang

> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
> 
> The alloc_vmemmap_page_list() is called when hugetlb get freed, more memory
> will be returned to buddy after it succeed, thus work with __GFP_MEMALLOC
> to allow it ignore watermarks.

From the kernel document about __GFP_MEMALLOC, it says:

* %__GFP_MEMALLOC allows access to all memory. This should only be used when
* the caller guarantees the allocation will allow more memory to be freed
* very shortly e.g. process exiting or swapping. Users either should
* be the MM or co-ordinating closely with the VM (e.g. swap over NFS).
* Users of this flag have to be extremely careful to not deplete the reserve

I think we may deplete the reserve memory if a 1GB page is freed. It'll
be even worse if recent patchset[1] is merged, because the vmemmap pages
will be freed batched meaning those memory will not be freed in a very
short time (the cover letter has some numbers). So NACK.

* completely and implement a throttling mechanism which controls the
* consumption of the reserve based on the amount of freed memory.
* Usage of a pre-allocated pool (e.g. mempool) should be always considered
* before using this flag.

[1] https://lore.kernel.org/linux-mm/20230825190436.55045-1-mike.kravetz@oracle.com/

> 
> Signed-off-by: Yuan Can <yuancan@huawei.com>
> ---
> mm/hugetlb_vmemmap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index 0485e471d224..dc0b9247a1f9 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> 				   struct list_head *list)
> {
> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_MEMALLOC;
> 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
> 	int nid = page_to_nid((struct page *)start);
> 	struct page *page, *next;
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
  2023-09-05  3:13 [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Yuan Can
  2023-09-05  3:13 ` [PATCH 2/2] mm: hugetlb_vmemmap: allow alloc_vmemmap_page_list() ignore watermarks Yuan Can
@ 2023-09-05  9:06 ` Muchun Song
  2023-09-05 10:43   ` Kefeng Wang
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Muchun Song @ 2023-09-05  9:06 UTC (permalink / raw)
  To: Yuan Can; +Cc: Mike Kravetz, Andrew Morton, Linux-MM, wangkefeng.wang



> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
> 
> The decreasing of hugetlb pages number failed with the following message
> given:
> 
> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
> Hardware name: linux,dummy-virt (DT)
> Call trace:
>  dump_backtrace.part.6+0x84/0xe4
>  show_stack+0x18/0x24
>  dump_stack_lvl+0x48/0x60
>  dump_stack+0x18/0x24
>  warn_alloc+0x100/0x1bc
>  __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>  __alloc_pages+0x244/0x2d0
>  hugetlb_vmemmap_restore+0x104/0x1e4
>  __update_and_free_hugetlb_folio+0x44/0x1f4
>  update_and_free_hugetlb_folio+0x20/0x68
>  update_and_free_pages_bulk+0x4c/0xac
>  set_max_huge_pages+0x198/0x334
>  nr_hugepages_store_common+0x118/0x178
>  nr_hugepages_store+0x18/0x24
>  kobj_attr_store+0x18/0x2c
>  sysfs_kf_write+0x40/0x54
>  kernfs_fop_write_iter+0x164/0x1dc
>  vfs_write+0x3a8/0x460
>  ksys_write+0x6c/0x100
>  __arm64_sys_write+0x1c/0x28
>  invoke_syscall+0x44/0x100
>  el0_svc_common.constprop.1+0x6c/0xe4
>  do_el0_svc+0x38/0x94
>  el0_svc+0x28/0x74
>  el0t_64_sync_handler+0xa0/0xc4
>  el0t_64_sync+0x174/0x178
> Mem-Info:
>  ...
> 
> The reason is that the hugetlb pages being released are allocated from
> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
> need to be allocated from the same node during the hugetlb pages

Thanks for your fix, I think it should be a real word issue, it's better
to add a Fixes tag to indicate backporting. Thanks.

> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
> node is always failed. Fix this problem by removing __GFP_THISNODE.
> 
> Signed-off-by: Yuan Can <yuancan@huawei.com>
> ---
> mm/hugetlb_vmemmap.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> index c2007ef5e9b0..0485e471d224 100644
> --- a/mm/hugetlb_vmemmap.c
> +++ b/mm/hugetlb_vmemmap.c
> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>   				   struct list_head *list)
> {
> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;

There is a little change for non-movable case after this change, we fist try
to allocate memory from the preferred node (it is same as original), if it
fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
huge pages could be freed once other nodes could satisfy the allocation of
vmemmap pages.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.

> 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
> 	int nid = page_to_nid((struct page *)start);
> 	struct page *page, *next;
> -- 
> 2.17.1
> 
> 



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
  2023-09-05  9:06 ` [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Muchun Song
@ 2023-09-05 10:43   ` Kefeng Wang
  2023-09-05 12:41   ` Yuan Can
  2023-09-06  0:28   ` Mike Kravetz
  2 siblings, 0 replies; 10+ messages in thread
From: Kefeng Wang @ 2023-09-05 10:43 UTC (permalink / raw)
  To: Muchun Song, Yuan Can; +Cc: Mike Kravetz, Andrew Morton, Linux-MM



On 2023/9/5 17:06, Muchun Song wrote:
> 
> 
>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
>>
>> The decreasing of hugetlb pages number failed with the following message
>> given:
>>
>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>>   dump_backtrace.part.6+0x84/0xe4
>>   show_stack+0x18/0x24
>>   dump_stack_lvl+0x48/0x60
>>   dump_stack+0x18/0x24
>>   warn_alloc+0x100/0x1bc
>>   __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>>   __alloc_pages+0x244/0x2d0
>>   hugetlb_vmemmap_restore+0x104/0x1e4
>>   __update_and_free_hugetlb_folio+0x44/0x1f4
>>   update_and_free_hugetlb_folio+0x20/0x68
>>   update_and_free_pages_bulk+0x4c/0xac
>>   set_max_huge_pages+0x198/0x334
>>   nr_hugepages_store_common+0x118/0x178
>>   nr_hugepages_store+0x18/0x24
>>   kobj_attr_store+0x18/0x2c
>>   sysfs_kf_write+0x40/0x54
>>   kernfs_fop_write_iter+0x164/0x1dc
>>   vfs_write+0x3a8/0x460
>>   ksys_write+0x6c/0x100
>>   __arm64_sys_write+0x1c/0x28
>>   invoke_syscall+0x44/0x100
>>   el0_svc_common.constprop.1+0x6c/0xe4
>>   do_el0_svc+0x38/0x94
>>   el0_svc+0x28/0x74
>>   el0t_64_sync_handler+0xa0/0xc4
>>   el0t_64_sync+0x174/0x178
>> Mem-Info:
>>   ...
>>
>> The reason is that the hugetlb pages being released are allocated from
>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
>> need to be allocated from the same node during the hugetlb pages
> 
> Thanks for your fix, I think it should be a real word issue, it's better
> to add a Fixes tag to indicate backporting. Thanks.
> 
>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
>> node is always failed. Fix this problem by removing __GFP_THISNODE.

Should be ad2fa3717b74 ("mm: hugetlb: alloc the vmemmap pages associated 
with each HugeTLB page")

>>
>> Signed-off-by: Yuan Can <yuancan@huawei.com>
>> ---
>> mm/hugetlb_vmemmap.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index c2007ef5e9b0..0485e471d224 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>>    				   struct list_head *list)
>> {
>> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
> 
> There is a little change for non-movable case after this change, we fist try
> to allocate memory from the preferred node (it is same as original), if it
> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
> huge pages could be freed once other nodes could satisfy the allocation of
> vmemmap pages.
> 
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> 
> Thanks.
> 
>> 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
>> 	int nid = page_to_nid((struct page *)start);
>> 	struct page *page, *next;
>> -- 
>> 2.17.1
>>
>>
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
  2023-09-05  9:06 ` [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Muchun Song
  2023-09-05 10:43   ` Kefeng Wang
@ 2023-09-05 12:41   ` Yuan Can
  2023-09-06  0:28   ` Mike Kravetz
  2 siblings, 0 replies; 10+ messages in thread
From: Yuan Can @ 2023-09-05 12:41 UTC (permalink / raw)
  To: Muchun Song; +Cc: Mike Kravetz, Andrew Morton, Linux-MM, wangkefeng.wang


在 2023/9/5 17:06, Muchun Song 写道:
>
>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
>>
>> The decreasing of hugetlb pages number failed with the following message
>> given:
>>
>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
>> Hardware name: linux,dummy-virt (DT)
>> Call trace:
>>   dump_backtrace.part.6+0x84/0xe4
>>   show_stack+0x18/0x24
>>   dump_stack_lvl+0x48/0x60
>>   dump_stack+0x18/0x24
>>   warn_alloc+0x100/0x1bc
>>   __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>>   __alloc_pages+0x244/0x2d0
>>   hugetlb_vmemmap_restore+0x104/0x1e4
>>   __update_and_free_hugetlb_folio+0x44/0x1f4
>>   update_and_free_hugetlb_folio+0x20/0x68
>>   update_and_free_pages_bulk+0x4c/0xac
>>   set_max_huge_pages+0x198/0x334
>>   nr_hugepages_store_common+0x118/0x178
>>   nr_hugepages_store+0x18/0x24
>>   kobj_attr_store+0x18/0x2c
>>   sysfs_kf_write+0x40/0x54
>>   kernfs_fop_write_iter+0x164/0x1dc
>>   vfs_write+0x3a8/0x460
>>   ksys_write+0x6c/0x100
>>   __arm64_sys_write+0x1c/0x28
>>   invoke_syscall+0x44/0x100
>>   el0_svc_common.constprop.1+0x6c/0xe4
>>   do_el0_svc+0x38/0x94
>>   el0_svc+0x28/0x74
>>   el0t_64_sync_handler+0xa0/0xc4
>>   el0t_64_sync+0x174/0x178
>> Mem-Info:
>>   ...
>>
>> The reason is that the hugetlb pages being released are allocated from
>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
>> need to be allocated from the same node during the hugetlb pages
> Thanks for your fix, I think it should be a real word issue, it's better
> to add a Fixes tag to indicate backporting. Thanks.
>
>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
>> node is always failed. Fix this problem by removing __GFP_THISNODE.
>>
>> Signed-off-by: Yuan Can <yuancan@huawei.com>
>> ---
>> mm/hugetlb_vmemmap.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>> index c2007ef5e9b0..0485e471d224 100644
>> --- a/mm/hugetlb_vmemmap.c
>> +++ b/mm/hugetlb_vmemmap.c
>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>>    				   struct list_head *list)
>> {
>> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
> There is a little change for non-movable case after this change, we fist try
> to allocate memory from the preferred node (it is same as original), if it
> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
> huge pages could be freed once other nodes could satisfy the allocation of
> vmemmap pages.
>
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
>
> Thanks.
Thanks for the review, I will send the v2 patch with Fixes tag and your 
Reviewed-by soon.

-- 
Best regards,
Yuan Can



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
  2023-09-05  9:06 ` [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Muchun Song
  2023-09-05 10:43   ` Kefeng Wang
  2023-09-05 12:41   ` Yuan Can
@ 2023-09-06  0:28   ` Mike Kravetz
  2023-09-06  2:32     ` Muchun Song
  2023-09-06  7:25     ` David Hildenbrand
  2 siblings, 2 replies; 10+ messages in thread
From: Mike Kravetz @ 2023-09-06  0:28 UTC (permalink / raw)
  To: Muchun Song
  Cc: Yuan Can, Andrew Morton, Linux-MM, wangkefeng.wang,
	David Hildenbrand, Michal Hocko

On 09/05/23 17:06, Muchun Song wrote:
> 
> 
> > On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
> > 
> > The decreasing of hugetlb pages number failed with the following message
> > given:
> > 
> > sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
> > CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
> > Hardware name: linux,dummy-virt (DT)
> > Call trace:
> >  dump_backtrace.part.6+0x84/0xe4
> >  show_stack+0x18/0x24
> >  dump_stack_lvl+0x48/0x60
> >  dump_stack+0x18/0x24
> >  warn_alloc+0x100/0x1bc
> >  __alloc_pages_slowpath.constprop.107+0xa40/0xad8
> >  __alloc_pages+0x244/0x2d0
> >  hugetlb_vmemmap_restore+0x104/0x1e4
> >  __update_and_free_hugetlb_folio+0x44/0x1f4
> >  update_and_free_hugetlb_folio+0x20/0x68
> >  update_and_free_pages_bulk+0x4c/0xac
> >  set_max_huge_pages+0x198/0x334
> >  nr_hugepages_store_common+0x118/0x178
> >  nr_hugepages_store+0x18/0x24
> >  kobj_attr_store+0x18/0x2c
> >  sysfs_kf_write+0x40/0x54
> >  kernfs_fop_write_iter+0x164/0x1dc
> >  vfs_write+0x3a8/0x460
> >  ksys_write+0x6c/0x100
> >  __arm64_sys_write+0x1c/0x28
> >  invoke_syscall+0x44/0x100
> >  el0_svc_common.constprop.1+0x6c/0xe4
> >  do_el0_svc+0x38/0x94
> >  el0_svc+0x28/0x74
> >  el0t_64_sync_handler+0xa0/0xc4
> >  el0t_64_sync+0x174/0x178
> > Mem-Info:
> >  ...
> > 
> > The reason is that the hugetlb pages being released are allocated from
> > movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
> > need to be allocated from the same node during the hugetlb pages
> 
> Thanks for your fix, I think it should be a real word issue, it's better
> to add a Fixes tag to indicate backporting. Thanks.
> 

I thought we might get get the same error (Unable to allocate on movable
node) when creating the hugetlb page.  Why?  Because we replace the head
vmemmap page.  However, I see that failure to allocate there is not a
fatal error and we fallback to the currently mapped page.  We also pass
__GFP_NOWARN to that allocation attempt so there will be no report of the
failure.

We might want to change this as well?

> > releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
> > node is always failed. Fix this problem by removing __GFP_THISNODE.
> > 
> > Signed-off-by: Yuan Can <yuancan@huawei.com>
> > ---
> > mm/hugetlb_vmemmap.c | 2 +-
> > 1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
> > index c2007ef5e9b0..0485e471d224 100644
> > --- a/mm/hugetlb_vmemmap.c
> > +++ b/mm/hugetlb_vmemmap.c
> > @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
> > static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
> >   				   struct list_head *list)
> > {
> > - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
> > + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
> 
> There is a little change for non-movable case after this change, we fist try
> to allocate memory from the preferred node (it is same as original), if it
> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
> huge pages could be freed once other nodes could satisfy the allocation of
> vmemmap pages.
> 
> Reviewed-by: Muchun Song <songmuchun@bytedance.com>

This looks reasonable to me as well.

Cc'ing David and Michal as they are expert in hotplug.
-- 
Mike Kravetz

> 
> Thanks.
> 
> > 	unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
> > 	int nid = page_to_nid((struct page *)start);
> > 	struct page *page, *next;
> > -- 
> > 2.17.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
  2023-09-06  0:28   ` Mike Kravetz
@ 2023-09-06  2:32     ` Muchun Song
  2023-09-06  2:59       ` Yuan Can
  2023-09-06  7:25     ` David Hildenbrand
  1 sibling, 1 reply; 10+ messages in thread
From: Muchun Song @ 2023-09-06  2:32 UTC (permalink / raw)
  To: Mike Kravetz
  Cc: Yuan Can, Andrew Morton, Linux-MM, Kefeng Wang,
	David Hildenbrand, Michal Hocko



> On Sep 6, 2023, at 08:28, Mike Kravetz <mike.kravetz@oracle.com> wrote:
> 
> On 09/05/23 17:06, Muchun Song wrote:
>> 
>> 
>>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
>>> 
>>> The decreasing of hugetlb pages number failed with the following message
>>> given:
>>> 
>>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
>>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
>>> Hardware name: linux,dummy-virt (DT)
>>> Call trace:
>>> dump_backtrace.part.6+0x84/0xe4
>>> show_stack+0x18/0x24
>>> dump_stack_lvl+0x48/0x60
>>> dump_stack+0x18/0x24
>>> warn_alloc+0x100/0x1bc
>>> __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>>> __alloc_pages+0x244/0x2d0
>>> hugetlb_vmemmap_restore+0x104/0x1e4
>>> __update_and_free_hugetlb_folio+0x44/0x1f4
>>> update_and_free_hugetlb_folio+0x20/0x68
>>> update_and_free_pages_bulk+0x4c/0xac
>>> set_max_huge_pages+0x198/0x334
>>> nr_hugepages_store_common+0x118/0x178
>>> nr_hugepages_store+0x18/0x24
>>> kobj_attr_store+0x18/0x2c
>>> sysfs_kf_write+0x40/0x54
>>> kernfs_fop_write_iter+0x164/0x1dc
>>> vfs_write+0x3a8/0x460
>>> ksys_write+0x6c/0x100
>>> __arm64_sys_write+0x1c/0x28
>>> invoke_syscall+0x44/0x100
>>> el0_svc_common.constprop.1+0x6c/0xe4
>>> do_el0_svc+0x38/0x94
>>> el0_svc+0x28/0x74
>>> el0t_64_sync_handler+0xa0/0xc4
>>> el0t_64_sync+0x174/0x178
>>> Mem-Info:
>>> ...
>>> 
>>> The reason is that the hugetlb pages being released are allocated from
>>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
>>> need to be allocated from the same node during the hugetlb pages
>> 
>> Thanks for your fix, I think it should be a real word issue, it's better
>> to add a Fixes tag to indicate backporting. Thanks.
>> 
> 
> I thought we might get get the same error (Unable to allocate on movable
> node) when creating the hugetlb page.  Why?  Because we replace the head
> vmemmap page.  However, I see that failure to allocate there is not a
> fatal error and we fallback to the currently mapped page.  We also pass
> __GFP_NOWARN to that allocation attempt so there will be no report of the
> failure.
> 
> We might want to change this as well?

I think yes. I also thought about this yesterday, but I think
this one is not a fetal error, it should be an improvement patch.
So it is better not to fold this change into this patch (a bug fix one).

Thanks.

> 
>>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
>>> node is always failed. Fix this problem by removing __GFP_THISNODE.
>>> 
>>> Signed-off-by: Yuan Can <yuancan@huawei.com>
>>> ---
>>> mm/hugetlb_vmemmap.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index c2007ef5e9b0..0485e471d224 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>>>      struct list_head *list)
>>> {
>>> -  gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>>> +  gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
>> 
>> There is a little change for non-movable case after this change, we fist try
>> to allocate memory from the preferred node (it is same as original), if it
>> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
>> huge pages could be freed once other nodes could satisfy the allocation of
>> vmemmap pages.
>> 
>> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> 
> This looks reasonable to me as well.
> 
> Cc'ing David and Michal as they are expert in hotplug.
> -- 
> Mike Kravetz
> 
>> 
>> Thanks.
>> 
>>> unsigned long nr_pages = (end - start) >> PAGE_SHIFT;
>>> int nid = page_to_nid((struct page *)start);
>>> struct page *page, *next;
>>> -- 
>>> 2.17.1




^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
  2023-09-06  2:32     ` Muchun Song
@ 2023-09-06  2:59       ` Yuan Can
  0 siblings, 0 replies; 10+ messages in thread
From: Yuan Can @ 2023-09-06  2:59 UTC (permalink / raw)
  To: Muchun Song, Mike Kravetz
  Cc: Andrew Morton, Linux-MM, Kefeng Wang, David Hildenbrand, Michal Hocko


在 2023/9/6 10:32, Muchun Song 写道:
>
>> On Sep 6, 2023, at 08:28, Mike Kravetz <mike.kravetz@oracle.com> wrote:
>>
>> On 09/05/23 17:06, Muchun Song wrote:
>>>
>>>> On Sep 5, 2023, at 11:13, Yuan Can <yuancan@huawei.com> wrote:
>>>>
>>>> The decreasing of hugetlb pages number failed with the following message
>>>> given:
>>>>
>>>> sh: page allocation failure: order:0, mode:0x204cc0(GFP_KERNEL|__GFP_RETRY_MAYFAIL|__GFP_THISNODE)
>>>> CPU: 1 PID: 112 Comm: sh Not tainted 6.5.0-rc7-... #45
>>>> Hardware name: linux,dummy-virt (DT)
>>>> Call trace:
>>>> dump_backtrace.part.6+0x84/0xe4
>>>> show_stack+0x18/0x24
>>>> dump_stack_lvl+0x48/0x60
>>>> dump_stack+0x18/0x24
>>>> warn_alloc+0x100/0x1bc
>>>> __alloc_pages_slowpath.constprop.107+0xa40/0xad8
>>>> __alloc_pages+0x244/0x2d0
>>>> hugetlb_vmemmap_restore+0x104/0x1e4
>>>> __update_and_free_hugetlb_folio+0x44/0x1f4
>>>> update_and_free_hugetlb_folio+0x20/0x68
>>>> update_and_free_pages_bulk+0x4c/0xac
>>>> set_max_huge_pages+0x198/0x334
>>>> nr_hugepages_store_common+0x118/0x178
>>>> nr_hugepages_store+0x18/0x24
>>>> kobj_attr_store+0x18/0x2c
>>>> sysfs_kf_write+0x40/0x54
>>>> kernfs_fop_write_iter+0x164/0x1dc
>>>> vfs_write+0x3a8/0x460
>>>> ksys_write+0x6c/0x100
>>>> __arm64_sys_write+0x1c/0x28
>>>> invoke_syscall+0x44/0x100
>>>> el0_svc_common.constprop.1+0x6c/0xe4
>>>> do_el0_svc+0x38/0x94
>>>> el0_svc+0x28/0x74
>>>> el0t_64_sync_handler+0xa0/0xc4
>>>> el0t_64_sync+0x174/0x178
>>>> Mem-Info:
>>>> ...
>>>>
>>>> The reason is that the hugetlb pages being released are allocated from
>>>> movable nodes, and with hugetlb_optimize_vmemmap enabled, vmemmap pages
>>>> need to be allocated from the same node during the hugetlb pages
>>> Thanks for your fix, I think it should be a real word issue, it's better
>>> to add a Fixes tag to indicate backporting. Thanks.
>>>
>> I thought we might get get the same error (Unable to allocate on movable
>> node) when creating the hugetlb page.  Why?  Because we replace the head
>> vmemmap page.  However, I see that failure to allocate there is not a
>> fatal error and we fallback to the currently mapped page.  We also pass
>> __GFP_NOWARN to that allocation attempt so there will be no report of the
>> failure.
>>
>> We might want to change this as well?
> I think yes. I also thought about this yesterday, but I think
> this one is not a fetal error, it should be an improvement patch.
> So it is better not to fold this change into this patch (a bug fix one).
>
> Thanks.
Sure, let me send another patch passing __GFP_NOWARN.

-- 
Best regards,
Yuan Can



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes
  2023-09-06  0:28   ` Mike Kravetz
  2023-09-06  2:32     ` Muchun Song
@ 2023-09-06  7:25     ` David Hildenbrand
  1 sibling, 0 replies; 10+ messages in thread
From: David Hildenbrand @ 2023-09-06  7:25 UTC (permalink / raw)
  To: Mike Kravetz, Muchun Song
  Cc: Yuan Can, Andrew Morton, Linux-MM, wangkefeng.wang, Michal Hocko

>>> releasing. With GFP_KERNEL and __GFP_THISNODE set, allocating from movable
>>> node is always failed. Fix this problem by removing __GFP_THISNODE.
>>>
>>> Signed-off-by: Yuan Can <yuancan@huawei.com>
>>> ---
>>> mm/hugetlb_vmemmap.c | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
>>> index c2007ef5e9b0..0485e471d224 100644
>>> --- a/mm/hugetlb_vmemmap.c
>>> +++ b/mm/hugetlb_vmemmap.c
>>> @@ -386,7 +386,7 @@ static int vmemmap_remap_free(unsigned long start, unsigned long end,
>>> static int alloc_vmemmap_page_list(unsigned long start, unsigned long end,
>>>    				   struct list_head *list)
>>> {
>>> - 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_THISNODE;
>>> + 	gfp_t gfp_mask = GFP_KERNEL | __GFP_RETRY_MAYFAIL;
>>
>> There is a little change for non-movable case after this change, we fist try
>> to allocate memory from the preferred node (it is same as original), if it
>> fails, it fallbacks to other nodes now. For me, it makes sense. At least, those
>> huge pages could be freed once other nodes could satisfy the allocation of
>> vmemmap pages.
>>
>> Reviewed-by: Muchun Song <songmuchun@bytedance.com>
> 
> This looks reasonable to me as well.
> 
> Cc'ing David and Michal as they are expert in hotplug.

IIUC, we still won't allocate from ZONE_MOVABLE / MIGRATE_CMA (due to 
GFP_KERNEL), so it should be fine.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-09-06  7:25 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-05  3:13 [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Yuan Can
2023-09-05  3:13 ` [PATCH 2/2] mm: hugetlb_vmemmap: allow alloc_vmemmap_page_list() ignore watermarks Yuan Can
2023-09-05  6:59   ` Muchun Song
2023-09-05  9:06 ` [PATCH 1/2] mm: hugetlb_vmemmap: fix hugetlb page number decrease failed on movable nodes Muchun Song
2023-09-05 10:43   ` Kefeng Wang
2023-09-05 12:41   ` Yuan Can
2023-09-06  0:28   ` Mike Kravetz
2023-09-06  2:32     ` Muchun Song
2023-09-06  2:59       ` Yuan Can
2023-09-06  7:25     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox