linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory
@ 2025-11-04  0:39 Akinobu Mita
  2025-11-04 17:18 ` Mike Rapoport
  0 siblings, 1 reply; 5+ messages in thread
From: Akinobu Mita @ 2025-11-04  0:39 UTC (permalink / raw)
  To: akinobu.mita; +Cc: linux-cxl, linux-kernel, linux-mm, akpm, rppt

memblock_estimated_nr_free_pages() returns the difference between the total
size of the "memory" memblock type and the "reserved" memblock type.

The "soft-reserved" memory regions are added to the "reserved" memblock
type, but not to the "memory" memblock type. Therefore,
memblock_estimated_nr_free_pages() may return a smaller value than
expected, or if it underflows, an extremely large value.

/proc/sys/kernel/threads-max is determined by the value of
memblock_estimated_nr_free_pages().  This issue was discovered on machines
with CXL memory because kernel.threads-max was either smaller than expected
or extremely large for the installed DRAM size.

This fixes the issue by improving the accuracy of
memblock_estimated_nr_free_pages() by subtracting only the overlapping size
of regions with "memory" and "reserved" memblock types.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
 mm/memblock.c | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index e23e16618e9b..af014fa10a44 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1812,6 +1812,22 @@ phys_addr_t __init_memblock memblock_reserved_kern_size(phys_addr_t limit, int n
 	return total;
 }
 
+static phys_addr_t __init memblock_addrs_overlap_size(phys_addr_t base1, phys_addr_t size1,
+		phys_addr_t base2, phys_addr_t size2)
+{
+	phys_addr_t start, end;
+
+	if (!memblock_addrs_overlap(base1, size1, base2, size2))
+		return 0;
+
+	memblock_cap_size(base1, &size1);
+	memblock_cap_size(base2, &size2);
+	start = max(base1, base2);
+	end = min(base1 + size1, base2 + size2);
+
+	return end - start;
+}
+
 /**
  * memblock_estimated_nr_free_pages - return estimated number of free pages
  * from memblock point of view
@@ -1826,7 +1842,22 @@ phys_addr_t __init_memblock memblock_reserved_kern_size(phys_addr_t limit, int n
  */
 unsigned long __init memblock_estimated_nr_free_pages(void)
 {
-	return PHYS_PFN(memblock_phys_mem_size() - memblock_reserved_size());
+	int memory_idx, reserved_idx;
+	struct memblock_type *memory_type = &memblock.memory;
+	struct memblock_type *reserved_type = &memblock.reserved;
+	struct memblock_region *memory_region, *reserved_region;
+	phys_addr_t phys_mem_size = 0;
+
+	for_each_memblock_type(memory_idx, memory_type, memory_region) {
+		phys_mem_size += memory_region->size;
+		for_each_memblock_type(reserved_idx, reserved_type, reserved_region) {
+			phys_mem_size -= memblock_addrs_overlap_size(memory_region->base,
+					memory_region->size, reserved_region->base,
+					reserved_region->size);
+		}
+	}
+
+	return PHYS_PFN(phys_mem_size);
 }
 
 /* lowest address */
-- 
2.43.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory
  2025-11-04  0:39 [PATCH] memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory Akinobu Mita
@ 2025-11-04 17:18 ` Mike Rapoport
  2025-11-05 13:23   ` Akinobu Mita
  2025-11-11  1:00   ` [PATCH v2] " Akinobu Mita
  0 siblings, 2 replies; 5+ messages in thread
From: Mike Rapoport @ 2025-11-04 17:18 UTC (permalink / raw)
  To: Akinobu Mita, Dan Williams; +Cc: linux-cxl, linux-kernel, linux-mm, akpm

(added Dan Williams)

Hi,

On Tue, Nov 04, 2025 at 09:39:21AM +0900, Akinobu Mita wrote:
> memblock_estimated_nr_free_pages() returns the difference between the total
> size of the "memory" memblock type and the "reserved" memblock type.
> 
> The "soft-reserved" memory regions are added to the "reserved" memblock
> type, but not to the "memory" memblock type. Therefore,

@Dan, do we really need to memblock_reserve() the E820_TYPE_SOFT_RESERVED
ranges?
Quick scan didn't show anything that requires this, but I could easily miss
something.

> memblock_estimated_nr_free_pages() may return a smaller value than
> expected, or if it underflows, an extremely large value.
> 
> /proc/sys/kernel/threads-max is determined by the value of
> memblock_estimated_nr_free_pages().  This issue was discovered on machines
> with CXL memory because kernel.threads-max was either smaller than expected
> or extremely large for the installed DRAM size.
> 
> This fixes the issue by improving the accuracy of
> memblock_estimated_nr_free_pages() by subtracting only the overlapping size
> of regions with "memory" and "reserved" memblock types.
> 
> Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> ---
>  mm/memblock.c | 33 ++++++++++++++++++++++++++++++++-
>  1 file changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memblock.c b/mm/memblock.c
> index e23e16618e9b..af014fa10a44 100644
> --- a/mm/memblock.c
> +++ b/mm/memblock.c

...

> @@ -1826,7 +1842,22 @@ phys_addr_t __init_memblock memblock_reserved_kern_size(phys_addr_t limit, int n
>   */
>  unsigned long __init memblock_estimated_nr_free_pages(void)
>  {
> -	return PHYS_PFN(memblock_phys_mem_size() - memblock_reserved_size());

We have memblock_reserved_kern_size() that tells how much memory was
reserved from the actual RAM. Replacing memblock_reserved_size() with
memblock_reserved_kern_size() will omit "soft-reserved" ranges.

> +	int memory_idx, reserved_idx;
> +	struct memblock_type *memory_type = &memblock.memory;
> +	struct memblock_type *reserved_type = &memblock.reserved;
> +	struct memblock_region *memory_region, *reserved_region;
> +	phys_addr_t phys_mem_size = 0;
> +
> +	for_each_memblock_type(memory_idx, memory_type, memory_region) {
> +		phys_mem_size += memory_region->size;
> +		for_each_memblock_type(reserved_idx, reserved_type, reserved_region) {
> +			phys_mem_size -= memblock_addrs_overlap_size(memory_region->base,
> +					memory_region->size, reserved_region->base,
> +					reserved_region->size);
> +		}
> +	}
> +
> +	return PHYS_PFN(phys_mem_size);
>  }
>  
>  /* lowest address */
> -- 
> 2.43.0
> 

-- 
Sincerely yours,
Mike.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory
  2025-11-04 17:18 ` Mike Rapoport
@ 2025-11-05 13:23   ` Akinobu Mita
  2025-11-11  1:00   ` [PATCH v2] " Akinobu Mita
  1 sibling, 0 replies; 5+ messages in thread
From: Akinobu Mita @ 2025-11-05 13:23 UTC (permalink / raw)
  To: Mike Rapoport; +Cc: Dan Williams, linux-cxl, linux-kernel, linux-mm, akpm

2025年11月5日(水) 2:18 Mike Rapoport <rppt@kernel.org>:
>
> (added Dan Williams)
>
> Hi,
>
> On Tue, Nov 04, 2025 at 09:39:21AM +0900, Akinobu Mita wrote:
> > memblock_estimated_nr_free_pages() returns the difference between the total
> > size of the "memory" memblock type and the "reserved" memblock type.
> >
> > The "soft-reserved" memory regions are added to the "reserved" memblock
> > type, but not to the "memory" memblock type. Therefore,
>
> @Dan, do we really need to memblock_reserve() the E820_TYPE_SOFT_RESERVED
> ranges?
> Quick scan didn't show anything that requires this, but I could easily miss
> something.
>
> > memblock_estimated_nr_free_pages() may return a smaller value than
> > expected, or if it underflows, an extremely large value.
> >
> > /proc/sys/kernel/threads-max is determined by the value of
> > memblock_estimated_nr_free_pages().  This issue was discovered on machines
> > with CXL memory because kernel.threads-max was either smaller than expected
> > or extremely large for the installed DRAM size.
> >
> > This fixes the issue by improving the accuracy of
> > memblock_estimated_nr_free_pages() by subtracting only the overlapping size
> > of regions with "memory" and "reserved" memblock types.
> >
> > Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
> > ---
> >  mm/memblock.c | 33 ++++++++++++++++++++++++++++++++-
> >  1 file changed, 32 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/memblock.c b/mm/memblock.c
> > index e23e16618e9b..af014fa10a44 100644
> > --- a/mm/memblock.c
> > +++ b/mm/memblock.c
>
> ...
>
> > @@ -1826,7 +1842,22 @@ phys_addr_t __init_memblock memblock_reserved_kern_size(phys_addr_t limit, int n
> >   */
> >  unsigned long __init memblock_estimated_nr_free_pages(void)
> >  {
> > -     return PHYS_PFN(memblock_phys_mem_size() - memblock_reserved_size());
>
> We have memblock_reserved_kern_size() that tells how much memory was
> reserved from the actual RAM. Replacing memblock_reserved_size() with
> memblock_reserved_kern_size() will omit "soft-reserved" ranges.

Replacing memblock_reserved_size() with memblock_reserved_kern_size(
MEMBLOCK_ALLOC_ANYWHERE, NUMA_NO_NODE) also fixed the problem. Thank you.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory
  2025-11-04 17:18 ` Mike Rapoport
  2025-11-05 13:23   ` Akinobu Mita
@ 2025-11-11  1:00   ` Akinobu Mita
  2025-11-11 16:19     ` Mike Rapoport
  1 sibling, 1 reply; 5+ messages in thread
From: Akinobu Mita @ 2025-11-11  1:00 UTC (permalink / raw)
  To: rppt
  Cc: akinobu.mita, linux-cxl, linux-kernel, linux-mm, akpm, dan.j.williams

memblock_estimated_nr_free_pages() returns the difference between the total
size of the "memory" memblock type and the "reserved" memblock type.

The "soft-reserved" memory regions are added to the "reserved" memblock
type, but not to the "memory" memblock type. Therefore,
memblock_estimated_nr_free_pages() may return a smaller value than
expected, or if it underflows, an extremely large value.

/proc/sys/kernel/threads-max is determined by the value of
memblock_estimated_nr_free_pages().  This issue was discovered on machines
with CXL memory because kernel.threads-max was either smaller than expected
or extremely large for the installed DRAM size.

This fixes the issue by replacing memblock_reserved_size() with
memblock_reserved_kern_size() that tells how much memory was
reserved from the actual RAM.

Suggested-by: Mike Rapoport <rppt@kernel.org>
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
---
v2: instead of subtracting only the overlapping size,
    replace memblock_reserved_size() with memblock_reserved_kern_size()

 mm/memblock.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/mm/memblock.c b/mm/memblock.c
index c7869860e659..905d06b16348 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1826,7 +1826,8 @@ phys_addr_t __init_memblock memblock_reserved_kern_size(phys_addr_t limit, int n
  */
 unsigned long __init memblock_estimated_nr_free_pages(void)
 {
-	return PHYS_PFN(memblock_phys_mem_size() - memblock_reserved_size());
+	return PHYS_PFN(memblock_phys_mem_size() -
+			memblock_reserved_kern_size(MEMBLOCK_ALLOC_ANYWHERE, NUMA_NO_NODE));
 }
 
 /* lowest address */
-- 
2.43.0



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v2] memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory
  2025-11-11  1:00   ` [PATCH v2] " Akinobu Mita
@ 2025-11-11 16:19     ` Mike Rapoport
  0 siblings, 0 replies; 5+ messages in thread
From: Mike Rapoport @ 2025-11-11 16:19 UTC (permalink / raw)
  To: Akinobu Mita
  Cc: Mike Rapoport, linux-cxl, linux-kernel, linux-mm, akpm, dan.j.williams

From: Mike Rapoport (Microsoft) <rppt@kernel.org>

On Tue, 11 Nov 2025 10:00:10 +0900, Akinobu Mita wrote:
> memblock_estimated_nr_free_pages() returns the difference between the total
> size of the "memory" memblock type and the "reserved" memblock type.
> 
> The "soft-reserved" memory regions are added to the "reserved" memblock
> type, but not to the "memory" memblock type. Therefore,
> memblock_estimated_nr_free_pages() may return a smaller value than
> expected, or if it underflows, an extremely large value.
> 
> [...]

Applied to fixes branch of memblock.git tree, thanks!

[1/1] memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory
      commit: c42af83c59b65d01c0f7a074e450bbbb43b22f0d

tree: https://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock
branch: fixes

In the future please start a new thread when sending the next version of a
patch.

--
Sincerely yours,
Mike.



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-11-11 16:19 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-04  0:39 [PATCH] memblock: fix memblock_estimated_nr_free_pages() for soft-reserved memory Akinobu Mita
2025-11-04 17:18 ` Mike Rapoport
2025-11-05 13:23   ` Akinobu Mita
2025-11-11  1:00   ` [PATCH v2] " Akinobu Mita
2025-11-11 16:19     ` Mike Rapoport

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox