linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V2] mm: compaction: skip memory compaction when there are not enough migratable pages
@ 2025-01-08  8:37 yangge1116
  2025-01-13  8:11 ` Baolin Wang
  0 siblings, 1 reply; 3+ messages in thread
From: yangge1116 @ 2025-01-08  8:37 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, 21cnbao, david, baolin.wang, hannes,
	liuzixing, yangge

From: yangge <yangge1116@126.com>

There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
of memory. I have configured 16GB of CMA memory on each NUMA node,
and starting a 32GB virtual machine with device passthrough is
extremely slow, taking almost an hour.

During the start-up of the virtual machine, it will call
pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
Long term GUP cannot allocate memory from CMA area, so a maximum of
16 GB of no-CMA memory on a NUMA node can be used as virtual machine
memory. There is 16GB of free CMA memory on a NUMA node, which is
sufficient to pass the order-0 watermark check, causing the
__compaction_suitable() function to  consistently return true.
However, if there aren't enough migratable pages available, performing
memory compaction is also meaningless. Besides checking whether
the order-0 watermark is met, __compaction_suitable() also needs
to determine whether there are sufficient migratable pages available
for memory compaction.

For costly allocations, because __compaction_suitable() always
returns true, __alloc_pages_slowpath() can't exit at the appropriate
place, resulting in excessively long virtual machine startup times.
Call trace:
__alloc_pages_slowpath
    if (compact_result == COMPACT_SKIPPED ||
        compact_result == COMPACT_DEFERRED)
        goto nopage; // should exit __alloc_pages_slowpath() from here

When the 16G of non-CMA memory on a single node is exhausted, we will
fallback to allocating memory on other nodes. In order to quickly
fallback to remote nodes, we should skip memory compaction when
migratable pages are insufficient. After this fix, it only takes a
few tens of seconds to start a 32GB virtual machine with device
passthrough functionality.

Signed-off-by: yangge <yangge1116@126.com>
---

V2:
- consider unevictable folios 

 mm/compaction.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/mm/compaction.c b/mm/compaction.c
index 07bd227..1630abd 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *zone, int order,
 				  int highest_zoneidx,
 				  unsigned long wmark_target)
 {
+	struct pglist_data *pgdat = zone->zone_pgdat;
+	unsigned long sum, nr_pinned;
 	unsigned long watermark;
+
+	sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
+		node_page_state(pgdat, NR_INACTIVE_ANON) +
+		node_page_state(pgdat, NR_ACTIVE_FILE) +
+		node_page_state(pgdat, NR_ACTIVE_ANON) +
+		node_page_state(pgdat, NR_UNEVICTABLE);
+
+	nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
+		node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
+
+	/*
+	 * Gup-pinned pages are non-migratable. After subtracting these pages,
+	 * we need to check if the remaining pages are sufficient for memory
+	 * compaction.
+	 */
+	if ((sum - nr_pinned) < (1 << order))
+		return false;
+
 	/*
 	 * Watermarks for order-0 must be met for compaction to be able to
 	 * isolate free pages for migration targets. This means that the
-- 
2.7.4



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] mm: compaction: skip memory compaction when there are not enough migratable pages
  2025-01-08  8:37 [PATCH V2] mm: compaction: skip memory compaction when there are not enough migratable pages yangge1116
@ 2025-01-13  8:11 ` Baolin Wang
  2025-01-13 10:16   ` David Hildenbrand
  0 siblings, 1 reply; 3+ messages in thread
From: Baolin Wang @ 2025-01-13  8:11 UTC (permalink / raw)
  To: yangge1116, akpm
  Cc: linux-mm, linux-kernel, 21cnbao, david, hannes, liuzixing,
	Vlastimil Babka

Cc Vlastimil.

On 2025/1/8 16:37, yangge1116@126.com wrote:
> From: yangge <yangge1116@126.com>
> 
> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
> of memory. I have configured 16GB of CMA memory on each NUMA node,
> and starting a 32GB virtual machine with device passthrough is
> extremely slow, taking almost an hour.
> 
> During the start-up of the virtual machine, it will call
> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
> Long term GUP cannot allocate memory from CMA area, so a maximum of
> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine
> memory. There is 16GB of free CMA memory on a NUMA node, which is
> sufficient to pass the order-0 watermark check, causing the
> __compaction_suitable() function to  consistently return true.
> However, if there aren't enough migratable pages available, performing
> memory compaction is also meaningless. Besides checking whether
> the order-0 watermark is met, __compaction_suitable() also needs
> to determine whether there are sufficient migratable pages available
> for memory compaction.
> 
> For costly allocations, because __compaction_suitable() always
> returns true, __alloc_pages_slowpath() can't exit at the appropriate
> place, resulting in excessively long virtual machine startup times.
> Call trace:
> __alloc_pages_slowpath
>      if (compact_result == COMPACT_SKIPPED ||
>          compact_result == COMPACT_DEFERRED)
>          goto nopage; // should exit __alloc_pages_slowpath() from here
> 
> When the 16G of non-CMA memory on a single node is exhausted, we will
> fallback to allocating memory on other nodes. In order to quickly
> fallback to remote nodes, we should skip memory compaction when
> migratable pages are insufficient. After this fix, it only takes a
> few tens of seconds to start a 32GB virtual machine with device
> passthrough functionality.
> 
> Signed-off-by: yangge <yangge1116@126.com>
> ---
> 
> V2:
> - consider unevictable folios
> 
>   mm/compaction.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 07bd227..1630abd 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *zone, int order,
>   				  int highest_zoneidx,
>   				  unsigned long wmark_target)
>   {
> +	struct pglist_data *pgdat = zone->zone_pgdat;
> +	unsigned long sum, nr_pinned;
>   	unsigned long watermark;
> +
> +	sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
> +		node_page_state(pgdat, NR_INACTIVE_ANON) +
> +		node_page_state(pgdat, NR_ACTIVE_FILE) +
> +		node_page_state(pgdat, NR_ACTIVE_ANON) +
> +		node_page_state(pgdat, NR_UNEVICTABLE);
> +
> +	nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
> +		node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
> +
> +	/*
> +	 * Gup-pinned pages are non-migratable. After subtracting these pages,
> +	 * we need to check if the remaining pages are sufficient for memory
> +	 * compaction.
> +	 */
> +	if ((sum - nr_pinned) < (1 << order))
> +		return false;
> +

Looks reasonable to me, but let's see if other people have any comments.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH V2] mm: compaction: skip memory compaction when there are not enough migratable pages
  2025-01-13  8:11 ` Baolin Wang
@ 2025-01-13 10:16   ` David Hildenbrand
  0 siblings, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2025-01-13 10:16 UTC (permalink / raw)
  To: Baolin Wang, yangge1116, akpm
  Cc: linux-mm, linux-kernel, 21cnbao, hannes, liuzixing, Vlastimil Babka

>> +	/*
>> +	 * Gup-pinned pages are non-migratable. After subtracting these pages,
>> +	 * we need to check if the remaining pages are sufficient for memory
>> +	 * compaction.
>> +	 */
>> +	if ((sum - nr_pinned) < (1 << order))
>> +		return false;
>> +
> 
> Looks reasonable to me, but let's see if other people have any comments.
> 

Noting that Barry had some concerns as reply to v3.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2025-01-13 10:16 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-08  8:37 [PATCH V2] mm: compaction: skip memory compaction when there are not enough migratable pages yangge1116
2025-01-13  8:11 ` Baolin Wang
2025-01-13 10:16   ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox