* [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages
@ 2025-01-04 8:58 yangge1116
2025-01-04 11:28 ` kernel test robot
2025-01-06 8:12 ` Baolin Wang
0 siblings, 2 replies; 6+ messages in thread
From: yangge1116 @ 2025-01-04 8:58 UTC (permalink / raw)
To: akpm
Cc: linux-mm, linux-kernel, 21cnbao, david, baolin.wang, hannes,
liuzixing, yangge
From: yangge <yangge1116@126.com>
There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
of memory. I have configured 16GB of CMA memory on each NUMA node,
and starting a 32GB virtual machine with device passthrough is
extremely slow, taking almost an hour.
During the start-up of the virtual machine, it will call
pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
Long term GUP cannot allocate memory from CMA area, so a maximum of
16 GB of no-CMA memory on a NUMA node can be used as virtual machine
memory. There is 16GB of free CMA memory on a NUMA node, which is
sufficient to pass the order-0 watermark check, causing the
__compaction_suitable() function to consistently return true.
However, if there aren't enough migratable pages available, performing
memory compaction is also meaningless. Besides checking whether
the order-0 watermark is met, __compaction_suitable() also needs
to determine whether there are sufficient migratable pages available
for memory compaction.
For costly allocations, because __compaction_suitable() always
returns true, __alloc_pages_slowpath() can't exit at the appropriate
place, resulting in excessively long virtual machine startup times.
Call trace:
__alloc_pages_slowpath
if (compact_result == COMPACT_SKIPPED ||
compact_result == COMPACT_DEFERRED)
goto nopage; // should exit __alloc_pages_slowpath() from here
When the 16G of non-CMA memory on a single node is exhausted, we will
fallback to allocating memory on other nodes. In order to quickly
fallback to remote nodes, we should skip memory compaction when
migratable pages are insufficient. After this fix, it only takes a
few tens of seconds to start a 32GB virtual machine with device
passthrough functionality.
Signed-off-by: yangge <yangge1116@126.com>
---
mm/compaction.c | 19 +++++++++++++++++++
1 file changed, 19 insertions(+)
diff --git a/mm/compaction.c b/mm/compaction.c
index 07bd227..1c469b3 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone *zone, int order,
int highest_zoneidx,
unsigned long wmark_target)
{
+ pg_data_t *pgdat = zone->zone_pgdat;
+ unsigned long sum, nr_pinned;
unsigned long watermark;
+
+ sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
+ node_page_state(pgdat, NR_INACTIVE_ANON) +
+ node_page_state(pgdat, NR_ACTIVE_FILE) +
+ node_page_state(pgdat, NR_ACTIVE_ANON);
+
+ nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
+ node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
+
+ /*
+ * Gup-pinned pages are non-migratable. After subtracting these pages,
+ * we need to check if the remaining pages are sufficient for memory
+ * compaction.
+ */
+ if ((sum - nr_pinned) < (1 << order))
+ return false;
+
/*
* Watermarks for order-0 must be met for compaction to be able to
* isolate free pages for migration targets. This means that the
--
2.7.4
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages
2025-01-04 8:58 [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages yangge1116
@ 2025-01-04 11:28 ` kernel test robot
2025-01-06 8:12 ` Baolin Wang
1 sibling, 0 replies; 6+ messages in thread
From: kernel test robot @ 2025-01-04 11:28 UTC (permalink / raw)
To: yangge1116, akpm
Cc: llvm, oe-kbuild-all, linux-mm, linux-kernel, 21cnbao, david,
baolin.wang, hannes, liuzixing, yangge
Hi,
kernel test robot noticed the following build warnings:
[auto build test WARNING on akpm-mm/mm-everything]
url: https://github.com/intel-lab-lkp/linux/commits/yangge1116-126-com/mm-compaction-skip-memory-compaction-when-there-are-not-enough-migratable-pages/20250104-170112
base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/r/1735981122-2085-1-git-send-email-yangge1116%40126.com
patch subject: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages
config: i386-buildonly-randconfig-001-20250104 (https://download.01.org/0day-ci/archive/20250104/202501041908.jDpLhAgL-lkp@intel.com/config)
compiler: clang version 19.1.3 (https://github.com/llvm/llvm-project ab51eccf88f5321e7c60591c5546b254b6afab99)
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20250104/202501041908.jDpLhAgL-lkp@intel.com/reproduce)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202501041908.jDpLhAgL-lkp@intel.com/
All warnings (new ones prefixed by >>):
In file included from mm/compaction.c:15:
include/linux/mm_inline.h:47:41: warning: arithmetic between different enumeration types ('enum node_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
47 | __mod_lruvec_state(lruvec, NR_LRU_BASE + lru, nr_pages);
| ~~~~~~~~~~~ ^ ~~~
include/linux/mm_inline.h:49:22: warning: arithmetic between different enumeration types ('enum zone_stat_item' and 'enum lru_list') [-Wenum-enum-conversion]
49 | NR_ZONE_LRU_BASE + lru, nr_pages);
| ~~~~~~~~~~~~~~~~ ^ ~~~
>> mm/compaction.c:2386:13: warning: unused variable 'pgdat' [-Wunused-variable]
2386 | pg_data_t *pgdat = zone->zone_pgdat;
| ^~~~~
3 warnings generated.
vim +/pgdat +2386 mm/compaction.c
2381
2382 static bool __compaction_suitable(struct zone *zone, int order,
2383 int highest_zoneidx,
2384 unsigned long wmark_target)
2385 {
> 2386 pg_data_t *pgdat = zone->zone_pgdat;
2387 unsigned long sum, nr_pinned;
2388 unsigned long watermark;
2389
2390 sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
2391 node_page_state(pgdat, NR_INACTIVE_ANON) +
2392 node_page_state(pgdat, NR_ACTIVE_FILE) +
2393 node_page_state(pgdat, NR_ACTIVE_ANON);
2394
2395 nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
2396 node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
2397
2398 /*
2399 * Gup-pinned pages are non-migratable. After subtracting these pages,
2400 * we need to check if the remaining pages are sufficient for memory
2401 * compaction.
2402 */
2403 if ((sum - nr_pinned) < (1 << order))
2404 return false;
2405
2406 /*
2407 * Watermarks for order-0 must be met for compaction to be able to
2408 * isolate free pages for migration targets. This means that the
2409 * watermark and alloc_flags have to match, or be more pessimistic than
2410 * the check in __isolate_free_page(). We don't use the direct
2411 * compactor's alloc_flags, as they are not relevant for freepage
2412 * isolation. We however do use the direct compactor's highest_zoneidx
2413 * to skip over zones where lowmem reserves would prevent allocation
2414 * even if compaction succeeds.
2415 * For costly orders, we require low watermark instead of min for
2416 * compaction to proceed to increase its chances.
2417 * ALLOC_CMA is used, as pages in CMA pageblocks are considered
2418 * suitable migration targets
2419 */
2420 watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ?
2421 low_wmark_pages(zone) : min_wmark_pages(zone);
2422 watermark += compact_gap(order);
2423 return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx,
2424 ALLOC_CMA, wmark_target);
2425 }
2426
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages
2025-01-04 8:58 [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages yangge1116
2025-01-04 11:28 ` kernel test robot
@ 2025-01-06 8:12 ` Baolin Wang
2025-01-06 8:49 ` Ge Yang
1 sibling, 1 reply; 6+ messages in thread
From: Baolin Wang @ 2025-01-06 8:12 UTC (permalink / raw)
To: yangge1116, akpm
Cc: linux-mm, linux-kernel, 21cnbao, david, hannes, liuzixing
On 2025/1/4 16:58, yangge1116@126.com wrote:
> From: yangge <yangge1116@126.com>
>
> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
> of memory. I have configured 16GB of CMA memory on each NUMA node,
> and starting a 32GB virtual machine with device passthrough is
> extremely slow, taking almost an hour.
>
> During the start-up of the virtual machine, it will call
> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
> Long term GUP cannot allocate memory from CMA area, so a maximum of
> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine
> memory. There is 16GB of free CMA memory on a NUMA node, which is
> sufficient to pass the order-0 watermark check, causing the
> __compaction_suitable() function to consistently return true.
> However, if there aren't enough migratable pages available, performing
> memory compaction is also meaningless. Besides checking whether
> the order-0 watermark is met, __compaction_suitable() also needs
> to determine whether there are sufficient migratable pages available
> for memory compaction.
>
> For costly allocations, because __compaction_suitable() always
> returns true, __alloc_pages_slowpath() can't exit at the appropriate
> place, resulting in excessively long virtual machine startup times.
> Call trace:
> __alloc_pages_slowpath
> if (compact_result == COMPACT_SKIPPED ||
> compact_result == COMPACT_DEFERRED)
> goto nopage; // should exit __alloc_pages_slowpath() from here
>
> When the 16G of non-CMA memory on a single node is exhausted, we will
> fallback to allocating memory on other nodes. In order to quickly
> fallback to remote nodes, we should skip memory compaction when
> migratable pages are insufficient. After this fix, it only takes a
> few tens of seconds to start a 32GB virtual machine with device
> passthrough functionality.
>
> Signed-off-by: yangge <yangge1116@126.com>
> ---
> mm/compaction.c | 19 +++++++++++++++++++
> 1 file changed, 19 insertions(+)
>
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 07bd227..1c469b3 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone *zone, int order,
> int highest_zoneidx,
> unsigned long wmark_target)
> {
> + pg_data_t *pgdat = zone->zone_pgdat;
> + unsigned long sum, nr_pinned;
> unsigned long watermark;
> +
> + sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
> + node_page_state(pgdat, NR_INACTIVE_ANON) +
> + node_page_state(pgdat, NR_ACTIVE_FILE) +
> + node_page_state(pgdat, NR_ACTIVE_ANON);
> +
> + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
> +
> + /*
> + * Gup-pinned pages are non-migratable. After subtracting these pages,
> + * we need to check if the remaining pages are sufficient for memory
> + * compaction.
> + */
> + if ((sum - nr_pinned) < (1 << order))
> + return false;
> +
IMO, using the node's statistics to determine whether the zone is
suitable for compaction doesn't make sense. It is possible that even
though the normal zone has long-term pinned pages, the movable zone can
still be suitable for compaction.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages
2025-01-06 8:12 ` Baolin Wang
@ 2025-01-06 8:49 ` Ge Yang
2025-01-08 2:50 ` Baolin Wang
0 siblings, 1 reply; 6+ messages in thread
From: Ge Yang @ 2025-01-06 8:49 UTC (permalink / raw)
To: Baolin Wang, akpm
Cc: linux-mm, linux-kernel, 21cnbao, david, hannes, liuzixing
在 2025/1/6 16:12, Baolin Wang 写道:
>
>
> On 2025/1/4 16:58, yangge1116@126.com wrote:
>> From: yangge <yangge1116@126.com>
>>
>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
>> of memory. I have configured 16GB of CMA memory on each NUMA node,
>> and starting a 32GB virtual machine with device passthrough is
>> extremely slow, taking almost an hour.
>>
>> During the start-up of the virtual machine, it will call
>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
>> Long term GUP cannot allocate memory from CMA area, so a maximum of
>> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine
>> memory. There is 16GB of free CMA memory on a NUMA node, which is
>> sufficient to pass the order-0 watermark check, causing the
>> __compaction_suitable() function to consistently return true.
>> However, if there aren't enough migratable pages available, performing
>> memory compaction is also meaningless. Besides checking whether
>> the order-0 watermark is met, __compaction_suitable() also needs
>> to determine whether there are sufficient migratable pages available
>> for memory compaction.
>>
>> For costly allocations, because __compaction_suitable() always
>> returns true, __alloc_pages_slowpath() can't exit at the appropriate
>> place, resulting in excessively long virtual machine startup times.
>> Call trace:
>> __alloc_pages_slowpath
>> if (compact_result == COMPACT_SKIPPED ||
>> compact_result == COMPACT_DEFERRED)
>> goto nopage; // should exit __alloc_pages_slowpath() from here
>>
>> When the 16G of non-CMA memory on a single node is exhausted, we will
>> fallback to allocating memory on other nodes. In order to quickly
>> fallback to remote nodes, we should skip memory compaction when
>> migratable pages are insufficient. After this fix, it only takes a
>> few tens of seconds to start a 32GB virtual machine with device
>> passthrough functionality.
>>
>> Signed-off-by: yangge <yangge1116@126.com>
>> ---
>> mm/compaction.c | 19 +++++++++++++++++++
>> 1 file changed, 19 insertions(+)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 07bd227..1c469b3 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone
>> *zone, int order,
>> int highest_zoneidx,
>> unsigned long wmark_target)
>> {
>> + pg_data_t *pgdat = zone->zone_pgdat;
>> + unsigned long sum, nr_pinned;
>> unsigned long watermark;
>> +
>> + sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
>> + node_page_state(pgdat, NR_INACTIVE_ANON) +
>> + node_page_state(pgdat, NR_ACTIVE_FILE) +
>> + node_page_state(pgdat, NR_ACTIVE_ANON);
>> +
>> + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
>> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
>> +
>> + /*
>> + * Gup-pinned pages are non-migratable. After subtracting these
>> pages,
>> + * we need to check if the remaining pages are sufficient for memory
>> + * compaction.
>> + */
>> + if ((sum - nr_pinned) < (1 << order))
>> + return false;
>> +
>
> IMO, using the node's statistics to determine whether the zone is
> suitable for compaction doesn't make sense. It is possible that even
> though the normal zone has long-term pinned pages, the movable zone can
> still be suitable for compaction.
If all the memory used on a node is pinned, then this memory cannot be
migrated anymore, and memory compaction operations would not succeed.
I haven't used movable zone before, can you explain why memory
compaction is still necessary? Thank you.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages
2025-01-06 8:49 ` Ge Yang
@ 2025-01-08 2:50 ` Baolin Wang
2025-01-08 8:35 ` Ge Yang
0 siblings, 1 reply; 6+ messages in thread
From: Baolin Wang @ 2025-01-08 2:50 UTC (permalink / raw)
To: Ge Yang, akpm; +Cc: linux-mm, linux-kernel, 21cnbao, david, hannes, liuzixing
On 2025/1/6 16:49, Ge Yang wrote:
>
>
> 在 2025/1/6 16:12, Baolin Wang 写道:
>>
>>
>> On 2025/1/4 16:58, yangge1116@126.com wrote:
>>> From: yangge <yangge1116@126.com>
>>>
>>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
>>> of memory. I have configured 16GB of CMA memory on each NUMA node,
>>> and starting a 32GB virtual machine with device passthrough is
>>> extremely slow, taking almost an hour.
>>>
>>> During the start-up of the virtual machine, it will call
>>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
>>> Long term GUP cannot allocate memory from CMA area, so a maximum of
>>> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine
>>> memory. There is 16GB of free CMA memory on a NUMA node, which is
>>> sufficient to pass the order-0 watermark check, causing the
>>> __compaction_suitable() function to consistently return true.
>>> However, if there aren't enough migratable pages available, performing
>>> memory compaction is also meaningless. Besides checking whether
>>> the order-0 watermark is met, __compaction_suitable() also needs
>>> to determine whether there are sufficient migratable pages available
>>> for memory compaction.
>>>
>>> For costly allocations, because __compaction_suitable() always
>>> returns true, __alloc_pages_slowpath() can't exit at the appropriate
>>> place, resulting in excessively long virtual machine startup times.
>>> Call trace:
>>> __alloc_pages_slowpath
>>> if (compact_result == COMPACT_SKIPPED ||
>>> compact_result == COMPACT_DEFERRED)
>>> goto nopage; // should exit __alloc_pages_slowpath() from here
>>>
>>> When the 16G of non-CMA memory on a single node is exhausted, we will
>>> fallback to allocating memory on other nodes. In order to quickly
>>> fallback to remote nodes, we should skip memory compaction when
>>> migratable pages are insufficient. After this fix, it only takes a
>>> few tens of seconds to start a 32GB virtual machine with device
>>> passthrough functionality.
>>>
>>> Signed-off-by: yangge <yangge1116@126.com>
>>> ---
>>> mm/compaction.c | 19 +++++++++++++++++++
>>> 1 file changed, 19 insertions(+)
>>>
>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>> index 07bd227..1c469b3 100644
>>> --- a/mm/compaction.c
>>> +++ b/mm/compaction.c
>>> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone
>>> *zone, int order,
>>> int highest_zoneidx,
>>> unsigned long wmark_target)
>>> {
>>> + pg_data_t *pgdat = zone->zone_pgdat;
>>> + unsigned long sum, nr_pinned;
>>> unsigned long watermark;
>>> +
>>> + sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
>>> + node_page_state(pgdat, NR_INACTIVE_ANON) +
>>> + node_page_state(pgdat, NR_ACTIVE_FILE) +
>>> + node_page_state(pgdat, NR_ACTIVE_ANON);
>>> +
>>> + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
>>> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
>>> +
>>> + /*
>>> + * Gup-pinned pages are non-migratable. After subtracting these
>>> pages,
>>> + * we need to check if the remaining pages are sufficient for
>>> memory
>>> + * compaction.
>>> + */
>>> + if ((sum - nr_pinned) < (1 << order))
>>> + return false;
>>> +
>>
>> IMO, using the node's statistics to determine whether the zone is
>> suitable for compaction doesn't make sense. It is possible that even
>> though the normal zone has long-term pinned pages, the movable zone
>> can still be suitable for compaction.
> If all the memory used on a node is pinned, then this memory cannot be
> migrated anymore, and memory compaction operations would not succeed.
> I haven't used movable zone before, can you explain why memory
> compaction is still necessary? Thank you.
Please consider unevictable folios that are not in the active/inactive
file/anon LRU lists, yet can still be migrated.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages
2025-01-08 2:50 ` Baolin Wang
@ 2025-01-08 8:35 ` Ge Yang
0 siblings, 0 replies; 6+ messages in thread
From: Ge Yang @ 2025-01-08 8:35 UTC (permalink / raw)
To: Baolin Wang, akpm
Cc: linux-mm, linux-kernel, 21cnbao, david, hannes, liuzixing
在 2025/1/8 10:50, Baolin Wang 写道:
>
>
> On 2025/1/6 16:49, Ge Yang wrote:
>>
>>
>> 在 2025/1/6 16:12, Baolin Wang 写道:
>>>
>>>
>>> On 2025/1/4 16:58, yangge1116@126.com wrote:
>>>> From: yangge <yangge1116@126.com>
>>>>
>>>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
>>>> of memory. I have configured 16GB of CMA memory on each NUMA node,
>>>> and starting a 32GB virtual machine with device passthrough is
>>>> extremely slow, taking almost an hour.
>>>>
>>>> During the start-up of the virtual machine, it will call
>>>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
>>>> Long term GUP cannot allocate memory from CMA area, so a maximum of
>>>> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine
>>>> memory. There is 16GB of free CMA memory on a NUMA node, which is
>>>> sufficient to pass the order-0 watermark check, causing the
>>>> __compaction_suitable() function to consistently return true.
>>>> However, if there aren't enough migratable pages available, performing
>>>> memory compaction is also meaningless. Besides checking whether
>>>> the order-0 watermark is met, __compaction_suitable() also needs
>>>> to determine whether there are sufficient migratable pages available
>>>> for memory compaction.
>>>>
>>>> For costly allocations, because __compaction_suitable() always
>>>> returns true, __alloc_pages_slowpath() can't exit at the appropriate
>>>> place, resulting in excessively long virtual machine startup times.
>>>> Call trace:
>>>> __alloc_pages_slowpath
>>>> if (compact_result == COMPACT_SKIPPED ||
>>>> compact_result == COMPACT_DEFERRED)
>>>> goto nopage; // should exit __alloc_pages_slowpath() from here
>>>>
>>>> When the 16G of non-CMA memory on a single node is exhausted, we will
>>>> fallback to allocating memory on other nodes. In order to quickly
>>>> fallback to remote nodes, we should skip memory compaction when
>>>> migratable pages are insufficient. After this fix, it only takes a
>>>> few tens of seconds to start a 32GB virtual machine with device
>>>> passthrough functionality.
>>>>
>>>> Signed-off-by: yangge <yangge1116@126.com>
>>>> ---
>>>> mm/compaction.c | 19 +++++++++++++++++++
>>>> 1 file changed, 19 insertions(+)
>>>>
>>>> diff --git a/mm/compaction.c b/mm/compaction.c
>>>> index 07bd227..1c469b3 100644
>>>> --- a/mm/compaction.c
>>>> +++ b/mm/compaction.c
>>>> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone
>>>> *zone, int order,
>>>> int highest_zoneidx,
>>>> unsigned long wmark_target)
>>>> {
>>>> + pg_data_t *pgdat = zone->zone_pgdat;
>>>> + unsigned long sum, nr_pinned;
>>>> unsigned long watermark;
>>>> +
>>>> + sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
>>>> + node_page_state(pgdat, NR_INACTIVE_ANON) +
>>>> + node_page_state(pgdat, NR_ACTIVE_FILE) +
>>>> + node_page_state(pgdat, NR_ACTIVE_ANON);
>>>> +
>>>> + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
>>>> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
>>>> +
>>>> + /*
>>>> + * Gup-pinned pages are non-migratable. After subtracting these
>>>> pages,
>>>> + * we need to check if the remaining pages are sufficient for
>>>> memory
>>>> + * compaction.
>>>> + */
>>>> + if ((sum - nr_pinned) < (1 << order))
>>>> + return false;
>>>> +
>>>
>>> IMO, using the node's statistics to determine whether the zone is
>>> suitable for compaction doesn't make sense. It is possible that even
>>> though the normal zone has long-term pinned pages, the movable zone
>>> can still be suitable for compaction.
>> If all the memory used on a node is pinned, then this memory cannot be
>> migrated anymore, and memory compaction operations would not succeed.
>> I haven't used movable zone before, can you explain why memory
>> compaction is still necessary? Thank you.
>
> Please consider unevictable folios that are not in the active/inactive
> file/anon LRU lists, yet can still be migrated.
Ok, thanks.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-01-08 8:36 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-04 8:58 [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages yangge1116
2025-01-04 11:28 ` kernel test robot
2025-01-06 8:12 ` Baolin Wang
2025-01-06 8:49 ` Ge Yang
2025-01-08 2:50 ` Baolin Wang
2025-01-08 8:35 ` Ge Yang
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox