From: Ge Yang <yangge1116@126.com>
To: Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, david@redhat.com,
baolin.wang@linux.alibaba.com, hannes@cmpxchg.org,
liuzixing@hygon.cn
Subject: Re: [PATCH V3] mm: compaction: skip memory compaction when there are not enough migratable pages
Date: Mon, 13 Jan 2025 17:02:24 +0800 [thread overview]
Message-ID: <a6cda21c-9324-4f27-9c8b-31e6b7ff3bae@126.com> (raw)
In-Reply-To: <CAGsJ_4yA04vEOTm3CLJ6EEY65Wbpa-YAnwd2t5mi7wq75P_P4Q@mail.gmail.com>
在 2025/1/13 16:47, Barry Song 写道:
> On Thu, Jan 9, 2025 at 12:31 AM <yangge1116@126.com> wrote:
>>
>> From: yangge <yangge1116@126.com>
>>
>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
>> of memory. I have configured 16GB of CMA memory on each NUMA node,
>> and starting a 32GB virtual machine with device passthrough is
>> extremely slow, taking almost an hour.
>>
>> During the start-up of the virtual machine, it will call
>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
>> Long term GUP cannot allocate memory from CMA area, so a maximum of
>> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine
>> memory. There is 16GB of free CMA memory on a NUMA node, which is
>> sufficient to pass the order-0 watermark check, causing the
>> __compaction_suitable() function to consistently return true.
>> However, if there aren't enough migratable pages available, performing
>> memory compaction is also meaningless. Besides checking whether
>> the order-0 watermark is met, __compaction_suitable() also needs
>> to determine whether there are sufficient migratable pages available
>> for memory compaction.
>>
>> For costly allocations, because __compaction_suitable() always
>> returns true, __alloc_pages_slowpath() can't exit at the appropriate
>> place, resulting in excessively long virtual machine startup times.
>> Call trace:
>> __alloc_pages_slowpath
>> if (compact_result == COMPACT_SKIPPED ||
>> compact_result == COMPACT_DEFERRED)
>> goto nopage; // should exit __alloc_pages_slowpath() from here
>>
>> When the 16G of non-CMA memory on a single node is exhausted, we will
>> fallback to allocating memory on other nodes. In order to quickly
>> fallback to remote nodes, we should skip memory compaction when
>> migratable pages are insufficient. After this fix, it only takes a
>> few tens of seconds to start a 32GB virtual machine with device
>> passthrough functionality.
>>
>> Signed-off-by: yangge <yangge1116@126.com>
>> ---
>>
>> V3:
>> - fix build error
>>
>> V2:
>> - consider unevictable folios
>>
>> mm/compaction.c | 20 ++++++++++++++++++++
>> 1 file changed, 20 insertions(+)
>>
>> diff --git a/mm/compaction.c b/mm/compaction.c
>> index 07bd227..a9f1261 100644
>> --- a/mm/compaction.c
>> +++ b/mm/compaction.c
>> @@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *zone, int order,
>> int highest_zoneidx,
>> unsigned long wmark_target)
>> {
>> + pg_data_t __maybe_unused *pgdat = zone->zone_pgdat;
>> + unsigned long sum, nr_pinned;
>> unsigned long watermark;
>> +
>> + sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
>> + node_page_state(pgdat, NR_INACTIVE_ANON) +
>> + node_page_state(pgdat, NR_ACTIVE_FILE) +
>> + node_page_state(pgdat, NR_ACTIVE_ANON) +
>> + node_page_state(pgdat, NR_UNEVICTABLE);
>> +
>> + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
>> + node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
>> +
>
> Does the sum of all LRU pages equal non-CMA memory?
> I'm quite confused for two reasons:
> 1. CMA pages can be LRU pages.
> 2. Free pages might not belong to any LRUs.
NO.
If all the pages in the LRU are pinned, it seems unnecessary to perform
memory compaction, as the migration of pinned pages is unlikely to succeed.
Besides checking whether the order-0 watermark is met,
__compaction_suitable() also needs to determine whether there are
sufficient migratable pages available for memory compaction.
>
>
>> + /*
>> + * Gup-pinned pages are non-migratable. After subtracting these pages,
>> + * we need to check if the remaining pages are sufficient for memory
>> + * compaction.
>> + */
>> + if ((sum - nr_pinned) < (1 << order))
>> + return false;
>> +
>> /*
>> * Watermarks for order-0 must be met for compaction to be able to
>> * isolate free pages for migration targets. This means that the
>> --
>> 2.7.4
>>
>>
>
> Thanks
> barry
next prev parent reply other threads:[~2025-01-13 9:03 UTC|newest]
Thread overview: 11+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-01-08 11:30 yangge1116
2025-01-13 8:47 ` Barry Song
2025-01-13 9:02 ` Ge Yang [this message]
2025-01-13 10:05 ` Barry Song
2025-01-13 11:23 ` Ge Yang
2025-01-13 15:46 ` Johannes Weiner
2025-01-14 2:51 ` Ge Yang
2025-01-14 11:21 ` Vlastimil Babka
2025-01-14 12:24 ` Ge Yang
2025-01-14 12:51 ` Vlastimil Babka
2025-01-15 9:17 ` Ge Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a6cda21c-9324-4f27-9c8b-31e6b7ff3bae@126.com \
--to=yangge1116@126.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=liuzixing@hygon.cn \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox