From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CA468E77188 for ; Mon, 6 Jan 2025 08:50:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53B006B0082; Mon, 6 Jan 2025 03:50:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4EA7A6B0088; Mon, 6 Jan 2025 03:50:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B2C76B0089; Mon, 6 Jan 2025 03:50:13 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 204876B0082 for ; Mon, 6 Jan 2025 03:50:13 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 9499681406 for ; Mon, 6 Jan 2025 08:50:12 +0000 (UTC) X-FDA: 82976405064.09.5BA1936 Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.8]) by imf08.hostedemail.com (Postfix) with ESMTP id 75E7D160012 for ; Mon, 6 Jan 2025 08:50:09 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=G7nxPifR; spf=pass (imf08.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736153410; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V03LLMa7DwWMNiKqB6DuDpK0BN+E/MAu7yCePh3s3bQ=; b=TMSS8s+V+dXlHuVE8SpISuQq7+hI6o7I8raowY0XSItH9AUQs/L9x9at+fdhQtwmCYtsQ8 TpaSVMEbh9hE0NTGEZs6mYzxK+KUR7Ul5yMw0GwwxhSWZ1LQ2+yKrEpC/HOD2Wr5wCzYuL JkVmsetHO+McEIa1/xaRywNuH5gQOIQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736153410; a=rsa-sha256; cv=none; b=L9HAYU2NyhyDrKHD590hAHQppnavIbLzB39g53yLq88dGs3LLaeuFuso/9djwTeMsqIFaM y+MTzmxDumEHWeeRClQPSxLs1MBw5xsl2ExgfyYb2s9800vC8Nj+F+jh+jEy3HMyuTY6Ws 7T4j59S76pTgyQ1pq+JcQs/ePBRyhZs= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=G7nxPifR; spf=pass (imf08.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=V03LLMa7DwWMNiKqB6DuDpK0BN+E/MAu7yCePh3s3bQ=; b=G7nxPifRf1TufdpvBLMMfm1S8gbAQlGe0mJrsBHUpDnn+eZ8N5/nIRD4s8uaMD eybBjlGg3NeoLKyl/WJP2aX54LKlylza+zcEYD8F/YudClWQFyn26nha8AIuCR6t OVc//6enC0/7GV/X/oXEPNElv18Lz6x63yc8NAg7fOTUc= Received: from [172.19.20.199] (unknown [112.64.138.194]) by gzsmtp5 (Coremail) with SMTP id qSkvCgDnjG42mXtnBlRFDQ--.34694S2; Mon, 06 Jan 2025 16:49:58 +0800 (CST) Message-ID: <2889f0bf-b0ae-4f1a-b91c-fb4b59eb2d97@126.com> Date: Mon, 6 Jan 2025 16:49:58 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages To: Baolin Wang , akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, hannes@cmpxchg.org, liuzixing@hygon.cn References: <1735981122-2085-1-git-send-email-yangge1116@126.com> From: Ge Yang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:qSkvCgDnjG42mXtnBlRFDQ--.34694S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxXr1UZr4DuFyfuw4UKFWDCFg_yoW5Kw17pr 18G3ZIga1kXFZFkw1xt3WqyF93ta1fCFWUJF9FyF97u3ZI9FnayrsrK34UC3WUZryjqF4Y vFWUuw1Duws8Za7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jYrWrUUUUU= X-Originating-IP: [112.64.138.194] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiOhfMG2d7j+MLxAABsE X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 75E7D160012 X-Stat-Signature: ereqw38m8gfeio5mmfyeaiwwrzfryxbp X-Rspam-User: X-HE-Tag: 1736153409-903769 X-HE-Meta: U2FsdGVkX187VW5y4nnu3pVc5VdIr+0TN9U38kpy06qavxY9iBQgRgBwEnFgc/FOh3kPTyx2Jasy+PGtNfY6zNJ0GVTBD40c29vUp2kcXbipek56NtB/opYSJaCHBxj65WkOFmoeomBqs/9FpuL3LqERyfOxQSMeYWHHCaKCrgWtp7wqCd0ljIWlYG2Poiwh8kSHJWoTViRjId8jRJgVEezaL4lqOu/RajCdbHbR1oedz6JepPminc56+byHg+/Lt7eEGYnLHfbpU9/kZmxjWz17qaA8Z3t0w/RrzoouZc0khcbSmBFpUBluM/PabvFAi+o1j8HW0nETQi18pjrM8unzx4W2/9uD28mHI6pO0uXDvDWL6Xs+WknZfVfnCUDFe/UELU1XzK44582y1deWFuxGH/C0yqI+ZxPcSjT1NIobRCqAFbFkAMaCSM5h0HM0FEZ3fi3y486MoRgnmCpF+sNHNNhEpKixZlwZSG3FfdhfecGPtpp5DMrdJBfEIHUno+umHWCpRud9PQK6kpigeQAMrq4Ua4Lqv3aPcS9QeubovhbOf9WmnPR31jNkZwFJ5MC3DbajEP7YOoeEVjTihFQFK7UaW+nBfN0uf14246ciOTkY5Ty5KI3HwOrQQTr6KHBxci/9nxQI1wEFMhrwJ7yPLFaQDjfV2WZSbqvHTp5qOZeEKpzQ3LVFVoG45P4wg97HLU2Ta2bpIss9Y3WUfXmn1d+8nUJgFXe4Tb0UaNQu0QiuSoeJ7ZTwKaMWl3jt2GYKk6CN3zZ4fuGcYYN2HOAphTQI3IzcvjGC2zzonFuMCVNi+dQ+ivjpR7czxeyPoIJ8WxD5fwGuv7l2cUfk+okiG7jHCsX+UcUGY1CktFf/vogTVid20b/tRXLiYU7Av9l69NQGQsqDcot/obWAYqehuw1sB28fKKKjTMX7mxzHDGcpRwUhtnb8RZ5bM2ZhfyF/A7LsciKcQjrq2Dm Br/3dgoG UZZNRU9UZrlB6lRfnWN25xJYUfCvk0a4JMQArf3GLodTzVZAzs0ggBTO8N7VHJz06Z6zqLjn3l5UnjZOkW9+HTxXgUTHsRZ+ERzTl3r31gBaMGHvlirmNGbz/oCvk6ibSpHwIldhjdPB4+w8zqUkY6t/L5mdN0+X9/gxBMZGYS2Ncu0KeJNv7UCXjDUz/URqCD0aXhJzmCKDR7yHgsblLz8I7/43lDTd33acRKP/HvxQ5IIw9CdPPSC+FHUu5JfNCpI3M/MJaiPcKRwYxqmE6QFf/NfeNBIkEJW5k9lWOj4BgU1dZN7RSL8ETcSK4LNlI5n5sQ5LIGVjYNwl1Di71Bl/zD05+T7p8djoR3Uth+7EuzPnuHt5r5Xvr83NPuRLrInYgqjDEtrX9EpvW5ov2s5pDnA2QkI13E4x4I4/SrnXugogECLqciJOLpg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/1/6 16:12, Baolin Wang 写道: > > > On 2025/1/4 16:58, yangge1116@126.com wrote: >> From: yangge >> >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >> of memory. I have configured 16GB of CMA memory on each NUMA node, >> and starting a 32GB virtual machine with device passthrough is >> extremely slow, taking almost an hour. >> >> During the start-up of the virtual machine, it will call >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >> Long term GUP cannot allocate memory from CMA area, so a maximum of >> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine >> memory. There is 16GB of free CMA memory on a NUMA node, which is >> sufficient to pass the order-0 watermark check, causing the >> __compaction_suitable() function to  consistently return true. >> However, if there aren't enough migratable pages available, performing >> memory compaction is also meaningless. Besides checking whether >> the order-0 watermark is met, __compaction_suitable() also needs >> to determine whether there are sufficient migratable pages available >> for memory compaction. >> >> For costly allocations, because __compaction_suitable() always >> returns true, __alloc_pages_slowpath() can't exit at the appropriate >> place, resulting in excessively long virtual machine startup times. >> Call trace: >> __alloc_pages_slowpath >>      if (compact_result == COMPACT_SKIPPED || >>          compact_result == COMPACT_DEFERRED) >>          goto nopage; // should exit __alloc_pages_slowpath() from here >> >> When the 16G of non-CMA memory on a single node is exhausted, we will >> fallback to allocating memory on other nodes. In order to quickly >> fallback to remote nodes, we should skip memory compaction when >> migratable pages are insufficient. After this fix, it only takes a >> few tens of seconds to start a 32GB virtual machine with device >> passthrough functionality. >> >> Signed-off-by: yangge >> --- >>   mm/compaction.c | 19 +++++++++++++++++++ >>   1 file changed, 19 insertions(+) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 07bd227..1c469b3 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone >> *zone, int order, >>                     int highest_zoneidx, >>                     unsigned long wmark_target) >>   { >> +    pg_data_t *pgdat = zone->zone_pgdat; >> +    unsigned long sum, nr_pinned; >>       unsigned long watermark; >> + >> +    sum = node_page_state(pgdat, NR_INACTIVE_FILE) + >> +        node_page_state(pgdat, NR_INACTIVE_ANON) + >> +        node_page_state(pgdat, NR_ACTIVE_FILE) + >> +        node_page_state(pgdat, NR_ACTIVE_ANON); >> + >> +    nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - >> +        node_page_state(pgdat, NR_FOLL_PIN_RELEASED); >> + >> +    /* >> +     * Gup-pinned pages are non-migratable. After subtracting these >> pages, >> +     * we need to check if the remaining pages are sufficient for memory >> +     * compaction. >> +     */ >> +    if ((sum - nr_pinned) < (1 << order)) >> +        return false; >> + > > IMO, using the node's statistics to determine whether the zone is > suitable for compaction doesn't make sense. It is possible that even > though the normal zone has long-term pinned pages, the movable zone can > still be suitable for compaction. If all the memory used on a node is pinned, then this memory cannot be migrated anymore, and memory compaction operations would not succeed. I haven't used movable zone before, can you explain why memory compaction is still necessary? Thank you.