From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 5B414E7719E
	for <linux-mm@archiver.kernel.org>; Mon, 13 Jan 2025 08:11:45 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id B2A8A6B0085; Mon, 13 Jan 2025 03:11:44 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id ADA736B0088; Mon, 13 Jan 2025 03:11:44 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 9CAA26B0089; Mon, 13 Jan 2025 03:11:44 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 7D4B56B0085
	for <linux-mm@kvack.org>; Mon, 13 Jan 2025 03:11:44 -0500 (EST)
Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 2EBB41A1CDF
	for <linux-mm@kvack.org>; Mon, 13 Jan 2025 08:11:44 +0000 (UTC)
X-FDA: 83001709728.20.59BAFE0
Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99])
	by imf12.hostedemail.com (Postfix) with ESMTP id 4C71E40009
	for <linux-mm@kvack.org>; Mon, 13 Jan 2025 08:11:40 +0000 (UTC)
Authentication-Results: imf12.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b=YescIMq5;
	dmarc=pass (policy=none) header.from=linux.alibaba.com;
	spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736755902; a=rsa-sha256;
	cv=none;
	b=exdhVZ4IymMVxsqJWXPKAnLeCXX3pHVu64h10rpNpTsDY5IRYvtefC/spq26f1eDXyg2fP
	/pTt8JNi9nlTlJDXvIqZBVT4Dd3v3RVAzspUM/Qns7NDNmoQTLQRqgfyoy8Py0oWAjQO0a
	w8pKbuawBxMzxbMy6Rn5kcQnaZZJdOQ=
ARC-Authentication-Results: i=1;
	imf12.hostedemail.com;
	dkim=pass header.d=linux.alibaba.com header.s=default header.b=YescIMq5;
	dmarc=pass (policy=none) header.from=linux.alibaba.com;
	spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1736755902;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=wbsywCouWA6zshwZs9IgmJzTWth1P8A5x0G4J1e81Yg=;
	b=khSvhoIKWIcwDNld8WPtBbDzB7ygyqaQZA4zTgTFSHzlXYDXgafi3HG59nNgYMuv3t8lRA
	hAaALYh7BH4CrkZv+sq9ShgDsGhr8iSTdNSfooLyTsGL1j/dwMgPdbmJHctNSVNKV2aTdI
	BscUTWe+8/0add2eG9IvTc0+MFqo76A=
DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed;
	d=linux.alibaba.com; s=default;
	t=1736755898; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type;
	bh=wbsywCouWA6zshwZs9IgmJzTWth1P8A5x0G4J1e81Yg=;
	b=YescIMq5uvAD56SbvvnKiaMTNp9aR8Os7Yhl6pgaJL/DItYSE7cQViRyRiXY4V27TNbC7oUGF6nohKHdJxemhpGo0Q/GIMcXQwB6unD5+QnL5+XAAX1b9mPiD/STFC/pnLbSOxBUIgok+TGEgeHiBgCr6X+uLsaZ7uuxn8S6Asw=
Received: from 30.74.144.122(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WNVPY22_1736755895 cluster:ay36)
          by smtp.aliyun-inc.com;
          Mon, 13 Jan 2025 16:11:36 +0800
Message-ID: <324bf85f-442a-4388-a8f4-55d60e57b914@linux.alibaba.com>
Date: Mon, 13 Jan 2025 16:11:35 +0800
MIME-Version: 1.0
User-Agent: Mozilla Thunderbird
Subject: Re: [PATCH V2] mm: compaction: skip memory compaction when there are
 not enough migratable pages
To: yangge1116@126.com, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com,
 david@redhat.com, hannes@cmpxchg.org, liuzixing@hygon.cn,
 Vlastimil Babka <vbabka@suse.cz>
References: <1736325440-30857-1-git-send-email-yangge1116@126.com>
From: Baolin Wang <baolin.wang@linux.alibaba.com>
In-Reply-To: <1736325440-30857-1-git-send-email-yangge1116@126.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 4C71E40009
X-Stat-Signature: 64ngwpcgt3ie53hin673tr5cbac3j346
X-Rspam-User: 
X-HE-Tag: 1736755900-836870
X-HE-Meta: U2FsdGVkX19tCSt1wz5CJgIxduJuEEOips0ZpzqUZkZQWVm/L7inFDBHs8jVbsTv3U1aM+G4aPMf8yeltGkqZLzytZ569pCHLqh96Avs5yRZdpB5xAdohQN/5FZ6bs6XGiy78mHB9zPfQWd7Q/ovlCyFJzPPN57US80TWb1KIBEtM1GAkKL5x+Mc2lATRxbuW7RRigScSgIlkzdoX57tuS8c0LzpZcQTglJfItice909L/Z5ysmrFuQce2T/urBVLXpyTyL5LoFnyXNWBLSiaiXwJeht3CSphk5tGUjENamQyFbFg2V0WygBg5WZIjDrFrcQrb8lCWyzjEk0RJQCgcwdCELFKGy8scn5qaXKT9LXMfIjoyHPqJNyjb2bCFsT/CQf7uwX6ABhHshAnF04V82V0gLCgvz2pw+8VrOXmvWlEgfeuCtkB0UzRlaxNxYbvJ+0yTUqqhOixt+VbWz+i50q3s5lkEWujuQfGzx9UuHjfw6WDq7F90dGrRAJJxqIp7fBp9jHQmiXc7P0yrvPtDYtJJ8QoNmoTo074BwZ7KDre5OrE/mpjWBqhsP++K5F3WTw325LFkhHFLUIbjLEbGdBVL84h9O/ZcF8J6j+qHQzGVlZrJMIDcIfbWA6Z5NR2rnhwbs8assFra8bd2TGLFjBlSGJRidK+hIf2ryTbR1LijC4tEtORWY6NJLi8aF75va+X/aCIuhc8NOomyf069QaNyjCgUdpyYwj3kvQbecJ1H0oj44S3+upWn7EZ1THWZqBuG6+RA1+UxHZkroCGUBphNPRa9QMiba43UT17IzI9FLpvaZlhOeLBYOztzT7dI+VIIyczbTs2fKXmqL92BdJxHSXwE0b3ETK+RRJwFX7mmn4MgGhVi2ZpAyU2fkhipXo2jwYoDnn81dCntBCZXr8GPNiriREla4DbPtDjAvw5y8pOO2Tfq+eHOZtLE+TZ5UzaJnGDOdxtVFiI2F
 pWX5vcB+
 J0k8rTpjWH34sS9hilQqX3EbuYjkBd2C3i+zBvDEQ+ZBXwabLL0G5O2sU6wS0E2VzhT2JPFGYCwz25kD1amfvcwZS0/RfdmWN7FHC6dHbwdNtB95/vW4OVLrYkBobU7bXtxYWLCTL5mMabw5SNbLtUFodVMWBF3Z/LWOcuaWgfcFali9yYRIH7yKZpeT8sTZ1Mtx6ANVDro0Uyl0+xXRdtYNizrgeDn8VjZ5Oc6AYxMj1cTvTL2fMujQ3xstSlOoyiGE8f++XS7FI8NhcaCaFin6tHFFpjC5uiQp87N762Zs5adzBSIXfjVJhuQ==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>
List-Subscribe: <mailto:majordomo@kvack.org>
List-Unsubscribe: <mailto:majordomo@kvack.org>

Cc Vlastimil.

On 2025/1/8 16:37, yangge1116@126.com wrote:
> From: yangge <yangge1116@126.com>
> 
> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB
> of memory. I have configured 16GB of CMA memory on each NUMA node,
> and starting a 32GB virtual machine with device passthrough is
> extremely slow, taking almost an hour.
> 
> During the start-up of the virtual machine, it will call
> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory.
> Long term GUP cannot allocate memory from CMA area, so a maximum of
> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine
> memory. There is 16GB of free CMA memory on a NUMA node, which is
> sufficient to pass the order-0 watermark check, causing the
> __compaction_suitable() function to  consistently return true.
> However, if there aren't enough migratable pages available, performing
> memory compaction is also meaningless. Besides checking whether
> the order-0 watermark is met, __compaction_suitable() also needs
> to determine whether there are sufficient migratable pages available
> for memory compaction.
> 
> For costly allocations, because __compaction_suitable() always
> returns true, __alloc_pages_slowpath() can't exit at the appropriate
> place, resulting in excessively long virtual machine startup times.
> Call trace:
> __alloc_pages_slowpath
>      if (compact_result == COMPACT_SKIPPED ||
>          compact_result == COMPACT_DEFERRED)
>          goto nopage; // should exit __alloc_pages_slowpath() from here
> 
> When the 16G of non-CMA memory on a single node is exhausted, we will
> fallback to allocating memory on other nodes. In order to quickly
> fallback to remote nodes, we should skip memory compaction when
> migratable pages are insufficient. After this fix, it only takes a
> few tens of seconds to start a 32GB virtual machine with device
> passthrough functionality.
> 
> Signed-off-by: yangge <yangge1116@126.com>
> ---
> 
> V2:
> - consider unevictable folios
> 
>   mm/compaction.c | 20 ++++++++++++++++++++
>   1 file changed, 20 insertions(+)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 07bd227..1630abd 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *zone, int order,
>   				  int highest_zoneidx,
>   				  unsigned long wmark_target)
>   {
> +	struct pglist_data *pgdat = zone->zone_pgdat;
> +	unsigned long sum, nr_pinned;
>   	unsigned long watermark;
> +
> +	sum = node_page_state(pgdat, NR_INACTIVE_FILE) +
> +		node_page_state(pgdat, NR_INACTIVE_ANON) +
> +		node_page_state(pgdat, NR_ACTIVE_FILE) +
> +		node_page_state(pgdat, NR_ACTIVE_ANON) +
> +		node_page_state(pgdat, NR_UNEVICTABLE);
> +
> +	nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) -
> +		node_page_state(pgdat, NR_FOLL_PIN_RELEASED);
> +
> +	/*
> +	 * Gup-pinned pages are non-migratable. After subtracting these pages,
> +	 * we need to check if the remaining pages are sufficient for memory
> +	 * compaction.
> +	 */
> +	if ((sum - nr_pinned) < (1 << order))
> +		return false;
> +

Looks reasonable to me, but let's see if other people have any comments.