From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5B414E7719E for ; Mon, 13 Jan 2025 08:11:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B2A8A6B0085; Mon, 13 Jan 2025 03:11:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ADA736B0088; Mon, 13 Jan 2025 03:11:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9CAA26B0089; Mon, 13 Jan 2025 03:11:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 7D4B56B0085 for ; Mon, 13 Jan 2025 03:11:44 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2EBB41A1CDF for ; Mon, 13 Jan 2025 08:11:44 +0000 (UTC) X-FDA: 83001709728.20.59BAFE0 Received: from out30-99.freemail.mail.aliyun.com (out30-99.freemail.mail.aliyun.com [115.124.30.99]) by imf12.hostedemail.com (Postfix) with ESMTP id 4C71E40009 for ; Mon, 13 Jan 2025 08:11:40 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=YescIMq5; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736755902; a=rsa-sha256; cv=none; b=exdhVZ4IymMVxsqJWXPKAnLeCXX3pHVu64h10rpNpTsDY5IRYvtefC/spq26f1eDXyg2fP /pTt8JNi9nlTlJDXvIqZBVT4Dd3v3RVAzspUM/Qns7NDNmoQTLQRqgfyoy8Py0oWAjQO0a w8pKbuawBxMzxbMy6Rn5kcQnaZZJdOQ= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=YescIMq5; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.99 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736755902; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wbsywCouWA6zshwZs9IgmJzTWth1P8A5x0G4J1e81Yg=; b=khSvhoIKWIcwDNld8WPtBbDzB7ygyqaQZA4zTgTFSHzlXYDXgafi3HG59nNgYMuv3t8lRA hAaALYh7BH4CrkZv+sq9ShgDsGhr8iSTdNSfooLyTsGL1j/dwMgPdbmJHctNSVNKV2aTdI BscUTWe+8/0add2eG9IvTc0+MFqo76A= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1736755898; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=wbsywCouWA6zshwZs9IgmJzTWth1P8A5x0G4J1e81Yg=; b=YescIMq5uvAD56SbvvnKiaMTNp9aR8Os7Yhl6pgaJL/DItYSE7cQViRyRiXY4V27TNbC7oUGF6nohKHdJxemhpGo0Q/GIMcXQwB6unD5+QnL5+XAAX1b9mPiD/STFC/pnLbSOxBUIgok+TGEgeHiBgCr6X+uLsaZ7uuxn8S6Asw= Received: from 30.74.144.122(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WNVPY22_1736755895 cluster:ay36) by smtp.aliyun-inc.com; Mon, 13 Jan 2025 16:11:36 +0800 Message-ID: <324bf85f-442a-4388-a8f4-55d60e57b914@linux.alibaba.com> Date: Mon, 13 Jan 2025 16:11:35 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V2] mm: compaction: skip memory compaction when there are not enough migratable pages To: yangge1116@126.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, hannes@cmpxchg.org, liuzixing@hygon.cn, Vlastimil Babka References: <1736325440-30857-1-git-send-email-yangge1116@126.com> From: Baolin Wang In-Reply-To: <1736325440-30857-1-git-send-email-yangge1116@126.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 4C71E40009 X-Stat-Signature: 64ngwpcgt3ie53hin673tr5cbac3j346 X-Rspam-User: X-HE-Tag: 1736755900-836870 X-HE-Meta: U2FsdGVkX19tCSt1wz5CJgIxduJuEEOips0ZpzqUZkZQWVm/L7inFDBHs8jVbsTv3U1aM+G4aPMf8yeltGkqZLzytZ569pCHLqh96Avs5yRZdpB5xAdohQN/5FZ6bs6XGiy78mHB9zPfQWd7Q/ovlCyFJzPPN57US80TWb1KIBEtM1GAkKL5x+Mc2lATRxbuW7RRigScSgIlkzdoX57tuS8c0LzpZcQTglJfItice909L/Z5ysmrFuQce2T/urBVLXpyTyL5LoFnyXNWBLSiaiXwJeht3CSphk5tGUjENamQyFbFg2V0WygBg5WZIjDrFrcQrb8lCWyzjEk0RJQCgcwdCELFKGy8scn5qaXKT9LXMfIjoyHPqJNyjb2bCFsT/CQf7uwX6ABhHshAnF04V82V0gLCgvz2pw+8VrOXmvWlEgfeuCtkB0UzRlaxNxYbvJ+0yTUqqhOixt+VbWz+i50q3s5lkEWujuQfGzx9UuHjfw6WDq7F90dGrRAJJxqIp7fBp9jHQmiXc7P0yrvPtDYtJJ8QoNmoTo074BwZ7KDre5OrE/mpjWBqhsP++K5F3WTw325LFkhHFLUIbjLEbGdBVL84h9O/ZcF8J6j+qHQzGVlZrJMIDcIfbWA6Z5NR2rnhwbs8assFra8bd2TGLFjBlSGJRidK+hIf2ryTbR1LijC4tEtORWY6NJLi8aF75va+X/aCIuhc8NOomyf069QaNyjCgUdpyYwj3kvQbecJ1H0oj44S3+upWn7EZ1THWZqBuG6+RA1+UxHZkroCGUBphNPRa9QMiba43UT17IzI9FLpvaZlhOeLBYOztzT7dI+VIIyczbTs2fKXmqL92BdJxHSXwE0b3ETK+RRJwFX7mmn4MgGhVi2ZpAyU2fkhipXo2jwYoDnn81dCntBCZXr8GPNiriREla4DbPtDjAvw5y8pOO2Tfq+eHOZtLE+TZ5UzaJnGDOdxtVFiI2F pWX5vcB+ J0k8rTpjWH34sS9hilQqX3EbuYjkBd2C3i+zBvDEQ+ZBXwabLL0G5O2sU6wS0E2VzhT2JPFGYCwz25kD1amfvcwZS0/RfdmWN7FHC6dHbwdNtB95/vW4OVLrYkBobU7bXtxYWLCTL5mMabw5SNbLtUFodVMWBF3Z/LWOcuaWgfcFali9yYRIH7yKZpeT8sTZ1Mtx6ANVDro0Uyl0+xXRdtYNizrgeDn8VjZ5Oc6AYxMj1cTvTL2fMujQ3xstSlOoyiGE8f++XS7FI8NhcaCaFin6tHFFpjC5uiQp87N762Zs5adzBSIXfjVJhuQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Cc Vlastimil. On 2025/1/8 16:37, yangge1116@126.com wrote: > From: yangge > > There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > of memory. I have configured 16GB of CMA memory on each NUMA node, > and starting a 32GB virtual machine with device passthrough is > extremely slow, taking almost an hour. > > During the start-up of the virtual machine, it will call > pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. > Long term GUP cannot allocate memory from CMA area, so a maximum of > 16 GB of no-CMA memory on a NUMA node can be used as virtual machine > memory. There is 16GB of free CMA memory on a NUMA node, which is > sufficient to pass the order-0 watermark check, causing the > __compaction_suitable() function to consistently return true. > However, if there aren't enough migratable pages available, performing > memory compaction is also meaningless. Besides checking whether > the order-0 watermark is met, __compaction_suitable() also needs > to determine whether there are sufficient migratable pages available > for memory compaction. > > For costly allocations, because __compaction_suitable() always > returns true, __alloc_pages_slowpath() can't exit at the appropriate > place, resulting in excessively long virtual machine startup times. > Call trace: > __alloc_pages_slowpath > if (compact_result == COMPACT_SKIPPED || > compact_result == COMPACT_DEFERRED) > goto nopage; // should exit __alloc_pages_slowpath() from here > > When the 16G of non-CMA memory on a single node is exhausted, we will > fallback to allocating memory on other nodes. In order to quickly > fallback to remote nodes, we should skip memory compaction when > migratable pages are insufficient. After this fix, it only takes a > few tens of seconds to start a 32GB virtual machine with device > passthrough functionality. > > Signed-off-by: yangge > --- > > V2: > - consider unevictable folios > > mm/compaction.c | 20 ++++++++++++++++++++ > 1 file changed, 20 insertions(+) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 07bd227..1630abd 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2383,7 +2383,27 @@ static bool __compaction_suitable(struct zone *zone, int order, > int highest_zoneidx, > unsigned long wmark_target) > { > + struct pglist_data *pgdat = zone->zone_pgdat; > + unsigned long sum, nr_pinned; > unsigned long watermark; > + > + sum = node_page_state(pgdat, NR_INACTIVE_FILE) + > + node_page_state(pgdat, NR_INACTIVE_ANON) + > + node_page_state(pgdat, NR_ACTIVE_FILE) + > + node_page_state(pgdat, NR_ACTIVE_ANON) + > + node_page_state(pgdat, NR_UNEVICTABLE); > + > + nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - > + node_page_state(pgdat, NR_FOLL_PIN_RELEASED); > + > + /* > + * Gup-pinned pages are non-migratable. After subtracting these pages, > + * we need to check if the remaining pages are sufficient for memory > + * compaction. > + */ > + if ((sum - nr_pinned) < (1 << order)) > + return false; > + Looks reasonable to me, but let's see if other people have any comments.