From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BB32C02187 for ; Tue, 21 Jan 2025 10:02:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D800B28000E; Tue, 21 Jan 2025 05:02:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D2DE728000B; Tue, 21 Jan 2025 05:02:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BF5F128000E; Tue, 21 Jan 2025 05:02:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 9C2FE28000B for ; Tue, 21 Jan 2025 05:02:51 -0500 (EST) Received: from smtpin23.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5115980890 for ; Tue, 21 Jan 2025 10:02:51 +0000 (UTC) X-FDA: 83031020142.23.AB39FE1 Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.9]) by imf07.hostedemail.com (Postfix) with ESMTP id 440B04000D for ; Tue, 21 Jan 2025 10:02:47 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=mB4C9odv; spf=pass (imf07.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737453769; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=SLf7jAY7GVeED4AGeVwqKDTVLGHvDKBIerCQLpHF+cM=; b=xNy4vLqF3PCaDHE99sTpDN5aAHY44rTN8MMg3swr0p/HEA+s1IegqHBcfEehrszF0P0uL7 oPTz+tPxVrS5rsXi2Wc2IAqpEx5eD9tIo5sge0KFo5DbmJwb2ge0lsyj6gANWtGYLCK/5v iFyFg1uXi9m0JD5NUi/leb6v0jw03bg= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=mB4C9odv; spf=pass (imf07.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737453769; a=rsa-sha256; cv=none; b=NOcU04+vL16wgN+VEIG5sZpnwMqjfCtTfAgx2y0IGPCgfBrXqBDoz+crnguMWh9TmGEP15 dgAWAtHoF4/8Jz+cp4ldBCLESllk47sKpKWUwetJfRFurpNDsztu2C5T2a0PU2NLp2HZ1n 9S2DgMkueOWNYiJmSaY1yQTmdK9i6qA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=SLf7jAY7GVeED4AGeVwqKDTVLGHvDKBIerCQLpHF+cM=; b=mB4C9odvzDe2B+7Ljkskx1WF0VG6fQw8fZ73Ds7lyOT5pRMIWI7FE3cySlZ3vV 1qTt71Xs3k7vvc1Tv7Pkg5fH3jIWbXROH5TBTuTsShIpCxRIY5QBCndHyrd0x7Cn m/qUS1080nmNPDeF/rZnKOXyu5RhUJoHeCujP3yCPaNUo= Received: from [172.19.20.199] (unknown []) by gzga-smtp-mtada-g1-1 (Coremail) with SMTP id _____wD3N1CGcI9nTxAlAA--.32164S2; Tue, 21 Jan 2025 18:01:42 +0800 (CST) Message-ID: <4ad51644-92de-47ca-af2a-bcb1866059d2@126.com> Date: Tue, 21 Jan 2025 18:01:42 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V2] mm: compaction: use the actual allocation context to determine the watermarks for costly order during async memory compaction To: akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, hannes@cmpxchg.org, vbabka@suse.cz, liuzixing@hygon.cn References: <1736991214-29069-1-git-send-email-yangge1116@126.com> From: Ge Yang In-Reply-To: <1736991214-29069-1-git-send-email-yangge1116@126.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wD3N1CGcI9nTxAlAA--.32164S2 X-Coremail-Antispam: 1Uf129KBjvJXoW3Gr4kKw1kur1xur48Xw1DWrg_yoW7GryDpF 48uF13C395XF17CF4xta1vkFy5Ww48JF15JF12vw18ZwsIkFn2v3WDKa45AF4UXry3GF4j qFZ0gFnrCanxZa7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jbHUDUUUUU= X-Originating-IP: [112.64.138.194] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiihTZG2eMpiZQxAABsi X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 440B04000D X-Stat-Signature: 931twnkbkb3rwkqzisw31ioysi6eyg1z X-Rspam-User: X-HE-Tag: 1737453767-514511 X-HE-Meta: U2FsdGVkX19CCg/ceCb350SCFl3S7IN1OGUCqQNHmkrF87j9YMMNhuvpquX9bBZdd3zSaqP4TBXmDbylJfJVqqUNXWA0pZ+k6UmdS3QixWzC9xg3JAcRY08r/EGkIx6Qn2aNLp1w7+rGj1vZ9+DKfxusNYEtSsEg97uYRTR0X2aRn6lp3B5yIxxw8PL6Gj+B+0RsPgJ6ErLE3AjLZyufpRJSzjcEnT/Sk9/ZdyBwLznc+z4ati1OdlcV5rQiO47N9JQwBD9pvJihsmFhK4DluwEJjw7BsVkOvIwSY+WSbgOzqpFF8EFOd2jrqNHfarthHSNX+rwajlmmaEjf05co3Sy2mBGC5PqO0oCJbDd2Gc6SFMSiQk20NZqHD9DqPyc1A8iQEa8ymewpAUbKRCFXyBHmlrhKGfg7+5nxe0kw9v4cw0TtyuhWj3x8XPC4jhIY29FAGcX8H1ULk4Ql7cGH3aDbH/Cz+RBFf55rWj60Ur2H+0QTJvq8sJ4KRW+MDazuFYWWeMhiVu6R/jVNbyrLvWjVspCRmtfFJnw2pMyK/fLMGCbRdhmqSqwMiPDBsRvPUVFlKxR/46zVbBAxdhJ/lVt59ilXHhhHoC/ATKou2mk1KMWrykl97BCU1Bb0K7vZcuU19vEZ5Wm9141spKb0FNKoJrVk83PbVSuswl6K2DZOJjB1bnECcWxRkxGXb3mUNa+YV5NrpXKp7brBQm227U7sbHFYgx9t0AzdZ75PQMqw27J80ZigeIR6O+ItbTOFEzgpiWCqUHtqQF4ce4s5PN+MqK3kimEBi/fYHhD7lf3Lwqya64dxbUscnrZTMfgHqWGdVIrfdT0KkWJ3Us9Rdfn1wMqvcbyw8Dae864ae5PfcZnwBtNvbuPtQSEgdGvDl+5hTcPlctV7viz94bCZlZUjIEmZ5tmWC4RPnMWf+v9n2U82hwqN4RjbNJJjMy6jlZc/xYY1dFF5K9Lbv+J KfpoiqLg jbzQJM8eDJtVx+h6a+a5Ydt8VmbtagBfm4cPiqVO36qc8tdeTACTKcMPNZF42hUIoJjvGVEwMLaXy6OcXGUOegADZE9c2zA6r2rClDi92wM8Gz8lyxG9O9BIezqHcvqJsiyZ18QNciCdu0+SZbphHWxSE2NhfaX4YRFIwUzLeVUHSqvnJsCYmc2pYcC8/j14XNlSLFgKmeAH5ZFlvnKnC/WTeb25He5sCgnVMt40RkXq5Xc8eagWkyUazjZ3l4w41zrZhWhFn7IPP2GAVm4Y+xqHBXoh2JE2YRfDu3fud2MQc8kxZKOEOJ7GjUwytg0yiCopaxYAJJYtml4dGTdfyZEMMXdEVqLxd4BjlSysj3U9yK9+P8pwAJ5br5r4I74j/buw9ehlQ2MPRHkVKazH42mZ3AzjjhT1EXOnqysaXPVTHsmYswmS9TGUdmcNRC7BTI9OL X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch has been revised based on Vlastimil's suggestions. Please continue to review it. Thank you. 在 2025/1/16 9:33, yangge1116@126.com 写道: > From: yangge > > There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > of memory. I have configured 16GB of CMA memory on each NUMA node, > and starting a 32GB virtual machine with device passthrough is > extremely slow, taking almost an hour. > > Long term GUP cannot allocate memory from CMA area, so a maximum of > 16 GB of no-CMA memory on a NUMA node can be used as virtual machine > memory. There is 16GB of free CMA memory on a NUMA node, which is > sufficient to pass the order-0 watermark check, causing the > __compaction_suitable() function to consistently return true. > > For costly allocations, if the __compaction_suitable() function always > returns true, it causes the __alloc_pages_slowpath() function to fail > to exit at the appropriate point. This prevents timely fallback to > allocating memory on other nodes, ultimately resulting in excessively > long virtual machine startup times. > Call trace: > __alloc_pages_slowpath > if (compact_result == COMPACT_SKIPPED || > compact_result == COMPACT_DEFERRED) > goto nopage; // should exit __alloc_pages_slowpath() from here > > We could use the real unmovable allocation context to have > __zone_watermark_unusable_free() subtract CMA pages, and thus we won't > pass the order-0 check anymore once the non-CMA part is exhausted. There > is some risk that in some different scenario the compaction could in > fact migrate pages from the exhausted non-CMA part of the zone to the > CMA part and succeed, and we'll skip it instead. But only __GFP_NORETRY > allocations should be affected in the immediate "goto nopage" when > compaction is skipped, others will attempt with DEF_COMPACT_PRIORITY > anyway and won't fail without trying to compact-migrate the non-CMA > pageblocks into CMA pageblocks first, so it should be fine. > > After this fix, it only takes a few tens of seconds to start a 32GB > virtual machine with device passthrough functionality. > > Link: https://lore.kernel.org/lkml/1736335854-548-1-git-send-email-yangge1116@126.com/ > Signed-off-by: yangge > Acked-by: Vlastimil Babka > --- > > V2: > - update code and message suggested by Vlastimil > > mm/compaction.c | 29 +++++++++++++++++++++++++---- > 1 file changed, 25 insertions(+), 4 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 07bd227..3de7b67 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2490,7 +2490,8 @@ bool compaction_zonelist_suitable(struct alloc_context *ac, int order, > */ > static enum compact_result > compaction_suit_allocation_order(struct zone *zone, unsigned int order, > - int highest_zoneidx, unsigned int alloc_flags) > + int highest_zoneidx, unsigned int alloc_flags, > + bool async) > { > unsigned long watermark; > > @@ -2499,6 +2500,23 @@ compaction_suit_allocation_order(struct zone *zone, unsigned int order, > alloc_flags)) > return COMPACT_SUCCESS; > > + /* > + * For unmovable allocations (without ALLOC_CMA), check if there is enough > + * free memory in the non-CMA pageblocks. Otherwise compaction could form > + * the high-order page in CMA pageblocks, which would not help the > + * allocation to succeed. However, limit the check to costly order async > + * compaction (such as opportunistic THP attempts) because there is the > + * possibility that compaction would migrate pages from non-CMA to CMA > + * pageblock. > + */ > + if (order > PAGE_ALLOC_COSTLY_ORDER && async && > + !(alloc_flags & ALLOC_CMA)) { > + watermark = low_wmark_pages(zone) + compact_gap(order); > + if (!__zone_watermark_ok(zone, 0, watermark, highest_zoneidx, > + 0, zone_page_state(zone, NR_FREE_PAGES))) > + return COMPACT_SKIPPED; > + } > + > if (!compaction_suitable(zone, order, highest_zoneidx)) > return COMPACT_SKIPPED; > > @@ -2534,7 +2552,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) > if (!is_via_compact_memory(cc->order)) { > ret = compaction_suit_allocation_order(cc->zone, cc->order, > cc->highest_zoneidx, > - cc->alloc_flags); > + cc->alloc_flags, > + cc->mode == MIGRATE_ASYNC); > if (ret != COMPACT_CONTINUE) > return ret; > } > @@ -3037,7 +3056,8 @@ static bool kcompactd_node_suitable(pg_data_t *pgdat) > > ret = compaction_suit_allocation_order(zone, > pgdat->kcompactd_max_order, > - highest_zoneidx, ALLOC_WMARK_MIN); > + highest_zoneidx, ALLOC_WMARK_MIN, > + false); > if (ret == COMPACT_CONTINUE) > return true; > } > @@ -3078,7 +3098,8 @@ static void kcompactd_do_work(pg_data_t *pgdat) > continue; > > ret = compaction_suit_allocation_order(zone, > - cc.order, zoneid, ALLOC_WMARK_MIN); > + cc.order, zoneid, ALLOC_WMARK_MIN, > + false); > if (ret != COMPACT_CONTINUE) > continue; >