From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E013E77184 for ; Wed, 18 Dec 2024 02:15:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3E0A6B0085; Tue, 17 Dec 2024 21:15:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DEC306B0088; Tue, 17 Dec 2024 21:15:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CDAD76B0089; Tue, 17 Dec 2024 21:15:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B0B626B0085 for ; Tue, 17 Dec 2024 21:15:22 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 1C042C0A9D for ; Wed, 18 Dec 2024 02:15:22 +0000 (UTC) X-FDA: 82906461960.26.5C715DF Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.9]) by imf02.hostedemail.com (Postfix) with ESMTP id CAF278000D for ; Wed, 18 Dec 2024 02:14:19 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=AImudZie; spf=pass (imf02.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734488096; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=53Hsg8krc/9Eb72fAtYb1b6FfRkOMXJEiWWXPojXaYI=; b=qb+aXOwlvKH7lDJzdU+qdF14BNI3ioBL9HBFGsSOES9vAx7kTmiG4eWNsxC6o0GPzsO5No DCTOjZo6d2DkIZ/42xLYg3pnMD8ci+OBXgIDBceSQBFqRZbYnkQKZ5CZ0AMIBDxtqjKt8l 0IOjFTDw7Pep1wLLau9mjVVoWMwVniE= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=AImudZie; spf=pass (imf02.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734488096; a=rsa-sha256; cv=none; b=uv+uDBwbt9Oyh87vlPi2bFuA36n8E1ZcAZGr+yxlCHwCU0pb/V1Yh/EfZYyQvDGKnzi34E g7HHj+7B3pENLQ/apLJiO93lWx1a01kLKlKwvUvekgOweqHK6646XIjCP1oXvnoQrvY+Gf IH0c1FB79k60KFpBhf4526AkTIi6eRA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=53Hsg8krc/9Eb72fAtYb1b6FfRkOMXJEiWWXPojXaYI=; b=AImudZie8AJIhZ1ru//aQrHQtuopGbEdVrjvBBH+rsob0m0p9EaHck8PwKz5gc L929EGqvAgeR04MpKCTQqwro5JtX5/3xZRyuJoT8xmDhI96HAEom3/Gz7yghvkOf kS3SofgDfXmSnqppdhnL8tCq+qfFZ0pkEiNCmT1bBN+Hk= Received: from [172.21.22.210] (unknown []) by gzga-smtp-mtada-g1-1 (Coremail) with SMTP id _____wD3l5IqMGJneqnIAQ--.29653S2; Wed, 18 Dec 2024 10:15:07 +0800 (CST) Message-ID: <93cf1aee-70df-426f-a3c0-1db8068bd59a@126.com> Date: Wed, 18 Dec 2024 10:15:06 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V7] mm, compaction: don't use ALLOC_CMA for unmovable allocations To: Johannes Weiner Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, vbabka@suse.cz, liuzixing@hygon.cn References: <1734436004-1212-1-git-send-email-yangge1116@126.com> <20241217155551.GA37530@cmpxchg.org> From: Ge Yang In-Reply-To: <20241217155551.GA37530@cmpxchg.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wD3l5IqMGJneqnIAQ--.29653S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxXF4xCw17CF4fGrWxCr48JFb_yoWrGry8pF W8ZrnFkws8ZFWDKrn2yw1v9a4jga18tF4UJw1qvrykursIkF9IkF1DtFyUCFyUXr15tayS qFW8u3sxAa15Za7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07UqYLPUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiOhq4G2dhmi52-wACs6 X-Rspamd-Queue-Id: CAF278000D X-Rspamd-Server: rspam12 X-Stat-Signature: bs35ed71bsqrxz7t7c94aimhb1fk9f6g X-Rspam-User: X-HE-Tag: 1734488059-333094 X-HE-Meta: U2FsdGVkX1/z4rC6bLPxsVdvjmrhicQcIgBpj39n9Y3lMyCWrmXC7+aVDiLRyexTeCOf0ioWcnMO/XMD0NmVbrG6XANdFpNc+uO06B6gzIksJ0xWYJy8vdxB4DlKoVp+CVn3WxSe4tpomSUIXjrhTqNPFiPQikco6dCYHEYIYPNWZuPX0MIRcLrgAJMcPDkOzfoL5KgGQA4kprPEewaCAQBLb0f0W4ew8xKYFUW7pT1THHUGRkt//F1m/nDkHYDBydPfZ8wpjOa9s+Hb0WmO/tvadOdNGsPRQHr69WCwf+qxcTd5F/ZufFrSBmASZU5YkzXevoVuSawGJpPC2vnxdC9Nuxzd7/0tO45hgXM+HL3ZupOB6unQLKVdCydxoFK9fHE26F67gh/+sJPB4NitwchqMIAPEjR5iq5n9r/X85or5Uq3I5/gwn0XHOm1HjGO9JyUEmoFZsikJKzcIuDnyXR7GNILjC6e2OGRmxWfoL4Cl8pE0Qw/q3vaAVxl/41oOVNUHR11SDMOVgF/Behr9IhKcqbR+ik+O7Gr/KCC8Kc8hCGsIKbT3ZHDyacwhbcplaX2zkKuJEfgeiyFcPIyOKTvPx4y32yaCI7hNHtx19tF2Jp8TAS9JSv9QUzrN1QUmxB1+ZnzyoOQVgrbZKJN0L0D7gIzDbSyRmDwvDNO+mKcmeEmglrYO9TX24sKVVDIFSq5LYNH4rS+XJWe27FyQPHexADCCqOBL/U1jA1D+awtwDtAtW6Db2J9ACk04X01tlvXonXB/H8c+n5oTP1CUfA3hwhnFxQPOZyEc0bxli29XDi6cS8f4eLgK0clyrl8AfZeEr7x9yXAweqQZr6/VKNQznOpnboSEOz7tZka988I5fr+7Y//dVcswb3EC3Z2JUT22PXcXUdpPBo6UhgP8F1awQTHNoi3aRPeQlCBvCylkYUsSgxrBZfySyN7FCk/mVRYhDy43ysGipeEcXx ++KRhNw+ aWptJRN2eq1T8rNOuAAYFZh0ve2gYxaTpk6xSVmgzOBfktz+GF8ViimVhs27F8PWAndE1Y1+vF/yer5SeEsb+DgDeuv9j3BfMeVjkO2PI3eI1YMoYHNBp1L5VWBgdZrWkT95YPwoXWLJFTD+Q2rKT3BzmHRcwm2mJ9oZvq2m5RFp+xSui/DWN+0bvqTj7PRqX6kxn2bxr9v48TcxPWoUtym82no58gBEL5rYOHLqkPoV98nZHl+Y9DfyhN2kZ4Lo4P5X1Kz1TTzyTSM33492KKHuLMTKXXCG2/D6t15jZ+EeZJVjWNZUQ621plKfjc5kD3G77ppGJwPYMf3G+6FYnLdo+CfvEtgptzgUNhvxlyrZvWA8XbzufGwyggdgoastxnQYOorLsM+nLNx08Qk2qPqyx1TKzghUfY/79lL5rkK6RiX0zYTgmg3e1FQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/12/17 23:55, Johannes Weiner 写道: > Hello Yangge, > > On Tue, Dec 17, 2024 at 07:46:44PM +0800, yangge1116@126.com wrote: >> From: yangge >> >> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags >> in __compaction_suitable()") allow compaction to proceed when free >> pages required for compaction reside in the CMA pageblocks, it's >> possible that __compaction_suitable() always returns true, and in >> some cases, it's not acceptable. >> >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >> of memory. I have configured 16GB of CMA memory on each NUMA node, >> and starting a 32GB virtual machine with device passthrough is >> extremely slow, taking almost an hour. >> >> During the start-up of the virtual machine, it will call >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >> Long term GUP cannot allocate memory from CMA area, so a maximum >> of 16 GB of no-CMA memory on a NUMA node can be used as virtual >> machine memory. Since there is 16G of free CMA memory on the NUMA >> node, watermark for order-0 always be met for compaction, so >> __compaction_suitable() always returns true, even if the node is >> unable to allocate non-CMA memory for the virtual machine. >> >> For costly allocations, because __compaction_suitable() always >> returns true, __alloc_pages_slowpath() can't exit at the appropriate >> place, resulting in excessively long virtual machine startup times. >> Call trace: >> __alloc_pages_slowpath >> if (compact_result == COMPACT_SKIPPED || >> compact_result == COMPACT_DEFERRED) >> goto nopage; // should exit __alloc_pages_slowpath() from here >> >> Other unmovable alloctions, like dma_buf, which can be large in a >> Linux system, are also unable to allocate memory from CMA, and these >> allocations suffer from the same problems described above. In order >> to quickly fall back to remote node, we should remove ALLOC_CMA both >> in __compaction_suitable() and __isolate_free_page() for unmovable >> alloctions. After this fix, starting a 32GB virtual machine with >> device passthrough takes only a few seconds. > > The symptom is obviously bad, but I don't understand this fix. > > The reason we do ALLOC_CMA is that, even for unmovable allocations, > you can create space in non-CMA space by moving migratable pages over > to CMA space. This is not a property we want to lose. But I also don't > see how it would interfere with your scenario. The __alloc_pages_slowpath() function was originally intended to exit at place 1, but due to __compaction_suitable() always returning true, it results in __alloc_pages_slowpath() exiting at place 2 instead. This ultimately leads to a significantly longer execution time for __alloc_pages_slowpath(). Call trace: __alloc_pages_slowpath if (compact_result == COMPACT_SKIPPED || compact_result == COMPACT_DEFERRED) goto nopage; // place 1 __alloc_pages_direct_reclaim() // Reclaim is very expensive __alloc_pages_direct_compact() if (gfp_mask & __GFP_NORETRY) goto nopage; // place 2 Every time memory allocation goes through the above slower process, it ultimately leads to significantly longer virtual machine startup times. > > There is the compaction_suitable() check in should_compact_retry(), > but that only applies when COMPACT_SKIPPED. IOW, it should only happen > when compaction_suitable() just now returned false. IOW, a race > condition. Which is why it's also not subject to limited retries. > > What's the exact condition that traps the allocator inside the loop? The should_compact_retry() function was not executed, and the slow here was mainly due to the execution of __alloc_pages_direct_reclaim().