From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 797ECE77188 for ; Wed, 8 Jan 2025 08:36:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C43EA6B00A7; Wed, 8 Jan 2025 03:36:34 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id BF36E6B00A9; Wed, 8 Jan 2025 03:36:34 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ABA516B00AB; Wed, 8 Jan 2025 03:36:34 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 8C8586B00A7 for ; Wed, 8 Jan 2025 03:36:34 -0500 (EST) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3A696C1027 for ; Wed, 8 Jan 2025 08:36:34 +0000 (UTC) X-FDA: 82983628308.26.42224FE Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.8]) by imf10.hostedemail.com (Postfix) with ESMTP id 9B47AC002F for ; Wed, 8 Jan 2025 08:36:30 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Wb9gxMij; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf10.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1736325391; a=rsa-sha256; cv=none; b=8YlJpxsjwg6QNfatDDXqN1aJ0MLTfD1qR9YI5C0TuY9jkZf0zxt7h9vk5a+czOyT6xCQIf pj8hNTg3OMqpJG9K4pWqd5TPmmUWa5lBj2IVe85iRhldLnCe8KPOvE5tbWeSUkFnnZyFmS DeXLwWWHJ0AuhKJKf7MRLFxpXsBGIlQ= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Wb9gxMij; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf10.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1736325391; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l2/5Dt1bINDIBXXfQ6MMwdwckdqbsFeMj4hTlS6F0Dk=; b=y09oOczkKjRNFnbstYLWv8vc4WKOd6JB4kSjeqcmM9xZ6WlQ3+Ss1T9Ao4RXD8841PtCQO oekB3471oBd9W2wRprJhJbplJlCM/FLlPJA1fb4eg6AWqtFPE70I02j3ix76XIJ6/H3KkI Q28CpPUPFCEzTxVH4bgxneg+ryJP704= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=l2/5Dt1bINDIBXXfQ6MMwdwckdqbsFeMj4hTlS6F0Dk=; b=Wb9gxMijQIdAh6NbGqb2V2eCYuU464B64J7L4H2IRdSSdqEK7r9eLY7nG5oqzC 9OZRLJ8zeCKFum8K2vFd9bBpQ6F6VUz+qg56+rVcB2pWwstoxgtWsVm58Tt5+4xt lj8LqB/r0TF7G2aFCjMLIFddNOrFakWeLsJADDk4YzJHc= Received: from [172.19.20.199] (unknown []) by gzga-smtp-mtada-g1-3 (Coremail) with SMTP id _____wD3J1DPOH5nYlyYAw--.51883S2; Wed, 08 Jan 2025 16:35:27 +0800 (CST) Message-ID: <32d4b46a-cf7c-4fd2-a86f-7e82b9b1a5e3@126.com> Date: Wed, 8 Jan 2025 16:35:27 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: compaction: skip memory compaction when there are not enough migratable pages To: Baolin Wang , akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, hannes@cmpxchg.org, liuzixing@hygon.cn References: <1735981122-2085-1-git-send-email-yangge1116@126.com> <2889f0bf-b0ae-4f1a-b91c-fb4b59eb2d97@126.com> <180269be-f344-49e8-86da-23dda0bb31a0@linux.alibaba.com> From: Ge Yang In-Reply-To: <180269be-f344-49e8-86da-23dda0bb31a0@linux.alibaba.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wD3J1DPOH5nYlyYAw--.51883S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxCFy8Ww15Aw48CF1fZF4DXFb_yoWrWrW3pr y8G3ZxKr4DJr9Fkr1Iq3Z0vFnxtw4fGFWUXr9Fyr97uasI9F1Syr47t34UC3W8Zr1jqF4j vr1Du3sxuan8Za7anT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jYGQhUUUUU= X-Originating-IP: [112.64.138.194] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiOg3OG2d+M7lRZgAAsG X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 9B47AC002F X-Stat-Signature: adx9q6koippgkt1n3pfps7i31tmdzrdf X-HE-Tag: 1736325390-468687 X-HE-Meta: U2FsdGVkX19opu92dlbbFX9oRPQQmM485Iev+8zDrt+UJjqjIHjwrY0/LmDV6CzYUo9GwmEKquEATHcj6ubFaJnMn8ikLAtNrT9kw7lhF7z6nT8mhO6mcKrFvUA+SpOF8oD35FIoMCR682KfW2AHJ4fC5Hg2wVh00b3PyfaDEsi2P+ciEOoTO5kIupCI6H72GKQPcdw3Ha2DHBnnbGB1dAE1/w3Dn4pIle9Ro/qgj+UvvXBHYA0ckOAiBnQlYUXesF1Dc0jqhndVDA8vcTun/qNpzeHxIgcQuefH3lyANAGOGTvg7kgTFg+KYwPMEipIIg17iXHmdZwUmP6dnrX66q4uythfdnaj9xJeLhi/S9ARSdd5OTG4K+UQjZd3fp2Y+r5n0oGEqEj43+ErtBnBs/t6ImDgJ/FuFPPxLeLhmUYnYKpXrtfdxf8gn/FCMt9WlQs/UKWYhaD7Vn56I/9wFciJCRN2T+gCEk1m/26C9pmhrudFPvN2ehDCFhTAwLk0T7odv6tpJihccOJjK1N9OqW9Yym8/AWAEc8JO3KsImDvQWEne9dInKX5FDEgj0o14wdDrb2S8g1Vis6m++70r8dzR/r3alcnkR6+Lyg26Rb3Ax5ZnU+Pnoe4sPLLMS14RY6T1IwugnSh7a766FINhSBXdE2juApeLsD3RhI/ufzxKGGZ0qEjialYjOOLGsVNxj4TMEUBMpuGnB8dGFIRobpu/kuMTs3WUdRyaQcXTo5KslOnekiaiw3l7LHw8AeekacxsHB172eDLokLjI9ncy7cI4gNXbkKTID9vrM9NsydCFihlJCPIxq0JlQN2LwzFpQG+o9CDhs3+9AkM4pPmB9V30fOgbyT4GZYqRTGoHP+J0/q472EmFAFXzoti8KFEnAP46zOdcOfugTL/nFQJslgJz8yYPK/hsQjJYMFYHbbIBH1rju3mBmSuewZMekFcLGgsOaHxcZ0h0BY1g0 u3fMcXVU QbMogluvkCmAa1b65fHerA4Nknfy1Lb9MJvjrgNNq1F3ixRaCRnsJWKp28MYvxuLHKS8LKFH1haP9WvyJNq7Jz9insdHOBvyRxZDWVwnbQySbUd9/URAwgpKc/UVTnivGoRF+FLYbXiCO5o2sU+a2FUrizYMzGMc2N+CF4O9S7gCKfok0Vy8t/pQu+b1WS/ca40L2CjJ6/1hxhuMo8ym5QFS91ShHAAj68alt0LlsdLNoCd1cb8QVGBIaNrgUAismGXSCkecwxRsPG2toIlgzCpxQVELTXPltiy+mS3D8rwHIzdWtwsA6hn4MD6JuH+96R58FI7L7CVps2WdlWaN+hKvasu1vBMaeKfNwS1UYcSJcC4KZ7/coFttT5XIqo4QZCiA3JzG+4gOu2Hk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/1/8 10:50, Baolin Wang 写道: > > > On 2025/1/6 16:49, Ge Yang wrote: >> >> >> 在 2025/1/6 16:12, Baolin Wang 写道: >>> >>> >>> On 2025/1/4 16:58, yangge1116@126.com wrote: >>>> From: yangge >>>> >>>> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >>>> of memory. I have configured 16GB of CMA memory on each NUMA node, >>>> and starting a 32GB virtual machine with device passthrough is >>>> extremely slow, taking almost an hour. >>>> >>>> During the start-up of the virtual machine, it will call >>>> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >>>> Long term GUP cannot allocate memory from CMA area, so a maximum of >>>> 16 GB of no-CMA memory on a NUMA node can be used as virtual machine >>>> memory. There is 16GB of free CMA memory on a NUMA node, which is >>>> sufficient to pass the order-0 watermark check, causing the >>>> __compaction_suitable() function to  consistently return true. >>>> However, if there aren't enough migratable pages available, performing >>>> memory compaction is also meaningless. Besides checking whether >>>> the order-0 watermark is met, __compaction_suitable() also needs >>>> to determine whether there are sufficient migratable pages available >>>> for memory compaction. >>>> >>>> For costly allocations, because __compaction_suitable() always >>>> returns true, __alloc_pages_slowpath() can't exit at the appropriate >>>> place, resulting in excessively long virtual machine startup times. >>>> Call trace: >>>> __alloc_pages_slowpath >>>>      if (compact_result == COMPACT_SKIPPED || >>>>          compact_result == COMPACT_DEFERRED) >>>>          goto nopage; // should exit __alloc_pages_slowpath() from here >>>> >>>> When the 16G of non-CMA memory on a single node is exhausted, we will >>>> fallback to allocating memory on other nodes. In order to quickly >>>> fallback to remote nodes, we should skip memory compaction when >>>> migratable pages are insufficient. After this fix, it only takes a >>>> few tens of seconds to start a 32GB virtual machine with device >>>> passthrough functionality. >>>> >>>> Signed-off-by: yangge >>>> --- >>>>   mm/compaction.c | 19 +++++++++++++++++++ >>>>   1 file changed, 19 insertions(+) >>>> >>>> diff --git a/mm/compaction.c b/mm/compaction.c >>>> index 07bd227..1c469b3 100644 >>>> --- a/mm/compaction.c >>>> +++ b/mm/compaction.c >>>> @@ -2383,7 +2383,26 @@ static bool __compaction_suitable(struct zone >>>> *zone, int order, >>>>                     int highest_zoneidx, >>>>                     unsigned long wmark_target) >>>>   { >>>> +    pg_data_t *pgdat = zone->zone_pgdat; >>>> +    unsigned long sum, nr_pinned; >>>>       unsigned long watermark; >>>> + >>>> +    sum = node_page_state(pgdat, NR_INACTIVE_FILE) + >>>> +        node_page_state(pgdat, NR_INACTIVE_ANON) + >>>> +        node_page_state(pgdat, NR_ACTIVE_FILE) + >>>> +        node_page_state(pgdat, NR_ACTIVE_ANON); >>>> + >>>> +    nr_pinned = node_page_state(pgdat, NR_FOLL_PIN_ACQUIRED) - >>>> +        node_page_state(pgdat, NR_FOLL_PIN_RELEASED); >>>> + >>>> +    /* >>>> +     * Gup-pinned pages are non-migratable. After subtracting these >>>> pages, >>>> +     * we need to check if the remaining pages are sufficient for >>>> memory >>>> +     * compaction. >>>> +     */ >>>> +    if ((sum - nr_pinned) < (1 << order)) >>>> +        return false; >>>> + >>> >>> IMO, using the node's statistics to determine whether the zone is >>> suitable for compaction doesn't make sense. It is possible that even >>> though the normal zone has long-term pinned pages, the movable zone >>> can still be suitable for compaction. >> If all the memory used on a node is pinned, then this memory cannot be >> migrated anymore, and memory compaction operations would not succeed. >> I haven't used movable zone before, can you explain why memory >> compaction is still necessary? Thank you. > > Please consider unevictable folios that are not in the active/inactive > file/anon LRU lists, yet can still be migrated. Ok, thanks.