From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2D292E7717D for ; Fri, 13 Dec 2024 08:44:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B10086B0089; Fri, 13 Dec 2024 03:44:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id ABFF66B008A; Fri, 13 Dec 2024 03:44:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9AE6D6B008C; Fri, 13 Dec 2024 03:44:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7D9516B0089 for ; Fri, 13 Dec 2024 03:44:07 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 359EA140B59 for ; Fri, 13 Dec 2024 08:44:07 +0000 (UTC) X-FDA: 82889297988.18.9348DA6 Received: from m16.mail.126.com (m16.mail.126.com [117.135.210.6]) by imf20.hostedemail.com (Postfix) with ESMTP id 771501C0004 for ; Fri, 13 Dec 2024 08:43:37 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Br7TXAJ6; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf20.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.6 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734079433; a=rsa-sha256; cv=none; b=29ZoFx2+HfkNnFBfoD8r4Y9VKknlAKwXXjmcHfurSfMITXZ0wLU9N4+CMkt4grh0nKTZzL m1tboUT1HZcBVBpFRfhpjWJzW9teit9JmKdP03C6+oY53rH+0vTjUL/+Bvbrs2Z6zbyf4M nMPyJw1raDcH71Okfqf/M0kJ/tS5k4s= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=Br7TXAJ6; dmarc=pass (policy=none) header.from=126.com; spf=pass (imf20.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.6 as permitted sender) smtp.mailfrom=yangge1116@126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734079433; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=21o0XTtNuQcRS+Oc7dde/R2xtmPgrdOT4SD3ZXbhO8o=; b=tfNgYtZ5dw3OQg4/IQAscWEKiOucHEoDQ//EuetFoSEWy87q9POmnGRFW5KfJHrVIRwL0M bxTtch8cPFGIlZFvyNutqCmwGynqXgP5VDV08oVZ7vZoF/QNijf7OlDmJsFXhhPvJ/X1M7 EB+T85k2p1CRTTHEOoACDKu8UP514Y0= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=21o0XTtNuQcRS+Oc7dde/R2xtmPgrdOT4SD3ZXbhO8o=; b=Br7TXAJ63rL76NDlkqboL0MkTRngkqIr1kxmhNprvIh7mQjhnvFHSB4iqqIpaX nWqqil1Iz8pmX/hEtbaScok5ERRiZXDocHtUaStDQipGIDtUun9QBu35eWlBIetr RS46bUpATIFy+E+K7vwtzFKGz/36diQzdU2qRlXDGPvfk= Received: from [172.21.22.210] (unknown []) by gzga-smtp-mtada-g1-1 (Coremail) with SMTP id _____wD3t2bL81tnFV3LAA--.50419S2; Fri, 13 Dec 2024 16:43:55 +0800 (CST) Message-ID: <3651bce1-f84b-4537-bc57-ef6d7460749f@126.com> Date: Fri, 13 Dec 2024 16:43:55 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm, compaction: don't use ALLOC_CMA in long term GUP flow To: Baolin Wang , akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, vbabka@suse.cz, liuzixing@hygon.cn References: <1734075432-14131-1-git-send-email-yangge1116@126.com> From: Ge Yang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wD3t2bL81tnFV3LAA--.50419S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxZw13KrWUXF47CFW7JFW5Wrg_yoWrZr45pF 1xA3WDAws8XFy5Cr48ta1v9F4Yvw4xKF45GryIqw18Zw1akF9a9F1kKry7AFWUur1Ykw4Y qFWq9asrZFsxZaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07U6KZXUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiOh20G2db5cfQGwACsf X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 771501C0004 X-Stat-Signature: c9zztrgreeujwratsiaw3n3ppj866kat X-HE-Tag: 1734079417-28465 X-HE-Meta: U2FsdGVkX18AyTnh5fwSwAglDRsun1JpkKWnXnNe9LNT0jb/Dyrzrx95jsp2qld0wc5lSUKnbaGwQkYcbwJMq3lS988I3Mz+pfPZ6PyAvTuN3vnIS4P/+WfRXEgXR+2W9OC6T/cDTdn+t5wIoCDShQxNtqfrXdr3Yigrk+CQuRuKxccQ4TqSF+d7O/Tt7Ow4ggPdPY1JsbKpf008fnJ8hwm8flIrkoQrxY3VbmKg55SxnqLVxmgc6lgYBpuhqAnC4FFMG0Itr9VZ8aiS3MpbdllKEZH/OulXCkrfBiVQNqlSmj7ZcVp1CsdRxpcG3VPJm7oTmwIsjyHezriPjxNVbvbWAoUy9gCCUo+WdjTUv3mY16+9IJEh9eqmGv1bulVTvUpyEsDmCNhCTXvVPhRyfC4ZjRGLR/K64INY/0fIoqo8h8ANSf/5KccB9Y37hmsTwgxu5AIaQcYTtIb7g/XNYQY8GflfXXCmedLOIR9PjMaic+udB73b8sfJ243IuaZy6R+s67/55CyyliL3v3NRStqtAxGaFI6MZoELaYJPS3d4Kx5tt3c6J3zcI2Sjic0Qxzw3mxlAtXGWxQT20qlOZ6QLNwRgmE3kJYQ3lYeyt0ICQUO4euX9cV17TiPgBIw64Sy6AM1LVgJMrYt6yaUvPftaKpb/vHeTccjPPiRn8VKfcynVrRGdVfQfNHezFX7OTvI2rzP5fY9SEXLu28DwEtdCAignQ9iIQaI+vWmm6pGopoymWt6CJVBfhM61ilZn97X/1saAKJnwCUWRCSti2WLTDm8TOoQJgPKulxPihac6ySIEGTI/WS+8DIyoGZb02Cu3pevDP6o6790EaARvCNK86fY8gOIMKuCVA2+jKm8FiKhu0Qdgj1i2JLEoqktJvcrSMkMBgXw1pjumiIR2sb8F1Od+4KixjrMgu9+/CaK4EyHSqtWFSaPZM9Vma8IGDebrhimb/TfH0G0PnRY k1v41JHu OqfSnOVdfDluu0HjB7V6R2IZlYjjGCioPlp7r3WmbNgQZ5LmZSaTe5lAM1lF3IMabTJOEFRH2jNnTv22U42j8zIpPZtDStVl+hrK/r16CSG20+zFK+IAd4uDWu1ovWXHwDLg3bC8udJTfl5eJSHT1RJsuIPlhFJFCgitbmHMA+V0MlGCpjK1qfqIg6uKQ//QylmAWHXIlh+rV4n3WOZP+u8pEHBF8f8XN+bDg3KTIx1H8pSHlTFu6lOJuiwnICziRIu9GerJhBUBQbj+ZUA0vyo7VViHFTafoAmn/4VQ5QHjZgvOuOWcidTpSR5K6JfHX5VrOu+u6FE71jnAKtRyg/S9vRfR8lFPcDpY2MHB2z9YxMYGANnQhdXZrjKsrs942rdRvmkJnoDorslQ1VSLSuvFr11T9oFtEzwqxXbs/3T8tpvN/euSKzSXtQlvwKxBP9kX3Hu4Xz4b3TMA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000607, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/12/13 16:23, Baolin Wang 写道: > > > On 2024/12/13 15:37, yangge1116@126.com wrote: >> From: yangge >> >> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags >> in __compaction_suitable()") allow compaction to proceed when free >> pages required for compaction reside in the CMA pageblocks, it's >> possible that __compaction_suitable() always returns true, and in >> some cases, it's not acceptable. >> >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >> of memory. I have configured 16GB of CMA memory on each NUMA node, >> and starting a 32GB virtual machine with device passthrough is >> extremely slow, taking almost an hour. >> >> During the start-up of the virtual machine, it will call >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >> Long term GUP cannot allocate memory from CMA area, so a maximum >> of 16 GB of no-CMA memory on a NUMA node can be used as virtual >> machine memory. Since there is 16G of free CMA memory on the NUMA >> node, watermark for order-0 always be met for compaction, so >> __compaction_suitable() always returns true, even if the node is >> unable to allocate non-CMA memory for the virtual machine. >> >> For costly allocations, because __compaction_suitable() always >> returns true, __alloc_pages_slowpath() can't exit at the appropriate >> place, resulting in excessively long virtual machine startup times. >> Call trace: >> __alloc_pages_slowpath >>      if (compact_result == COMPACT_SKIPPED || >>          compact_result == COMPACT_DEFERRED) >>          goto nopage; // should exit __alloc_pages_slowpath() from here >> >> To sum up, during long term GUP flow, we should remove ALLOC_CMA >> both in __compaction_suitable() and __isolate_free_page(). >> >> Fixes: 984fdba6a32e ("mm, compaction: use proper alloc_flags in >> __compaction_suitable()") >> Cc: >> Signed-off-by: yangge >> --- >>   mm/compaction.c | 8 +++++--- >>   mm/page_alloc.c | 4 +++- >>   2 files changed, 8 insertions(+), 4 deletions(-) >> >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 07bd227..044c2247 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -2384,6 +2384,7 @@ static bool __compaction_suitable(struct zone >> *zone, int order, >>                     unsigned long wmark_target) >>   { >>       unsigned long watermark; >> +    bool pin; >>       /* >>        * Watermarks for order-0 must be met for compaction to be able to >>        * isolate free pages for migration targets. This means that the >> @@ -2395,14 +2396,15 @@ static bool __compaction_suitable(struct zone >> *zone, int order, >>        * even if compaction succeeds. >>        * For costly orders, we require low watermark instead of min for >>        * compaction to proceed to increase its chances. >> -     * ALLOC_CMA is used, as pages in CMA pageblocks are considered >> -     * suitable migration targets >> +     * In addition to long term GUP flow, ALLOC_CMA is used, as pages in >> +     * CMA pageblocks are considered suitable migration targets >>        */ >>       watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? >>                   low_wmark_pages(zone) : min_wmark_pages(zone); >>       watermark += compact_gap(order); >> +    pin = !!(current->flags & PF_MEMALLOC_PIN); >>       return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx, >> -                   ALLOC_CMA, wmark_target); >> +                   pin ? 0 : ALLOC_CMA, wmark_target); >>   } > > Seems a little hack for me. Using the 'cc->alloc_flags' passed from the > caller to determin if ‘ALLOC_CMA’ is needed looks more reasonable to me. Ok, thanks. > >>   /* >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index dde19db..9a5dfda 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2813,6 +2813,7 @@ int __isolate_free_page(struct page *page, >> unsigned int order) >>   { >>       struct zone *zone = page_zone(page); >>       int mt = get_pageblock_migratetype(page); >> +    bool pin; >>       if (!is_migrate_isolate(mt)) { >>           unsigned long watermark; >> @@ -2823,7 +2824,8 @@ int __isolate_free_page(struct page *page, >> unsigned int order) >>            * exists. >>            */ >>           watermark = zone->_watermark[WMARK_MIN] + (1UL << order); >> -        if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) >> +        pin = !!(current->flags & PF_MEMALLOC_PIN); >> +        if (!zone_watermark_ok(zone, 0, watermark, 0, pin ? 0 : >> ALLOC_CMA)) >>               return 0; >>       }