From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 844BFE7717D for ; Fri, 13 Dec 2024 08:23:26 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E3D9B6B0083; Fri, 13 Dec 2024 03:23:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DED186B0085; Fri, 13 Dec 2024 03:23:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CB5256B0088; Fri, 13 Dec 2024 03:23:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id ABF916B0083 for ; Fri, 13 Dec 2024 03:23:25 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 56F898087C for ; Fri, 13 Dec 2024 08:23:25 +0000 (UTC) X-FDA: 82889245404.07.D237C6C Received: from out30-97.freemail.mail.aliyun.com (out30-97.freemail.mail.aliyun.com [115.124.30.97]) by imf10.hostedemail.com (Postfix) with ESMTP id 9E539C0010 for ; Fri, 13 Dec 2024 08:23:11 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=TZbJZpy8; spf=pass (imf10.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.97 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734078186; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XRhy5zjrYxlzCaTtoo4U77UzxfNTyWLiF59UvfYjO+I=; b=uwZwx1G+OhHv3uTlO5CzJoCGaPlOX83krrbphBBcfDKqykrQGjZqzfpf/PIrwinnE/vx+1 hgseDCPOkLZE7Ra3EYNT6ICRqmfhq9Gb00LbpcV2iNwgywIbV+tUBmfNDNDHnzp7Nqkgcq UuPnU7VWuWoYWxEi0zXpB0GZrij7Mas= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=TZbJZpy8; spf=pass (imf10.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.97 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734078186; a=rsa-sha256; cv=none; b=tzl2l9DDgGxwEZqsGpQ1qRmlFvrMJFdoSG08N3+pPaaosIJ043tCiqlRfezCfj/JtILu2V X5YJ5aqWy1+s7P8UG9sz+vqpCsukIG8nFSY232x7eyCXzWZZ0bLrAq1oLR5B8bnzgF4ocq ZGd2G7uDpEsiZNekGYfGS1EgtYuk4xs= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1734078199; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=XRhy5zjrYxlzCaTtoo4U77UzxfNTyWLiF59UvfYjO+I=; b=TZbJZpy8JuCJY8duAK19Wqqyr0LfHlwIp2y9MBZZEq0vc2648szW2pzXHH0c6E+3uYZmbrT9duPP5gDHtw77hginbYKT9WYILluoc85tEJsCH96plMWr0uZJXNEfyA1zv3AzNEFIe7F1OuMVnMJZwItP6RN5F6T00sXTroSUwGo= Received: from 30.74.144.152(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WLOR3W7_1734078197 cluster:ay36) by smtp.aliyun-inc.com; Fri, 13 Dec 2024 16:23:18 +0800 Message-ID: Date: Fri, 13 Dec 2024 16:23:16 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm, compaction: don't use ALLOC_CMA in long term GUP flow To: yangge1116@126.com, akpm@linux-foundation.org Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, vbabka@suse.cz, liuzixing@hygon.cn References: <1734075432-14131-1-git-send-email-yangge1116@126.com> From: Baolin Wang In-Reply-To: <1734075432-14131-1-git-send-email-yangge1116@126.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam05 X-Stat-Signature: deh67is6mmhd9mjs1qs7sf3q8sdciyj4 X-Rspamd-Queue-Id: 9E539C0010 X-Rspam-User: X-HE-Tag: 1734078191-429525 X-HE-Meta: U2FsdGVkX1/T8wBI0WRTXWNsVUnAfdhb4+NnnhAdrDLd0yaCgZD5kQWDgbgSXjpMKZNUn2l69WRlLldMzPXi5yEyKhNLxgT5jszIlN4KX6woxFMNAB+feQ3hTyqS0jI2LLsxT2++JCLt6qgJR4dCoKwE2N0tWq49SrrmPZZVJitl6jH4prg69wLmeVy2ykqoC13uCgg1iQpTVRRJX7FuyJQlb3yD3XlCE1qvbWQ5B+GOfYJoTSPdfR8XvyKKAoa7HbOTgs8Du8lQM97mcIPb6CDDVolfOUDJTTLbcmbTG+DWOwYIJP8rsY4QGamTA4zpReOFypwRQh64wC2kmZQMKsCWi6RM4E1EXcKAaiPM27Jns5wXTDQQJByrZ8E2l5et03mEzgGvCRLYQAJx5MdJO2tLY0WhJLk34ypEKeIWMWX81JP0dR9PrSnbhChO83felZ3m8Rfi8B+0qXvXBlTsnuI683eTVVN5OG0/vWxsWv/LCFugTZ8cYKV8mahZQFg3RXjpMxLO+DsNiJpbioAXjfNgZNoet2opa3kPu3uPG2BShd/D4mvwzGryXH8hYz+FM7s3lsGZJTosNoflxw2DrwDS8Uj7jVh1aXkdOm9CKqvuTw5Uj1Mm0JJFF1cbVPAYFab85oxuhpwSe22J/A5REfi7kXN+bPxMHC3XUy7rPL9Sny8fXGLV4ufP2GEbEHtukyKHAChHlHUbihOJwekgAt32tMFxtCPb4sTokFI/fsbln3d/+FW13XkBmh4wVaxgAhlI3GF8BdnPYg2NeSVdhCds0j1DfNcSuUO3vkjMArkVYge7UqwWMp9gEMr6AUUlsggkC7NHxSTU+rtNfHFh+rYvBI+douDPRnjtBxt4tfWlYr4ExDTi6E/595PGAKMmgRFTGHUYYnXhrhy8FLWMmApkY+FCQ2rnBfJQSC7KluRvUvETc1lAt8pIsUZTp9qEV5fkv1EqcvI+aEGVaDz iJ/sma3T uFerWttrUZH3EGXZfih7T7bdhiK4JvxgtLcihQ0pYxCWBsm5mLapmqJU1t1KLqY1+EfbxpTjKmuhw7E563Pt0usATFtmwx4WP7ChOz0UlJ0Gy12midSdkf/H5Ihz9g4zTg81wtDizbl6xAb0v1ucXrJFex+83FrqHiMA+nC2YTz+SAhcuS6GGvJbH/yliYosSQRdybJr++cgri9dIT7oQKyZXNqCd67lGY9nVRMz/Pa8SsLYBJGCh3ndSKPEB7dbe2cAJU61vvbkdBgo23uV5JPQ9LPpvbV9W4qOjILJ7vzGQuSCEpvKLyjU94/qeNh+6sDFFwQtmvAJZ+Y87wM6SWX2mjJJyRF3tIu5GLqViKKbwxAY8ipdj1hBoKLd44F4SqDWshJkYoSZB0BXsq6nfHViVOw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/12/13 15:37, yangge1116@126.com wrote: > From: yangge > > Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags > in __compaction_suitable()") allow compaction to proceed when free > pages required for compaction reside in the CMA pageblocks, it's > possible that __compaction_suitable() always returns true, and in > some cases, it's not acceptable. > > There are 4 NUMA nodes on my machine, and each NUMA node has 32GB > of memory. I have configured 16GB of CMA memory on each NUMA node, > and starting a 32GB virtual machine with device passthrough is > extremely slow, taking almost an hour. > > During the start-up of the virtual machine, it will call > pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. > Long term GUP cannot allocate memory from CMA area, so a maximum > of 16 GB of no-CMA memory on a NUMA node can be used as virtual > machine memory. Since there is 16G of free CMA memory on the NUMA > node, watermark for order-0 always be met for compaction, so > __compaction_suitable() always returns true, even if the node is > unable to allocate non-CMA memory for the virtual machine. > > For costly allocations, because __compaction_suitable() always > returns true, __alloc_pages_slowpath() can't exit at the appropriate > place, resulting in excessively long virtual machine startup times. > Call trace: > __alloc_pages_slowpath > if (compact_result == COMPACT_SKIPPED || > compact_result == COMPACT_DEFERRED) > goto nopage; // should exit __alloc_pages_slowpath() from here > > To sum up, during long term GUP flow, we should remove ALLOC_CMA > both in __compaction_suitable() and __isolate_free_page(). > > Fixes: 984fdba6a32e ("mm, compaction: use proper alloc_flags in __compaction_suitable()") > Cc: > Signed-off-by: yangge > --- > mm/compaction.c | 8 +++++--- > mm/page_alloc.c | 4 +++- > 2 files changed, 8 insertions(+), 4 deletions(-) > > diff --git a/mm/compaction.c b/mm/compaction.c > index 07bd227..044c2247 100644 > --- a/mm/compaction.c > +++ b/mm/compaction.c > @@ -2384,6 +2384,7 @@ static bool __compaction_suitable(struct zone *zone, int order, > unsigned long wmark_target) > { > unsigned long watermark; > + bool pin; > /* > * Watermarks for order-0 must be met for compaction to be able to > * isolate free pages for migration targets. This means that the > @@ -2395,14 +2396,15 @@ static bool __compaction_suitable(struct zone *zone, int order, > * even if compaction succeeds. > * For costly orders, we require low watermark instead of min for > * compaction to proceed to increase its chances. > - * ALLOC_CMA is used, as pages in CMA pageblocks are considered > - * suitable migration targets > + * In addition to long term GUP flow, ALLOC_CMA is used, as pages in > + * CMA pageblocks are considered suitable migration targets > */ > watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? > low_wmark_pages(zone) : min_wmark_pages(zone); > watermark += compact_gap(order); > + pin = !!(current->flags & PF_MEMALLOC_PIN); > return __zone_watermark_ok(zone, 0, watermark, highest_zoneidx, > - ALLOC_CMA, wmark_target); > + pin ? 0 : ALLOC_CMA, wmark_target); > } Seems a little hack for me. Using the 'cc->alloc_flags' passed from the caller to determin if ‘ALLOC_CMA’ is needed looks more reasonable to me. > > /* > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index dde19db..9a5dfda 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -2813,6 +2813,7 @@ int __isolate_free_page(struct page *page, unsigned int order) > { > struct zone *zone = page_zone(page); > int mt = get_pageblock_migratetype(page); > + bool pin; > > if (!is_migrate_isolate(mt)) { > unsigned long watermark; > @@ -2823,7 +2824,8 @@ int __isolate_free_page(struct page *page, unsigned int order) > * exists. > */ > watermark = zone->_watermark[WMARK_MIN] + (1UL << order); > - if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) > + pin = !!(current->flags & PF_MEMALLOC_PIN); > + if (!zone_watermark_ok(zone, 0, watermark, 0, pin ? 0 : ALLOC_CMA)) > return 0; > } >