From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECA71E7716A for ; Tue, 17 Dec 2024 07:31:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 440076B0089; Tue, 17 Dec 2024 02:31:45 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3EFA56B00A3; Tue, 17 Dec 2024 02:31:45 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2911D6B009A; Tue, 17 Dec 2024 02:31:45 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 04D856B00A3 for ; Tue, 17 Dec 2024 02:31:44 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7CD6BAEAA1 for ; Tue, 17 Dec 2024 07:31:44 +0000 (UTC) X-FDA: 82903631034.14.5D92752 Received: from out30-131.freemail.mail.aliyun.com (out30-131.freemail.mail.aliyun.com [115.124.30.131]) by imf28.hostedemail.com (Postfix) with ESMTP id 68D09C0009 for ; Tue, 17 Dec 2024 07:31:08 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=rpVjO5J2; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf28.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734420688; a=rsa-sha256; cv=none; b=DiWKVSsWWAbVZ2K0liESXdFhHwb9oN8Oialxn9+k44pN3HZrovDd5livyFgF6Biv8d0SWA 0dLGq7tqadoxjMnWLneh1p4Uc8L6GvSjXE+G2+62mK1R/DZH6Ae/+YhO4LveicGwe8FdTt ECYBObiMDvH7rANi5i8C3mJ3Qs5T6Mg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=rpVjO5J2; dmarc=pass (policy=none) header.from=linux.alibaba.com; spf=pass (imf28.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 115.124.30.131 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734420688; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TkosoO+3ONVOSeeI8+WgoAPbCZNq1k7eOucDpvCkMms=; b=amsRGgcYdKRXzM8TH6sLP/+auX6Y527K/PFD8Lf6Bq9UXgVXWKJfS8pcYs6TGST6LQDyki liWVGc8SV8PIoBQ/WX/0Yvv7QUqAIibN8Wf3F/kU7jdortdnhU+Er/7GzXLhY7yodBHlTp 9EaoLDc3xmW9y1f7TT2RBJKYrU7C2KA= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1734420698; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=TkosoO+3ONVOSeeI8+WgoAPbCZNq1k7eOucDpvCkMms=; b=rpVjO5J29e6lkeqkdmU8iiu6w6py7L9uT/LzNLiGXA/sKdi9n3KxmcpnDC1R0oG8h8XoMEZmnFY9osvygjktiixSURtDcS+kcTy2v+4APCOA4bNIN5hY/Bns2ZTIdZM1noNXSDM9vssaM/WLIdh/rignu0OAI0mGEcN7N4zaB+U= Received: from 30.74.144.132(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WLhprS1_1734420696 cluster:ay36) by smtp.aliyun-inc.com; Tue, 17 Dec 2024 15:31:37 +0800 Message-ID: <03d09def-2509-4e87-ad14-cf616ac90908@linux.alibaba.com> Date: Tue, 17 Dec 2024 15:31:36 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V6] mm, compaction: don't use ALLOC_CMA in long term GUP flow To: Barry Song <21cnbao@gmail.com>, yangge1116@126.com Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, david@redhat.com, vbabka@suse.cz, liuzixing@hygon.cn References: <1734406405-25847-1-git-send-email-yangge1116@126.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 68D09C0009 X-Stat-Signature: icsumj1e19ypw7fixjpmd8hgimthwoad X-HE-Tag: 1734420668-98883 X-HE-Meta: U2FsdGVkX19kvH/5he1Uvym8M2NZXdERaYFu15cU9CMXL9P2kL+ZETVIOGRnwkW1ZQI+ndTnYQFIf7yd4dyuKQNRtAxmNwIIDzkkoIQ2fHIfgXVA6UWA1fegwCXZPpro70yjQY0KTexLfj98oLqtxFynxJrkeirDDZMaV+KXr503kbN+Pk6fHFfskLzNKelsuKnON5O/GKCcaKx8r5Ge3rrVvJj3G16somX/xV3OQIwhT/vQMSHr0+Ak2Rf53eT1yTfSCW/rJ592fZiVh3xj98iiKJRqYoKqw5T+2vaCRTrRqOpznk57DCyBdqub0RxmNELQaBvvmn6icYl5MMTvZkGA/TSY9CyacnhEHkIs80F/xWK4wI6X3SAxMI6zsl2uk9tOoBY7cx6/CbNIvUJ8ObNOtw85axPeIyTZZGvNntp5MF1ifUJnMjlUpKVSPLZVLmvXttZBQbejHKtqSeKBkrmKGdjxrTY9rXLMICTuXVXtXUwkDp/xbhYUx+1F5vVeuuI/nfs7vx2muqYkkjW84R6ycQmQxpTgszTKc+xzZ9laCHMIOeV+7bVys2HcmtrzO8N25JyNAdRjurORXkFDQeNnESJvA8g6oh+JzJ1liScIKxAEuTjptKwB6mWSLnGkW7PpjfyMrTq52i424U4A5vm7+OuVx9bUkEtsYW08Vgk4WXijth4GRKYSrwZK6q8gJ0TQeQkvyI5yxFCb6OXHG7l1ioIgQRro3aAM+ax+TBbffBkWzqp0ufq/UHGLmIFsyvAf6ht7boI1Fr186P2flQMKqvboAnmoxz97PPxxmOa+2XvjGyYFQnGcJmHEEoQx2AHh7ZzxnnwXKHftyMO+xie2OLMxYVmrnh6TtwAH3i+bVNeSqr8joY+TVQmG6ZL0ax4JjKce4ZgFOhhFp06r11w5xFZ0DSToCAlI7g78dWYgaJI8zJ1Hbl2lm0U6X12YayIqoH2On+ggye/CSKg DEJG9dPL qnpQOggBd5qLbYNgRBHtg/wBD6UXjdxTg95AED947sFh5PG9S2YgFeEAmS2cpxn7okkfNyKex/pPhedXH+m7uFcrHrEyz32OnXNSqpSO9WYdIoA4iIzvPskgUtE5GWLFpIJHMj2A5yxZXn7PhDo2l3C/r+rvP9WTj7MyA7OZs7uYNgf4dmOAb/+IeXZOjHEEaFpG3NmmPIfxGBtvUA1f8h+AgFEimh09EIIYfQhPimPm7K24CDNjO752oW9FLXjrnswxdZYCTIaEBMM/chbv5sTFMUPC6Gd3jFJaLuSOrgEWlWj+nT6SNuddQ8vGvf0Mrb2xCuGTY8tNksCrHOvDEaZ16KEwX+Amyvt7/QFOQzjOpkA9ud71y0nqO4uKkjMXJkZ5vr8ZfvuBDBXNojALEQhOSpUcfd8lil69TbRIainIvpwg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/12/17 14:14, Barry Song wrote: > On Tue, Dec 17, 2024 at 4:33 PM wrote: >> >> From: yangge >> >> Since commit 984fdba6a32e ("mm, compaction: use proper alloc_flags >> in __compaction_suitable()") allow compaction to proceed when free >> pages required for compaction reside in the CMA pageblocks, it's >> possible that __compaction_suitable() always returns true, and in >> some cases, it's not acceptable. >> >> There are 4 NUMA nodes on my machine, and each NUMA node has 32GB >> of memory. I have configured 16GB of CMA memory on each NUMA node, >> and starting a 32GB virtual machine with device passthrough is >> extremely slow, taking almost an hour. >> >> During the start-up of the virtual machine, it will call >> pin_user_pages_remote(..., FOLL_LONGTERM, ...) to allocate memory. >> Long term GUP cannot allocate memory from CMA area, so a maximum >> of 16 GB of no-CMA memory on a NUMA node can be used as virtual >> machine memory. Since there is 16G of free CMA memory on the NUMA > > Other unmovable allocations, like dma_buf, which can be large in a > Linux system, are > also unable to allocate memory from CMA. My question is whether the issue you > described applies to these allocations as well. > >> node, watermark for order-0 always be met for compaction, so >> __compaction_suitable() always returns true, even if the node is >> unable to allocate non-CMA memory for the virtual machine. >> >> For costly allocations, because __compaction_suitable() always >> returns true, __alloc_pages_slowpath() can't exit at the appropriate >> place, resulting in excessively long virtual machine startup times. >> Call trace: >> __alloc_pages_slowpath >> if (compact_result == COMPACT_SKIPPED || >> compact_result == COMPACT_DEFERRED) >> goto nopage; // should exit __alloc_pages_slowpath() from here >> > > Do we face the same issue if we allocate dma-buf while CMA has plenty > of free memory, but non-CMA has none? > >> In order to quickly fall back to remote node, we should remove >> ALLOC_CMA both in __compaction_suitable() and __isolate_free_page() >> in long term GUP flow. After this fix, starting a 32GB virtual machine >> with device passthrough takes only a few seconds. >> >> Fixes: 984fdba6a32e ("mm, compaction: use proper alloc_flags in __compaction_suitable()") >> Cc: >> Signed-off-by: yangge >> Reviewed-by: Baolin Wang >> --- >> >> V6: >> -- update cc->alloc_flags to keep the original loginc >> >> V5: >> - add 'alloc_flags' parameter for __isolate_free_page() >> - remove 'usa_cma' variable >> >> V4: >> - rich the commit log description >> >> V3: >> - fix build errors >> - add ALLOC_CMA both in should_continue_reclaim() and compaction_ready() >> >> V2: >> - using the 'cc->alloc_flags' to determin if 'ALLOC_CMA' is needed >> - rich the commit log description >> >> include/linux/compaction.h | 6 ++++-- >> mm/compaction.c | 26 +++++++++++++++----------- >> mm/internal.h | 3 ++- >> mm/page_alloc.c | 7 +++++-- >> mm/page_isolation.c | 3 ++- >> mm/page_reporting.c | 2 +- >> mm/vmscan.c | 4 ++-- >> 7 files changed, 31 insertions(+), 20 deletions(-) >> >> diff --git a/include/linux/compaction.h b/include/linux/compaction.h >> index e947764..b4c3ac3 100644 >> --- a/include/linux/compaction.h >> +++ b/include/linux/compaction.h >> @@ -90,7 +90,8 @@ extern enum compact_result try_to_compact_pages(gfp_t gfp_mask, >> struct page **page); >> extern void reset_isolation_suitable(pg_data_t *pgdat); >> extern bool compaction_suitable(struct zone *zone, int order, >> - int highest_zoneidx); >> + int highest_zoneidx, >> + unsigned int alloc_flags); >> >> extern void compaction_defer_reset(struct zone *zone, int order, >> bool alloc_success); >> @@ -108,7 +109,8 @@ static inline void reset_isolation_suitable(pg_data_t *pgdat) >> } >> >> static inline bool compaction_suitable(struct zone *zone, int order, >> - int highest_zoneidx) >> + int highest_zoneidx, >> + unsigned int alloc_flags) >> { >> return false; >> } >> diff --git a/mm/compaction.c b/mm/compaction.c >> index 07bd227..d92ba6c 100644 >> --- a/mm/compaction.c >> +++ b/mm/compaction.c >> @@ -655,7 +655,7 @@ static unsigned long isolate_freepages_block(struct compact_control *cc, >> >> /* Found a free page, will break it into order-0 pages */ >> order = buddy_order(page); >> - isolated = __isolate_free_page(page, order); >> + isolated = __isolate_free_page(page, order, cc->alloc_flags); >> if (!isolated) >> break; >> set_page_private(page, order); >> @@ -1634,7 +1634,7 @@ static void fast_isolate_freepages(struct compact_control *cc) >> >> /* Isolate the page if available */ >> if (page) { >> - if (__isolate_free_page(page, order)) { >> + if (__isolate_free_page(page, order, cc->alloc_flags)) { >> set_page_private(page, order); >> nr_isolated = 1 << order; >> nr_scanned += nr_isolated - 1; >> @@ -2381,6 +2381,7 @@ static enum compact_result compact_finished(struct compact_control *cc) >> >> static bool __compaction_suitable(struct zone *zone, int order, >> int highest_zoneidx, >> + unsigned int alloc_flags, >> unsigned long wmark_target) >> { >> unsigned long watermark; >> @@ -2395,25 +2396,26 @@ static bool __compaction_suitable(struct zone *zone, int order, >> * even if compaction succeeds. >> * For costly orders, we require low watermark instead of min for >> * compaction to proceed to increase its chances. >> - * ALLOC_CMA is used, as pages in CMA pageblocks are considered >> - * suitable migration targets >> + * In addition to long term GUP flow, ALLOC_CMA is used, as pages in >> + * CMA pageblocks are considered suitable migration targets > > I'm not sure if this document is correct for cases other than GUP. Yes, we should update the document for other cases where CMA cannot be used. That's why we use the passed 'alloc_flags' to determine if 'ALLOC_CMA' is needed, instead of using 'current->flags & PF_MEMALLOC_PIN'.