From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3DC6C27C79 for ; Tue, 18 Jun 2024 03:32:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 488116B02D5; Mon, 17 Jun 2024 23:32:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 438126B02D7; Mon, 17 Jun 2024 23:32:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 300C36B02D8; Mon, 17 Jun 2024 23:32:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 126136B02D5 for ; Mon, 17 Jun 2024 23:32:08 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C65F21C0CC8 for ; Tue, 18 Jun 2024 03:32:07 +0000 (UTC) X-FDA: 82242585894.07.71CFB50 Received: from m16.mail.126.com (m16.mail.126.com [117.135.210.9]) by imf29.hostedemail.com (Postfix) with ESMTP id 4D6F8120015 for ; Tue, 18 Jun 2024 03:32:04 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=iGKPY3bM; spf=pass (imf29.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718681523; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vHsJF5wp2r69V/iuhQZOR/0eg3076+0uu7SPU4sU6c4=; b=Ndr0DJ8MJdPmxDod1KQjvTgltUWUdUjs9b5VBgMMsIoqoFj4lf3UBgu8MnpX8bCJafAQYk 7R4NQB9FYgdOO8WeWWP86ytfrA0KG1oP/8PsQoc6J5Vbqb8DV2gBMf06alGbaVQo8v+ny8 +aGEFO1CNMeJngdG7y+AO08LuLVfkv0= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=iGKPY3bM; spf=pass (imf29.hostedemail.com: domain of yangge1116@126.com designates 117.135.210.9 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718681523; a=rsa-sha256; cv=none; b=JQl2PDUndTXR1KCS7rqcj4CrxBWl3SHY0F0GHckZXXCcfhz3zxO3zR3LGuhHPAioGrsHL5 uLnkgV2yMPXi/7xNMOGXXxIi6DR9utrZdinXZYHoKEG4IG0notR6zTSgzOJDWj68zMJ1pt NLh1D5tO1jHoCr5tL58RoBSzZKwSWsA= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Subject:From:Message-ID:Date:MIME-Version: Content-Type; bh=vHsJF5wp2r69V/iuhQZOR/0eg3076+0uu7SPU4sU6c4=; b=iGKPY3bMliEgNW804a7rkDA4zEeCLQHkdBMWnZ4fj/JkMtn08S8q1SSpFn4Dck PzDct0lsi53efO7/6cXe0Yiq7s8foqdLfDRrCRYIDRjq5tvD+Up6dy/RD4niYgOX lw74R3rLFE1xjRGgnVWzjh6QCgZ24BR7iaiQHPd8i5oBs= Received: from [172.21.21.216] (unknown [118.242.3.34]) by gzga-smtp-mta-g0-0 (Coremail) with SMTP id _____wDnTxer_3BmGKLiBg--.18575S2; Tue, 18 Jun 2024 11:31:56 +0800 (CST) Subject: Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating non-CMA THP-sized page To: Barry Song <21cnbao@gmail.com>, Baolin Wang Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, liuzixing@hygon.cn, Johannes Weiner , Vlastimil Babka , Zi Yan References: <1717492460-19457-1-git-send-email-yangge1116@126.com> <82d31425-86d7-16fa-d09b-fcb203de0986@126.com> <7087d0af-93d8-4d49-94f4-dc846a4e2b98@linux.alibaba.com> <6dc8df31-eb01-4382-8467-c5510f75531e@linux.alibaba.com> From: yangge1116 Message-ID: <163d69f4-3e66-f1b8-87c5-d58cd0c1b18f@126.com> Date: Tue, 18 Jun 2024 11:31:55 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wDnTxer_3BmGKLiBg--.18575S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxtFWDWw45XrWDXF1xJFW3Awb_yoW7Xw18pF W8C3W2yF4UJ345Ary7twn0kr4akws7KF15Xr48XryUZrnIyrWxCFn7tr15uFy0vr9rAr10 qrWqgryfZF4jy3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07bjSdkUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiWR4CG2VLayWeQQAAsh X-Rspamd-Server: rspam03 X-Rspam-User: X-Rspamd-Queue-Id: 4D6F8120015 X-Stat-Signature: 3wtb5d5z3ze631ko8nim54k3bofootaj X-HE-Tag: 1718681524-128985 X-HE-Meta: U2FsdGVkX18ksIApJWTt9JUn3v5xj7nhQHqB9rnxiXhAeVquWMF9rcSAklYJCzIJh5HfhXywILpxPdRH7chu/NA05TahJ5QQf0vwt5FJPnm08K+O5DuVBxbnH7WgwifInTIjEFgYn9JCjiybJolmFX/oZYQt5E7Y9rtgeZP2Nmg3NwfgTW7p2rBfvq83/lYY5G2uQwNMv7Dh9WkVh70Jjx7ScyrcpySuWTqZwUyHufusArOmqT1CAd2lqK8vHFaolG3c0+SSAi/Zdfz8evUJcjW5KadpMoM+mA6+uP0yyCW8OztyltD1Z2qrGGHntFciRsIMmtFK72J033AHH/SAEQoEOI9O4WeVYPbRxU7Xslm37igQCVSzE3PZROL8jiRsS4VQeNRQd7+KwnWnbOAb69AzV9IqeYWrznMpimHFybRgMlAAfsrEONX3eK0paCacZHrVrnUWw5RQynqq8ZjgKdelmM3hI/c3ds+BdYYtbkIyPltyqkeOriIlJVaRErWLwQcNyWfEmipIb5dDxRSQodBWakFJ15jFNomOZ6b7zbbB4DIAOmgq+xSRMgFDdunyHKToq4YA8EjGj6tmRvzQW5OMYubESczad9jlSnsOI4ixfDVFyyeX5Dc6fFwlVSnFyzk4wHHwPKiAi78/DG2VqaZD5wG6ryj4WFr2fxptHB1aGmC9gCCM8BYSjpTMlkK0O75BwMrRWDtQp/v20YbhPl09uUylIhY0lOdTsP5o2g/p4FZhNSszCIqyDDDOu3pcCp5eU9eUn9A3kqW1VbdR/wh+gnNlhZ1ICQPGgsXS8Z3rga8wEKcDVPkwvfdjPjvxVwN+6FZtqEp/ygLFckw/ObwdBDJxWITAbwuzAreuV7lFjELc3l71Ggyan2dZn11fmDgp6+00f1cLKkQ4TfnoS2LfZmovzwmERN/+dRl5kbra1gU/bPROfKfWbr7yagIdS9r26B2psPnvR3Pn5j9 aEpwCvqd bRB+57wEp92b9g9YtnfYi2KedjDiGFnp/OFHfpqO4AuUR99pyhYyyK5G6FNTJkkne+eg5/hoALlVFYzWuf0lvUPT/NQO/6UynklILFwqTOyT37836/zQTL89ebzfks7Tpzy/BsUVO9HkwYyFVB9/ZwFlORP7IEcmmzyeiS93T3bJ8N9Y5fy6NW33ytVF6+RIQoeNvxa/Vo121xOQlVLhdKWPB1SKO/kK56mD23sH5i+PpDkszwJabU1R9Wocr2ZddVOKuTwSoPBRPPSNgcRPLutdthnOlnbdaH5hDBaI3Gw6ojFosqiDXGbEaXAaLQNaDMtlWK79w1kuaf97/1UOmg9FMXu44tDKhxSJzrUtKRJ2o/Cd2AcIXindIcW11IvbL7ABkJwpi0fYqgs6RfAYLpiK6jl3WH/PQ+bgOeNfaRLrU9wSmVv1xO9cKWBshh0LqT0VZDkkOJEwhq9zhGz618eK8LdJ3mG4avhJevlIE6G5K7o8JaQvKU7KFvom0IKf33Fw6o6wGvtrW5cA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/6/17 下午7:55, Barry Song 写道: > On Mon, Jun 17, 2024 at 7:36 PM Baolin Wang > wrote: >> >> >> >> On 2024/6/17 18:43, Barry Song wrote: >>> On Thu, Jun 6, 2024 at 3:07 PM Baolin Wang >>> wrote: >>>> >>>> >>>> >>>> On 2024/6/4 20:36, yangge1116 wrote: >>>>> >>>>> >>>>> 在 2024/6/4 下午8:01, Baolin Wang 写道: >>>>>> Cc Johannes, Zi and Vlastimil. >>>>>> >>>>>> On 2024/6/4 17:14, yangge1116@126.com wrote: >>>>>>> From: yangge >>>>>>> >>>>>>> Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for >>>>>>> THP-sized allocations") no longer differentiates the migration type >>>>>>> of pages in THP-sized PCP list, it's possible to get a CMA page from >>>>>>> the list, in some cases, it's not acceptable, for example, allocating >>>>>>> a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page. >>>>>>> >>>>>>> The patch forbids allocating non-CMA THP-sized page from THP-sized >>>>>>> PCP list to avoid the issue above. >>>>>>> >>>>>>> Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for >>>>>>> THP-sized allocations") >>>>>>> Signed-off-by: yangge >>>>>>> --- >>>>>>> mm/page_alloc.c | 10 ++++++++++ >>>>>>> 1 file changed, 10 insertions(+) >>>>>>> >>>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>>>> index 2e22ce5..0bdf471 100644 >>>>>>> --- a/mm/page_alloc.c >>>>>>> +++ b/mm/page_alloc.c >>>>>>> @@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone >>>>>>> *preferred_zone, >>>>>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); >>>>>>> if (likely(pcp_allowed_order(order))) { >>>>>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>>>>> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & ALLOC_CMA || >>>>>>> + order != HPAGE_PMD_ORDER) { >>>>>> >>>>>> Seems you will also miss the non-CMA THP from the PCP, so I wonder if >>>>>> we can add a migratetype comparison in __rmqueue_pcplist(), and if >>>>>> it's not suitable, then fallback to buddy? >>>>> >>>>> Yes, we may miss some non-CMA THPs in the PCP. But, if add a migratetype >>>>> comparison in __rmqueue_pcplist(), we may need to compare many times >>>>> because of pcp batch. >>>> >>>> I mean we can only compare once, focusing on CMA pages. >>>> >>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>> index 3734fe7e67c0..960a3b5744d8 100644 >>>> --- a/mm/page_alloc.c >>>> +++ b/mm/page_alloc.c >>>> @@ -2973,6 +2973,11 @@ struct page *__rmqueue_pcplist(struct zone *zone, >>>> unsigned int order, >>>> } >>>> >>>> page = list_first_entry(list, struct page, pcp_list); >>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>> + if (order == HPAGE_PMD_ORDER && >>>> !is_migrate_movable(migratetype) && >>>> + is_migrate_cma(get_pageblock_migratetype(page))) >>>> + return NULL; >>>> +#endif >>> >>> This doesn't seem ideal either. It's possible that the PCP still has many >>> non-CMA folios, but due to bad luck, the first entry is "always" CMA. >>> In this case, >>> allocations with is_migrate_movable(migratetype) == false will always lose the >>> chance to use the PCP. It also appears to incur a PCP spin lock/unlock. >> >> Yes, just some ideas to to mitigate the issue... >> >>> >>> I don't see an ideal solution unless we bring back the CMA PCP :-) >> >> Tend to agree, and adding a CMA PCP seems the overhead can be acceptable? > > yes. probably. Hi Ge, > > Could we printk the size before and after adding 1 to NR_PCP_LISTS? > Does it increase one cacheline? > > struct per_cpu_pages { > spinlock_t lock; /* Protects lists field */ > int count; /* number of pages in the list */ > int high; /* high watermark, emptying needed */ > int high_min; /* min high watermark */ > int high_max; /* max high watermark */ > int batch; /* chunk size for buddy add/remove */ > u8 flags; /* protected by pcp->lock */ > u8 alloc_factor; /* batch scaling factor during allocate */ > #ifdef CONFIG_NUMA > u8 expire; /* When 0, remote pagesets are drained */ > #endif > short free_count; /* consecutive free count */ > > /* Lists of pages, one per migrate type stored on the pcp-lists */ > struct list_head lists[NR_PCP_LISTS]; > } ____cacheline_aligned_in_smp; > OK. The size of struct per_cpu_pages is 256 bytes in current code containing commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations"). crash> struct per_cpu_pages struct per_cpu_pages { spinlock_t lock; int count; int high; int high_min; int high_max; int batch; u8 flags; u8 alloc_factor; u8 expire; short free_count; struct list_head lists[13]; } SIZE: 256 After revert commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations"), the size of struct per_cpu_pages is 272 bytes. crash> struct per_cpu_pages struct per_cpu_pages { spinlock_t lock; int count; int high; int high_min; int high_max; int batch; u8 flags; u8 alloc_factor; u8 expire; short free_count; struct list_head lists[15]; } SIZE: 272 Seems commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized allocations") decrease one cacheline.