From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DFBECC27C79 for ; Tue, 18 Jun 2024 03:40:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6D7DE6B02B6; Mon, 17 Jun 2024 23:40:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 688336B02B8; Mon, 17 Jun 2024 23:40:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 529C86B02BC; Mon, 17 Jun 2024 23:40:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 34EE96B02B6 for ; Mon, 17 Jun 2024 23:40:41 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 705CFC0343 for ; Tue, 18 Jun 2024 03:40:40 +0000 (UTC) X-FDA: 82242607440.03.21152A5 Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.6]) by imf16.hostedemail.com (Postfix) with ESMTP id 2734A180004 for ; Tue, 18 Jun 2024 03:40:35 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=qeOybXQo; spf=pass (imf16.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.6 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718682030; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=QIrPTMY/YwesLH3jlvYihiQmQO6QVdAPAQ9OqtJ8dyI=; b=T7a1KbC0RRg9dMNO+bfIs8bCZMKrY3e37n8ZVyje3/hW/8DlQweqPLZGy18Qo8SxhRSZtG Xg7++HiolznPthhGkWhbwDVTq4WtgVBs69hrJFl51bTT993oAh1SZ6TFuNgmODzDxPMJ8C QtXsNDYGeojDWeghKLw5MZ494Oh3aM0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718682030; a=rsa-sha256; cv=none; b=IWXU0lSMz9+Zhquqo36S6hKtkQlhhYDlQsE3Qod0xT2ObMODcBQJ4wrjnzR0fx2oHuOELu v75eIvgHuEKPk9X8qMNmoDSDdgJeJuMEbp0EM6x/+OzfgpHQrnAKO+ksXDy9jvkVKViu66 iOn9OHYGNMjSABqSwS17TZkk0wkhBhs= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b=qeOybXQo; spf=pass (imf16.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.6 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Subject:From:Message-ID:Date:MIME-Version: Content-Type; bh=QIrPTMY/YwesLH3jlvYihiQmQO6QVdAPAQ9OqtJ8dyI=; b=qeOybXQoOix/iVFnwtrjm4DDH+/kahKuTK4JTccj6G/TIPYgb7BovTfx3e6wzu PXwwFzXbJPKos/hwaVWzJN79kzFZt9Hlx6Rf5fPh1y91qIaQDA9gP48lhc/CD4Y0 /hdeqiS8O4wU4GlRYq1+a1K1J6uRuFFGAIKeEYxsppArg= Received: from [172.21.21.216] (unknown [118.242.3.34]) by gzga-smtp-mta-g1-2 (Coremail) with SMTP id _____wD3_12hAXFmHcQhDQ--.15301S2; Tue, 18 Jun 2024 11:40:19 +0800 (CST) Subject: Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating non-CMA THP-sized page To: Barry Song <21cnbao@gmail.com> Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn References: <1717492460-19457-1-git-send-email-yangge1116@126.com> <2e3a3a3f-737c-ed01-f820-87efee0adc93@126.com> <9b227c9d-f59b-a8b0-b353-7876a56c0bde@126.com> From: yangge1116 Message-ID: Date: Tue, 18 Jun 2024 11:40:17 +0800 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 8bit X-CM-TRANSID:_____wD3_12hAXFmHcQhDQ--.15301S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxtw17Gr4kuFyxZr4fWFyfJFb_yoW7AFy8pF WfG3Waka1UXryUAwnrt3Z0kr10k34rKr48Wr1rXr18urnIyF1Iyr4xJr18uFyrAryUJF40 qr4UtasxZF4DAaDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07bjEfrUUUUU= X-Originating-IP: [118.242.3.34] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiWQsCG2VLayXP7QAAsJ X-Stat-Signature: zq15y5hx5ifhiu3rfa1gwnkoxxan5tk6 X-Rspamd-Queue-Id: 2734A180004 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1718682035-24084 X-HE-Meta: U2FsdGVkX1/bGMH/HrnMa8Ma9QGFMZfbKs3SJYWQjdHgJeqY3DLv23ld91wAiTUfu8EBX2vuGGozGPef1Fgvhrh0pcd7mJMZlouzKQqAd2jv0NZpfX5IcDEfTTpjPbdgKWHsU3TR/x2XxPPdqqnmcYBBXM9UdRwkSAzwzh2sJJnzdVqqh0CuGY9qrNUoX99AFOlIk5VNW/ovHlP0vqUsJfNuevPvWBNwXcz29QxSc3QDKgFwuHLoUe+NDTRs+gAJah0KpZ5KnfDnN7VkueoRRuZslGIgoFHdEO7DCFYDtwfjQwI4QU+ZhBObYs/lsEXEsREJfjgcJQcFALUMyEXdwN+NFPkbt3nbU0pK1pkVQAGOsJ3i0ETC+tWrk+idLNkadctdgzl67b9VwyMojj0gQQiLYPjXIQ8GUr+ueXQjWRNNDk+YHFwnAHZaa+fzJBSyIXlsZtdd7MXxpkCTfTCsbQ9bIfG0vEVlwX/jMQwA3cMYUUPtbpZxIelJTI5DSu3sCy3kf9mofFbrcEsQnaR0rCL20XvguFKsvlEWo+pJl09rK3sQOuNMPc5jkWH2ZVY4EDB28k0nEnKukjx3ebfIRFRC5chW5HFsZUD7in6Guzrv/ixuEm0cl5JLvFWHmD26Qj+qjQf16xBfaq5RBfclqwtS4F4aiXTqJ85BCxPtVDleqSMtq7fM2W18e8qUILNTrB967C/STP6eNHV5glrHu383eLrcK1lKHtx/ke1bZc0e72gaIPWQ/h1Vuk7wg/dDZCncaQw8QlOWQ0ROTK1Y3J4sM1oTh8h2ZR5aAcCyrB+nIWByam3Qjd8d352c1YbAycef43QdNhaOKVE7axzbpSxgKr9aDvVZ6Iera3f2oNiD1pCDbmLOHav0jjdDQsV+48C4glPqltDFrAZlPoxhVoI+0/vSVcSKQ7Jls+bMCcQJhBm9RxSor4vDrQf1smAwt0VXRh5+q/YfbZGe1sI TnNvwcJ3 ZFQy28zFYYQFCQDYIah2g+zqh14ra9EXLPz7xGHlw6KxUOLQNl4jY/Pc3sgj3G2HAS94Dnm0Du3e3r5WN9qudXDBSUIk1n3yTczfok6SqnO8xTvv4rcBOeyB3R0iA6vescYZalb8KameNZIzZ562qglTtEHJ6v6r8e/NuLO/dIZK4lMKIQAfzDLdmAsl7/7qEGC29Dx4fg7+O+y3D5maB2vwpg5eHS2ExhalHuzGxWg5v1BtQTNr+DDhoGOWsBS/grFkIM5G9a2bZe55FusXlP24lEDSCQRvqP8yaTmwlMHNVrCgfPyhzAjodXyi3+RZ0tz0qIqe+96/gtdmBjRUhaqh6XV0TgEDJRvC/WEs5IZ7Yt6+6DkBB+Sa77g1v1uHwH88L2GuuiaptdBDHAXUTLYvfWH7ukxcsrmdVgFquXR+PJNzLWxgRgc8mBTZrZtO2RutVAhkarc2GLOvGmURjHT9HixAB4WqBxMXjx4mB/TcOqcyzq0ZAaAT4MfJJKqH/DfsDMzauqfiNKXaJPHHtJb8YyTXZJYBDN/volSsMjHN/iP3/XexqiG1O1k3vWUlqkNxyt+4nLqEDud0J89UXqM43B2cbMiiyyj2o0zFj7uX9ix0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/6/18 上午9:55, Barry Song 写道: > On Tue, Jun 18, 2024 at 9:36 AM yangge1116 wrote: >> >> >> >> 在 2024/6/17 下午8:47, yangge1116 写道: >>> >>> >>> 在 2024/6/17 下午6:26, Barry Song 写道: >>>> On Tue, Jun 4, 2024 at 9:15 PM wrote: >>>>> >>>>> From: yangge >>>>> >>>>> Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for >>>>> THP-sized allocations") no longer differentiates the migration type >>>>> of pages in THP-sized PCP list, it's possible to get a CMA page from >>>>> the list, in some cases, it's not acceptable, for example, allocating >>>>> a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page. >>>>> >>>>> The patch forbids allocating non-CMA THP-sized page from THP-sized >>>>> PCP list to avoid the issue above. >>>> >>>> Could you please describe the impact on users in the commit log? >>> >>> If a large number of CMA memory are configured in the system (for >>> example, the CMA memory accounts for 50% of the system memory), starting >>> virtual machine with device passthrough will get stuck. >>> >>> During starting virtual machine, it will call pin_user_pages_remote(..., >>> FOLL_LONGTERM, ...) to pin memory. If a page is in CMA area, >>> pin_user_pages_remote() will migrate the page from CMA area to non-CMA >>> area because of FOLL_LONGTERM flag. If non-movable allocation requests >>> return CMA memory, pin_user_pages_remote() will enter endless loops. >>> >>> backtrace: >>> pin_user_pages_remote >>> ----__gup_longterm_locked //cause endless loops in this function >>> --------__get_user_pages_locked >>> --------check_and_migrate_movable_pages //always check fail and continue >>> to migrate >>> ------------migrate_longterm_unpinnable_pages >>> ----------------alloc_migration_target // non-movable allocation >>> >>>> Is it possible that some CMA memory might be used by non-movable >>>> allocation requests? >>> >>> Yes. >>> >>> >>>> If so, will CMA somehow become unable to migrate, causing cma_alloc() >>>> to fail? >>> >>> >>> No, it will cause endless loops in __gup_longterm_locked(). If >>> non-movable allocation requests return CMA memory, >>> migrate_longterm_unpinnable_pages() will migrate a CMA page to another >>> CMA page, which is useless and cause endless loops in >>> __gup_longterm_locked(). > > This is only one perspective. We also need to consider the impact on > CMA itself. For example, > when CMA is borrowed by THP, and we need to reclaim it through > cma_alloc() or dma_alloc_coherent(), > we must move those pages out to ensure CMA's users can retrieve that > contiguous memory. > > Currently, CMA's memory is occupied by non-movable pages, meaning we > can't relocate them. > As a result, cma_alloc() is more likely to fail. > >>> >>> backtrace: >>> pin_user_pages_remote >>> ----__gup_longterm_locked //cause endless loops in this function >>> --------__get_user_pages_locked >>> --------check_and_migrate_movable_pages //always check fail and continue >>> to migrate >>> ------------migrate_longterm_unpinnable_pages >>> >>> >>> >>> >>> >>>>> >>>>> Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for >>>>> THP-sized allocations") >>>>> Signed-off-by: yangge >>>>> --- >>>>> mm/page_alloc.c | 10 ++++++++++ >>>>> 1 file changed, 10 insertions(+) >>>>> >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >>>>> index 2e22ce5..0bdf471 100644 >>>>> --- a/mm/page_alloc.c >>>>> +++ b/mm/page_alloc.c >>>>> @@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone >>>>> *preferred_zone, >>>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); >>>>> >>>>> if (likely(pcp_allowed_order(order))) { >>>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE >>>>> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & >>>>> ALLOC_CMA || >>>>> + order != >>>>> HPAGE_PMD_ORDER) { >>>>> + page = rmqueue_pcplist(preferred_zone, zone, >>>>> order, >>>>> + migratetype, >>>>> alloc_flags); >>>>> + if (likely(page)) >>>>> + goto out; >>>>> + } >>>> >>>> This seems not ideal, because non-CMA THP gets no chance to use PCP. >>>> But it >>>> still seems better than causing the failure of CMA allocation. >>>> >>>> Is there a possible approach to avoiding adding CMA THP into pcp from >>>> the first >>>> beginning? Otherwise, we might need a separate PCP for CMA. >>>> >> >> The vast majority of THP-sized allocations are GFP_MOVABLE, avoiding >> adding CMA THP into pcp may incur a slight performance penalty. >> > > But the majority of movable pages aren't CMA, right? Yes. > Do we have an estimate for > adding back a CMA THP PCP? Will per_cpu_pages introduce a new cacheline, which > the original intention for THP was to avoid by having only one PCP[1]? > > [1] https://patchwork.kernel.org/project/linux-mm/patch/20220624125423.6126-3-mgorman@techsingularity.net/ > > >> Commit 1d91df85f399 takes a similar approach to filter, and I mainly >> refer to it. >> >> >>>>> +#else >>>>> page = rmqueue_pcplist(preferred_zone, zone, order, >>>>> migratetype, alloc_flags); >>>>> if (likely(page)) >>>>> goto out; >>>>> +#endif >>>>> } >>>>> >>>>> page = rmqueue_buddy(preferred_zone, zone, order, alloc_flags, >>>>> -- >>>>> 2.7.4 >>>> >>>> Thanks >>>> Barry >>>> >> >>