From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F6A6C27C4F for ; Tue, 18 Jun 2024 04:10:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BCFBD8D0012; Tue, 18 Jun 2024 00:10:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B7ED88D0001; Tue, 18 Jun 2024 00:10:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A46A88D0012; Tue, 18 Jun 2024 00:10:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 86F3E8D0001 for ; Tue, 18 Jun 2024 00:10:23 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3C5611C0D32 for ; Tue, 18 Jun 2024 04:10:23 +0000 (UTC) X-FDA: 82242682326.27.C3DDF5E Received: from mail-vk1-f174.google.com (mail-vk1-f174.google.com [209.85.221.174]) by imf20.hostedemail.com (Postfix) with ESMTP id 73FE01C0008 for ; Tue, 18 Jun 2024 04:10:21 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KlDoZJnj; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718683816; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WVAmMf5zjhgua5uAQi4PwTrkecjeDqn4nRI+mC+wfvM=; b=SrgrveVck8GkZo9zQ5/BwshfGUxH6sb2wdMiV/o+vQsFN6cfOFgoWUbpJmWyx3awxz+IAi uzYumZKPRRR5z8e9Xn1v+VxTgEj1vcrRHoIxZFxqLIocRd0KY02Y9NwbgccxAoxRb1kp09 9Fb4usEIHzrFtjhYNIIw+WssukP6sRk= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=KlDoZJnj; spf=pass (imf20.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.221.174 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718683816; a=rsa-sha256; cv=none; b=egB8+FNL4cQ0PDv5nUp8epxY/fuRSoIMTmpjTJmpyMlrwwEfb5sIWUVGZXUqI46i/6hPFL PCn/iRaCydIAdD0laUS78otZ7hUikioKtOvbNtuEhK2M6SRz0GbcL7N7pDDBDKtq43u1TN +mkFQgAFGYi1D2YeyTt07gidV8tUiiw= Received: by mail-vk1-f174.google.com with SMTP id 71dfb90a1353d-4ed10427e43so1446193e0c.2 for ; Mon, 17 Jun 2024 21:10:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718683820; x=1719288620; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=WVAmMf5zjhgua5uAQi4PwTrkecjeDqn4nRI+mC+wfvM=; b=KlDoZJnjp72XHLN+gMckiOOt1NMCKb0sCwg5WJFayg32d7fp5sxRtGUsL0eVPTjwKu CJ3edZR9b3Glgv8vBMnj8UmrhhvY02d753e6s0FYy6eAIE0jWstRHrDxwDUwgzW4aCoW gurmsUaUbzyNDeNSw6C/XVCRwk8xQVaotjWdsCEQStN09/rsDibFeFs17+5iFRy2LCxI jHlUKNVsXx7W/wDLGF+8CMMWRktwcXLSeYG3tXEa3yMLZgzBc0Ze0S+N4NqpS6ma710u iwGx3lcTvrJE8/FfRnu/G3yHNaO2P+lBOuao/WUprmC1b5a9+elvr7hQzMLQ6EH70HC7 jl1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718683820; x=1719288620; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=WVAmMf5zjhgua5uAQi4PwTrkecjeDqn4nRI+mC+wfvM=; b=f8xFwg7V2mJUG5+6qIchxpw78DqRSdMs1KGBxkIcnAE3x3lNhNB1mdKZEmH7JzUq8a Sj6syfFpjLi+GBlg3DTHbZfsxbZlqKLjZ4r5ffC1hlW8/iSRzqgQ5Dv5NZG2mSiU3vOq /VGf7wtA731sHYWAAwCLdGYIwlzHuiQy1wAsxeYUcmGFkY0jpuVvR2fs8tJnnVdZ2lp3 5vj9lC3wCHik38r1DeNp7RRKFBuzWQ1H0CexlgNmnkzz0dhIAMcTDc0rcIqR93CXCLkg riNo72H44J21keRfUAu0q2yL0WsD6J2eTDUr46g5mpKUpuJvWzT3HJyG791fU0CRZURV dFOw== X-Forwarded-Encrypted: i=1; AJvYcCVr+pvZsxFgsoZRmESL2ftBSOX9vG5ueTD8UvI9kUs5G/3fwy0SDh7w3tqbkLaufJrG8GDaFegnXnQ4n9VPRN1PNjc= X-Gm-Message-State: AOJu0YwTzx0Okb5g/pbT6GAFkzmAvWyxq7VB416LPp8X3/0ojmcljyrP IyqedZMohmF5LjmF62Dftl8maB+BlVkXfbl2cXh6TIQJHJuftF6GnP8vXuJG0QWtqgkBxaansqi MRQsiq+ko36Clt7KJ6ICM96UmTPg= X-Google-Smtp-Source: AGHT+IFiKYv1VKDbLmV8FvDZX65MbakeTTQCheseXZDe6XThVx6hz/dTOzqDsu4d33b7RdHXMLGmkpI2ehfZxePy+dI= X-Received: by 2002:a05:6122:9a0:b0:4ec:f7c4:fe25 with SMTP id 71dfb90a1353d-4ee41218591mr10862133e0c.15.1718683820275; Mon, 17 Jun 2024 21:10:20 -0700 (PDT) MIME-Version: 1.0 References: <1717492460-19457-1-git-send-email-yangge1116@126.com> <2e3a3a3f-737c-ed01-f820-87efee0adc93@126.com> <9b227c9d-f59b-a8b0-b353-7876a56c0bde@126.com> <4482bf69-eb07-0ec9-f777-28ce40f96589@126.com> In-Reply-To: <4482bf69-eb07-0ec9-f777-28ce40f96589@126.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 18 Jun 2024 16:10:08 +1200 Message-ID: Subject: Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating non-CMA THP-sized page To: yangge1116 Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: ex9tbme3g47s4onkxur1mtgmio8em849 X-Rspam-User: X-Rspamd-Queue-Id: 73FE01C0008 X-Rspamd-Server: rspam02 X-HE-Tag: 1718683821-288785 X-HE-Meta: U2FsdGVkX19nK6v8qRqo80XQoK9+cJ6HWNMey2OwADVnHMGzk5YLWg0BfEBY+OepulklmGRl/XZ6c8OEgd70vDbGELUjAhlCZfEKSIw/qTDt6qKutRqX75lSNyCu7OKJCKiSeDGKg63zNolVbKv2R7EWke7Vh/NwsVY2X3prpNKVuw3j+yt2ImEqQhN5RZe7zNyDz3Vs29XebnjPg1iteuz9hW+u78H6QhkprTGJAnnJC/XreUDO/Zdvk3Q4S42P5KqUetRzMcib9KqdfQamT9JhV662owZKqo3QE3W1lFyCyddbJcg/6POcLbdyLuQT8RzVVFxXGnhJqavGe0MM4S3vTkodsnbgownYlqmtS+rQv47LPr/tGBmeRn2K4vZC+QEPSvkzuZxD4m1DdBM+3G+05r3ijdgUVZKRsAxvLo0HszAq1nWvYyBE2chpwsdqKDk5OlnDuIHDCjz9Yc+o/h3iEJNcpJmvYXk+lUAFNjZRbRW3E14fkYEOk78WT8vUvwZe23zRdkdBiFl0h6Hm+CCfG61H/nBMGcL918JQR6YzzFo+zm14u1E2wxa5zrCuV6o8yEmYKnN8LUp1Sjylhn8YuTtNkR4amH0rr5RcVQ/QSAhhP4sN3basEdZWjyKIkG/YTkWEieDeBsyzXhvJHyDEVKGejtqt61cgTZG+HKY+X/yW9NvaY5BtqmSdrw5szraZLC9WHb6EOHRm6eyfP82nOogJpAsSJh8pyAZCh0IMj+FC6gnRSBdJf1GNK2SyXxqCMgQh38Hs9d8kf7caHpg1GG3rY1HAlpbcEoOJ2fHuBup6qtRry3M8xr6fkExpzMBHJF2SGwcn6IUhUp33BP5+w7SgHULOalKZKYyQBBcypnyTWT1gv6kFYbPyPmC3LNmLi/lqihByUgGZt6h6T2w/EAXVcTXELx0HXD+AMfqbPW+30RMMIWp7ewL/TVlTiL50ezMKSBaXcCjqtog OS3Fqloj HjpetUjJKSt7dyWIgMtoynvylS1Qyo87X1f+slSktdSgPddwNpULTNmaTbYEY63UjNo5IHST9LvwsmLr7zFr8k/YThB6RUa1sssktIbrY9maHhprKoVSNKDsQvyMsEV1XnxSCZQe1UTFCqyf3YE86HemHHOzTEDT+rheNqyBONPIyChH4HQNQr0BC718/jiyLhy1YRkqrFifFAN3yEo9O3LxlzMzxefamRbaJo2GTilsOA7/1jnJx578AZLU7TgQg+/fcbbsFG6u8ESUwFAhWztQA37pSUTw9j0fVI8WQ3wX4Xd8EAmSm7jdr6rEPlC+ZN/x6og9/dzd9rpAFtqrae6rXA6E9TDgNu7WvTwheoXEiKp11xHUumJZQ6VMr5KBCF/JDbCZZ6en7Sk9/cdGBIa6VGlrrV87nUePewLtFVktcSC4VH0KB+4/tnELPkXF7pZNcv+lcSo+Zkx8R9oaQHaPNhzrkbB/bqYi3j74HNA7Tzcc= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 18, 2024 at 3:32=E2=80=AFPM yangge1116 wro= te: > > > > =E5=9C=A8 2024/6/18 =E4=B8=8A=E5=8D=889:55, Barry Song =E5=86=99=E9=81=93= : > > On Tue, Jun 18, 2024 at 9:36=E2=80=AFAM yangge1116 = wrote: > >> > >> > >> > >> =E5=9C=A8 2024/6/17 =E4=B8=8B=E5=8D=888:47, yangge1116 =E5=86=99=E9=81= =93: > >>> > >>> > >>> =E5=9C=A8 2024/6/17 =E4=B8=8B=E5=8D=886:26, Barry Song =E5=86=99=E9= =81=93: > >>>> On Tue, Jun 4, 2024 at 9:15=E2=80=AFPM wrote: > >>>>> > >>>>> From: yangge > >>>>> > >>>>> Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list fo= r > >>>>> THP-sized allocations") no longer differentiates the migration type > >>>>> of pages in THP-sized PCP list, it's possible to get a CMA page fro= m > >>>>> the list, in some cases, it's not acceptable, for example, allocati= ng > >>>>> a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page. > >>>>> > >>>>> The patch forbids allocating non-CMA THP-sized page from THP-sized > >>>>> PCP list to avoid the issue above. > >>>> > >>>> Could you please describe the impact on users in the commit log? > >>> > >>> If a large number of CMA memory are configured in the system (for > >>> example, the CMA memory accounts for 50% of the system memory), start= ing > >>> virtual machine with device passthrough will get stuck. > >>> > >>> During starting virtual machine, it will call pin_user_pages_remote(.= .., > >>> FOLL_LONGTERM, ...) to pin memory. If a page is in CMA area, > >>> pin_user_pages_remote() will migrate the page from CMA area to non-CM= A > >>> area because of FOLL_LONGTERM flag. If non-movable allocation request= s > >>> return CMA memory, pin_user_pages_remote() will enter endless loops. > >>> > >>> backtrace: > >>> pin_user_pages_remote > >>> ----__gup_longterm_locked //cause endless loops in this function > >>> --------__get_user_pages_locked > >>> --------check_and_migrate_movable_pages //always check fail and conti= nue > >>> to migrate > >>> ------------migrate_longterm_unpinnable_pages > >>> ----------------alloc_migration_target // non-movable allocation > >>> > >>>> Is it possible that some CMA memory might be used by non-movable > >>>> allocation requests? > >>> > >>> Yes. > >>> > >>> > >>>> If so, will CMA somehow become unable to migrate, causing cma_alloc(= ) > >>>> to fail? > >>> > >>> > >>> No, it will cause endless loops in __gup_longterm_locked(). If > >>> non-movable allocation requests return CMA memory, > >>> migrate_longterm_unpinnable_pages() will migrate a CMA page to anothe= r > >>> CMA page, which is useless and cause endless loops in > >>> __gup_longterm_locked(). > > > > This is only one perspective. We also need to consider the impact on > > CMA itself. For example, > > when CMA is borrowed by THP, and we need to reclaim it through > > cma_alloc() or dma_alloc_coherent(), > > we must move those pages out to ensure CMA's users can retrieve that > > contiguous memory. > > > > Currently, CMA's memory is occupied by non-movable pages, meaning we > > can't relocate them. > > As a result, cma_alloc() is more likely to fail. > > > >>> > >>> backtrace: > >>> pin_user_pages_remote > >>> ----__gup_longterm_locked //cause endless loops in this function > >>> --------__get_user_pages_locked > >>> --------check_and_migrate_movable_pages //always check fail and conti= nue > >>> to migrate > >>> ------------migrate_longterm_unpinnable_pages > >>> > >>> > >>> > >>> > >>> > >>>>> > >>>>> Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for > >>>>> THP-sized allocations") > >>>>> Signed-off-by: yangge > >>>>> --- > >>>>> mm/page_alloc.c | 10 ++++++++++ > >>>>> 1 file changed, 10 insertions(+) > >>>>> > >>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>>>> index 2e22ce5..0bdf471 100644 > >>>>> --- a/mm/page_alloc.c > >>>>> +++ b/mm/page_alloc.c > >>>>> @@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone > >>>>> *preferred_zone, > >>>>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > >>>>> > >>>>> if (likely(pcp_allowed_order(order))) { > >>>>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > >>>>> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & > >>>>> ALLOC_CMA || > >>>>> + order !=3D > >>>>> HPAGE_PMD_ORDER) { > >>>>> + page =3D rmqueue_pcplist(preferred_zone, zo= ne, > >>>>> order, > >>>>> + migratetype, > >>>>> alloc_flags); > >>>>> + if (likely(page)) > >>>>> + goto out; > >>>>> + } > >>>> > >>>> This seems not ideal, because non-CMA THP gets no chance to use PCP. > >>>> But it > >>>> still seems better than causing the failure of CMA allocation. > >>>> > >>>> Is there a possible approach to avoiding adding CMA THP into pcp fro= m > >>>> the first > >>>> beginning? Otherwise, we might need a separate PCP for CMA. > >>>> > >> > >> The vast majority of THP-sized allocations are GFP_MOVABLE, avoiding > >> adding CMA THP into pcp may incur a slight performance penalty. > >> > > > > But the majority of movable pages aren't CMA, right? > > > Do we have an estimate for > > adding back a CMA THP PCP? Will per_cpu_pages introduce a new cacheline= , which > > the original intention for THP was to avoid by having only one PCP[1]? > > > > [1] https://patchwork.kernel.org/project/linux-mm/patch/20220624125423.= 6126-3-mgorman@techsingularity.net/ > > > > The size of struct per_cpu_pages is 256 bytes in current code containing > commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized > allocations"). > crash> struct per_cpu_pages > struct per_cpu_pages { > spinlock_t lock; > int count; > int high; > int high_min; > int high_max; > int batch; > u8 flags; > u8 alloc_factor; > u8 expire; > short free_count; > struct list_head lists[13]; > } > SIZE: 256 > > After revert commit 5d0a661d808f ("mm/page_alloc: use only one PCP list > for THP-sized allocations"), the size of struct per_cpu_pages is 272 byte= s. > crash> struct per_cpu_pages > struct per_cpu_pages { > spinlock_t lock; > int count; > int high; > int high_min; > int high_max; > int batch; > u8 flags; > u8 alloc_factor; > u8 expire; > short free_count; > struct list_head lists[15]; > } > SIZE: 272 > > Seems commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for > THP-sized allocations") decrease one cacheline. the proposal is not reverting the patch but adding one CMA pcp. so it is "struct list_head lists[14]"; in this case, the size is still 256? > > > > >> Commit 1d91df85f399 takes a similar approach to filter, and I mainly > >> refer to it. > >> > >> > >>>>> +#else > >>>>> page =3D rmqueue_pcplist(preferred_zone, zone, or= der, > >>>>> migratetype, alloc_flags); > >>>>> if (likely(page)) > >>>>> goto out; > >>>>> +#endif > >>>>> } > >>>>> > >>>>> page =3D rmqueue_buddy(preferred_zone, zone, order, alloc= _flags, > >>>>> -- > >>>>> 2.7.4 > >>>> > >>>> Thanks > >>>> Barry > >>>> > >> > >> >