From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 64B23C27C53 for ; Wed, 19 Jun 2024 22:28:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 212026B03F9; Wed, 19 Jun 2024 18:28:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1C3216B03FB; Wed, 19 Jun 2024 18:28:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 062F86B045D; Wed, 19 Jun 2024 18:28:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D531B6B03F9 for ; Wed, 19 Jun 2024 18:28:43 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4284E81534 for ; Wed, 19 Jun 2024 22:28:43 +0000 (UTC) X-FDA: 82249078926.15.C97372C Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com [209.85.222.44]) by imf14.hostedemail.com (Postfix) with ESMTP id 828CE100008 for ; Wed, 19 Jun 2024 22:28:40 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=L++X3sq6; spf=pass (imf14.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718836116; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=76WkMXBhwe6N2xp4fu56Z3Ov7klwTm7A/zZs08taFkE=; b=naKqHSqzYkkaQ7AQ5NQP3TaKYS5zGQGj1BTJM21V/P7mYaDsQ6tS20KbTAjBVa7I3U/1as MiZ7kg2aIia4WI5ejh+Wom5xWiQfko5GiZnJ5h7nSt5HHYI0tutPo/nZfXjfHtEzEibEpl rXj7T899wYGl/ff+iCc4h70JmyXmZo8= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=L++X3sq6; spf=pass (imf14.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718836116; a=rsa-sha256; cv=none; b=TjtssN9yqM71y8Oa7BA6hrU/2zQ2NCRSO54pXfR5cf5DgXjNvW1kCdKFK/noxWxuEC5Z5X KP7Jp7+EIR27sk+NZGC/PsMdT1ZxOHaKZKRw5sHiqHGDXqnUcVIZi0IDqSINb2sLabgIya V8Yvl+YGnqcYY8ELnqVpITyyOjKVJGU= Received: by mail-ua1-f44.google.com with SMTP id a1e0cc1a2514c-80acfd81899so81297241.1 for ; Wed, 19 Jun 2024 15:28:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718836119; x=1719440919; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=76WkMXBhwe6N2xp4fu56Z3Ov7klwTm7A/zZs08taFkE=; b=L++X3sq62B9HMllJNLR++VVoMYE8VFe8hyQB6UuPVVxV46OpXGGd7X9axjoEiCvj5J fCHVyHsLRgoBXFMSQMaOCSxtVRV/otgLnsS2ys6GUNSPCYwoxWNiIwauvN6LB7UOUAho +dXaSlcBxwVa+0XjAp/crr+NuZSpIwHbsviA002pxazWQnpe4KMe+K1RNSrR/FOuSHCu uA7yvP9psnnoYGoCgVSIPdRIZSzlSs7cO7e6tA7VwvrVZz6cyuARN82Rt1ZEl66VMiPL 2l9kLv/ZlWHXgsMsGPuR7qMuRXkWBx+bQwac1Hs5ptkH+tnAXTUUlA3xHyRALJY+reRN 3jmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718836119; x=1719440919; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=76WkMXBhwe6N2xp4fu56Z3Ov7klwTm7A/zZs08taFkE=; b=BtENWSiVg1UpiRrNcknQomZhawCJChQe7f8DRjRWWZq+x7PAJAs+4yQHSB5tbrUz/S Ve9nLDz9AYxFAinnoXWl3eqNMPu1Jm4NDrC9kG/ak4wS8NJUOX5uFEE+DEkn5fJN7GTw JbtPrfzZKvTAxp1msIQfJ0Aa2M8PxFxLH2Ri3mC2GZOPjEO/XVQ/63EMwdno0P5PP3PY Vkr0Ogk++RpukFKcnbER3Na4FSg94i0mWjZrK8TGvWg12Fnv5a+UBjWLbkDDu5y9eK7I m8SL54jWB9E7Hfb6t33uesjbSNP0EHMQNmM0PZ80PPZWZk0qppVRlu+WSps4iYQGfy7U OOiw== X-Forwarded-Encrypted: i=1; AJvYcCU+HArWwVLjQsba2O4+q6lhhq/7einOb3mXjiegO2vmwzWk2exMuUY/kKkmn4VkFFob3KtRGhD0BWX8E88KY1H8Trk= X-Gm-Message-State: AOJu0Yxyvr9NchTjXD+fcPlgZPFtiNbAGwi2FXasRYIfL3AAKjwwlehQ njMUuzFtY5Ej89qb5UP92Yk49AXVrUXoCMLWjvdyB1vH3uzp+Jp/+j2UC0j/HiZQatvdW2RYQM7 ztyoLw10+FqqnEJ/vZeTD+e98r00= X-Google-Smtp-Source: AGHT+IEW+MCDnUcbcFZwTPreXhMCxw59gEDO699eNxeIUjf41ZTjNA6c/ZiCBjYOCYLHiSPAkqOSrYGAC4fvXLI5CIQ= X-Received: by 2002:a05:6102:5898:b0:48d:c5ea:d350 with SMTP id ada2fe7eead31-48f12fd9001mr4027631137.1.1718836119412; Wed, 19 Jun 2024 15:28:39 -0700 (PDT) MIME-Version: 1.0 References: <1718801672-30152-1-git-send-email-yangge1116@126.com> In-Reply-To: <1718801672-30152-1-git-send-email-yangge1116@126.com> From: Barry Song <21cnbao@gmail.com> Date: Thu, 20 Jun 2024 10:28:28 +1200 Message-ID: Subject: Re: [PATCH] mm/page_alloc: add one PCP list for THP To: yangge1116@126.com Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, stable@vger.kernel.org, baolin.wang@linux.alibaba.com, mgorman@techsingularity.net, liuzixing@hygon.cn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 828CE100008 X-Stat-Signature: j9wspoxjgp5k9csmkx7d4sybp88yahws X-HE-Tag: 1718836120-212526 X-HE-Meta: U2FsdGVkX185xKT5pndEXIbXI3XjctplRfJZhXCdLMufW3ObcTkmH9ppEjrnDAPpyu9rkW9bxCASSxrP1sDsg3cQ+UuLm8QdO3pbRpEFsOXhd2BG2U9Ivs8mbiO3tnmfpiZKt39S5CzTQWsYhv14sntys/6Fps9OyLuanCCcsGfThI0NIVtJYnoDKnUbVDQDiEy13OUIgeB2Mep1GUq1VG8vYry6RHkxDMW6XjkwnOza1B4dMOeealuY92h93k9jo+Yld36rloyRq1Y1FKlEHFjJJlQLvI6JcGo4R4B0oBO5zIoMhJAVYBzfrFSXiTr1SdU+HOy1Dv/kcjHxuzzjw2imnOEOn4T/1m2fDUHwnY5ahdHt4kWJQfFT2oFWRCtPrmNM2MG7d98oldUxgIkJ8wVcDrGl5BJOJKvF6tDcuZgqxkj1xZhmH+UH0PpOxVvsg72+dAL8/MC8kF3Zm3DIzXFhdbJGGTDSSiCFRXpKOjr80OrD3+8k4vgI2m60nx9br965yeFGch0d0jU/AuZXY8zSrXoYuZSvA7Xvm8YInqsxxoT3eBCofiSz0e+6Uet3j/kIy1MdF/KFchGciJ4H0paj7wfxpPQ5DqwBVnHYFkiMk8wkpUFXbrmHmb9OWjrwQzLGBTy++UfMiIfy7Bg7RLmw+Jm/X6itvBO6QihTISp2WKlJ8jraf7A8/pvnympBHuIgyeGJBum7VKTdBWEVnPr2Z/Zfczuk/GQ8Vr2svfEOezlWZ7HEKLWV5SCaZY/kB9/L788H0TwYh/TjFL1GJc5iOWiNBsGerDxLMBqdqnFnmyAsJh8b4KMpRy9losL9Gxr5lhjYvg9UTMagd6MdKfJ/6N8QKhUFkppiZBLOXk0YJsmFiCkRYIYuDpuBgX124Od9RNJ45eN/suwyKiA87Z3YB7l4HlAk8//zU3Y3L1zmvA0kSKuxakMcBMgTY+0vJ/j+azwJ5mgkjMxpvEa ZVlnBcKU jrHq6mpcOIBLuN2aEHZ6E1KHghVWdgbt2vu9YIbn77KAUL4/Sjo7EIB5iIeddtxoIa/EAtd/sVe4s8qVWZPkNLDwDym4iBaSD12ZYJabuz3Uwz1DK2TJJdJvpUBZEdd/gG7hM/asxY5JHrvWUlzyNkVHOPRh57rtA8Zkw+Er7lh75j584U7B92hQu9NNT2760Ryx5NAvbRIef+KkX4ml1efbsvV4GfSeJbcEWJqtZX3meFBpG31Vsb/9MI8RxW+aYl0ov9Gplws5vjs3ei9f0LtLV1B9TBeYlOw7kDKQQYp58pt09Sr/rr/Mn/s0ufgFA0v8QXJQx/D3+JE3vIAtR36NmclCTzXhPjxP5BYVNjYWVFVGQ8OBkgEUzBU8qSyWxUYluFctLGAv5cUS0iiu7AMdAOYUeoypW6AVXYcfHrG1SxxI= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jun 20, 2024 at 12:55=E2=80=AFAM wrote: > > From: yangge > > Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for > THP-sized allocations") no longer differentiates the migration type > of pages in THP-sized PCP list, it's possible that non-movable > allocation requests may get a CMA page from the list, in some cases, > it's not acceptable. > > If a large number of CMA memory are configured in system (for > example, the CMA memory accounts for 50% of the system memory), > starting a virtual machine with device passthrough will get stuck. > During starting the virtual machine, it will call > pin_user_pages_remote(..., FOLL_LONGTERM, ...) to pin memory. Normally > if a page is present and in CMA area, pin_user_pages_remote() will > migrate the page from CMA area to non-CMA area because of > FOLL_LONGTERM flag. But if non-movable allocation requests return > CMA memory, migrate_longterm_unpinnable_pages() will migrate a CMA > page to another CMA page, which will fail to pass the check in > check_and_migrate_movable_pages() and cause migration endless. > Call trace: > pin_user_pages_remote > --__gup_longterm_locked // endless loops in this function > ----_get_user_pages_locked > ----check_and_migrate_movable_pages > ------migrate_longterm_unpinnable_pages > --------alloc_migration_target > > This problem will also have a negative impact on CMA itself. For > example, when CMA is borrowed by THP, and we need to reclaim it > through cma_alloc() or dma_alloc_coherent(), we must move those > pages out to ensure CMA's users can retrieve that contigous memory. > Currently, CMA's memory is occupied by non-movable pages, meaning > we can't relocate them. As a result, cma_alloc() is more likely to > fail. > > To fix the problem above, we add one PCP list for THP, which will > not introduce a new cacheline for struct per_cpu_pages. THP will > have 2 PCP lists, one PCP list is used by MOVABLE allocation, and > the other PCP list is used by UNMOVABLE allocation. MOVABLE > allocation contains GPF_MOVABLE, and UNMOVABLE allocation contains > GFP_UNMOVABLE and GFP_RECLAIMABLE. > > Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for THP-sized = allocations") Please add the below tag Cc: And I don't think 'mm/page_alloc: add one PCP list for THP' is a good title. Maybe: 'mm/page_alloc: Separate THP PCP into movable and non-movable categories' Whenever you send a new version, please add things like 'PATCH V2', 'PATCH = V3'. You have already missed several version numbers, so we may have to start fr= om V2 though V2 is wrong. > Signed-off-by: yangge > --- > include/linux/mmzone.h | 9 ++++----- > mm/page_alloc.c | 9 +++++++-- > 2 files changed, 11 insertions(+), 7 deletions(-) > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index b7546dd..cb7f265 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -656,13 +656,12 @@ enum zone_watermarks { > }; > > /* > - * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. One additional = list > - * for THP which will usually be GFP_MOVABLE. Even if it is another type= , > - * it should not contribute to serious fragmentation causing THP allocat= ion > - * failures. > + * One per migratetype for each PAGE_ALLOC_COSTLY_ORDER. Two additional = lists > + * are added for THP. One PCP list is used by GPF_MOVABLE, and the other= PCP list > + * is used by GFP_UNMOVABLE and GFP_RECLAIMABLE. > */ > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > -#define NR_PCP_THP 1 > +#define NR_PCP_THP 2 > #else > #define NR_PCP_THP 0 > #endif > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 8f416a0..0a837e6 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -504,10 +504,15 @@ static void bad_page(struct page *page, const char = *reason) > > static inline unsigned int order_to_pindex(int migratetype, int order) > { > + bool __maybe_unused movable; > + > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > if (order > PAGE_ALLOC_COSTLY_ORDER) { > VM_BUG_ON(order !=3D HPAGE_PMD_ORDER); > - return NR_LOWORDER_PCP_LISTS; > + > + movable =3D migratetype =3D=3D MIGRATE_MOVABLE; > + > + return NR_LOWORDER_PCP_LISTS + movable; > } > #else > VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); > @@ -521,7 +526,7 @@ static inline int pindex_to_order(unsigned int pindex= ) > int order =3D pindex / MIGRATE_PCPTYPES; > > #ifdef CONFIG_TRANSPARENT_HUGEPAGE > - if (pindex =3D=3D NR_LOWORDER_PCP_LISTS) > + if (pindex >=3D NR_LOWORDER_PCP_LISTS) > order =3D HPAGE_PMD_ORDER; > #else > VM_BUG_ON(order > PAGE_ALLOC_COSTLY_ORDER); > -- > 2.7.4 >