From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9F67C27C79 for ; Tue, 18 Jun 2024 01:55:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48B046B02BA; Mon, 17 Jun 2024 21:55:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43B076B02BB; Mon, 17 Jun 2024 21:55:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 302986B02BC; Mon, 17 Jun 2024 21:55:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 136FC6B02BA for ; Mon, 17 Jun 2024 21:55:41 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 872D78012D for ; Tue, 18 Jun 2024 01:55:40 +0000 (UTC) X-FDA: 82242342840.20.5FDAD82 Received: from mail-vs1-f47.google.com (mail-vs1-f47.google.com [209.85.217.47]) by imf03.hostedemail.com (Postfix) with ESMTP id BF4A82000B for ; Tue, 18 Jun 2024 01:55:38 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MMc9E141; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718675731; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Zpsy0npL52e1fCtTldxY0aQBwvhZxOEFkncwykjvpy4=; b=c1MjFL7oYoyum2OEWyrILy0nMlyHHAQd7aVdpO7N7VACD5CN05hAxc3NylUIXnSoBQVaGV rBzKqzO66txd9D5iT5W93y0VPS5zQIrIV6yOkKvWrtnd1q0EaImJhFccj1vosSHkDGz9+i ltTDGNAI/DacNC+TkRWZd1PBehIWfkM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718675731; a=rsa-sha256; cv=none; b=uH2+DVppSfzeOAD3rSTSyJUK/nKTE3xKTQlthi/ACg4c0Tppc5JQaHmOyQEpojz2fLzCu1 Ky8ic9krhafvvTWE6FbKUT9qobyhcAry6N1MZLdA8pAtAXdVSS7KKrFk5IVE4fiRHtJErG kE9UTqUEcqtwKugr/p+IlhrF8mxFwfA= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=MMc9E141; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf03.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.217.47 as permitted sender) smtp.mailfrom=21cnbao@gmail.com Received: by mail-vs1-f47.google.com with SMTP id ada2fe7eead31-48c38cc7810so1477643137.0 for ; Mon, 17 Jun 2024 18:55:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1718675738; x=1719280538; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Zpsy0npL52e1fCtTldxY0aQBwvhZxOEFkncwykjvpy4=; b=MMc9E141u98KOlMgmYaNUGIX995+mXrr1YyEjN5Iz8ha7v8/+sWHnYFDI+LxW643ah NchYblkXal1oj/2h2Ds8xmkg01avsiy9Yxm2AcC3rJYS3+s10swf5B+Fl+wtsQ6uV10W 5vc+nRLnBqJQ1h0vCQfvP7MsB2rYK72jWdXvqlPyjHtdv7aQYfkhoa67wLRE+1Zv7qNK yXt1Re6bMciwBa1Kohjt+w3cAwpDFgbIL4cdi/yXlbT+hHiBHNWJ59WfQPlicD+9/kfY ZQjZZXNhZ5ctxxvotx3En6cDNX+9xN8PYzAHxQB76ribOgQPvfEMTH9dbfOByibAF9NA JzmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1718675738; x=1719280538; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Zpsy0npL52e1fCtTldxY0aQBwvhZxOEFkncwykjvpy4=; b=b+TZ6KXaBElyP+zjic45II5HSawpVYHWUIs+qXj1m7JH9U6hgPQe9nEE88t4usN2y3 CpztViu9ZmWUUH4RtCUM7xpuL08rWSvQOm5x/eDhHxBnYgUzZQJo4GxKc0QLZ3KXjMeb YCshHDU3mWMBa5NWyI1JRIRY/QHxU5HOChUh8StaYOwAzzfAmiyu0CUXuw8wZiiJEtb0 bSt+48B7GkjPjwG7FdEd2LmQwJD5+2FifrEKpLIHV1J+NOIDOlKhEArW5WGaqnpD306v MXPLJNfAR/EHKe5t2egThR3WsOqd/E2bgewTRiuaZaA4yamGVx9hjrbzuYC0ycfU484+ b1PQ== X-Forwarded-Encrypted: i=1; AJvYcCWVDP2Vayx0vpaURLpBudl3YtQXsSl+Lb1SARvkFgmMxHB9FQLAHkOygWs0fsVfUq5K7eq4QJXxpDfwKYkulk3EzHI= X-Gm-Message-State: AOJu0YwY5tw9gbQkThWg8xG52GQ1UxcW2BdhHh4EdbNK2UDNo7Nnjz9i LNyyybsschHjmXe2Tgzp+iscJpPqudid6SNKZ/sXzGQmFY+m4GKYPf8Xd/Bct6FTpCpQbQjG74D 3MLcb2Oh5Tk009iZg99CqVIZl8s0= X-Google-Smtp-Source: AGHT+IHeuUoHNmBJX6etzjqRwU2rFGzePFITlJqydbzcSwjLfvLQ5wjO0NeGCPoWT2MH8a8EsKuBzl+yJ6zTST214pE= X-Received: by 2002:a05:6102:50aa:b0:48c:3731:e537 with SMTP id ada2fe7eead31-48dae346888mr15604114137.12.1718675737576; Mon, 17 Jun 2024 18:55:37 -0700 (PDT) MIME-Version: 1.0 References: <1717492460-19457-1-git-send-email-yangge1116@126.com> <2e3a3a3f-737c-ed01-f820-87efee0adc93@126.com> <9b227c9d-f59b-a8b0-b353-7876a56c0bde@126.com> In-Reply-To: <9b227c9d-f59b-a8b0-b353-7876a56c0bde@126.com> From: Barry Song <21cnbao@gmail.com> Date: Tue, 18 Jun 2024 09:55:24 +0800 Message-ID: Subject: Re: [PATCH] mm/page_alloc: skip THP-sized PCP list when allocating non-CMA THP-sized page To: yangge1116 Cc: akpm@linux-foundation.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, baolin.wang@linux.alibaba.com, liuzixing@hygon.cn Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: BF4A82000B X-Stat-Signature: k1mc7gsedj8gz3tn1ykx5mj8jskrkpg5 X-Rspam-User: X-HE-Tag: 1718675738-518548 X-HE-Meta: U2FsdGVkX1/xEzGh0oXdtIOjqR5RBLSjqJoVWq5Ozn9iSO2xYY+HRZ7celv7fI002WfG93FI6ak4Z6yAKUQBCRItQML0MFBGvXFCO6Bl1POA3+USReGKonGPx4wHsm9S6zq7dVeDdDceAT7aFVX4BX2BFlV+MMlSn6Aovi+bi4kZ0izNwz4YFVA5meYHM0neaMjjiG0X7iK3EHmItJFlFCkyjoTzaJdkZlzoPX0TWyD65mylap9g1THUri/vT2lJbDf6J49t+6k1UC9R7Bjh3E9gwvZ5jsDBWMFh1HWKo5l2ID49/DoOhmOdr5AiKbt6ftezbqYLrAGqRQpOy7jCw8/tU2aaf8ci7XC8sMg7RiXaoXpqwRaxvgV7KCIygfwcAwtpdfo8N/bvSYy6V8QUbjm2Rah5I6T9g1QMSWYmhe69+brOYWjvMqZuwNZ0vY6hEEJ7oVQE34WsGKIW7zudmw/iBXRqrZ5irbP4sTFYnMwV2MKQ6Jvob9zPOM56mlvKObKcC1/9R85i8ZvWF6BnCLWnMp/SiWInx1iducIJkPb2X7dzZZPEP2MnB/45h2Pw8uWKPSYSif/hdQLV29L/SXm/1vRi7Y98s1lePYeRPPku+cA6ZJJ3q8XSfgvogSdUNS1/bJyl9aGQ9B7NL5/18YJVGf3XoNKclSAf5OW2oNf55I49I0o9a1bHRJ5cjb5GFHAPGw4VBU88y5MH4B0MGWybrdr5k3h4PfbbE5OzKDNTL+psz3b0fapyuowCnU0AgpXoUcUCSRMrsy5e02wi8j1pexKf+x8zrMOxfVGuIJ0kC5DMUDN6e6325H51ubWUn0NDH4WWHh2/RiJy+7COZB4vJGFOKUI6ui6J+kBgXcx4X0WRlTLNUKZyv5hkWnftTq1bRNDAh97iMZmSakWpQpdNwio9hzgyVyXC4ZXEJmCOI1zjt9sVhMKIjXJcmyAIDtjENt/BbvR79aJgxY3 GcIdqHJ4 P0woBl6bLtrSfgTN/yFQXu+MqWHgdNganin3cU9Pka/Odp93lJ1f26LOicQw2+vKtUspfJyCxHgWhNqmlthSJljgxINBwYxSuD+8y8KtlsOo90dtbzCkXeFnuzKkmbTDhlerliBpGKDy9ACBYR5pHXy/oba5WPStKSqjDrte8uPJhL/AdpBBfGHD3D17nkIL3fYfpqbqXqGs+YUhsLKHzrepcTSig8NuSnqJ5YcrbiSO+kBrqu+G4ska5IXIEpOr3Rn1pr/yPT3Zgcw/hH9F0ki/NKd0oNB0kIFk/1j3xyQKSh0rtx5XuuEe5Bei6vp6woRE2Kw5HbsJEXz7A7+09dzusnOPH9C6gurGCWV7exKAStZ3kWdJTUIl3vafU1YDNTbq6bNYNiuE0HJdeFYQIWDDpvFS7B6d6IVFmgkoPFg17X58R9DYSEQpqclh/KP6V7sK6upk3dfhEN4b6aIlSi0U8joGp4dKzxOAKougmbzWKDW0= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 18, 2024 at 9:36=E2=80=AFAM yangge1116 wro= te: > > > > =E5=9C=A8 2024/6/17 =E4=B8=8B=E5=8D=888:47, yangge1116 =E5=86=99=E9=81=93= : > > > > > > =E5=9C=A8 2024/6/17 =E4=B8=8B=E5=8D=886:26, Barry Song =E5=86=99=E9=81= =93: > >> On Tue, Jun 4, 2024 at 9:15=E2=80=AFPM wrote: > >>> > >>> From: yangge > >>> > >>> Since commit 5d0a661d808f ("mm/page_alloc: use only one PCP list for > >>> THP-sized allocations") no longer differentiates the migration type > >>> of pages in THP-sized PCP list, it's possible to get a CMA page from > >>> the list, in some cases, it's not acceptable, for example, allocating > >>> a non-CMA page with PF_MEMALLOC_PIN flag returns a CMA page. > >>> > >>> The patch forbids allocating non-CMA THP-sized page from THP-sized > >>> PCP list to avoid the issue above. > >> > >> Could you please describe the impact on users in the commit log? > > > > If a large number of CMA memory are configured in the system (for > > example, the CMA memory accounts for 50% of the system memory), startin= g > > virtual machine with device passthrough will get stuck. > > > > During starting virtual machine, it will call pin_user_pages_remote(...= , > > FOLL_LONGTERM, ...) to pin memory. If a page is in CMA area, > > pin_user_pages_remote() will migrate the page from CMA area to non-CMA > > area because of FOLL_LONGTERM flag. If non-movable allocation requests > > return CMA memory, pin_user_pages_remote() will enter endless loops. > > > > backtrace: > > pin_user_pages_remote > > ----__gup_longterm_locked //cause endless loops in this function > > --------__get_user_pages_locked > > --------check_and_migrate_movable_pages //always check fail and continu= e > > to migrate > > ------------migrate_longterm_unpinnable_pages > > ----------------alloc_migration_target // non-movable allocation > > > >> Is it possible that some CMA memory might be used by non-movable > >> allocation requests? > > > > Yes. > > > > > >> If so, will CMA somehow become unable to migrate, causing cma_alloc() > >> to fail? > > > > > > No, it will cause endless loops in __gup_longterm_locked(). If > > non-movable allocation requests return CMA memory, > > migrate_longterm_unpinnable_pages() will migrate a CMA page to another > > CMA page, which is useless and cause endless loops in > > __gup_longterm_locked(). This is only one perspective. We also need to consider the impact on CMA itself. For example, when CMA is borrowed by THP, and we need to reclaim it through cma_alloc() or dma_alloc_coherent(), we must move those pages out to ensure CMA's users can retrieve that contiguous memory. Currently, CMA's memory is occupied by non-movable pages, meaning we can't relocate them. As a result, cma_alloc() is more likely to fail. > > > > backtrace: > > pin_user_pages_remote > > ----__gup_longterm_locked //cause endless loops in this function > > --------__get_user_pages_locked > > --------check_and_migrate_movable_pages //always check fail and continu= e > > to migrate > > ------------migrate_longterm_unpinnable_pages > > > > > > > > > > > >>> > >>> Fixes: 5d0a661d808f ("mm/page_alloc: use only one PCP list for > >>> THP-sized allocations") > >>> Signed-off-by: yangge > >>> --- > >>> mm/page_alloc.c | 10 ++++++++++ > >>> 1 file changed, 10 insertions(+) > >>> > >>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c > >>> index 2e22ce5..0bdf471 100644 > >>> --- a/mm/page_alloc.c > >>> +++ b/mm/page_alloc.c > >>> @@ -2987,10 +2987,20 @@ struct page *rmqueue(struct zone > >>> *preferred_zone, > >>> WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > >>> > >>> if (likely(pcp_allowed_order(order))) { > >>> +#ifdef CONFIG_TRANSPARENT_HUGEPAGE > >>> + if (!IS_ENABLED(CONFIG_CMA) || alloc_flags & > >>> ALLOC_CMA || > >>> + order !=3D > >>> HPAGE_PMD_ORDER) { > >>> + page =3D rmqueue_pcplist(preferred_zone, zone= , > >>> order, > >>> + migratetype, > >>> alloc_flags); > >>> + if (likely(page)) > >>> + goto out; > >>> + } > >> > >> This seems not ideal, because non-CMA THP gets no chance to use PCP. > >> But it > >> still seems better than causing the failure of CMA allocation. > >> > >> Is there a possible approach to avoiding adding CMA THP into pcp from > >> the first > >> beginning? Otherwise, we might need a separate PCP for CMA. > >> > > The vast majority of THP-sized allocations are GFP_MOVABLE, avoiding > adding CMA THP into pcp may incur a slight performance penalty. > But the majority of movable pages aren't CMA, right? Do we have an estimate= for adding back a CMA THP PCP? Will per_cpu_pages introduce a new cacheline, wh= ich the original intention for THP was to avoid by having only one PCP[1]? [1] https://patchwork.kernel.org/project/linux-mm/patch/20220624125423.6126= -3-mgorman@techsingularity.net/ > Commit 1d91df85f399 takes a similar approach to filter, and I mainly > refer to it. > > > >>> +#else > >>> page =3D rmqueue_pcplist(preferred_zone, zone, order= , > >>> migratetype, alloc_flags); > >>> if (likely(page)) > >>> goto out; > >>> +#endif > >>> } > >>> > >>> page =3D rmqueue_buddy(preferred_zone, zone, order, alloc_fl= ags, > >>> -- > >>> 2.7.4 > >> > >> Thanks > >> Barry > >> > >