From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 77E8EC02181 for ; Wed, 22 Jan 2025 04:06:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ADCA86B0082; Tue, 21 Jan 2025 23:06:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A8CEA6B0083; Tue, 21 Jan 2025 23:06:31 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9540A6B0085; Tue, 21 Jan 2025 23:06:31 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 771796B0082 for ; Tue, 21 Jan 2025 23:06:31 -0500 (EST) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id E71E71612E1 for ; Wed, 22 Jan 2025 04:06:30 +0000 (UTC) X-FDA: 83033750940.17.04B9312 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf15.hostedemail.com (Postfix) with ESMTP id 23CADA0008 for ; Wed, 22 Jan 2025 04:06:28 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=S4O8XoR3; spf=pass (imf15.hostedemail.com: domain of jyescas@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=jyescas@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1737518789; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ygudRL45toiZe9r4RMUKfSSskdF3y1a6h7WiGzlxGBU=; b=ktcR+/GHqsHPah7wCWN+hOPBBF6paGLcWqYyGOTnPKylk9x9w3ucqQL4aPjnUhnwjQgkkx pFuEwvDC8k9lPvR7BHfQylSjjJ3mdgQ1/7GNpEB7id8ECDwpxe8HGUEWA2UXcLxSOwstVI llP9zluI0cjoyl9/sdsBvFs+CDKdhQw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1737518789; a=rsa-sha256; cv=none; b=IWNWJdS9evX3a+APHuavpriZfkZ9OczQsE08bOJfkHwpMllrNUtvITZy3cG7cT7Zsr9aa8 mbxZDPAyrV7+hGmoij7PA/uvsxC1nGKTLor6mJB3uWoy3ATr0LQax0Nqq+38D7mYLcbDj0 aKOyXZ/g0ncbEt4iXIqztCVfpx6OSJ8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=S4O8XoR3; spf=pass (imf15.hostedemail.com: domain of jyescas@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=jyescas@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-467abce2ef9so166101cf.0 for ; Tue, 21 Jan 2025 20:06:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1737518788; x=1738123588; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ygudRL45toiZe9r4RMUKfSSskdF3y1a6h7WiGzlxGBU=; b=S4O8XoR3EXB8ABmHCM+4EPX5Riz3oZmmeccLH65nAbfUVPnWAPyOAAXDOQ2SzK8FGT abLGSaC26R0VCyAoonc8irLXEH3hQFuqbUbuTY0GMYh1QNWS1L+U0bfICPmv6+KWr+jR kx5D7zRMgmlA2efCMd4O38/YWVRJgKwzzCsWo9Xzhcr/g/l2yibLvdTOUJ7QDaaYl4i5 bqvbTMJWEHWVG6q3UAV3U1MQi4ZnrG2Z00h/UkdkSZycFbZUG6I87iTgN1KVTxSvxiQZ MH3+PP9/eyeEv+1XoBg8dR/8JtGGdVrYIhniFzkrejlQykeJcheNEpp3YFeCfmqdUQ7A z2oA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1737518788; x=1738123588; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ygudRL45toiZe9r4RMUKfSSskdF3y1a6h7WiGzlxGBU=; b=nrK0jA5OOl/3p0tG0FiYhl0T1gzzZ9iYDFDIHxDnXfzj6eoi/KxckuCPWHH3GaYEWJ a4bFc30zWt0cZinu5rKFh9grVrGAhmawL/RFimS6C5PXsEyjryZx0xHEQ1A9QZpnDeeV Va0WvRG8aLjgBH5pZmSAqOJj48g43vEP559xxeKAKd9mmTuDvwzFI6hk82BWvY/Yt6Bh Fjvm4+SzmPdCOox/M9nIu3D7LEd1NLCtaA5s/mtkgkoq1FlypUKCdcr6MKwqb4nxPjsR zUU2Y39fFu1bOit+kSY5zmJHSorsxPHdZKljkVgGj4Tou98Z6Hd50N+SRfrKWwGrEkL7 J2+A== X-Forwarded-Encrypted: i=1; AJvYcCUKb+Ws6gmxHMAuwq6cWLkS5h/RC0bBl0L/XkYd/TI1Vyxyvhvo78kZsMF7jXa9dybA3XN9gFJxfQ==@kvack.org X-Gm-Message-State: AOJu0Yx35WxhJ4yidQYynt1uOIWX5inAgY9fCfQITXPjUuWd3Ynt9M/h XjJmBpDe8ESr0rnaF3NrZrv85mUh9MwQQ/lQyTEJrTbid6DhUnImFmSnLVFftBEAdYPaOwP+Plv ZtGpYN7hM182oCM1wP2rcDP1dCWUfn2t1Hgkl X-Gm-Gg: ASbGnctGsy3aZAdYrQNPjC48Pgax7dIZzEELz/cnj5OxuXxX6eo1o4bkP6uOTCzM4yO oejuUWUeHauVFLTuVCMLpgj2f3H+9+A3pL6LEQSbOpEvARROdinpGhHBAUxOXZlOrFA3s9RhpEr 7IqAtMyAlOrA== X-Google-Smtp-Source: AGHT+IG2LaHq4A5MMwzY8D+dB+5yO5ZwYBtVJItB5wK+CizwdnLHq5iw7PdsK6EWxfucshJ4eKsATHkq2TBM4CuYw98= X-Received: by 2002:a05:622a:5888:b0:466:8887:6751 with SMTP id d75a77b69052e-46e51478e0fmr1134161cf.23.1737518788022; Tue, 21 Jan 2025 20:06:28 -0800 (PST) MIME-Version: 1.0 References: <463eb421-ac16-435c-b0a0-51a6a92168f6@redhat.com> <8f36d3ca-3a31-4fc4-9eaa-c53ee84bf6e7@redhat.com> In-Reply-To: From: Juan Yescas Date: Tue, 21 Jan 2025 20:06:16 -0800 X-Gm-Features: AbW1kvblyjFN-oBIHFYtjuiuhMRZxSaYAbLHq9uZrGket4w8JCA_BONEVDpCCmI Message-ID: Subject: Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel. To: Zi Yan Cc: David Hildenbrand , Barry Song <21cnbao@gmail.com>, linux-mm@kvack.org, muchun.song@linux.dev, rppt@kernel.org, osalvador@suse.de, akpm@linux-foundation.org, lorenzo.stoakes@oracle.com, Jann Horn , Liam.Howlett@oracle.com, minchan@kernel.org, jaewon31.kim@samsung.com, Suren Baghdasaryan , Kalesh Singh , "T.J. Mercier" , Isaac Manjarres , iamjoonsoo.kim@lge.com, quic_charante@quicinc.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: yod6xw4z7y3g7tzrnw1c4ayaahzfgsg4 X-Rspam-User: X-Rspamd-Queue-Id: 23CADA0008 X-Rspamd-Server: rspam03 X-HE-Tag: 1737518788-714548 X-HE-Meta: U2FsdGVkX1/zlHmkJwXXiLxEjBUp3Xg6OULCmDXfgxVKpkNaCcQ+emmKIps/cGfhklTWh1/rsnpGM9hid2yhhdKYd0A9kL1Us8aoztCQAnuaLwTL1eFEBMiecLSIMpezRUs0iCt80yj51O1CVixpGXs4NT2cTr+hYHQ1y1fdKX2wm517NwNNv7+WVzL74hBQ5at9dQXm6LGjqporox4+E9Yvwi52uFHa+RFWofnLteOEE53+IXMjcfgFIBrlQQTjM8cGCCOV4/WHI8ldCmDjWGz/BZnho7DBbK+7poj3daJzE8H0mohxK4x5eAd/79curuHULcGooUzRTLQ7tFu2dW74c+ZsjrKk7fVv1f9T1TkR2sgI70VM/GU3VKThvyxyY+lnpTJofMOJfTsR6vlvezKEFG1Vl0uDD0DZ5rHsh697ToZFw/NRDySzLxwze1xexZiF9Tz5msk8oqRLzFchZfp41FtW8+3dJOK0346E8i2mexW1WCtToPkVmwRxZ2y5DbaxYKgO/5njsKf+NYgdR8Qu21M+TMKD8wtEaGG1CLNzOf9PERiwv8R5+3scEU+rSeGkkvLEjNTmofYwKFV5kXzK21Jvw3oE/W3WWoh8fL3qdo7XY75jB0EhiDNw5/d2B4nGG5s5/7iVzPk7KA1aeohvLrj8Sd2lwVe4fFuYnpls9WHRlFa/9RXiFnoNhEFqzFbaXxtyxJiFXUpTkKSiZr46TtU87CMNr17OQgwxkgh3mrzujiqJlIl46YtO47MXAklbLI53e5I3H8OjmifPBIhrtS3qOOoQnwBsu6AQbhiIabvXNXjTOwo/EbH/Eujp4OdQF6XPGJUv//JQCtQ4qgKp2MByyaZMz18tFWw5All48Sk8eQfL4D5tYTVRtv+dEl/4OXeo2DgNTXkcACVgR+EIJsALo+KxiAXR9Oa+FujX114Bt8wAKDAVvI7hHM5EXHflumJZZuuOb+unFvf +tuMdgkk fyXoQsmtLibTOmC0RQGJ8Vmz56z2bgXRJ/UVIQJueZ6JVJhLwDksKUK3BfGyxh8of6aoQvFP+QvzMl0/GbL6AfVdal32y68qE8LUT1CteFfHpPklvPWqad6oz3MvM2CahA4XlwmiHzKBOIngPKRCk0Qv2HjZf0RRZb3wDvuKwLKzDZDB+JsZGzgiBjEhG+0Ai7FRvAi23cBC0aDgPFE5Ydd65c0bhEPS5yBbZJ4QlgJc5QvGj3O5iPY7cZQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jan 21, 2025 at 6:24=E2=80=AFPM Zi Yan wrote: > > On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote: > > On Mon, Jan 20, 2025 at 9:59=E2=80=AFAM David Hildenbrand wrote: > > > > > > On 20.01.25 16:29, Zi Yan wrote: > > > > On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote: > > > >> On 20.01.25 01:39, Zi Yan wrote: > > > >>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote: > > > >>> > > > >>>>>>>> > > > >>>>>>>> > > > >>>>>>>> However, with this workaround, we can't use transparent huge= pages. > > > >>>>>>>> > > > >>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to= support huge pages? > > > >>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAG= ES, which > > > >>>>> is equal to pageblock size. Enabling THP just bumps the pageblo= ck size. > > > >>>> > > > > Thanks, I can see the initialization in include/linux/pageblock-flags.h > > > > #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAG= E_ORDER) > > > > > >>>> Currently, THP might be mTHP, which can have a significantly sma= ller > > > >>>> size than 32MB. For > > > >>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE= mTHP > > > >>>> is possible. > > > >>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE con= figuration. > > > >>>> > > > >>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE > > > >>>> without necessarily > > > >>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a= large > > > >>>> pageblock size wouldn't > > > >>>> be necessary? > > > > Do you mean with mTHP? We haven't explored that option. > > Yes. Unless your applications have special demands for PMD THPs. 2MB > mTHP should work. > > > > > > >>> > > > >>> I think this should work by reducing MAX_PAGE_ORDER like Juan did= for > > > >>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig = needs > > > >>> to be changed and kernel needs to be recompiled. Not sure if it i= s OK > > > >>> for Juan's use case. > > > >> > > > > The main goal is to reserve only the necessary CMA memory for the > > drivers, which is > > usually the same for 4kb and 16kb page size kernels. > > Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the > minimal CMA alignment size. Can you deploy that kernel to production? We can't deploy that because many Android partners are using PMD THP instea= d of mTHP. > If yes, you can use mTHP instead of PMD THP and still get the CMA > alignemnt you want. > > > > > > >> > > > >> IIRC, we set pageblock size =3D=3D THP size because this is the gr= anularity > > > >> we want to optimize defragmentation for. ("try keep pageblock > > > >> granularity of the same memory type: movable vs. unmovable") > > > > > > > > Right. In past, it is optimized for PMD THP. Now we have mTHP. If u= ser > > > > does not care about PMD THP (32MB in ARM64 16KB base page case) and= mTHP > > > > (2MB mTHP here) is good enough, reducing pageblock size works. > > > > > > > >> > > > >> However, the buddy already supports having different pagetypes for= large > > > >> allocations. > > > > > > > > Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, an= d > > > > MIGRATE_MOVABLE can be merged. > > > > > > Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine. > > > > > > > > > > >> > > > >> So we could leave MAX_ORDER alone and try adjusting the pageblock = size > > > >> in these setups. pageblock size is already variable on some > > > >> architectures IIRC. > > > > > > > > Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the > > 16KiB page size kernel, > > I tried these 2 configurations: > > > > #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES) > > > > and > > > > #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES) > > > > with both of them, the kernel failed to boot. > > CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES. > So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock > size. pageblock size is determined by pageblock order, which is > affected by MAX_PAGE_ORDER. > > > > > > > Making pageblock size a boot time variable? We might want to warn > > > > sysadmin/user that >pageblock_order THP/mTHP creation will suffer. > > > > > > Yes, some way to configure it. > > > > > > > > > > >> > > > >> We'd only have to check if all of the THP logic can deal with page= block > > > >> size < THP size. > > > > > > > > The reason that THP was disabled in my experiment is because this > > assertion failed > > > > mm/huge_memory.c > > /* > > * hugepages can't be allocated by the buddy allocator > > */ > > MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER); > > > > when > > > > config ARCH_FORCE_MAX_ORDER > > int > > ..... > > default "8" if ARM64_16K_PAGES > > > > You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works. > We'll do that and post the results. > > > > > > Probably yes, pageblock should be independent of THP logic, althoug= h > > > > compaction (used to create THPs) logic is based on pageblock. > > > > > > Right. As raised in the past, we need a higher level mechanism that > > > tries to group pageblocks together during comapction/conversion to li= mit > > > fragmentation on a higher level. > > > > > > I assume that many use cases would be fine with not using 32MB/512MB > > > THPs at all for now -- and instead using 2 MB ones. Of course, for ve= ry > > > large installations it might be different. > > > > > > >> > > > >> This issue is even more severe on arm64 with 64k (pageblock =3D 51= 2MiB). > > > > > > > > I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we = get: > > > > PAGE_SIZE | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES > > 4KiB | 15 | 4KiB > > * 32KiB =3D 128MiB > > 16KiB | 13 | 16KiB > > * 8KiB =3D 128MiB > > 64KiB | 13 | 64KiB > > * 8KiB =3D 512MiB > > > > > > This is also good for virtio-mem, since the offline memory block si= ze > > > > can also be reduced. I remember you complained about it before. > > > > > > Yes, yes, yes! :) > > > > > David's proposal should work in general, but will might take non-trivial > amount of work: > > 1. keep pageblock size always at 4MB for all arch. > 2. adjust existing pageblock users, like compaction, to work on a > different range, independent of pageblock. > a. for anti-fragmentation mechanism, multiple pageblocks might have > different migratetypes but would be compacted to generate huge > pages, but how to align their migratetypes is TBD. > 3. other corner case handlings. > > > The final question is that Barry mentioned that over-reserved CMA areas > can be used for movable page allocations. Why does it not work for you? I need to run more experiments to see what type of page allocations in the system is the dominant one (unmovable or movable). If it is movable, over-reserved CMA areas should be fine. > > -- > Best Regards, > Yan, Zi >