From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1678C2D0CD for ; Wed, 21 May 2025 16:52:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6FAA66B008C; Wed, 21 May 2025 12:52:14 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D29D6B0095; Wed, 21 May 2025 12:52:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5E86C6B0096; Wed, 21 May 2025 12:52:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 428776B008C for ; Wed, 21 May 2025 12:52:14 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id E7329140C79 for ; Wed, 21 May 2025 16:52:13 +0000 (UTC) X-FDA: 83467507746.11.C5BC6AB Received: from mail-qt1-f173.google.com (mail-qt1-f173.google.com [209.85.160.173]) by imf20.hostedemail.com (Postfix) with ESMTP id 10FDF1C0004 for ; Wed, 21 May 2025 16:52:11 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Ye7fsYMd; spf=pass (imf20.hostedemail.com: domain of jyescas@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=jyescas@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747846332; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Xzr5DN7c10BBUBlQ2M+z0tfhq4oVhDixEXUCPIoodw4=; b=KG/BBbsrrPxtXnqEDJ4PLSIpC0IvstsrsE4CChMn6L0zE/dnc/Nz0CukGosoQPIKtkA47w fUF+c/TI9vS/WswvgUf6Y+iOWuG2O9PuLZxeiYttqS8u6uhnjeClLQtNvOTMtUzeyYSOL9 CUSkSug5lafkTJXyq0RqefRdFAHtyFM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747846332; a=rsa-sha256; cv=none; b=Ce+788kuqa5M6x2cpE8PBUe4He/yeC8xGi3gVUr9StH65+sy6qKWeinJlpsoHInfk1DiKn cw5qRwScAlVcnIkF+i4TgiBHIY5B8GnlJagGyaZt1aHi0bB81D2nymgkqgvH2tgi5wfiTn zTcHkbwN3Q2KgO+Xx36VQfP69RCqhew= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Ye7fsYMd; spf=pass (imf20.hostedemail.com: domain of jyescas@google.com designates 209.85.160.173 as permitted sender) smtp.mailfrom=jyescas@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-qt1-f173.google.com with SMTP id d75a77b69052e-47e9fea29easo1802341cf.1 for ; Wed, 21 May 2025 09:52:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1747846331; x=1748451131; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Xzr5DN7c10BBUBlQ2M+z0tfhq4oVhDixEXUCPIoodw4=; b=Ye7fsYMdV1OXTMa9FEJsWnGqG54VdGwGUN2TXAF0Th9AwtPQaPAvQneUNCM4vaJet7 T1MVa1EiHr9Ao5eaMF139oc8nznF+wqQnkYRbLhvmLf/vPuaX6SYnympFxcjJRpg/yrI nUa2nC9H2tI4Yy9mfcVl+jPz/YmzYxmcyB8jPKT9BgOjpsQZYA2Q4GBv/VIakVzP9skb BgMTfpDOLQJExjYWYou4A/BzuGpJZ8K9nr8OqAHKLo5EnMWAzDal7GAiK5RW5jqLG6Vb hOiadPj1jDQl9EmxF1lI7P9QJNI8e3mSpBa0mz4AtPms+edPQEb8aIVKFSR6Cd2rHuIO fiUw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747846331; x=1748451131; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Xzr5DN7c10BBUBlQ2M+z0tfhq4oVhDixEXUCPIoodw4=; b=weDVVCjfubhPzV+Q5aFm9ICfL6lBKIqAc14bLypPpzvJIlLzdwtPFKz6DfPVhz90Et 8xg/sadVE9GEmPNB+89707HN0gEl7tmxRiGXgpPj76OR3fLWOqb5yAnG5HOG9ds5MFL8 AGYUtjnudBMnRnGdLg/8pg/Mz/KCq8+DdCwYaYxx4Wat2uKkqNNtqXQ2L2RkJEtLNLv5 SYoVHopG1peSaFiqltDRuUIk5X7sKMF4Li+Mo2kBoHmB0bcJzDYr0psMjj1+M/jC+/n/ 7Qxjlg51EAc+Ibpd4OoAJqkTSDUA56wv4hJlukpQozpWxtSnzNsxFCNd+Oa2JtCChJ+N kB7Q== X-Forwarded-Encrypted: i=1; AJvYcCWAJcSyqKax27DPGTopHHmaNjJjdERfGQYf9LEJrZH8lnoNVnVZm532al4wG7rAVKHy5VCrjXbXbg==@kvack.org X-Gm-Message-State: AOJu0YwVInlV5D9ibC8mhUhUYYUsKHq2VdblqQNpqy5cOcTF1Md4+KWM iKSKJlWIK2uKDVDsDkXGzo1s/OMc/RzNWhyauXAzZYtiYA3k7ibNOFUVm83/XDwQd1J3Uc8XB+o luG48N8qETpOkK6iXc7UhWtzo/TU7FyfyOmnISo+G X-Gm-Gg: ASbGncuA7ey4VBwSPSxqYA5B4xrlg8FYPUBX5C7puXn2917rTY5kNkaERu2bY0gUD2Q nDlUnHlKkanhjRdTkppi/41G92r/JB4eTqF3zr3WGFV9pJ+uZw0nHrj/Vz1LC26PhofyfESSyvA inGfH4QdNiDq7Lpyw7q2pcZ3iFMzoKUSypt7JZRu4Df6P8Mzyi0BcgNo6wjXFi4b8r6lz+OAF24 wNxZg== X-Google-Smtp-Source: AGHT+IGkNe6g6hLI65pZPw+VIFqBt2XeJtJ7r0s1dnUHwABvl0FENopO0L4DnTuzFvXZH6CzSDb2Pt/F3IDOE3dNzY4= X-Received: by 2002:a05:622a:1826:b0:494:af4b:59fd with SMTP id d75a77b69052e-49595d52c6emr18378601cf.18.1747846330794; Wed, 21 May 2025 09:52:10 -0700 (PDT) MIME-Version: 1.0 References: <20250520225945.991229-1-jyescas@google.com> <28a2881d-fd33-44d9-a212-adeb8600e15b@redhat.com> In-Reply-To: <28a2881d-fd33-44d9-a212-adeb8600e15b@redhat.com> From: Juan Yescas Date: Wed, 21 May 2025 09:51:58 -0700 X-Gm-Features: AX0GCFtzzMydHhPRZJlML7H8EGmro4cbDr5qsfwSbfNq5AjZkA7_wmFNUAGiUNo Message-ID: Subject: Re: [PATCH v6] mm: Add CONFIG_PAGE_BLOCK_ORDER to select page block order To: David Hildenbrand Cc: Andrew Morton , Lorenzo Stoakes , "Liam R. Howlett" , Vlastimil Babka , Mike Rapoport , Suren Baghdasaryan , Michal Hocko , Zi Yan , linux-mm@kvack.org, linux-kernel@vger.kernel.org, tjmercier@google.com, isaacmanjarres@google.com, kaleshsingh@google.com, masahiroy@kernel.org, Minchan Kim Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: tngsdqdoaxgacjc8aazeq7c8a63jsiqh X-Rspamd-Queue-Id: 10FDF1C0004 X-Rspam-User: X-Rspamd-Server: rspam02 X-HE-Tag: 1747846331-312165 X-HE-Meta: U2FsdGVkX1++6ToiivvP3QoEx6uwTR5ie3XqCbgh00QEmvsoTF6HOQDJdVEwwE+2a3z8VlQlNKFC/HirVZ5JyE83PbzmqCudS9aqbxypO9loA4ZrmnMv/bKO9McW4srYD88UApOshwptfT/MA8vgpiP1BPd3xswAoGzjG8biP2xgKjyQhMfFnwLRbBBf3aGjPy6jmOZBV58Up6aVqzYd/yhrV1M7PNkiTmVHO4q33851Mem984XWBBUWooRnpHYMBvxY7zCYiHrUS62/xT0VAePMB1rjTDkP8yEH6Iz74BahkqSvSpNOcazSpUPid+sbu5Fy8y/upLRd4GytH0LECAruCbo/P4jnznfBznEMz4OtK8MaQNZZvFK86PhhF3eWIfNonHwM5t9iT/5I71Y3+A9ktNgMh98icLnmQbbhEui38TDtku5MB91LzT12SJ/nK++zQfexnyUr8a6Gv1P2KN9BK54YKfJBRIlxptd71VLZA1RITSqTr50ds6cqGwhwAxVqgCIF35Y7mu6B+191+/s1ba7b2Clg4p+p+vGd7E7aTP4lxc8wpM6di3gMIqETaBwTBaqf+oOVnFo3RX/5/KVh1VQ3OCISsvJSocx0qXIJu95tsZw/A3R/bJ8Q2Mkd51ckU5ysv6/YfoZ92tAhQ2++0JiM1by6mA3ejbCilr0aJHlXjEdcBJnC6/dzGDDfnhR9WZg47ZDKKAAAkzDCAhVep2yO+orABNXjwsckjyPHwstAB3b7/O84O7gpARR483Kn2TlsghXeOJMg6PdCxcfj6ZNBCcDrnMXt4P3Vo4lyCoZ2umhLOHrlGq9XflqeZRoEZZhsiH6WAtWOhs9Q9QIOdf7fsHuUNIBFcAYfqoXfb6gZ8KmqK/PHBMCEjmZaQQuEdlnnuS+8Q54woDBl/DCw2yUc6PDKQ0u0R63Tj8NNPPZpx/ZPMhf5PnBL6OlCrnUyVSSIS7jaHPZYrEx mE1cHe55 Feal9bvJadOcvL4MUiaDaeRbJyyyTsadJ1xns5sI8y9O9PIXebCEChvoppsyI3byc/3DOEXJDgP43FtMSlzQUjFv0YTUeSy6HC/3YAyUp1l7le04OXFyxiNzSeWldiLF1Q3tseTxISTdchHuK6O18iHkCe1EBFQC82iPlLD86tpDECBsh0R89nO9M0VTM7Qw0WYj2rFq83TOKHBMKJaTaIvarXMMtbkSuEf9qWKLP3sCjA4oA7Qep0hL+dht4SQQTKcYYtMycV7MPl93HFalLv2phLEcA8bJ9emvihVlqdbSSPPfyUf0SwSYmB4zGfOJk6VoQtYLFG3d6RGjbPqa87QELr2AU+r42bo9jrS7aSiiICoer1Qd5gvyQHjoQRLKCVF8ozDoQL5EQ5PYAuueRM+3DniKiMx8IPnDHmmFzmqDOHds= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, May 20, 2025 at 11:47=E2=80=AFPM David Hildenbrand wrote: > > On 21.05.25 00:59, Juan Yescas wrote: > > Problem: On large page size configurations (16KiB, 64KiB), the CMA > > alignment requirement (CMA_MIN_ALIGNMENT_BYTES) increases considerably, > > and this causes the CMA reservations to be larger than necessary. > > This means that system will have less available MIGRATE_UNMOVABLE and > > MIGRATE_RECLAIMABLE page blocks since MIGRATE_CMA can't fallback to the= m. > > > > The CMA_MIN_ALIGNMENT_BYTES increases because it depends on > > MAX_PAGE_ORDER which depends on ARCH_FORCE_MAX_ORDER. The value of > > ARCH_FORCE_MAX_ORDER increases on 16k and 64k kernels. > > > > For example, in ARM, the CMA alignment requirement when: > > > > - CONFIG_ARCH_FORCE_MAX_ORDER default value is used > > - CONFIG_TRANSPARENT_HUGEPAGE is set: > > > > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES > > ----------------------------------------------------------------------- > > 4KiB | 10 | 10 | 4KiB * (2 ^ 10) =3D = 4MiB > > Why is pageblock_nr_pages 10 in that case? > > #define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_= PAGE_ORDER) > > So it should be 2 MiB (order-9)? > That is right. I will update the description to set it to 2 MiB. > > 16Kib | 11 | 11 | 16KiB * (2 ^ 11) =3D 3= 2MiB > > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) =3D 51= 2MiB > > > > There are some extreme cases for the CMA alignment requirement when: > > > > - CONFIG_ARCH_FORCE_MAX_ORDER maximum value is set > > - CONFIG_TRANSPARENT_HUGEPAGE is NOT set: > > - CONFIG_HUGETLB_PAGE is NOT set > > I think we should just always group at HPAGE_PMD_ORDER also in this case.= But that's > a different thing to sort out :) > > > > > PAGE_SIZE | MAX_PAGE_ORDER | pageblock_order | CMA_MIN_ALIGNMENT_BYTES > > -----------------------------------------------------------------------= - > > 4KiB | 15 | 15 | 4KiB * (2 ^ 15) =3D 12= 8MiB > > 16Kib | 13 | 13 | 16KiB * (2 ^ 13) =3D 12= 8MiB > > 64KiB | 13 | 13 | 64KiB * (2 ^ 13) =3D 51= 2MiB > > > > This affects the CMA reservations for the drivers. If a driver in a > > 4KiB kernel needs 4MiB of CMA memory, in a 16KiB kernel, the minimal > > reservation has to be 32MiB due to the alignment requirements: > > > > reserved-memory { > > ... > > cma_test_reserve: cma_test_reserve { > > compatible =3D "shared-dma-pool"; > > size =3D <0x0 0x400000>; /* 4 MiB */ > > ... > > }; > > }; > > > > reserved-memory { > > ... > > cma_test_reserve: cma_test_reserve { > > compatible =3D "shared-dma-pool"; > > size =3D <0x0 0x2000000>; /* 32 MiB */ > > ... > > }; > > }; > > > > Solution: Add a new config CONFIG_PAGE_BLOCK_ORDER that > > allows to set the page block order in all the architectures. > > The maximum page block order will be given by > > ARCH_FORCE_MAX_ORDER. > > > > By default, CONFIG_PAGE_BLOCK_ORDER will have the same > > value that ARCH_FORCE_MAX_ORDER. This will make sure that > > current kernel configurations won't be affected by this > > change. It is a opt-in change. > > > > This patch will allow to have the same CMA alignment > > requirements for large page sizes (16KiB, 64KiB) as that > > in 4kb kernels by setting a lower pageblock_order. > > > > Tests: > > > > - Verified that HugeTLB pages work when pageblock_order is 1, 7, 10 > > on 4k and 16k kernels. > > > > - Verified that Transparent Huge Pages work when pageblock_order > > is 1, 7, 10 on 4k and 16k kernels. > > > > - Verified that dma-buf heaps allocations work when pageblock_order > > is 1, 7, 10 on 4k and 16k kernels. > > > > Benchmarks: > > > > The benchmarks compare 16kb kernels with pageblock_order 10 and 7. The > > reason for the pageblock_order 7 is because this value makes the min > > CMA alignment requirement the same as that in 4kb kernels (2MB). > > > > - Perform 100K dma-buf heaps (/dev/dma_heap/system) allocations of > > SZ_8M, SZ_4M, SZ_2M, SZ_1M, SZ_64, SZ_8, SZ_4. Use simpleperf > > (https://developer.android.com/ndk/guides/simpleperf) to measure > > the # of instructions and page-faults on 16k kernels. > > The benchmark was executed 10 times. The averages are below: > > > > # instructions | #page-faults > > order 10 | order 7 | order 10 | order 7 > > -------------------------------------------------------- > > 13,891,765,770 | 11,425,777,314 | 220 | 217 > > 14,456,293,487 | 12,660,819,302 | 224 | 219 > > 13,924,261,018 | 13,243,970,736 | 217 | 221 > > 13,910,886,504 | 13,845,519,630 | 217 | 221 > > 14,388,071,190 | 13,498,583,098 | 223 | 224 > > 13,656,442,167 | 12,915,831,681 | 216 | 218 > > 13,300,268,343 | 12,930,484,776 | 222 | 218 > > 13,625,470,223 | 14,234,092,777 | 219 | 218 > > 13,508,964,965 | 13,432,689,094 | 225 | 219 > > 13,368,950,667 | 13,683,587,37 | 219 | 225 > > ------------------------------------------------------------------- > > 13,803,137,433 | 13,131,974,268 | 220 | 220 Averages > > > > There were 4.85% #instructions when order was 7, in comparison > > with order 10. > > > > 13,803,137,433 - 13,131,974,268 =3D -671,163,166 (-4.86%) > > > > The number of page faults in order 7 and 10 were the same. > > > > These results didn't show any significant regression when the > > pageblock_order is set to 7 on 16kb kernels. > > > > - Run speedometer 3.1 (https://browserbench.org/Speedometer3.1/) 5 time= s > > on the 16k kernels with pageblock_order 7 and 10. > > > > order 10 | order 7 | order 7 - order 10 | (order 7 - order 10) % > > ------------------------------------------------------------------- > > 15.8 | 16.4 | 0.6 | 3.80% > > 16.4 | 16.2 | -0.2 | -1.22% > > 16.6 | 16.3 | -0.3 | -1.81% > > 16.8 | 16.3 | -0.5 | -2.98% > > 16.6 | 16.8 | 0.2 | 1.20% > > ------------------------------------------------------------------- > > 16.44 16.4 -0.04 -0.24% Averages > > > > The results didn't show any significant regression when the > > pageblock_order is set to 7 on 16kb kernels. > > > > Sorry for the late reply. I think using a bootime option might have saved= us > some of the headake. :) No worries. The bootime option sounds good, however, there are these tradeoffs: - bootloader needs to be updated to find out the kernel page size and calcu= late the pageblock_order to pass to the kernel. - if the pageblock_order changes, it is likely that some CMA reservations might need to be updated, so the DTS needs to be compiled. > [...] > > > +/* Defines the order for the number of pages that have a migrate type.= */ > > +#ifndef CONFIG_PAGE_BLOCK_ORDER > > +#define PAGE_BLOCK_ORDER MAX_PAGE_ORDER > > +#else > > +#define PAGE_BLOCK_ORDER CONFIG_PAGE_BLOCK_ORDER > > +#endif /* CONFIG_PAGE_BLOCK_ORDER */ > > + > > +/* > > + * The MAX_PAGE_ORDER, which defines the max order of pages to be allo= cated > > + * by the buddy allocator, has to be larger or equal to the PAGE_BLOCK= _ORDER, > > + * which defines the order for the number of pages that can have a mig= rate type > > + */ > > +#if (PAGE_BLOCK_ORDER > MAX_PAGE_ORDER) > > +#error MAX_PAGE_ORDER must be >=3D PAGE_BLOCK_ORDER > > +#endif > > +> /* > > * PAGE_ALLOC_COSTLY_ORDER is the order at which allocations are deem= ed > > * costly to service. That is between allocation orders which should > > diff --git a/include/linux/pageblock-flags.h b/include/linux/pageblock-= flags.h > > index fc6b9c87cb0a..e73a4292ef02 100644 > > --- a/include/linux/pageblock-flags.h > > +++ b/include/linux/pageblock-flags.h > > @@ -41,18 +41,18 @@ extern unsigned int pageblock_order; > > * Huge pages are a constant size, but don't exceed the maximum alloc= ation > > * granularity. > > */ > > How is CONFIG_HUGETLB_PAGE_SIZE_VARIABLE handled? That is a powepc configuration, and the pageorder_order variable is initialized in: mm/mm_init.c #ifdef CONFIG_HUGETLB_PAGE_SIZE_VARIABLE /* Initialise the number of pages represented by NR_PAGEBLOCK_BITS */ void __init set_pageblock_order(void) { unsigned int order =3D MAX_PAGE_ORDER; /* Check that pageblock_nr_pages has not already been setup */ if (pageblock_order) return; /* Don't let pageblocks exceed the maximum allocation granularity. */ if (HPAGE_SHIFT > PAGE_SHIFT && HUGETLB_PAGE_ORDER < order) order =3D HUGETLB_PAGE_ORDER; /* * Assume the largest contiguous order of interest is a huge page. * This value may be variable depending on boot parameters on powerpc. */ pageblock_order =3D order; } Should this line be updated? https://elixir.bootlin.com/linux/v6.15-rc7/source/mm/mm_init.c#L1513 unsigned int order =3D MAX_PAGE_ORDER; > > -#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_= ORDER, MAX_PAGE_ORDER) > > +#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_= ORDER, PAGE_BLOCK_ORDER) > > > > #endif /* CONFIG_HUGETLB_PAGE_SIZE_VARIABLE */ > > > > #elif defined(CONFIG_TRANSPARENT_HUGEPAGE) > > > > -#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORD= ER, MAX_PAGE_ORDER) > > +#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORD= ER, PAGE_BLOCK_ORDER) > > Wait, why are we using the MIN_T in that case? If someone requests 4 MiB,= why would we reduce > it to 2 MiB even though MAX_PAGE_ORDER allows for it? > I don't have the context for that change. I think Vlastimil might know why it is needed: That change was introduced in this patch: https://lore.kernel.org/all/20240426040258.AD47FC113CD@smtp.kernel.org/ Thanks Juan > > Maybe we really have to clean all that up first :/ > > -- > Cheers, > > David / dhildenb >