mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
@ 2025-01-17 22:51 Juan Yescas
  2025-01-17 22:52 ` Juan Yescas
  0 siblings, 1 reply; 21+ messages in thread
From: Juan Yescas @ 2025-01-17 22:51 UTC (permalink / raw)
  To: linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, 21cnbao, minchan,
	jaewon31.kim, charante

Hi Linux memory team

When the drivers reserve CMA memory in 16KiB kernels, the minimum
alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
kernels, the CMA alignment is 4MiB.

This is forcing the drivers to reserve more memory in 16KiB kernels,
even if they only require 4MiB or 8MiB.

reserved-memory {
      #address-cells = <2>;
      #size-cells = <2>;
      ranges;
      tpu_cma_reserve: tpu_cma_reserve {
            compatible = "shared-dma-pool";
            reusable;
           size = <0x0 0x2000000>; /* 32 MiB */
}

One workaround to continue using 4MiB alignment is:

- Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
have to allocate huge pages (32 MiB in 16KiB page sizes)
- Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
"11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB

    config ARCH_FORCE_MAX_ORDER
        int
        default "13" if ARM64_64K_PAGES
        default "8" if ARM64_16K_PAGES
       default "10"

#define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
#define pageblock_order MAX_PAGE_ORDER              // 8
#define pageblock_nr_pages (1UL << pageblock_order)    // 256
#define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
  // 16384 * 256 = 4194304 = 4 MiB

After compiling the kernel with this changes, the kernel boots without
warnings and the memory is reserved:

[    0.000000] Reserved memory: created CMA memory pool at
0x000000007f800000, size 8 MiB
[    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
compatible id shared-dma-pool
[    0.000000] OF: reserved mem:
0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
tpu_cma_reserve

#  uname -a
Linux buildroot 6.12.9-dirty
# zcat /proc/config.gz | grep ARM64_16K
CONFIG_ARM64_16K_PAGES=y
# zcat /proc/config.gz | grep TRANSPARENT_HUGE
CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
# CONFIG_TRANSPARENT_HUGEPAGE is not set
# cat /proc/pagetypeinfo
Page block order: 8
Pages per block:  256

Free pages count per migrate type at order       0      1      2
3      4      5      6      7      8
Node    0, zone      DMA, type    Unmovable      1      1     13
6      5      2      0      0      1
Node    0, zone      DMA, type      Movable      9     16     19
13     13      5      2      0    182
Node    0, zone      DMA, type  Reclaimable      0      1      0
1      1      0      0      1      0
Node    0, zone      DMA, type   HighAtomic      0      0      0
0      0      0      0      0      0
Node    0, zone      DMA, type          CMA      1      0      0
0      0      0      0      0     49
Node    0, zone      DMA, type      Isolate      0      0      0
0      0      0      0      0      0
Number of blocks type     Unmovable      Movable  Reclaimable
HighAtomic          CMA      Isolate
Node 0, zone      DMA            6          199            1
 0           50            0


However, with this workaround, we can't use transparent huge pages.

Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
Is there another option to reduce the CMA_MIN_ALIGNMENT_BYTES alignment?

Thanks
Juan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-17 22:51 mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel Juan Yescas
@ 2025-01-17 22:52 ` Juan Yescas
  2025-01-17 23:00   ` Juan Yescas
  0 siblings, 1 reply; 21+ messages in thread
From: Juan Yescas @ 2025-01-17 22:52 UTC (permalink / raw)
  To: linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, 21cnbao, minchan,
	jaewon31.kim, charante, Suren Baghdasaryan, Kalesh Singh,
	T.J. Mercier, Isaac Manjarres

+Suren Baghdasaryan
+Kalesh Singh
+T.J. Mercier
+Isaac Manjarres

On Fri, Jan 17, 2025 at 2:51 PM Juan Yescas <jyescas@google.com> wrote:
>
> Hi Linux memory team
>
> When the drivers reserve CMA memory in 16KiB kernels, the minimum
> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
> kernels, the CMA alignment is 4MiB.
>
> This is forcing the drivers to reserve more memory in 16KiB kernels,
> even if they only require 4MiB or 8MiB.
>
> reserved-memory {
>       #address-cells = <2>;
>       #size-cells = <2>;
>       ranges;
>       tpu_cma_reserve: tpu_cma_reserve {
>             compatible = "shared-dma-pool";
>             reusable;
>            size = <0x0 0x2000000>; /* 32 MiB */
> }
>
> One workaround to continue using 4MiB alignment is:
>
> - Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
> have to allocate huge pages (32 MiB in 16KiB page sizes)
> - Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
> "11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB
>
>     config ARCH_FORCE_MAX_ORDER
>         int
>         default "13" if ARM64_64K_PAGES
>         default "8" if ARM64_16K_PAGES
>        default "10"
>
> #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
> #define pageblock_order MAX_PAGE_ORDER              // 8
> #define pageblock_nr_pages (1UL << pageblock_order)    // 256
> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
>   // 16384 * 256 = 4194304 = 4 MiB
>
> After compiling the kernel with this changes, the kernel boots without
> warnings and the memory is reserved:
>
> [    0.000000] Reserved memory: created CMA memory pool at
> 0x000000007f800000, size 8 MiB
> [    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
> compatible id shared-dma-pool
> [    0.000000] OF: reserved mem:
> 0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
> tpu_cma_reserve
>
> #  uname -a
> Linux buildroot 6.12.9-dirty
> # zcat /proc/config.gz | grep ARM64_16K
> CONFIG_ARM64_16K_PAGES=y
> # zcat /proc/config.gz | grep TRANSPARENT_HUGE
> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> # CONFIG_TRANSPARENT_HUGEPAGE is not set
> # cat /proc/pagetypeinfo
> Page block order: 8
> Pages per block:  256
>
> Free pages count per migrate type at order       0      1      2
> 3      4      5      6      7      8
> Node    0, zone      DMA, type    Unmovable      1      1     13
> 6      5      2      0      0      1
> Node    0, zone      DMA, type      Movable      9     16     19
> 13     13      5      2      0    182
> Node    0, zone      DMA, type  Reclaimable      0      1      0
> 1      1      0      0      1      0
> Node    0, zone      DMA, type   HighAtomic      0      0      0
> 0      0      0      0      0      0
> Node    0, zone      DMA, type          CMA      1      0      0
> 0      0      0      0      0     49
> Node    0, zone      DMA, type      Isolate      0      0      0
> 0      0      0      0      0      0
> Number of blocks type     Unmovable      Movable  Reclaimable
> HighAtomic          CMA      Isolate
> Node 0, zone      DMA            6          199            1
>  0           50            0
>
>
> However, with this workaround, we can't use transparent huge pages.
>
> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
> Is there another option to reduce the CMA_MIN_ALIGNMENT_BYTES alignment?
>
> Thanks
> Juan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-17 22:52 ` Juan Yescas
@ 2025-01-17 23:00   ` Juan Yescas
  2025-01-17 23:19     ` Zi Yan
  2025-01-20  0:17     ` Barry Song
  0 siblings, 2 replies; 21+ messages in thread
From: Juan Yescas @ 2025-01-17 23:00 UTC (permalink / raw)
  To: linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, 21cnbao, minchan,
	jaewon31.kim, charante, Suren Baghdasaryan, Kalesh Singh,
	T.J. Mercier, Isaac Manjarres, iamjoonsoo.kim, quic_charante

+ iamjoonsoo.kim@lge.com
+ quic_charante@quicinc.com

On Fri, Jan 17, 2025 at 2:52 PM Juan Yescas <jyescas@google.com> wrote:
>
> +Suren Baghdasaryan
> +Kalesh Singh
> +T.J. Mercier
> +Isaac Manjarres
>
> On Fri, Jan 17, 2025 at 2:51 PM Juan Yescas <jyescas@google.com> wrote:
> >
> > Hi Linux memory team
> >
> > When the drivers reserve CMA memory in 16KiB kernels, the minimum
> > alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
> > kernels, the CMA alignment is 4MiB.
> >
> > This is forcing the drivers to reserve more memory in 16KiB kernels,
> > even if they only require 4MiB or 8MiB.
> >
> > reserved-memory {
> >       #address-cells = <2>;
> >       #size-cells = <2>;
> >       ranges;
> >       tpu_cma_reserve: tpu_cma_reserve {
> >             compatible = "shared-dma-pool";
> >             reusable;
> >            size = <0x0 0x2000000>; /* 32 MiB */
> > }
> >
> > One workaround to continue using 4MiB alignment is:
> >
> > - Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
> > have to allocate huge pages (32 MiB in 16KiB page sizes)
> > - Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
> > "11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB
> >
> >     config ARCH_FORCE_MAX_ORDER
> >         int
> >         default "13" if ARM64_64K_PAGES
> >         default "8" if ARM64_16K_PAGES
> >        default "10"
> >
> > #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
> > #define pageblock_order MAX_PAGE_ORDER              // 8
> > #define pageblock_nr_pages (1UL << pageblock_order)    // 256
> > #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
> > #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
> >   // 16384 * 256 = 4194304 = 4 MiB
> >
> > After compiling the kernel with this changes, the kernel boots without
> > warnings and the memory is reserved:
> >
> > [    0.000000] Reserved memory: created CMA memory pool at
> > 0x000000007f800000, size 8 MiB
> > [    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
> > compatible id shared-dma-pool
> > [    0.000000] OF: reserved mem:
> > 0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
> > tpu_cma_reserve
> >
> > #  uname -a
> > Linux buildroot 6.12.9-dirty
> > # zcat /proc/config.gz | grep ARM64_16K
> > CONFIG_ARM64_16K_PAGES=y
> > # zcat /proc/config.gz | grep TRANSPARENT_HUGE
> > CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> > # CONFIG_TRANSPARENT_HUGEPAGE is not set
> > # cat /proc/pagetypeinfo
> > Page block order: 8
> > Pages per block:  256
> >
> > Free pages count per migrate type at order       0      1      2
> > 3      4      5      6      7      8
> > Node    0, zone      DMA, type    Unmovable      1      1     13
> > 6      5      2      0      0      1
> > Node    0, zone      DMA, type      Movable      9     16     19
> > 13     13      5      2      0    182
> > Node    0, zone      DMA, type  Reclaimable      0      1      0
> > 1      1      0      0      1      0
> > Node    0, zone      DMA, type   HighAtomic      0      0      0
> > 0      0      0      0      0      0
> > Node    0, zone      DMA, type          CMA      1      0      0
> > 0      0      0      0      0     49
> > Node    0, zone      DMA, type      Isolate      0      0      0
> > 0      0      0      0      0      0
> > Number of blocks type     Unmovable      Movable  Reclaimable
> > HighAtomic          CMA      Isolate
> > Node 0, zone      DMA            6          199            1
> >  0           50            0
> >
> >
> > However, with this workaround, we can't use transparent huge pages.
> >
> > Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
> > Is there another option to reduce the CMA_MIN_ALIGNMENT_BYTES alignment?
> >
> > Thanks
> > Juan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-17 23:00   ` Juan Yescas
@ 2025-01-17 23:19     ` Zi Yan
  2025-01-19 23:55       ` Barry Song
  2025-01-20  0:17     ` Barry Song
  1 sibling, 1 reply; 21+ messages in thread
From: Zi Yan @ 2025-01-17 23:19 UTC (permalink / raw)
  To: Juan Yescas
  Cc: linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, 21cnbao, minchan,
	jaewon31.kim, charante, Suren Baghdasaryan, Kalesh Singh,
	T.J. Mercier, Isaac Manjarres, iamjoonsoo.kim, quic_charante

On 17 Jan 2025, at 18:00, Juan Yescas wrote:

> + iamjoonsoo.kim@lge.com
> + quic_charante@quicinc.com
>
> On Fri, Jan 17, 2025 at 2:52 PM Juan Yescas <jyescas@google.com> wrote:
>>
>> +Suren Baghdasaryan
>> +Kalesh Singh
>> +T.J. Mercier
>> +Isaac Manjarres
>>
>> On Fri, Jan 17, 2025 at 2:51 PM Juan Yescas <jyescas@google.com> wrote:
>>>
>>> Hi Linux memory team
>>>
>>> When the drivers reserve CMA memory in 16KiB kernels, the minimum
>>> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
>>> kernels, the CMA alignment is 4MiB.
>>>
>>> This is forcing the drivers to reserve more memory in 16KiB kernels,
>>> even if they only require 4MiB or 8MiB.
>>>
>>> reserved-memory {
>>>       #address-cells = <2>;
>>>       #size-cells = <2>;
>>>       ranges;
>>>       tpu_cma_reserve: tpu_cma_reserve {
>>>             compatible = "shared-dma-pool";
>>>             reusable;
>>>            size = <0x0 0x2000000>; /* 32 MiB */
>>> }
>>>
>>> One workaround to continue using 4MiB alignment is:
>>>
>>> - Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
>>> have to allocate huge pages (32 MiB in 16KiB page sizes)
>>> - Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
>>> "11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB
>>>
>>>     config ARCH_FORCE_MAX_ORDER
>>>         int
>>>         default "13" if ARM64_64K_PAGES
>>>         default "8" if ARM64_16K_PAGES
>>>        default "10"
>>>
>>> #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
>>> #define pageblock_order MAX_PAGE_ORDER              // 8
>>> #define pageblock_nr_pages (1UL << pageblock_order)    // 256
>>> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
>>> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
>>>   // 16384 * 256 = 4194304 = 4 MiB
>>>
>>> After compiling the kernel with this changes, the kernel boots without
>>> warnings and the memory is reserved:
>>>
>>> [    0.000000] Reserved memory: created CMA memory pool at
>>> 0x000000007f800000, size 8 MiB
>>> [    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
>>> compatible id shared-dma-pool
>>> [    0.000000] OF: reserved mem:
>>> 0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
>>> tpu_cma_reserve
>>>
>>> #  uname -a
>>> Linux buildroot 6.12.9-dirty
>>> # zcat /proc/config.gz | grep ARM64_16K
>>> CONFIG_ARM64_16K_PAGES=y
>>> # zcat /proc/config.gz | grep TRANSPARENT_HUGE
>>> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
>>> # CONFIG_TRANSPARENT_HUGEPAGE is not set
>>> # cat /proc/pagetypeinfo
>>> Page block order: 8
>>> Pages per block:  256
>>>
>>> Free pages count per migrate type at order       0      1      2
>>> 3      4      5      6      7      8
>>> Node    0, zone      DMA, type    Unmovable      1      1     13
>>> 6      5      2      0      0      1
>>> Node    0, zone      DMA, type      Movable      9     16     19
>>> 13     13      5      2      0    182
>>> Node    0, zone      DMA, type  Reclaimable      0      1      0
>>> 1      1      0      0      1      0
>>> Node    0, zone      DMA, type   HighAtomic      0      0      0
>>> 0      0      0      0      0      0
>>> Node    0, zone      DMA, type          CMA      1      0      0
>>> 0      0      0      0      0     49
>>> Node    0, zone      DMA, type      Isolate      0      0      0
>>> 0      0      0      0      0      0
>>> Number of blocks type     Unmovable      Movable  Reclaimable
>>> HighAtomic          CMA      Isolate
>>> Node 0, zone      DMA            6          199            1
>>>  0           50            0
>>>
>>>
>>> However, with this workaround, we can't use transparent huge pages.
>>>
>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
is equal to pageblock size. Enabling THP just bumps the pageblock size.

>>> Is there another option to reduce the CMA_MIN_ALIGNMENT_BYTES alignment?

Not easily. CMA is reserved at pageblock level, there is a MIGRATE_CMA
type for a pageblock used for CMA. You will need to change buddy allocator
to allow MIGRATE_CMA pageblocks used by normal page allocations to be able
to reclaim over-reserved CMA memory (and CMA_MIN_ALIGNMENT_BYTES will not
be changed in this case).

You can see more background on this from
patchset: Use pageblock_order for cma and alloc_contig_range alignment,
starting from commit b48d8a8e5ce5 ("mm: page_isolation: move has_unmovable_pages()
to mm/page_isolation.c”)[1].

[1] https://lore.kernel.org/all/20220425143118.2850746-1-zi.yan@sent.com/

Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-17 23:19     ` Zi Yan
@ 2025-01-19 23:55       ` Barry Song
  2025-01-20  0:39         ` Zi Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Barry Song @ 2025-01-19 23:55 UTC (permalink / raw)
  To: Zi Yan
  Cc: Juan Yescas, linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Sat, Jan 18, 2025 at 12:19 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 17 Jan 2025, at 18:00, Juan Yescas wrote:
>
> > + iamjoonsoo.kim@lge.com
> > + quic_charante@quicinc.com
> >
> > On Fri, Jan 17, 2025 at 2:52 PM Juan Yescas <jyescas@google.com> wrote:
> >>
> >> +Suren Baghdasaryan
> >> +Kalesh Singh
> >> +T.J. Mercier
> >> +Isaac Manjarres
> >>
> >> On Fri, Jan 17, 2025 at 2:51 PM Juan Yescas <jyescas@google.com> wrote:
> >>>
> >>> Hi Linux memory team
> >>>
> >>> When the drivers reserve CMA memory in 16KiB kernels, the minimum
> >>> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
> >>> kernels, the CMA alignment is 4MiB.
> >>>
> >>> This is forcing the drivers to reserve more memory in 16KiB kernels,
> >>> even if they only require 4MiB or 8MiB.
> >>>
> >>> reserved-memory {
> >>>       #address-cells = <2>;
> >>>       #size-cells = <2>;
> >>>       ranges;
> >>>       tpu_cma_reserve: tpu_cma_reserve {
> >>>             compatible = "shared-dma-pool";
> >>>             reusable;
> >>>            size = <0x0 0x2000000>; /* 32 MiB */
> >>> }
> >>>
> >>> One workaround to continue using 4MiB alignment is:
> >>>
> >>> - Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
> >>> have to allocate huge pages (32 MiB in 16KiB page sizes)
> >>> - Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
> >>> "11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB
> >>>
> >>>     config ARCH_FORCE_MAX_ORDER
> >>>         int
> >>>         default "13" if ARM64_64K_PAGES
> >>>         default "8" if ARM64_16K_PAGES
> >>>        default "10"
> >>>
> >>> #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
> >>> #define pageblock_order MAX_PAGE_ORDER              // 8
> >>> #define pageblock_nr_pages (1UL << pageblock_order)    // 256
> >>> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
> >>> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
> >>>   // 16384 * 256 = 4194304 = 4 MiB
> >>>
> >>> After compiling the kernel with this changes, the kernel boots without
> >>> warnings and the memory is reserved:
> >>>
> >>> [    0.000000] Reserved memory: created CMA memory pool at
> >>> 0x000000007f800000, size 8 MiB
> >>> [    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
> >>> compatible id shared-dma-pool
> >>> [    0.000000] OF: reserved mem:
> >>> 0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
> >>> tpu_cma_reserve
> >>>
> >>> #  uname -a
> >>> Linux buildroot 6.12.9-dirty
> >>> # zcat /proc/config.gz | grep ARM64_16K
> >>> CONFIG_ARM64_16K_PAGES=y
> >>> # zcat /proc/config.gz | grep TRANSPARENT_HUGE
> >>> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> >>> # CONFIG_TRANSPARENT_HUGEPAGE is not set
> >>> # cat /proc/pagetypeinfo
> >>> Page block order: 8
> >>> Pages per block:  256
> >>>
> >>> Free pages count per migrate type at order       0      1      2
> >>> 3      4      5      6      7      8
> >>> Node    0, zone      DMA, type    Unmovable      1      1     13
> >>> 6      5      2      0      0      1
> >>> Node    0, zone      DMA, type      Movable      9     16     19
> >>> 13     13      5      2      0    182
> >>> Node    0, zone      DMA, type  Reclaimable      0      1      0
> >>> 1      1      0      0      1      0
> >>> Node    0, zone      DMA, type   HighAtomic      0      0      0
> >>> 0      0      0      0      0      0
> >>> Node    0, zone      DMA, type          CMA      1      0      0
> >>> 0      0      0      0      0     49
> >>> Node    0, zone      DMA, type      Isolate      0      0      0
> >>> 0      0      0      0      0      0
> >>> Number of blocks type     Unmovable      Movable  Reclaimable
> >>> HighAtomic          CMA      Isolate
> >>> Node 0, zone      DMA            6          199            1
> >>>  0           50            0
> >>>
> >>>
> >>> However, with this workaround, we can't use transparent huge pages.
> >>>
> >>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
> is equal to pageblock size. Enabling THP just bumps the pageblock size.

Currently, THP might be mTHP, which can have a significantly smaller
size than 32MB. For
example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
is possible.
Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.

I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
without necessarily
using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
pageblock size wouldn't
be necessary?

>
> >>> Is there another option to reduce the CMA_MIN_ALIGNMENT_BYTES alignment?
>
> Not easily. CMA is reserved at pageblock level, there is a MIGRATE_CMA
> type for a pageblock used for CMA. You will need to change buddy allocator
> to allow MIGRATE_CMA pageblocks used by normal page allocations to be able
> to reclaim over-reserved CMA memory (and CMA_MIN_ALIGNMENT_BYTES will not
> be changed in this case).
>
> You can see more background on this from
> patchset: Use pageblock_order for cma and alloc_contig_range alignment,
> starting from commit b48d8a8e5ce5 ("mm: page_isolation: move has_unmovable_pages()
> to mm/page_isolation.c”)[1].
>
> [1] https://lore.kernel.org/all/20220425143118.2850746-1-zi.yan@sent.com/
>
> Best Regards,
> Yan, Zi

Thanks
Barry


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-17 23:00   ` Juan Yescas
  2025-01-17 23:19     ` Zi Yan
@ 2025-01-20  0:17     ` Barry Song
  2025-01-20  0:26       ` Zi Yan
  1 sibling, 1 reply; 21+ messages in thread
From: Barry Song @ 2025-01-20  0:17 UTC (permalink / raw)
  To: Juan Yescas
  Cc: linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Sat, Jan 18, 2025 at 12:00 PM Juan Yescas <jyescas@google.com> wrote:
>
> + iamjoonsoo.kim@lge.com
> + quic_charante@quicinc.com
>
> On Fri, Jan 17, 2025 at 2:52 PM Juan Yescas <jyescas@google.com> wrote:
> >
> > +Suren Baghdasaryan
> > +Kalesh Singh
> > +T.J. Mercier
> > +Isaac Manjarres
> >
> > On Fri, Jan 17, 2025 at 2:51 PM Juan Yescas <jyescas@google.com> wrote:
> > >
> > > Hi Linux memory team
> > >
> > > When the drivers reserve CMA memory in 16KiB kernels, the minimum
> > > alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
> > > kernels, the CMA alignment is 4MiB.
> > >
> > > This is forcing the drivers to reserve more memory in 16KiB kernels,
> > > even if they only require 4MiB or 8MiB.
> > >
> > > reserved-memory {
> > >       #address-cells = <2>;
> > >       #size-cells = <2>;
> > >       ranges;
> > >       tpu_cma_reserve: tpu_cma_reserve {
> > >             compatible = "shared-dma-pool";
> > >             reusable;
> > >            size = <0x0 0x2000000>; /* 32 MiB */
> > > }
> > >
> > > One workaround to continue using 4MiB alignment is:
> > >
> > > - Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
> > > have to allocate huge pages (32 MiB in 16KiB page sizes)
> > > - Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
> > > "11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB
> > >
> > >     config ARCH_FORCE_MAX_ORDER
> > >         int
> > >         default "13" if ARM64_64K_PAGES
> > >         default "8" if ARM64_16K_PAGES
> > >        default "10"
> > >
> > > #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
> > > #define pageblock_order MAX_PAGE_ORDER              // 8
> > > #define pageblock_nr_pages (1UL << pageblock_order)    // 256
> > > #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
> > > #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
> > >   // 16384 * 256 = 4194304 = 4 MiB
> > >
> > > After compiling the kernel with this changes, the kernel boots without
> > > warnings and the memory is reserved:
> > >
> > > [    0.000000] Reserved memory: created CMA memory pool at
> > > 0x000000007f800000, size 8 MiB
> > > [    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
> > > compatible id shared-dma-pool
> > > [    0.000000] OF: reserved mem:
> > > 0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
> > > tpu_cma_reserve
> > >
> > > #  uname -a
> > > Linux buildroot 6.12.9-dirty
> > > # zcat /proc/config.gz | grep ARM64_16K
> > > CONFIG_ARM64_16K_PAGES=y
> > > # zcat /proc/config.gz | grep TRANSPARENT_HUGE
> > > CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> > > # CONFIG_TRANSPARENT_HUGEPAGE is not set
> > > # cat /proc/pagetypeinfo
> > > Page block order: 8
> > > Pages per block:  256
> > >
> > > Free pages count per migrate type at order       0      1      2
> > > 3      4      5      6      7      8
> > > Node    0, zone      DMA, type    Unmovable      1      1     13
> > > 6      5      2      0      0      1
> > > Node    0, zone      DMA, type      Movable      9     16     19
> > > 13     13      5      2      0    182
> > > Node    0, zone      DMA, type  Reclaimable      0      1      0
> > > 1      1      0      0      1      0
> > > Node    0, zone      DMA, type   HighAtomic      0      0      0
> > > 0      0      0      0      0      0
> > > Node    0, zone      DMA, type          CMA      1      0      0
> > > 0      0      0      0      0     49
> > > Node    0, zone      DMA, type      Isolate      0      0      0
> > > 0      0      0      0      0      0
> > > Number of blocks type     Unmovable      Movable  Reclaimable
> > > HighAtomic          CMA      Isolate
> > > Node 0, zone      DMA            6          199            1
> > >  0           50            0
> > >
> > >
> > > However, with this workaround, we can't use transparent huge pages.

I don’t think this is accurate. You can still use mTHP with a size
equal to or smaller than 4MiB,
right?

By the way, what specific regression have you observed when reserving
a larger size like
32MB?
For CMA, the over-reserved memory is still available to the system for
movable folios. 28MiB
doesn’t seem significant enough to cause a noticeable regression, does it?

> > >
> > > Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
> > > Is there another option to reduce the CMA_MIN_ALIGNMENT_BYTES alignment?
> > >
> > > Thanks
> > > Juan

Thanks
barry


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-20  0:17     ` Barry Song
@ 2025-01-20  0:26       ` Zi Yan
  2025-01-20  0:38         ` Barry Song
  0 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2025-01-20  0:26 UTC (permalink / raw)
  To: Barry Song
  Cc: Juan Yescas, linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On 19 Jan 2025, at 19:17, Barry Song wrote:

> On Sat, Jan 18, 2025 at 12:00 PM Juan Yescas <jyescas@google.com> wrote:
>>
>> + iamjoonsoo.kim@lge.com
>> + quic_charante@quicinc.com
>>
>> On Fri, Jan 17, 2025 at 2:52 PM Juan Yescas <jyescas@google.com> wrote:
>>>
>>> +Suren Baghdasaryan
>>> +Kalesh Singh
>>> +T.J. Mercier
>>> +Isaac Manjarres
>>>
>>> On Fri, Jan 17, 2025 at 2:51 PM Juan Yescas <jyescas@google.com> wrote:
>>>>
>>>> Hi Linux memory team
>>>>
>>>> When the drivers reserve CMA memory in 16KiB kernels, the minimum
>>>> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
>>>> kernels, the CMA alignment is 4MiB.
>>>>
>>>> This is forcing the drivers to reserve more memory in 16KiB kernels,
>>>> even if they only require 4MiB or 8MiB.
>>>>
>>>> reserved-memory {
>>>>       #address-cells = <2>;
>>>>       #size-cells = <2>;
>>>>       ranges;
>>>>       tpu_cma_reserve: tpu_cma_reserve {
>>>>             compatible = "shared-dma-pool";
>>>>             reusable;
>>>>            size = <0x0 0x2000000>; /* 32 MiB */
>>>> }
>>>>
>>>> One workaround to continue using 4MiB alignment is:
>>>>
>>>> - Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
>>>> have to allocate huge pages (32 MiB in 16KiB page sizes)
>>>> - Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
>>>> "11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB
>>>>
>>>>     config ARCH_FORCE_MAX_ORDER
>>>>         int
>>>>         default "13" if ARM64_64K_PAGES
>>>>         default "8" if ARM64_16K_PAGES
>>>>        default "10"
>>>>
>>>> #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
>>>> #define pageblock_order MAX_PAGE_ORDER              // 8
>>>> #define pageblock_nr_pages (1UL << pageblock_order)    // 256
>>>> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
>>>> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
>>>>   // 16384 * 256 = 4194304 = 4 MiB
>>>>
>>>> After compiling the kernel with this changes, the kernel boots without
>>>> warnings and the memory is reserved:
>>>>
>>>> [    0.000000] Reserved memory: created CMA memory pool at
>>>> 0x000000007f800000, size 8 MiB
>>>> [    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
>>>> compatible id shared-dma-pool
>>>> [    0.000000] OF: reserved mem:
>>>> 0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
>>>> tpu_cma_reserve
>>>>
>>>> #  uname -a
>>>> Linux buildroot 6.12.9-dirty
>>>> # zcat /proc/config.gz | grep ARM64_16K
>>>> CONFIG_ARM64_16K_PAGES=y
>>>> # zcat /proc/config.gz | grep TRANSPARENT_HUGE
>>>> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
>>>> # CONFIG_TRANSPARENT_HUGEPAGE is not set
>>>> # cat /proc/pagetypeinfo
>>>> Page block order: 8
>>>> Pages per block:  256
>>>>
>>>> Free pages count per migrate type at order       0      1      2
>>>> 3      4      5      6      7      8
>>>> Node    0, zone      DMA, type    Unmovable      1      1     13
>>>> 6      5      2      0      0      1
>>>> Node    0, zone      DMA, type      Movable      9     16     19
>>>> 13     13      5      2      0    182
>>>> Node    0, zone      DMA, type  Reclaimable      0      1      0
>>>> 1      1      0      0      1      0
>>>> Node    0, zone      DMA, type   HighAtomic      0      0      0
>>>> 0      0      0      0      0      0
>>>> Node    0, zone      DMA, type          CMA      1      0      0
>>>> 0      0      0      0      0     49
>>>> Node    0, zone      DMA, type      Isolate      0      0      0
>>>> 0      0      0      0      0      0
>>>> Number of blocks type     Unmovable      Movable  Reclaimable
>>>> HighAtomic          CMA      Isolate
>>>> Node 0, zone      DMA            6          199            1
>>>>  0           50            0
>>>>
>>>>
>>>> However, with this workaround, we can't use transparent huge pages.
>
> I don’t think this is accurate. You can still use mTHP with a size
> equal to or smaller than 4MiB,
> right?
>
> By the way, what specific regression have you observed when reserving
> a larger size like
> 32MB?
> For CMA, the over-reserved memory is still available to the system for
> movable folios. 28MiB

The fallbacks table does not have MIGRATE_CMA as a fallback for any
migratetype. How can it be used for movable folios? Am I missing something?

> doesn’t seem significant enough to cause a noticeable regression, does it?

--
Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-20  0:26       ` Zi Yan
@ 2025-01-20  0:38         ` Barry Song
  2025-01-20  0:45           ` Zi Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Barry Song @ 2025-01-20  0:38 UTC (permalink / raw)
  To: Zi Yan
  Cc: Juan Yescas, linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Mon, Jan 20, 2025 at 1:26 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On 19 Jan 2025, at 19:17, Barry Song wrote:
>
> > On Sat, Jan 18, 2025 at 12:00 PM Juan Yescas <jyescas@google.com> wrote:
> >>
> >> + iamjoonsoo.kim@lge.com
> >> + quic_charante@quicinc.com
> >>
> >> On Fri, Jan 17, 2025 at 2:52 PM Juan Yescas <jyescas@google.com> wrote:
> >>>
> >>> +Suren Baghdasaryan
> >>> +Kalesh Singh
> >>> +T.J. Mercier
> >>> +Isaac Manjarres
> >>>
> >>> On Fri, Jan 17, 2025 at 2:51 PM Juan Yescas <jyescas@google.com> wrote:
> >>>>
> >>>> Hi Linux memory team
> >>>>
> >>>> When the drivers reserve CMA memory in 16KiB kernels, the minimum
> >>>> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
> >>>> kernels, the CMA alignment is 4MiB.
> >>>>
> >>>> This is forcing the drivers to reserve more memory in 16KiB kernels,
> >>>> even if they only require 4MiB or 8MiB.
> >>>>
> >>>> reserved-memory {
> >>>>       #address-cells = <2>;
> >>>>       #size-cells = <2>;
> >>>>       ranges;
> >>>>       tpu_cma_reserve: tpu_cma_reserve {
> >>>>             compatible = "shared-dma-pool";
> >>>>             reusable;
> >>>>            size = <0x0 0x2000000>; /* 32 MiB */
> >>>> }
> >>>>
> >>>> One workaround to continue using 4MiB alignment is:
> >>>>
> >>>> - Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
> >>>> have to allocate huge pages (32 MiB in 16KiB page sizes)
> >>>> - Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
> >>>> "11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB
> >>>>
> >>>>     config ARCH_FORCE_MAX_ORDER
> >>>>         int
> >>>>         default "13" if ARM64_64K_PAGES
> >>>>         default "8" if ARM64_16K_PAGES
> >>>>        default "10"
> >>>>
> >>>> #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
> >>>> #define pageblock_order MAX_PAGE_ORDER              // 8
> >>>> #define pageblock_nr_pages (1UL << pageblock_order)    // 256
> >>>> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
> >>>> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
> >>>>   // 16384 * 256 = 4194304 = 4 MiB
> >>>>
> >>>> After compiling the kernel with this changes, the kernel boots without
> >>>> warnings and the memory is reserved:
> >>>>
> >>>> [    0.000000] Reserved memory: created CMA memory pool at
> >>>> 0x000000007f800000, size 8 MiB
> >>>> [    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
> >>>> compatible id shared-dma-pool
> >>>> [    0.000000] OF: reserved mem:
> >>>> 0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
> >>>> tpu_cma_reserve
> >>>>
> >>>> #  uname -a
> >>>> Linux buildroot 6.12.9-dirty
> >>>> # zcat /proc/config.gz | grep ARM64_16K
> >>>> CONFIG_ARM64_16K_PAGES=y
> >>>> # zcat /proc/config.gz | grep TRANSPARENT_HUGE
> >>>> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
> >>>> # CONFIG_TRANSPARENT_HUGEPAGE is not set
> >>>> # cat /proc/pagetypeinfo
> >>>> Page block order: 8
> >>>> Pages per block:  256
> >>>>
> >>>> Free pages count per migrate type at order       0      1      2
> >>>> 3      4      5      6      7      8
> >>>> Node    0, zone      DMA, type    Unmovable      1      1     13
> >>>> 6      5      2      0      0      1
> >>>> Node    0, zone      DMA, type      Movable      9     16     19
> >>>> 13     13      5      2      0    182
> >>>> Node    0, zone      DMA, type  Reclaimable      0      1      0
> >>>> 1      1      0      0      1      0
> >>>> Node    0, zone      DMA, type   HighAtomic      0      0      0
> >>>> 0      0      0      0      0      0
> >>>> Node    0, zone      DMA, type          CMA      1      0      0
> >>>> 0      0      0      0      0     49
> >>>> Node    0, zone      DMA, type      Isolate      0      0      0
> >>>> 0      0      0      0      0      0
> >>>> Number of blocks type     Unmovable      Movable  Reclaimable
> >>>> HighAtomic          CMA      Isolate
> >>>> Node 0, zone      DMA            6          199            1
> >>>>  0           50            0
> >>>>
> >>>>
> >>>> However, with this workaround, we can't use transparent huge pages.
> >
> > I don’t think this is accurate. You can still use mTHP with a size
> > equal to or smaller than 4MiB,
> > right?
> >
> > By the way, what specific regression have you observed when reserving
> > a larger size like
> > 32MB?
> > For CMA, the over-reserved memory is still available to the system for
> > movable folios. 28MiB
>
> The fallbacks table does not have MIGRATE_CMA as a fallback for any
> migratetype. How can it be used for movable folios? Am I missing something?

The whole purpose of CMA is to allow the memory reserved for a
device's dma_alloc_coherent or other contiguous memory needs to
be freely used by movable allocations when the device doesn't
require it. When the device's DMA needs the memory, the movable
folios can be migrated to make it available for the device.

/* Must be called after current_gfp_context() which can change gfp_mask */
static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask,
                                                  unsigned int alloc_flags)
{
#ifdef CONFIG_CMA
        if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
                alloc_flags |= ALLOC_CMA;
#endif

        return alloc_flags;
}

So there’s no waste here. cma can be used by normal buddy.

>
> > doesn’t seem significant enough to cause a noticeable regression, does it?
>
> --
> Best Regards,
> Yan, Zi

Thanks
Barry


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-19 23:55       ` Barry Song
@ 2025-01-20  0:39         ` Zi Yan
  2025-01-20  8:14           ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2025-01-20  0:39 UTC (permalink / raw)
  To: Barry Song, Juan Yescas
  Cc: linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
<snip>
>> >>>
>> >>>
>> >>> However, with this workaround, we can't use transparent huge pages.
>> >>>
>> >>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
>
> Currently, THP might be mTHP, which can have a significantly smaller
> size than 32MB. For
> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
> is possible.
> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
>
> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
> without necessarily
> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
> pageblock size wouldn't
> be necessary?

I think this should work by reducing MAX_PAGE_ORDER like Juan did for
the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
to be changed and kernel needs to be recompiled. Not sure if it is OK
for Juan's use case.

-- 
Best Regards,
Yan, Zi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-20  0:38         ` Barry Song
@ 2025-01-20  0:45           ` Zi Yan
  0 siblings, 0 replies; 21+ messages in thread
From: Zi Yan @ 2025-01-20  0:45 UTC (permalink / raw)
  To: Barry Song
  Cc: Juan Yescas, linux-mm, muchun.song, rppt, david, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Sun Jan 19, 2025 at 7:38 PM EST, Barry Song wrote:
> On Mon, Jan 20, 2025 at 1:26 PM Zi Yan <ziy@nvidia.com> wrote:
>>
>> On 19 Jan 2025, at 19:17, Barry Song wrote:
>>
>> > On Sat, Jan 18, 2025 at 12:00 PM Juan Yescas <jyescas@google.com> wrote:
>> >>
>> >> + iamjoonsoo.kim@lge.com
>> >> + quic_charante@quicinc.com
>> >>
>> >> On Fri, Jan 17, 2025 at 2:52 PM Juan Yescas <jyescas@google.com> wrote:
>> >>>
>> >>> +Suren Baghdasaryan
>> >>> +Kalesh Singh
>> >>> +T.J. Mercier
>> >>> +Isaac Manjarres
>> >>>
>> >>> On Fri, Jan 17, 2025 at 2:51 PM Juan Yescas <jyescas@google.com> wrote:
>> >>>>
>> >>>> Hi Linux memory team
>> >>>>
>> >>>> When the drivers reserve CMA memory in 16KiB kernels, the minimum
>> >>>> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
>> >>>> kernels, the CMA alignment is 4MiB.
>> >>>>
>> >>>> This is forcing the drivers to reserve more memory in 16KiB kernels,
>> >>>> even if they only require 4MiB or 8MiB.
>> >>>>
>> >>>> reserved-memory {
>> >>>>       #address-cells = <2>;
>> >>>>       #size-cells = <2>;
>> >>>>       ranges;
>> >>>>       tpu_cma_reserve: tpu_cma_reserve {
>> >>>>             compatible = "shared-dma-pool";
>> >>>>             reusable;
>> >>>>            size = <0x0 0x2000000>; /* 32 MiB */
>> >>>> }
>> >>>>
>> >>>> One workaround to continue using 4MiB alignment is:
>> >>>>
>> >>>> - Disable CONFIG_TRANSPARENT_HUGEPAGE so the buddy allocator does NOT
>> >>>> have to allocate huge pages (32 MiB in 16KiB page sizes)
>> >>>> - Set ARCH_FORCE_MAX_ORDER for ARM64_16K_PAGES to "8", instead of
>> >>>> "11", so CMA_MIN_ALIGNMENT_BYTES is equals to 4 MiB
>> >>>>
>> >>>>     config ARCH_FORCE_MAX_ORDER
>> >>>>         int
>> >>>>         default "13" if ARM64_64K_PAGES
>> >>>>         default "8" if ARM64_16K_PAGES
>> >>>>        default "10"
>> >>>>
>> >>>> #define MAX_PAGE_ORDER CONFIG_ARCH_FORCE_MAX_ORDER      // 8
>> >>>> #define pageblock_order MAX_PAGE_ORDER              // 8
>> >>>> #define pageblock_nr_pages (1UL << pageblock_order)    // 256
>> >>>> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages      // 256
>> >>>> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
>> >>>>   // 16384 * 256 = 4194304 = 4 MiB
>> >>>>
>> >>>> After compiling the kernel with this changes, the kernel boots without
>> >>>> warnings and the memory is reserved:
>> >>>>
>> >>>> [    0.000000] Reserved memory: created CMA memory pool at
>> >>>> 0x000000007f800000, size 8 MiB
>> >>>> [    0.000000] OF: reserved mem: initialized node tpu_cma_reserve,
>> >>>> compatible id shared-dma-pool
>> >>>> [    0.000000] OF: reserved mem:
>> >>>> 0x000000007f800000..0x000000007fffffff (8192 KiB) map reusable
>> >>>> tpu_cma_reserve
>> >>>>
>> >>>> #  uname -a
>> >>>> Linux buildroot 6.12.9-dirty
>> >>>> # zcat /proc/config.gz | grep ARM64_16K
>> >>>> CONFIG_ARM64_16K_PAGES=y
>> >>>> # zcat /proc/config.gz | grep TRANSPARENT_HUGE
>> >>>> CONFIG_HAVE_ARCH_TRANSPARENT_HUGEPAGE=y
>> >>>> # CONFIG_TRANSPARENT_HUGEPAGE is not set
>> >>>> # cat /proc/pagetypeinfo
>> >>>> Page block order: 8
>> >>>> Pages per block:  256
>> >>>>
>> >>>> Free pages count per migrate type at order       0      1      2
>> >>>> 3      4      5      6      7      8
>> >>>> Node    0, zone      DMA, type    Unmovable      1      1     13
>> >>>> 6      5      2      0      0      1
>> >>>> Node    0, zone      DMA, type      Movable      9     16     19
>> >>>> 13     13      5      2      0    182
>> >>>> Node    0, zone      DMA, type  Reclaimable      0      1      0
>> >>>> 1      1      0      0      1      0
>> >>>> Node    0, zone      DMA, type   HighAtomic      0      0      0
>> >>>> 0      0      0      0      0      0
>> >>>> Node    0, zone      DMA, type          CMA      1      0      0
>> >>>> 0      0      0      0      0     49
>> >>>> Node    0, zone      DMA, type      Isolate      0      0      0
>> >>>> 0      0      0      0      0      0
>> >>>> Number of blocks type     Unmovable      Movable  Reclaimable
>> >>>> HighAtomic          CMA      Isolate
>> >>>> Node 0, zone      DMA            6          199            1
>> >>>>  0           50            0
>> >>>>
>> >>>>
>> >>>> However, with this workaround, we can't use transparent huge pages.
>> >
>> > I don’t think this is accurate. You can still use mTHP with a size
>> > equal to or smaller than 4MiB,
>> > right?
>> >
>> > By the way, what specific regression have you observed when reserving
>> > a larger size like
>> > 32MB?
>> > For CMA, the over-reserved memory is still available to the system for
>> > movable folios. 28MiB
>>
>> The fallbacks table does not have MIGRATE_CMA as a fallback for any
>> migratetype. How can it be used for movable folios? Am I missing something?
>
> The whole purpose of CMA is to allow the memory reserved for a
> device's dma_alloc_coherent or other contiguous memory needs to
> be freely used by movable allocations when the device doesn't
> require it. When the device's DMA needs the memory, the movable
> folios can be migrated to make it available for the device.
>
> /* Must be called after current_gfp_context() which can change gfp_mask */
> static inline unsigned int gfp_to_alloc_flags_cma(gfp_t gfp_mask,
>                                                   unsigned int alloc_flags)
> {
> #ifdef CONFIG_CMA
>         if (gfp_migratetype(gfp_mask) == MIGRATE_MOVABLE)
>                 alloc_flags |= ALLOC_CMA;
> #endif
>
>         return alloc_flags;
> }
>
> So there’s no waste here. cma can be used by normal buddy.

Ah, you are right. I missed the above code, which adds ALLOC_CMA. I
agree with you that there is no waste unless the system has a heavy use
of unmovable data.

-- 
Best Regards,
Yan, Zi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-20  0:39         ` Zi Yan
@ 2025-01-20  8:14           ` David Hildenbrand
  2025-01-20 15:29             ` Zi Yan
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2025-01-20  8:14 UTC (permalink / raw)
  To: Zi Yan, Barry Song, Juan Yescas
  Cc: linux-mm, muchun.song, rppt, osalvador, akpm, lorenzo.stoakes,
	Jann Horn, Liam.Howlett, minchan, jaewon31.kim, charante,
	Suren Baghdasaryan, Kalesh Singh, T.J. Mercier, Isaac Manjarres,
	iamjoonsoo.kim, quic_charante

On 20.01.25 01:39, Zi Yan wrote:
> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
> <snip>
>>>>>>
>>>>>>
>>>>>> However, with this workaround, we can't use transparent huge pages.
>>>>>>
>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
>>
>> Currently, THP might be mTHP, which can have a significantly smaller
>> size than 32MB. For
>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
>> is possible.
>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
>>
>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
>> without necessarily
>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
>> pageblock size wouldn't
>> be necessary?
> 
> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
> to be changed and kernel needs to be recompiled. Not sure if it is OK
> for Juan's use case.


IIRC, we set pageblock size == THP size because this is the granularity 
we want to optimize defragmentation for. ("try keep pageblock 
granularity of the same memory type: movable vs. unmovable")

However, the buddy already supports having different pagetypes for large 
allocations.

So we could leave MAX_ORDER alone and try adjusting the pageblock size 
in these setups. pageblock size is already variable on some 
architectures IIRC.

We'd only have to check if all of the THP logic can deal with pageblock 
size < THP size.

This issue is even more severe on arm64 with 64k (pageblock = 512MiB).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-20  8:14           ` David Hildenbrand
@ 2025-01-20 15:29             ` Zi Yan
  2025-01-20 17:59               ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2025-01-20 15:29 UTC (permalink / raw)
  To: David Hildenbrand, Barry Song, Juan Yescas
  Cc: linux-mm, muchun.song, rppt, osalvador, akpm, lorenzo.stoakes,
	Jann Horn, Liam.Howlett, minchan, jaewon31.kim, charante,
	Suren Baghdasaryan, Kalesh Singh, T.J. Mercier, Isaac Manjarres,
	iamjoonsoo.kim, quic_charante

On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
> On 20.01.25 01:39, Zi Yan wrote:
>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
>> <snip>
>>>>>>>
>>>>>>>
>>>>>>> However, with this workaround, we can't use transparent huge pages.
>>>>>>>
>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
>>>
>>> Currently, THP might be mTHP, which can have a significantly smaller
>>> size than 32MB. For
>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
>>> is possible.
>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
>>>
>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
>>> without necessarily
>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
>>> pageblock size wouldn't
>>> be necessary?
>> 
>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
>> to be changed and kernel needs to be recompiled. Not sure if it is OK
>> for Juan's use case.
>
>
> IIRC, we set pageblock size == THP size because this is the granularity 
> we want to optimize defragmentation for. ("try keep pageblock 
> granularity of the same memory type: movable vs. unmovable")

Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
(2MB mTHP here) is good enough, reducing pageblock size works.

>
> However, the buddy already supports having different pagetypes for large 
> allocations.

Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
MIGRATE_MOVABLE can be merged.

>
> So we could leave MAX_ORDER alone and try adjusting the pageblock size 
> in these setups. pageblock size is already variable on some 
> architectures IIRC.

Making pageblock size a boot time variable? We might want to warn
sysadmin/user that >pageblock_order THP/mTHP creation will suffer.

>
> We'd only have to check if all of the THP logic can deal with pageblock 
> size < THP size.

Probably yes, pageblock should be independent of THP logic, although
compaction (used to create THPs) logic is based on pageblock.
>
> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).

This is also good for virtio-mem, since the offline memory block size
can also be reduced. I remember you complained about it before.

-- 
Best Regards,
Yan, Zi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-20 15:29             ` Zi Yan
@ 2025-01-20 17:59               ` David Hildenbrand
  2025-01-22  2:08                 ` Juan Yescas
  0 siblings, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2025-01-20 17:59 UTC (permalink / raw)
  To: Zi Yan, Barry Song, Juan Yescas
  Cc: linux-mm, muchun.song, rppt, osalvador, akpm, lorenzo.stoakes,
	Jann Horn, Liam.Howlett, minchan, jaewon31.kim, charante,
	Suren Baghdasaryan, Kalesh Singh, T.J. Mercier, Isaac Manjarres,
	iamjoonsoo.kim, quic_charante

On 20.01.25 16:29, Zi Yan wrote:
> On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
>> On 20.01.25 01:39, Zi Yan wrote:
>>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
>>> <snip>
>>>>>>>>
>>>>>>>>
>>>>>>>> However, with this workaround, we can't use transparent huge pages.
>>>>>>>>
>>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
>>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
>>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
>>>>
>>>> Currently, THP might be mTHP, which can have a significantly smaller
>>>> size than 32MB. For
>>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
>>>> is possible.
>>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
>>>>
>>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
>>>> without necessarily
>>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
>>>> pageblock size wouldn't
>>>> be necessary?
>>>
>>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
>>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
>>> to be changed and kernel needs to be recompiled. Not sure if it is OK
>>> for Juan's use case.
>>
>>
>> IIRC, we set pageblock size == THP size because this is the granularity
>> we want to optimize defragmentation for. ("try keep pageblock
>> granularity of the same memory type: movable vs. unmovable")
> 
> Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
> does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
> (2MB mTHP here) is good enough, reducing pageblock size works.
> 
>>
>> However, the buddy already supports having different pagetypes for large
>> allocations.
> 
> Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
> MIGRATE_MOVABLE can be merged.

Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.

> 
>>
>> So we could leave MAX_ORDER alone and try adjusting the pageblock size
>> in these setups. pageblock size is already variable on some
>> architectures IIRC.
> 
> Making pageblock size a boot time variable? We might want to warn
> sysadmin/user that >pageblock_order THP/mTHP creation will suffer.

Yes, some way to configure it.

> 
>>
>> We'd only have to check if all of the THP logic can deal with pageblock
>> size < THP size.
> 
> Probably yes, pageblock should be independent of THP logic, although
> compaction (used to create THPs) logic is based on pageblock.

Right. As raised in the past, we need a higher level mechanism that 
tries to group pageblocks together during comapction/conversion to limit 
fragmentation on a higher level.

I assume that many use cases would be fine with not using 32MB/512MB 
THPs at all for now -- and instead using 2 MB ones. Of course, for very 
large installations it might be different.

>>
>> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
> 
> This is also good for virtio-mem, since the offline memory block size
> can also be reduced. I remember you complained about it before.

Yes, yes, yes! :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-20 17:59               ` David Hildenbrand
@ 2025-01-22  2:08                 ` Juan Yescas
  2025-01-22  2:24                   ` Zi Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Juan Yescas @ 2025-01-22  2:08 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Zi Yan, Barry Song, linux-mm, muchun.song, rppt, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 20.01.25 16:29, Zi Yan wrote:
> > On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
> >> On 20.01.25 01:39, Zi Yan wrote:
> >>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
> >>> <snip>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> However, with this workaround, we can't use transparent huge pages.
> >>>>>>>>
> >>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
> >>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
> >>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
> >>>>

Thanks, I can see the initialization in include/linux/pageblock-flags.h

#define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)

> >>>> Currently, THP might be mTHP, which can have a significantly smaller
> >>>> size than 32MB. For
> >>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
> >>>> is possible.
> >>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
> >>>>
> >>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
> >>>> without necessarily
> >>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
> >>>> pageblock size wouldn't
> >>>> be necessary?

Do you mean with mTHP? We haven't explored that option.

> >>>
> >>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
> >>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
> >>> to be changed and kernel needs to be recompiled. Not sure if it is OK
> >>> for Juan's use case.
> >>

The main goal is to reserve only the necessary CMA memory for the
drivers, which is
usually the same for 4kb and 16kb page size kernels.

> >>
> >> IIRC, we set pageblock size == THP size because this is the granularity
> >> we want to optimize defragmentation for. ("try keep pageblock
> >> granularity of the same memory type: movable vs. unmovable")
> >
> > Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
> > does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
> > (2MB mTHP here) is good enough, reducing pageblock size works.
> >
> >>
> >> However, the buddy already supports having different pagetypes for large
> >> allocations.
> >
> > Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
> > MIGRATE_MOVABLE can be merged.
>
> Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.
>
> >
> >>
> >> So we could leave MAX_ORDER alone and try adjusting the pageblock size
> >> in these setups. pageblock size is already variable on some
> >> architectures IIRC.
> >

Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the
16KiB page size kernel,
I tried these 2 configurations:

#define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES)

and

#define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES)

with both of them, the kernel failed to boot.

> > Making pageblock size a boot time variable? We might want to warn
> > sysadmin/user that >pageblock_order THP/mTHP creation will suffer.
>
> Yes, some way to configure it.
>
> >
> >>
> >> We'd only have to check if all of the THP logic can deal with pageblock
> >> size < THP size.
> >

The reason that THP was disabled in my experiment is because this
assertion failed

mm/huge_memory.c
/*
* hugepages can't be allocated by the buddy allocator
*/
MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER);

when

    config ARCH_FORCE_MAX_ORDER
        int
        .....
        default "8" if ARM64_16K_PAGES


> > Probably yes, pageblock should be independent of THP logic, although
> > compaction (used to create THPs) logic is based on pageblock.
>
> Right. As raised in the past, we need a higher level mechanism that
> tries to group pageblocks together during comapction/conversion to limit
> fragmentation on a higher level.
>
> I assume that many use cases would be fine with not using 32MB/512MB
> THPs at all for now -- and instead using 2 MB ones. Of course, for very
> large installations it might be different.
>
> >>
> >> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
> >

I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get:

PAGE_SIZE  | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES
4KiB              |                      15                   |  4KiB
* 32KiB = 128MiB
16KiB            |                      13                   |  16KiB
* 8KiB = 128MiB
64KiB            |                      13                   |  64KiB
* 8KiB = 512MiB

> > This is also good for virtio-mem, since the offline memory block size
> > can also be reduced. I remember you complained about it before.
>
> Yes, yes, yes! :)
>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-22  2:08                 ` Juan Yescas
@ 2025-01-22  2:24                   ` Zi Yan
  2025-01-22  4:06                     ` Juan Yescas
  2025-01-22  8:11                     ` David Hildenbrand
  0 siblings, 2 replies; 21+ messages in thread
From: Zi Yan @ 2025-01-22  2:24 UTC (permalink / raw)
  To: Juan Yescas, David Hildenbrand
  Cc: Barry Song, linux-mm, muchun.song, rppt, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote:
> On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 20.01.25 16:29, Zi Yan wrote:
> > > On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
> > >> On 20.01.25 01:39, Zi Yan wrote:
> > >>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
> > >>> <snip>
> > >>>>>>>>
> > >>>>>>>>
> > >>>>>>>> However, with this workaround, we can't use transparent huge pages.
> > >>>>>>>>
> > >>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
> > >>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
> > >>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
> > >>>>
>
> Thanks, I can see the initialization in include/linux/pageblock-flags.h
>
> #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
>
> > >>>> Currently, THP might be mTHP, which can have a significantly smaller
> > >>>> size than 32MB. For
> > >>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
> > >>>> is possible.
> > >>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
> > >>>>
> > >>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
> > >>>> without necessarily
> > >>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
> > >>>> pageblock size wouldn't
> > >>>> be necessary?
>
> Do you mean with mTHP? We haven't explored that option.

Yes. Unless your applications have special demands for PMD THPs. 2MB
mTHP should work.

>
> > >>>
> > >>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
> > >>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
> > >>> to be changed and kernel needs to be recompiled. Not sure if it is OK
> > >>> for Juan's use case.
> > >>
>
> The main goal is to reserve only the necessary CMA memory for the
> drivers, which is
> usually the same for 4kb and 16kb page size kernels.

Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the
minimal CMA alignment size. Can you deploy that kernel to production?
If yes, you can use mTHP instead of PMD THP and still get the CMA
alignemnt you want.

>
> > >>
> > >> IIRC, we set pageblock size == THP size because this is the granularity
> > >> we want to optimize defragmentation for. ("try keep pageblock
> > >> granularity of the same memory type: movable vs. unmovable")
> > >
> > > Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
> > > does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
> > > (2MB mTHP here) is good enough, reducing pageblock size works.
> > >
> > >>
> > >> However, the buddy already supports having different pagetypes for large
> > >> allocations.
> > >
> > > Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
> > > MIGRATE_MOVABLE can be merged.
> >
> > Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.
> >
> > >
> > >>
> > >> So we could leave MAX_ORDER alone and try adjusting the pageblock size
> > >> in these setups. pageblock size is already variable on some
> > >> architectures IIRC.
> > >
>
> Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the
> 16KiB page size kernel,
> I tried these 2 configurations:
>
> #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES)
>
> and
>
> #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES)
>
> with both of them, the kernel failed to boot.

CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES.
So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock
size. pageblock size is determined by pageblock order, which is
affected by MAX_PAGE_ORDER.

>
> > > Making pageblock size a boot time variable? We might want to warn
> > > sysadmin/user that >pageblock_order THP/mTHP creation will suffer.
> >
> > Yes, some way to configure it.
> >
> > >
> > >>
> > >> We'd only have to check if all of the THP logic can deal with pageblock
> > >> size < THP size.
> > >
>
> The reason that THP was disabled in my experiment is because this
> assertion failed
>
> mm/huge_memory.c
> /*
> * hugepages can't be allocated by the buddy allocator
> */
> MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER);
>
> when
>
>     config ARCH_FORCE_MAX_ORDER
>         int
>         .....
>         default "8" if ARM64_16K_PAGES
>

You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works.

>
> > > Probably yes, pageblock should be independent of THP logic, although
> > > compaction (used to create THPs) logic is based on pageblock.
> >
> > Right. As raised in the past, we need a higher level mechanism that
> > tries to group pageblocks together during comapction/conversion to limit
> > fragmentation on a higher level.
> >
> > I assume that many use cases would be fine with not using 32MB/512MB
> > THPs at all for now -- and instead using 2 MB ones. Of course, for very
> > large installations it might be different.
> >
> > >>
> > >> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
> > >
>
> I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get:
>
> PAGE_SIZE  | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES
> 4KiB              |                      15                   |  4KiB
> * 32KiB = 128MiB
> 16KiB            |                      13                   |  16KiB
> * 8KiB = 128MiB
> 64KiB            |                      13                   |  64KiB
> * 8KiB = 512MiB
>
> > > This is also good for virtio-mem, since the offline memory block size
> > > can also be reduced. I remember you complained about it before.
> >
> > Yes, yes, yes! :)
> >

David's proposal should work in general, but will might take non-trivial
amount of work:

1. keep pageblock size always at 4MB for all arch.
2. adjust existing pageblock users, like compaction, to work on a
different range, independent of pageblock.
    a. for anti-fragmentation mechanism, multiple pageblocks might have
    different migratetypes but would be compacted to generate huge
    pages, but how to align their migratetypes is TBD.
3. other corner case handlings.


The final question is that Barry mentioned that over-reserved CMA areas
can be used for movable page allocations. Why does it not work for you?

-- 
Best Regards,
Yan, Zi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-22  2:24                   ` Zi Yan
@ 2025-01-22  4:06                     ` Juan Yescas
  2025-01-22  6:52                       ` Barry Song
  2025-01-22  8:11                     ` David Hildenbrand
  1 sibling, 1 reply; 21+ messages in thread
From: Juan Yescas @ 2025-01-22  4:06 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, Barry Song, linux-mm, muchun.song, rppt,
	osalvador, akpm, lorenzo.stoakes, Jann Horn, Liam.Howlett,
	minchan, jaewon31.kim, Suren Baghdasaryan, Kalesh Singh,
	T.J. Mercier, Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Tue, Jan 21, 2025 at 6:24 PM Zi Yan <ziy@nvidia.com> wrote:
>
> On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote:
> > On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
> > >
> > > On 20.01.25 16:29, Zi Yan wrote:
> > > > On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
> > > >> On 20.01.25 01:39, Zi Yan wrote:
> > > >>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
> > > >>> <snip>
> > > >>>>>>>>
> > > >>>>>>>>
> > > >>>>>>>> However, with this workaround, we can't use transparent huge pages.
> > > >>>>>>>>
> > > >>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
> > > >>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
> > > >>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
> > > >>>>
> >
> > Thanks, I can see the initialization in include/linux/pageblock-flags.h
> >
> > #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
> >
> > > >>>> Currently, THP might be mTHP, which can have a significantly smaller
> > > >>>> size than 32MB. For
> > > >>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
> > > >>>> is possible.
> > > >>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
> > > >>>>
> > > >>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
> > > >>>> without necessarily
> > > >>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
> > > >>>> pageblock size wouldn't
> > > >>>> be necessary?
> >
> > Do you mean with mTHP? We haven't explored that option.
>
> Yes. Unless your applications have special demands for PMD THPs. 2MB
> mTHP should work.
>
> >
> > > >>>
> > > >>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
> > > >>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
> > > >>> to be changed and kernel needs to be recompiled. Not sure if it is OK
> > > >>> for Juan's use case.
> > > >>
> >
> > The main goal is to reserve only the necessary CMA memory for the
> > drivers, which is
> > usually the same for 4kb and 16kb page size kernels.
>
> Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the
> minimal CMA alignment size. Can you deploy that kernel to production?

We can't deploy that because many Android partners are using PMD THP instead
of mTHP.

> If yes, you can use mTHP instead of PMD THP and still get the CMA
> alignemnt you want.
>
> >
> > > >>
> > > >> IIRC, we set pageblock size == THP size because this is the granularity
> > > >> we want to optimize defragmentation for. ("try keep pageblock
> > > >> granularity of the same memory type: movable vs. unmovable")
> > > >
> > > > Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
> > > > does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
> > > > (2MB mTHP here) is good enough, reducing pageblock size works.
> > > >
> > > >>
> > > >> However, the buddy already supports having different pagetypes for large
> > > >> allocations.
> > > >
> > > > Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
> > > > MIGRATE_MOVABLE can be merged.
> > >
> > > Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.
> > >
> > > >
> > > >>
> > > >> So we could leave MAX_ORDER alone and try adjusting the pageblock size
> > > >> in these setups. pageblock size is already variable on some
> > > >> architectures IIRC.
> > > >
> >
> > Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the
> > 16KiB page size kernel,
> > I tried these 2 configurations:
> >
> > #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES)
> >
> > and
> >
> > #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES)
> >
> > with both of them, the kernel failed to boot.
>
> CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES.
> So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock
> size. pageblock size is determined by pageblock order, which is
> affected by MAX_PAGE_ORDER.
>
> >
> > > > Making pageblock size a boot time variable? We might want to warn
> > > > sysadmin/user that >pageblock_order THP/mTHP creation will suffer.
> > >
> > > Yes, some way to configure it.
> > >
> > > >
> > > >>
> > > >> We'd only have to check if all of the THP logic can deal with pageblock
> > > >> size < THP size.
> > > >
> >
> > The reason that THP was disabled in my experiment is because this
> > assertion failed
> >
> > mm/huge_memory.c
> > /*
> > * hugepages can't be allocated by the buddy allocator
> > */
> > MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER);
> >
> > when
> >
> >     config ARCH_FORCE_MAX_ORDER
> >         int
> >         .....
> >         default "8" if ARM64_16K_PAGES
> >
>
> You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works.
>

We'll do that and post the results.

> >
> > > > Probably yes, pageblock should be independent of THP logic, although
> > > > compaction (used to create THPs) logic is based on pageblock.
> > >
> > > Right. As raised in the past, we need a higher level mechanism that
> > > tries to group pageblocks together during comapction/conversion to limit
> > > fragmentation on a higher level.
> > >
> > > I assume that many use cases would be fine with not using 32MB/512MB
> > > THPs at all for now -- and instead using 2 MB ones. Of course, for very
> > > large installations it might be different.
> > >
> > > >>
> > > >> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
> > > >
> >
> > I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get:
> >
> > PAGE_SIZE  | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES
> > 4KiB              |                      15                   |  4KiB
> > * 32KiB = 128MiB
> > 16KiB            |                      13                   |  16KiB
> > * 8KiB = 128MiB
> > 64KiB            |                      13                   |  64KiB
> > * 8KiB = 512MiB
> >
> > > > This is also good for virtio-mem, since the offline memory block size
> > > > can also be reduced. I remember you complained about it before.
> > >
> > > Yes, yes, yes! :)
> > >
>
> David's proposal should work in general, but will might take non-trivial
> amount of work:
>
> 1. keep pageblock size always at 4MB for all arch.
> 2. adjust existing pageblock users, like compaction, to work on a
> different range, independent of pageblock.
>     a. for anti-fragmentation mechanism, multiple pageblocks might have
>     different migratetypes but would be compacted to generate huge
>     pages, but how to align their migratetypes is TBD.
> 3. other corner case handlings.
>
>
> The final question is that Barry mentioned that over-reserved CMA areas
> can be used for movable page allocations. Why does it not work for you?

I need to run more experiments to see what type of page allocations in
the system is the dominant one (unmovable or movable). If it is movable,
over-reserved CMA areas should be fine.

>
> --
> Best Regards,
> Yan, Zi
>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-22  4:06                     ` Juan Yescas
@ 2025-01-22  6:52                       ` Barry Song
  2025-01-22  8:04                         ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Barry Song @ 2025-01-22  6:52 UTC (permalink / raw)
  To: Juan Yescas
  Cc: Zi Yan, David Hildenbrand, linux-mm, muchun.song, rppt,
	osalvador, akpm, lorenzo.stoakes, Jann Horn, Liam.Howlett,
	minchan, jaewon31.kim, Suren Baghdasaryan, Kalesh Singh,
	T.J. Mercier, Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Wed, Jan 22, 2025 at 5:06 PM Juan Yescas <jyescas@google.com> wrote:
>
> On Tue, Jan 21, 2025 at 6:24 PM Zi Yan <ziy@nvidia.com> wrote:
> >
> > On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote:
> > > On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
> > > >
> > > > On 20.01.25 16:29, Zi Yan wrote:
> > > > > On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
> > > > >> On 20.01.25 01:39, Zi Yan wrote:
> > > > >>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
> > > > >>> <snip>
> > > > >>>>>>>>
> > > > >>>>>>>>
> > > > >>>>>>>> However, with this workaround, we can't use transparent huge pages.
> > > > >>>>>>>>
> > > > >>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
> > > > >>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
> > > > >>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
> > > > >>>>
> > >
> > > Thanks, I can see the initialization in include/linux/pageblock-flags.h
> > >
> > > #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
> > >
> > > > >>>> Currently, THP might be mTHP, which can have a significantly smaller
> > > > >>>> size than 32MB. For
> > > > >>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
> > > > >>>> is possible.
> > > > >>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
> > > > >>>>
> > > > >>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
> > > > >>>> without necessarily
> > > > >>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
> > > > >>>> pageblock size wouldn't
> > > > >>>> be necessary?
> > >
> > > Do you mean with mTHP? We haven't explored that option.
> >
> > Yes. Unless your applications have special demands for PMD THPs. 2MB
> > mTHP should work.
> >
> > >
> > > > >>>
> > > > >>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
> > > > >>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
> > > > >>> to be changed and kernel needs to be recompiled. Not sure if it is OK
> > > > >>> for Juan's use case.
> > > > >>
> > >
> > > The main goal is to reserve only the necessary CMA memory for the
> > > drivers, which is
> > > usually the same for 4kb and 16kb page size kernels.
> >
> > Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the
> > minimal CMA alignment size. Can you deploy that kernel to production?
>
> We can't deploy that because many Android partners are using PMD THP instead
> of mTHP.
>
> > If yes, you can use mTHP instead of PMD THP and still get the CMA
> > alignemnt you want.
> >
> > >
> > > > >>
> > > > >> IIRC, we set pageblock size == THP size because this is the granularity
> > > > >> we want to optimize defragmentation for. ("try keep pageblock
> > > > >> granularity of the same memory type: movable vs. unmovable")
> > > > >
> > > > > Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
> > > > > does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
> > > > > (2MB mTHP here) is good enough, reducing pageblock size works.
> > > > >
> > > > >>
> > > > >> However, the buddy already supports having different pagetypes for large
> > > > >> allocations.
> > > > >
> > > > > Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
> > > > > MIGRATE_MOVABLE can be merged.
> > > >
> > > > Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.
> > > >
> > > > >
> > > > >>
> > > > >> So we could leave MAX_ORDER alone and try adjusting the pageblock size
> > > > >> in these setups. pageblock size is already variable on some
> > > > >> architectures IIRC.
> > > > >
> > >
> > > Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the
> > > 16KiB page size kernel,
> > > I tried these 2 configurations:
> > >
> > > #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES)
> > >
> > > and
> > >
> > > #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES)
> > >
> > > with both of them, the kernel failed to boot.
> >
> > CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES.
> > So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock
> > size. pageblock size is determined by pageblock order, which is
> > affected by MAX_PAGE_ORDER.
> >
> > >
> > > > > Making pageblock size a boot time variable? We might want to warn
> > > > > sysadmin/user that >pageblock_order THP/mTHP creation will suffer.
> > > >
> > > > Yes, some way to configure it.
> > > >
> > > > >
> > > > >>
> > > > >> We'd only have to check if all of the THP logic can deal with pageblock
> > > > >> size < THP size.
> > > > >
> > >
> > > The reason that THP was disabled in my experiment is because this
> > > assertion failed
> > >
> > > mm/huge_memory.c
> > > /*
> > > * hugepages can't be allocated by the buddy allocator
> > > */
> > > MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER);
> > >
> > > when
> > >
> > >     config ARCH_FORCE_MAX_ORDER
> > >         int
> > >         .....
> > >         default "8" if ARM64_16K_PAGES
> > >
> >
> > You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works.
> >
>
> We'll do that and post the results.
>
> > >
> > > > > Probably yes, pageblock should be independent of THP logic, although
> > > > > compaction (used to create THPs) logic is based on pageblock.
> > > >
> > > > Right. As raised in the past, we need a higher level mechanism that
> > > > tries to group pageblocks together during comapction/conversion to limit
> > > > fragmentation on a higher level.
> > > >
> > > > I assume that many use cases would be fine with not using 32MB/512MB
> > > > THPs at all for now -- and instead using 2 MB ones. Of course, for very
> > > > large installations it might be different.
> > > >
> > > > >>
> > > > >> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
> > > > >
> > >
> > > I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get:
> > >
> > > PAGE_SIZE  | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES
> > > 4KiB              |                      15                   |  4KiB
> > > * 32KiB = 128MiB
> > > 16KiB            |                      13                   |  16KiB
> > > * 8KiB = 128MiB
> > > 64KiB            |                      13                   |  64KiB
> > > * 8KiB = 512MiB
> > >
> > > > > This is also good for virtio-mem, since the offline memory block size
> > > > > can also be reduced. I remember you complained about it before.
> > > >
> > > > Yes, yes, yes! :)
> > > >
> >
> > David's proposal should work in general, but will might take non-trivial
> > amount of work:
> >
> > 1. keep pageblock size always at 4MB for all arch.
> > 2. adjust existing pageblock users, like compaction, to work on a
> > different range, independent of pageblock.
> >     a. for anti-fragmentation mechanism, multiple pageblocks might have
> >     different migratetypes but would be compacted to generate huge
> >     pages, but how to align their migratetypes is TBD.
> > 3. other corner case handlings.
> >
> >
> > The final question is that Barry mentioned that over-reserved CMA areas
> > can be used for movable page allocations. Why does it not work for you?
>
> I need to run more experiments to see what type of page allocations in
> the system is the dominant one (unmovable or movable). If it is movable,
> over-reserved CMA areas should be fine.

My understanding is that over-reserving 28MiB is unlikely to cause
noticeable regression, given that we frequently handle allocations like
GFP_HIGHUSER_MOVABLE or similar, which are significantly larger
than 28MiB. However, David also mentioned a reservation of 512MiB
for a 64KiB page size. In that case, 512MiB might be large enough to
potentially impact the balance between movable and unmovable
allocations. For instance, if we still have 512MiB reserved in CMA
but are allocating unmovable folios(for example dma-buf), we could
fail an allocation even when there’s actually capacity. So, in any case,
there is still work to be done here.

By the way, is 512MiB truly a reasonable size for THP? it seems
that 2MiB is a more suitable default size for THP. Both 4KiB, 16KiB,
64KiB support 2MB large folios. For 4KiB, it is PMD-mmaped, for
16KiB and 64KiB, it is cont-pte.

>
> >
> > --
> > Best Regards,
> > Yan, Zi
> >

Thanks
Barry


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-22  6:52                       ` Barry Song
@ 2025-01-22  8:04                         ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2025-01-22  8:04 UTC (permalink / raw)
  To: Barry Song, Juan Yescas
  Cc: Zi Yan, linux-mm, muchun.song, rppt, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	Suren Baghdasaryan, Kalesh Singh, T.J. Mercier, Isaac Manjarres,
	iamjoonsoo.kim, quic_charante

On 22.01.25 07:52, Barry Song wrote:
> On Wed, Jan 22, 2025 at 5:06 PM Juan Yescas <jyescas@google.com> wrote:
>>
>> On Tue, Jan 21, 2025 at 6:24 PM Zi Yan <ziy@nvidia.com> wrote:
>>>
>>> On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote:
>>>> On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
>>>>>
>>>>> On 20.01.25 16:29, Zi Yan wrote:
>>>>>> On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
>>>>>>> On 20.01.25 01:39, Zi Yan wrote:
>>>>>>>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
>>>>>>>> <snip>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> However, with this workaround, we can't use transparent huge pages.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
>>>>>>>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
>>>>>>>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
>>>>>>>>>
>>>>
>>>> Thanks, I can see the initialization in include/linux/pageblock-flags.h
>>>>
>>>> #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
>>>>
>>>>>>>>> Currently, THP might be mTHP, which can have a significantly smaller
>>>>>>>>> size than 32MB. For
>>>>>>>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
>>>>>>>>> is possible.
>>>>>>>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
>>>>>>>>>
>>>>>>>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
>>>>>>>>> without necessarily
>>>>>>>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
>>>>>>>>> pageblock size wouldn't
>>>>>>>>> be necessary?
>>>>
>>>> Do you mean with mTHP? We haven't explored that option.
>>>
>>> Yes. Unless your applications have special demands for PMD THPs. 2MB
>>> mTHP should work.
>>>
>>>>
>>>>>>>>
>>>>>>>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
>>>>>>>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
>>>>>>>> to be changed and kernel needs to be recompiled. Not sure if it is OK
>>>>>>>> for Juan's use case.
>>>>>>>
>>>>
>>>> The main goal is to reserve only the necessary CMA memory for the
>>>> drivers, which is
>>>> usually the same for 4kb and 16kb page size kernels.
>>>
>>> Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the
>>> minimal CMA alignment size. Can you deploy that kernel to production?
>>
>> We can't deploy that because many Android partners are using PMD THP instead
>> of mTHP.
>>
>>> If yes, you can use mTHP instead of PMD THP and still get the CMA
>>> alignemnt you want.
>>>
>>>>
>>>>>>>
>>>>>>> IIRC, we set pageblock size == THP size because this is the granularity
>>>>>>> we want to optimize defragmentation for. ("try keep pageblock
>>>>>>> granularity of the same memory type: movable vs. unmovable")
>>>>>>
>>>>>> Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
>>>>>> does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
>>>>>> (2MB mTHP here) is good enough, reducing pageblock size works.
>>>>>>
>>>>>>>
>>>>>>> However, the buddy already supports having different pagetypes for large
>>>>>>> allocations.
>>>>>>
>>>>>> Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
>>>>>> MIGRATE_MOVABLE can be merged.
>>>>>
>>>>> Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> So we could leave MAX_ORDER alone and try adjusting the pageblock size
>>>>>>> in these setups. pageblock size is already variable on some
>>>>>>> architectures IIRC.
>>>>>>
>>>>
>>>> Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the
>>>> 16KiB page size kernel,
>>>> I tried these 2 configurations:
>>>>
>>>> #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES)
>>>>
>>>> and
>>>>
>>>> #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES)
>>>>
>>>> with both of them, the kernel failed to boot.
>>>
>>> CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES.
>>> So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock
>>> size. pageblock size is determined by pageblock order, which is
>>> affected by MAX_PAGE_ORDER.
>>>
>>>>
>>>>>> Making pageblock size a boot time variable? We might want to warn
>>>>>> sysadmin/user that >pageblock_order THP/mTHP creation will suffer.
>>>>>
>>>>> Yes, some way to configure it.
>>>>>
>>>>>>
>>>>>>>
>>>>>>> We'd only have to check if all of the THP logic can deal with pageblock
>>>>>>> size < THP size.
>>>>>>
>>>>
>>>> The reason that THP was disabled in my experiment is because this
>>>> assertion failed
>>>>
>>>> mm/huge_memory.c
>>>> /*
>>>> * hugepages can't be allocated by the buddy allocator
>>>> */
>>>> MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER);
>>>>
>>>> when
>>>>
>>>>      config ARCH_FORCE_MAX_ORDER
>>>>          int
>>>>          .....
>>>>          default "8" if ARM64_16K_PAGES
>>>>
>>>
>>> You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works.
>>>
>>
>> We'll do that and post the results.
>>
>>>>
>>>>>> Probably yes, pageblock should be independent of THP logic, although
>>>>>> compaction (used to create THPs) logic is based on pageblock.
>>>>>
>>>>> Right. As raised in the past, we need a higher level mechanism that
>>>>> tries to group pageblocks together during comapction/conversion to limit
>>>>> fragmentation on a higher level.
>>>>>
>>>>> I assume that many use cases would be fine with not using 32MB/512MB
>>>>> THPs at all for now -- and instead using 2 MB ones. Of course, for very
>>>>> large installations it might be different.
>>>>>
>>>>>>>
>>>>>>> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
>>>>>>
>>>>
>>>> I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get:
>>>>
>>>> PAGE_SIZE  | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES
>>>> 4KiB              |                      15                   |  4KiB
>>>> * 32KiB = 128MiB
>>>> 16KiB            |                      13                   |  16KiB
>>>> * 8KiB = 128MiB
>>>> 64KiB            |                      13                   |  64KiB
>>>> * 8KiB = 512MiB
>>>>
>>>>>> This is also good for virtio-mem, since the offline memory block size
>>>>>> can also be reduced. I remember you complained about it before.
>>>>>
>>>>> Yes, yes, yes! :)
>>>>>
>>>
>>> David's proposal should work in general, but will might take non-trivial
>>> amount of work:
>>>
>>> 1. keep pageblock size always at 4MB for all arch.
>>> 2. adjust existing pageblock users, like compaction, to work on a
>>> different range, independent of pageblock.
>>>      a. for anti-fragmentation mechanism, multiple pageblocks might have
>>>      different migratetypes but would be compacted to generate huge
>>>      pages, but how to align their migratetypes is TBD.
>>> 3. other corner case handlings.
>>>
>>>
>>> The final question is that Barry mentioned that over-reserved CMA areas
>>> can be used for movable page allocations. Why does it not work for you?
>>
>> I need to run more experiments to see what type of page allocations in
>> the system is the dominant one (unmovable or movable). If it is movable,
>> over-reserved CMA areas should be fine.
> 
> My understanding is that over-reserving 28MiB is unlikely to cause
> noticeable regression, given that we frequently handle allocations like
> GFP_HIGHUSER_MOVABLE or similar, which are significantly larger
> than 28MiB. However, David also mentioned a reservation of 512MiB
> for a 64KiB page size. In that case, 512MiB might be large enough to
> potentially impact the balance between movable and unmovable
> allocations. For instance, if we still have 512MiB reserved in CMA
> but are allocating unmovable folios(for example dma-buf), we could
> fail an allocation even when there’s actually capacity. So, in any case,
> there is still work to be done here.
> 
> By the way, is 512MiB truly a reasonable size for THP? 

No, it's absolutely stupid for most setups.

Just think of a small VM with 4 GiB: great you have 8 pageblocks and 
probably never get a single THP.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-22  2:24                   ` Zi Yan
  2025-01-22  4:06                     ` Juan Yescas
@ 2025-01-22  8:11                     ` David Hildenbrand
  2025-01-22 12:49                       ` Zi Yan
  1 sibling, 1 reply; 21+ messages in thread
From: David Hildenbrand @ 2025-01-22  8:11 UTC (permalink / raw)
  To: Zi Yan, Juan Yescas
  Cc: Barry Song, linux-mm, muchun.song, rppt, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On 22.01.25 03:24, Zi Yan wrote:
> On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote:
>> On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 20.01.25 16:29, Zi Yan wrote:
>>>> On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
>>>>> On 20.01.25 01:39, Zi Yan wrote:
>>>>>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
>>>>>> <snip>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> However, with this workaround, we can't use transparent huge pages.
>>>>>>>>>>>
>>>>>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
>>>>>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
>>>>>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
>>>>>>>
>>
>> Thanks, I can see the initialization in include/linux/pageblock-flags.h
>>
>> #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
>>
>>>>>>> Currently, THP might be mTHP, which can have a significantly smaller
>>>>>>> size than 32MB. For
>>>>>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
>>>>>>> is possible.
>>>>>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
>>>>>>>
>>>>>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
>>>>>>> without necessarily
>>>>>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
>>>>>>> pageblock size wouldn't
>>>>>>> be necessary?
>>
>> Do you mean with mTHP? We haven't explored that option.
> 
> Yes. Unless your applications have special demands for PMD THPs. 2MB
> mTHP should work.
> 
>>
>>>>>>
>>>>>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
>>>>>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
>>>>>> to be changed and kernel needs to be recompiled. Not sure if it is OK
>>>>>> for Juan's use case.
>>>>>
>>
>> The main goal is to reserve only the necessary CMA memory for the
>> drivers, which is
>> usually the same for 4kb and 16kb page size kernels.
> 
> Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the
> minimal CMA alignment size. Can you deploy that kernel to production?
> If yes, you can use mTHP instead of PMD THP and still get the CMA
> alignemnt you want.
> 
>>
>>>>>
>>>>> IIRC, we set pageblock size == THP size because this is the granularity
>>>>> we want to optimize defragmentation for. ("try keep pageblock
>>>>> granularity of the same memory type: movable vs. unmovable")
>>>>
>>>> Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
>>>> does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
>>>> (2MB mTHP here) is good enough, reducing pageblock size works.
>>>>
>>>>>
>>>>> However, the buddy already supports having different pagetypes for large
>>>>> allocations.
>>>>
>>>> Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
>>>> MIGRATE_MOVABLE can be merged.
>>>
>>> Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.
>>>
>>>>
>>>>>
>>>>> So we could leave MAX_ORDER alone and try adjusting the pageblock size
>>>>> in these setups. pageblock size is already variable on some
>>>>> architectures IIRC.
>>>>
>>
>> Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the
>> 16KiB page size kernel,
>> I tried these 2 configurations:
>>
>> #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES)
>>
>> and
>>
>> #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES)
>>
>> with both of them, the kernel failed to boot.
> 
> CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES.
> So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock
> size. pageblock size is determined by pageblock order, which is
> affected by MAX_PAGE_ORDER.

Yes, most importantly we must not exceed MAX_PAGE_ORDER. Going smaller 
is the common case.

> 
>>
>>>> Making pageblock size a boot time variable? We might want to warn
>>>> sysadmin/user that >pageblock_order THP/mTHP creation will suffer.
>>>
>>> Yes, some way to configure it.
>>>
>>>>
>>>>>
>>>>> We'd only have to check if all of the THP logic can deal with pageblock
>>>>> size < THP size.
>>>>
>>
>> The reason that THP was disabled in my experiment is because this
>> assertion failed
>>
>> mm/huge_memory.c
>> /*
>> * hugepages can't be allocated by the buddy allocator
>> */
>> MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER);
>>
>> when
>>
>>      config ARCH_FORCE_MAX_ORDER
>>          int
>>          .....
>>          default "8" if ARM64_16K_PAGES
>>
> 
> You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works.
> 
>>
>>>> Probably yes, pageblock should be independent of THP logic, although
>>>> compaction (used to create THPs) logic is based on pageblock.
>>>
>>> Right. As raised in the past, we need a higher level mechanism that
>>> tries to group pageblocks together during comapction/conversion to limit
>>> fragmentation on a higher level.
>>>
>>> I assume that many use cases would be fine with not using 32MB/512MB
>>> THPs at all for now -- and instead using 2 MB ones. Of course, for very
>>> large installations it might be different.
>>>
>>>>>
>>>>> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
>>>>
>>
>> I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get:
>>
>> PAGE_SIZE  | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES
>> 4KiB              |                      15                   |  4KiB
>> * 32KiB = 128MiB
>> 16KiB            |                      13                   |  16KiB
>> * 8KiB = 128MiB
>> 64KiB            |                      13                   |  64KiB
>> * 8KiB = 512MiB
>>
>>>> This is also good for virtio-mem, since the offline memory block size
>>>> can also be reduced. I remember you complained about it before.
>>>
>>> Yes, yes, yes! :)
>>>
> 
> David's proposal should work in general, but will might take non-trivial
> amount of work:
> 
> 1. keep pageblock size always at 4MB for all arch.

My proposal was to leave it unchanged for most archs, but allow for 
overriding it on aarch64 as a first step.

s390x is happy with 1MiB, x86 with 2MiB. It's aarch64 that does 
questionable things :)

CONFIG_HUGETLB_PAGE_SIZE_VARIABLE already allows for variable 
pageblock_order. That whole code likely needs some love, but most of it 
should already be there.


In the future, I could imagine just going for a smaller pageblock size 
on aarch64, and handling fragmentation avoidance for larger THPs (512 
MiB really is close to 1 GiB on x86) differently, not using pageblocks.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-22  8:11                     ` David Hildenbrand
@ 2025-01-22 12:49                       ` Zi Yan
  2025-01-22 13:58                         ` David Hildenbrand
  0 siblings, 1 reply; 21+ messages in thread
From: Zi Yan @ 2025-01-22 12:49 UTC (permalink / raw)
  To: David Hildenbrand, Juan Yescas
  Cc: Barry Song, linux-mm, muchun.song, rppt, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

On Wed Jan 22, 2025 at 3:11 AM EST, David Hildenbrand wrote:
> On 22.01.25 03:24, Zi Yan wrote:
>> On Tue Jan 21, 2025 at 9:08 PM EST, Juan Yescas wrote:
>>> On Mon, Jan 20, 2025 at 9:59 AM David Hildenbrand <david@redhat.com> wrote:
>>>>
>>>> On 20.01.25 16:29, Zi Yan wrote:
>>>>> On Mon Jan 20, 2025 at 3:14 AM EST, David Hildenbrand wrote:
>>>>>> On 20.01.25 01:39, Zi Yan wrote:
>>>>>>> On Sun Jan 19, 2025 at 6:55 PM EST, Barry Song wrote:
>>>>>>> <snip>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> However, with this workaround, we can't use transparent huge pages.
>>>>>>>>>>>>
>>>>>>>>>>>> Is the CMA_MIN_ALIGNMENT_BYTES requirement alignment only to support huge pages?
>>>>>>>>> No. CMA_MIN_ALIGNMENT_BYTES is limited by CMA_MIN_ALIGNMENT_PAGES, which
>>>>>>>>> is equal to pageblock size. Enabling THP just bumps the pageblock size.
>>>>>>>>
>>>
>>> Thanks, I can see the initialization in include/linux/pageblock-flags.h
>>>
>>> #define pageblock_order MIN_T(unsigned int, HUGETLB_PAGE_ORDER, MAX_PAGE_ORDER)
>>>
>>>>>>>> Currently, THP might be mTHP, which can have a significantly smaller
>>>>>>>> size than 32MB. For
>>>>>>>> example, on arm64 systems with a 16KiB page size, a 2MB CONT-PTE mTHP
>>>>>>>> is possible.
>>>>>>>> Additionally, mTHP relies on the CONFIG_TRANSPARENT_HUGEPAGE configuration.
>>>>>>>>
>>>>>>>> I wonder if it's possible to enable CONFIG_TRANSPARENT_HUGEPAGE
>>>>>>>> without necessarily
>>>>>>>> using 32MiB THP. If we use other sizes, such as 64KiB, perhaps a large
>>>>>>>> pageblock size wouldn't
>>>>>>>> be necessary?
>>>
>>> Do you mean with mTHP? We haven't explored that option.
>> 
>> Yes. Unless your applications have special demands for PMD THPs. 2MB
>> mTHP should work.
>> 
>>>
>>>>>>>
>>>>>>> I think this should work by reducing MAX_PAGE_ORDER like Juan did for
>>>>>>> the experiment. But MAX_PAGE_ORDER is a macro right now, Kconfig needs
>>>>>>> to be changed and kernel needs to be recompiled. Not sure if it is OK
>>>>>>> for Juan's use case.
>>>>>>
>>>
>>> The main goal is to reserve only the necessary CMA memory for the
>>> drivers, which is
>>> usually the same for 4kb and 16kb page size kernels.
>> 
>> Got it. Based on your experiment, you changed MAX_PAGE_ORDER to get the
>> minimal CMA alignment size. Can you deploy that kernel to production?
>> If yes, you can use mTHP instead of PMD THP and still get the CMA
>> alignemnt you want.
>> 
>>>
>>>>>>
>>>>>> IIRC, we set pageblock size == THP size because this is the granularity
>>>>>> we want to optimize defragmentation for. ("try keep pageblock
>>>>>> granularity of the same memory type: movable vs. unmovable")
>>>>>
>>>>> Right. In past, it is optimized for PMD THP. Now we have mTHP. If user
>>>>> does not care about PMD THP (32MB in ARM64 16KB base page case) and mTHP
>>>>> (2MB mTHP here) is good enough, reducing pageblock size works.
>>>>>
>>>>>>
>>>>>> However, the buddy already supports having different pagetypes for large
>>>>>> allocations.
>>>>>
>>>>> Right. To be clear, only MIGRATE_UNMOVABLE, MIGRATE_RECLAIMABLE, and
>>>>> MIGRATE_MOVABLE can be merged.
>>>>
>>>> Yes! An a THP cannot span partial MIGRATE_CMA, which would be fine.
>>>>
>>>>>
>>>>>>
>>>>>> So we could leave MAX_ORDER alone and try adjusting the pageblock size
>>>>>> in these setups. pageblock size is already variable on some
>>>>>> architectures IIRC.
>>>>>
>>>
>>> Which values would work for the CMA_MIN_ALIGNMENT_BYTES macro? In the
>>> 16KiB page size kernel,
>>> I tried these 2 configurations:
>>>
>>> #define CMA_MIN_ALIGNMENT_BYTES (2048 * CMA_MIN_ALIGNMENT_PAGES)
>>>
>>> and
>>>
>>> #define CMA_MIN_ALIGNMENT_BYTES (4096 * CMA_MIN_ALIGNMENT_PAGES)
>>>
>>> with both of them, the kernel failed to boot.
>> 
>> CMA_MIN_ALIGNMENT_BYTES needs to be PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES.
>> So you need to adjust CMA_MIN_ALIGNMENT_PAGES, which is set by pageblock
>> size. pageblock size is determined by pageblock order, which is
>> affected by MAX_PAGE_ORDER.
>
> Yes, most importantly we must not exceed MAX_PAGE_ORDER. Going smaller 
> is the common case.
>
>> 
>>>
>>>>> Making pageblock size a boot time variable? We might want to warn
>>>>> sysadmin/user that >pageblock_order THP/mTHP creation will suffer.
>>>>
>>>> Yes, some way to configure it.
>>>>
>>>>>
>>>>>>
>>>>>> We'd only have to check if all of the THP logic can deal with pageblock
>>>>>> size < THP size.
>>>>>
>>>
>>> The reason that THP was disabled in my experiment is because this
>>> assertion failed
>>>
>>> mm/huge_memory.c
>>> /*
>>> * hugepages can't be allocated by the buddy allocator
>>> */
>>> MAYBE_BUILD_BUG_ON(HPAGE_PMD_ORDER > MAX_PAGE_ORDER);
>>>
>>> when
>>>
>>>      config ARCH_FORCE_MAX_ORDER
>>>          int
>>>          .....
>>>          default "8" if ARM64_16K_PAGES
>>>
>> 
>> You can remove that BUILD_BUG_ON and turn on mTHP and see if mTHP works.
>> 
>>>
>>>>> Probably yes, pageblock should be independent of THP logic, although
>>>>> compaction (used to create THPs) logic is based on pageblock.
>>>>
>>>> Right. As raised in the past, we need a higher level mechanism that
>>>> tries to group pageblocks together during comapction/conversion to limit
>>>> fragmentation on a higher level.
>>>>
>>>> I assume that many use cases would be fine with not using 32MB/512MB
>>>> THPs at all for now -- and instead using 2 MB ones. Of course, for very
>>>> large installations it might be different.
>>>>
>>>>>>
>>>>>> This issue is even more severe on arm64 with 64k (pageblock = 512MiB).
>>>>>
>>>
>>> I agree, and if ARCH_FORCE_MAX_ORDER is configured to the max value we get:
>>>
>>> PAGE_SIZE  | max MAX_PAGE_ORDER | CMA_MIN_ALIGNMENT_BYTES
>>> 4KiB              |                      15                   |  4KiB
>>> * 32KiB = 128MiB
>>> 16KiB            |                      13                   |  16KiB
>>> * 8KiB = 128MiB
>>> 64KiB            |                      13                   |  64KiB
>>> * 8KiB = 512MiB
>>>
>>>>> This is also good for virtio-mem, since the offline memory block size
>>>>> can also be reduced. I remember you complained about it before.
>>>>
>>>> Yes, yes, yes! :)
>>>>
>> 
>> David's proposal should work in general, but will might take non-trivial
>> amount of work:
>> 
>> 1. keep pageblock size always at 4MB for all arch.
>
> My proposal was to leave it unchanged for most archs, but allow for 
> overriding it on aarch64 as a first step.

Got it. Makes sense.

>
> s390x is happy with 1MiB, x86 with 2MiB. It's aarch64 that does 
> questionable things :)
>
> CONFIG_HUGETLB_PAGE_SIZE_VARIABLE already allows for variable 
> pageblock_order. That whole code likely needs some love, but most of it 
> should already be there.
>
>
> In the future, I could imagine just going for a smaller pageblock size 
> on aarch64, and handling fragmentation avoidance for larger THPs (512 
> MiB really is close to 1 GiB on x86) differently, not using pageblocks.

Right. That is what I meant by "...compaction, to work on a different
range, independent of pageblock". But based on Juan's reply[1], 32MB PMD
THP is still needed for the deployment. That means "the future" needs to
be done to fully satisfy Juan's needs. :)

-- 
Best Regards,
Yan, Zi



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel.
  2025-01-22 12:49                       ` Zi Yan
@ 2025-01-22 13:58                         ` David Hildenbrand
  0 siblings, 0 replies; 21+ messages in thread
From: David Hildenbrand @ 2025-01-22 13:58 UTC (permalink / raw)
  To: Zi Yan, Juan Yescas
  Cc: Barry Song, linux-mm, muchun.song, rppt, osalvador, akpm,
	lorenzo.stoakes, Jann Horn, Liam.Howlett, minchan, jaewon31.kim,
	charante, Suren Baghdasaryan, Kalesh Singh, T.J. Mercier,
	Isaac Manjarres, iamjoonsoo.kim, quic_charante

>> In the future, I could imagine just going for a smaller pageblock size
>> on aarch64, and handling fragmentation avoidance for larger THPs (512
>> MiB really is close to 1 GiB on x86) differently, not using pageblocks.
> 
> Right. That is what I meant by "...compaction, to work on a different
> range, independent of pageblock". But based on Juan's reply[1], 32MB PMD
> THP is still needed for the deployment. That means "the future" needs to
> be done to fully satisfy Juan's needs. :)

Yes, larger systems will also want 512 MiB THP.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2025-01-22 13:58 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-17 22:51 mm: CMA reservations require 32MiB alignment in 16KiB page size kernels instead of 8MiB in 4KiB page size kernel Juan Yescas
2025-01-17 22:52 ` Juan Yescas
2025-01-17 23:00   ` Juan Yescas
2025-01-17 23:19     ` Zi Yan
2025-01-19 23:55       ` Barry Song
2025-01-20  0:39         ` Zi Yan
2025-01-20  8:14           ` David Hildenbrand
2025-01-20 15:29             ` Zi Yan
2025-01-20 17:59               ` David Hildenbrand
2025-01-22  2:08                 ` Juan Yescas
2025-01-22  2:24                   ` Zi Yan
2025-01-22  4:06                     ` Juan Yescas
2025-01-22  6:52                       ` Barry Song
2025-01-22  8:04                         ` David Hildenbrand
2025-01-22  8:11                     ` David Hildenbrand
2025-01-22 12:49                       ` Zi Yan
2025-01-22 13:58                         ` David Hildenbrand
2025-01-20  0:17     ` Barry Song
2025-01-20  0:26       ` Zi Yan
2025-01-20  0:38         ` Barry Song
2025-01-20  0:45           ` Zi Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox