[LSF/MM/BPF TOPIC] CMA reservation optimizations

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [LSF/MM/BPF TOPIC] CMA reservation optimizations
@ 2025-01-28  1:04 Juan Yescas
  2025-01-28  9:58 ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Juan Yescas @ 2025-01-28  1:04 UTC (permalink / raw)
  To: lsf-pc, Suren Baghdasaryan, linux-mm, Kalesh Singh,
	Isaac Manjarres, T.J. Mercier, Zi Yan, Barry Song,
	David Hildenbrand

Hi LSF organizers,

I would like to continue discussing this topic with the mm community:

"CMA reservation optimizations"

Note: There is already an email in the linux-mm mailing list that is
discussing this issue. The title is:

"CMA reservations require 32MiB alignment in 16KiB page size kernels
instead of 8MiB in 4KiB page size kernel"

Background

When the drivers reserve CMA memory in 16KiB kernels, the minimum
alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
kernels, the CMA alignment is 4MiB.

This is forcing the drivers to reserve more memory than required in
16KiB kernels,
even if they only require 4MiB or 8MiB.

reserved-memory {
      #address-cells = <2>;
      #size-cells = <2>;
      ranges;
      tpu_cma_reserve: tpu_cma_reserve {
            compatible = "shared-dma-pool";
            reusable;
           size = <0x0 0x2000000>; /* 32 MiB */
}

Thanks
Juan

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CMA reservation optimizations
  2025-01-28  1:04 [LSF/MM/BPF TOPIC] CMA reservation optimizations Juan Yescas
@ 2025-01-28  9:58 ` David Hildenbrand
  2025-01-28 17:07   ` Juan Yescas
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-01-28  9:58 UTC (permalink / raw)
  To: Juan Yescas, lsf-pc, Suren Baghdasaryan, linux-mm, Kalesh Singh,
	Isaac Manjarres, T.J. Mercier, Zi Yan, Barry Song

On 28.01.25 02:04, Juan Yescas wrote:
> Hi LSF organizers,
> 
> I would like to continue discussing this topic with the mm community:
> 
> "CMA reservation optimizations"
> 
> Note: There is already an email in the linux-mm mailing list that is
> discussing this issue. The title is:
> 
> "CMA reservations require 32MiB alignment in 16KiB page size kernels
> instead of 8MiB in 4KiB page size kernel"
> 
> Background
> 
> When the drivers reserve CMA memory in 16KiB kernels, the minimum
> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
> kernels, the CMA alignment is 4MiB.

I'm curious, here you say 4 MiB, above 8 MiB.

But nowadays it's usually 2 MiB (pageblock size), no?

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CMA reservation optimizations
  2025-01-28  9:58 ` David Hildenbrand
@ 2025-01-28 17:07   ` Juan Yescas
  2025-01-28 18:33     ` David Hildenbrand
  0 siblings, 1 reply; 6+ messages in thread
From: Juan Yescas @ 2025-01-28 17:07 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: lsf-pc, Suren Baghdasaryan, linux-mm, Kalesh Singh,
	Isaac Manjarres, T.J. Mercier, Zi Yan, Barry Song

On Tue, Jan 28, 2025 at 1:58 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 28.01.25 02:04, Juan Yescas wrote:
> > Hi LSF organizers,
> >
> > I would like to continue discussing this topic with the mm community:
> >
> > "CMA reservation optimizations"
> >
> > Note: There is already an email in the linux-mm mailing list that is
> > discussing this issue. The title is:
> >
> > "CMA reservations require 32MiB alignment in 16KiB page size kernels
> > instead of 8MiB in 4KiB page size kernel"
> >
> > Background
> >
> > When the drivers reserve CMA memory in 16KiB kernels, the minimum
> > alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
> > kernels, the CMA alignment is 4MiB.
>
> I'm curious, here you say 4 MiB, above 8 MiB.
>

My bad, it is a typo. I meant 4 MiB.

> But nowadays it's usually 2 MiB (pageblock size), no?

That's right for the case when THPs are enabled in 4KiB page size configs.

#define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)
https://elixir.bootlin.com/linux/v6.13/source/include/linux/pageblock-flags.h#L50

This evals to pageblock_order = min(21 - 12, 10) = 9

#define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages
#define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
https://elixir.bootlin.com/linux/v6.13/source/include/linux/cma.h#L21

CMA_MIN_ALIGNMENT_BYTES = (4096 * 2 ^ 9) = (4096 * 512) = 2097152 = 2 MiB

However, when THPs are disabled, we get:

#define pageblock_order MAX_PAGE_ORDER // 10
https://elixir.bootlin.com/linux/v6.13/source/arch/arm64/Kconfig#L1630
https://elixir.bootlin.com/linux/v6.13/source/include/linux/pageblock-flags.h#L55

CMA_MIN_ALIGNMENT_BYTES = (4096 * 2 ^ 10) = (4096 * 1024) = 4194304 = 4 MiB










>
> --
> Cheers,
>
> David / dhildenb
>


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CMA reservation optimizations
  2025-01-28 17:07   ` Juan Yescas
@ 2025-01-28 18:33     ` David Hildenbrand
  2025-01-28 20:15       ` Zi Yan
  0 siblings, 1 reply; 6+ messages in thread
From: David Hildenbrand @ 2025-01-28 18:33 UTC (permalink / raw)
  To: Juan Yescas
  Cc: lsf-pc, Suren Baghdasaryan, linux-mm, Kalesh Singh,
	Isaac Manjarres, T.J. Mercier, Zi Yan, Barry Song

On 28.01.25 18:07, Juan Yescas wrote:
> On Tue, Jan 28, 2025 at 1:58 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 28.01.25 02:04, Juan Yescas wrote:
>>> Hi LSF organizers,
>>>
>>> I would like to continue discussing this topic with the mm community:
>>>
>>> "CMA reservation optimizations"
>>>
>>> Note: There is already an email in the linux-mm mailing list that is
>>> discussing this issue. The title is:
>>>
>>> "CMA reservations require 32MiB alignment in 16KiB page size kernels
>>> instead of 8MiB in 4KiB page size kernel"
>>>
>>> Background
>>>
>>> When the drivers reserve CMA memory in 16KiB kernels, the minimum
>>> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
>>> kernels, the CMA alignment is 4MiB.
>>
>> I'm curious, here you say 4 MiB, above 8 MiB.
>>
> 
> My bad, it is a typo. I meant 4 MiB.

That makes sense.

> 
>> But nowadays it's usually 2 MiB (pageblock size), no?
> 
> That's right for the case when THPs are enabled in 4KiB page size configs.
> 
> #define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)
> https://elixir.bootlin.com/linux/v6.13/source/include/linux/pageblock-flags.h#L50
> 
> This evals to pageblock_order = min(21 - 12, 10) = 9
> 
> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages
> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
> https://elixir.bootlin.com/linux/v6.13/source/include/linux/cma.h#L21
> 
> CMA_MIN_ALIGNMENT_BYTES = (4096 * 2 ^ 9) = (4096 * 512) = 2097152 = 2 MiB
> 
> However, when THPs are disabled, we get:
> 
> #define pageblock_order MAX_PAGE_ORDER // 10
> https://elixir.bootlin.com/linux/v6.13/source/arch/arm64/Kconfig#L1630
> https://elixir.bootlin.com/linux/v6.13/source/include/linux/pageblock-flags.h#L55
> 
> CMA_MIN_ALIGNMENT_BYTES = (4096 * 2 ^ 10) = (4096 * 1024) = 4194304 = 4 MiB

Right, and it can depend on ARCH_FORCE_MAX_ORDER.

I've been wondering for a while if pageblock_order should nowadays 
default to HPAGE_PMD_ORDER, with the option to make it smaller/larger 
(likely smaller) -- as discussed.

As discussed, the topic you are touching on is also relevant for 
virtio-mem, which can add/remove memory currently in pageblock 
granularity: 512 MiB on arm64 are not particularly helpful. I think we 
could support adding/removing smaller granularity, but it requires a bit 
of work, and always isolating 512MiB worth of pages just to effectively 
allocate e.g., 2 MiB worth of pages is rather suboptimal. Same applies 
to CMA I assume.

So there is more infrastructure that could benefit from pageblocks to 
rather be on the smaller side, even when hugetlb+THP might not be around 
in a config.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CMA reservation optimizations
  2025-01-28 18:33     ` David Hildenbrand
@ 2025-01-28 20:15       ` Zi Yan
  2025-04-02 18:01         ` Juan Yescas
  0 siblings, 1 reply; 6+ messages in thread
From: Zi Yan @ 2025-01-28 20:15 UTC (permalink / raw)
  To: David Hildenbrand
  Cc: Juan Yescas, lsf-pc, Suren Baghdasaryan, linux-mm, Kalesh Singh,
	Isaac Manjarres, T.J. Mercier, Barry Song, Mel Gorman,
	Vlastimil Babka

On 28 Jan 2025, at 13:33, David Hildenbrand wrote:

> On 28.01.25 18:07, Juan Yescas wrote:
>> On Tue, Jan 28, 2025 at 1:58 AM David Hildenbrand <david@redhat.com> wrote:
>>>
>>> On 28.01.25 02:04, Juan Yescas wrote:
>>>> Hi LSF organizers,
>>>>
>>>> I would like to continue discussing this topic with the mm community:
>>>>
>>>> "CMA reservation optimizations"
>>>>
>>>> Note: There is already an email in the linux-mm mailing list that is
>>>> discussing this issue. The title is:
>>>>
>>>> "CMA reservations require 32MiB alignment in 16KiB page size kernels
>>>> instead of 8MiB in 4KiB page size kernel"
>>>>
>>>> Background
>>>>
>>>> When the drivers reserve CMA memory in 16KiB kernels, the minimum
>>>> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
>>>> kernels, the CMA alignment is 4MiB.
>>>
>>> I'm curious, here you say 4 MiB, above 8 MiB.
>>>
>>
>> My bad, it is a typo. I meant 4 MiB.
>
> That makes sense.
>
>>
>>> But nowadays it's usually 2 MiB (pageblock size), no?
>>
>> That's right for the case when THPs are enabled in 4KiB page size configs.
>>
>> #define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER, MAX_PAGE_ORDER)
>> https://elixir.bootlin.com/linux/v6.13/source/include/linux/pageblock-flags.h#L50
>>
>> This evals to pageblock_order = min(21 - 12, 10) = 9
>>
>> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages
>> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
>> https://elixir.bootlin.com/linux/v6.13/source/include/linux/cma.h#L21
>>
>> CMA_MIN_ALIGNMENT_BYTES = (4096 * 2 ^ 9) = (4096 * 512) = 2097152 = 2 MiB
>>
>> However, when THPs are disabled, we get:
>>
>> #define pageblock_order MAX_PAGE_ORDER // 10
>> https://elixir.bootlin.com/linux/v6.13/source/arch/arm64/Kconfig#L1630
>> https://elixir.bootlin.com/linux/v6.13/source/include/linux/pageblock-flags.h#L55
>>
>> CMA_MIN_ALIGNMENT_BYTES = (4096 * 2 ^ 10) = (4096 * 1024) = 4194304 = 4 MiB
>
> Right, and it can depend on ARCH_FORCE_MAX_ORDER.
>
> I've been wondering for a while if pageblock_order should nowadays default to HPAGE_PMD_ORDER, with the option to make it smaller/larger (likely smaller) -- as discussed.
>
> As discussed, the topic you are touching on is also relevant for virtio-mem, which can add/remove memory currently in pageblock granularity: 512 MiB on arm64 are not particularly helpful. I think we could support adding/removing smaller granularity, but it requires a bit of work, and always isolating 512MiB worth of pages just to effectively allocate e.g., 2 MiB worth of pages is rather suboptimal. Same applies to CMA I assume.
>
> So there is more infrastructure that could benefit from pageblocks to rather be on the smaller side, even when hugetlb+THP might not be around in a config.


It is related to anti-fragmentation mechanism in the kernel (Mel and Vlastima
are cc’d, feel free to add more since I must miss others).

If pageblock size is smaller than a PMD THP size, current compaction
code will not be able to efficiently defragment memory for PMD THP creation,
since compaction code works at pageblock granularity. This means we need to
decouple compaction granularity from pageblock. pageblock uses different
migratetypes (UNMOVABLE, MOVABLE, RECLAIMABLE, ...) to help reduce memory
fragmentation by grouping pages by mobility. When compaction works on multiple
pageblocks to generate a PMD THP, it needs all pageblocks within the range
share the same migratetype, otherwise, compacting memory within a region with
MIGRATE_UNMOVABLE would highly likely result in a waste of time, since unmovable
pages just prevent a big free page from creation. So the key question is
how to prevent MIGRATE_UNMOVABLE fragment a PMD THP size pageblock range,
anti-fragmentation for pageblocks?

Do we want to have super-pageblock for that? Or we allow sub-pageblock for
virtio-mem and CMA reservation? That is probably what we want to discuss
on the infrastructure side.


Best Regards,
Yan, Zi


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [LSF/MM/BPF TOPIC] CMA reservation optimizations
  2025-01-28 20:15       ` Zi Yan
@ 2025-04-02 18:01         ` Juan Yescas
  0 siblings, 0 replies; 6+ messages in thread
From: Juan Yescas @ 2025-04-02 18:01 UTC (permalink / raw)
  To: Zi Yan
  Cc: David Hildenbrand, lsf-pc, Suren Baghdasaryan, linux-mm,
	Kalesh Singh, Isaac Manjarres, T.J. Mercier, Barry Song,
	Mel Gorman, Vlastimil Babka


[-- Attachment #1.1: Type: text/plain, Size: 4503 bytes --]

Thanks all for the comments and suggestions.

The slides for my presentation "CMA optimization alignment" are attached.

Thanks,
Juan

On Tue, Jan 28, 2025 at 12:15 PM Zi Yan <ziy@nvidia.com> wrote:

> On 28 Jan 2025, at 13:33, David Hildenbrand wrote:
>
> > On 28.01.25 18:07, Juan Yescas wrote:
> >> On Tue, Jan 28, 2025 at 1:58 AM David Hildenbrand <david@redhat.com>
> wrote:
> >>>
> >>> On 28.01.25 02:04, Juan Yescas wrote:
> >>>> Hi LSF organizers,
> >>>>
> >>>> I would like to continue discussing this topic with the mm community:
> >>>>
> >>>> "CMA reservation optimizations"
> >>>>
> >>>> Note: There is already an email in the linux-mm mailing list that is
> >>>> discussing this issue. The title is:
> >>>>
> >>>> "CMA reservations require 32MiB alignment in 16KiB page size kernels
> >>>> instead of 8MiB in 4KiB page size kernel"
> >>>>
> >>>> Background
> >>>>
> >>>> When the drivers reserve CMA memory in 16KiB kernels, the minimum
> >>>> alignment is 32 MiB as per CMA_MIN_ALIGNMENT_BYTES. However, in 4KiB
> >>>> kernels, the CMA alignment is 4MiB.
> >>>
> >>> I'm curious, here you say 4 MiB, above 8 MiB.
> >>>
> >>
> >> My bad, it is a typo. I meant 4 MiB.
> >
> > That makes sense.
> >
> >>
> >>> But nowadays it's usually 2 MiB (pageblock size), no?
> >>
> >> That's right for the case when THPs are enabled in 4KiB page size
> configs.
> >>
> >> #define pageblock_order MIN_T(unsigned int, HPAGE_PMD_ORDER,
> MAX_PAGE_ORDER)
> >>
> https://elixir.bootlin.com/linux/v6.13/source/include/linux/pageblock-flags.h#L50
> >>
> >> This evals to pageblock_order = min(21 - 12, 10) = 9
> >>
> >> #define CMA_MIN_ALIGNMENT_PAGES pageblock_nr_pages
> >> #define CMA_MIN_ALIGNMENT_BYTES (PAGE_SIZE * CMA_MIN_ALIGNMENT_PAGES)
> >> https://elixir.bootlin.com/linux/v6.13/source/include/linux/cma.h#L21
> >>
> >> CMA_MIN_ALIGNMENT_BYTES = (4096 * 2 ^ 9) = (4096 * 512) = 2097152 = 2
> MiB
> >>
> >> However, when THPs are disabled, we get:
> >>
> >> #define pageblock_order MAX_PAGE_ORDER // 10
> >> https://elixir.bootlin.com/linux/v6.13/source/arch/arm64/Kconfig#L1630
> >>
> https://elixir.bootlin.com/linux/v6.13/source/include/linux/pageblock-flags.h#L55
> >>
> >> CMA_MIN_ALIGNMENT_BYTES = (4096 * 2 ^ 10) = (4096 * 1024) = 4194304 = 4
> MiB
> >
> > Right, and it can depend on ARCH_FORCE_MAX_ORDER.
> >
> > I've been wondering for a while if pageblock_order should nowadays
> default to HPAGE_PMD_ORDER, with the option to make it smaller/larger
> (likely smaller) -- as discussed.
> >
> > As discussed, the topic you are touching on is also relevant for
> virtio-mem, which can add/remove memory currently in pageblock granularity:
> 512 MiB on arm64 are not particularly helpful. I think we could support
> adding/removing smaller granularity, but it requires a bit of work, and
> always isolating 512MiB worth of pages just to effectively allocate e.g., 2
> MiB worth of pages is rather suboptimal. Same applies to CMA I assume.
> >
> > So there is more infrastructure that could benefit from pageblocks to
> rather be on the smaller side, even when hugetlb+THP might not be around in
> a config.
>
>
> It is related to anti-fragmentation mechanism in the kernel (Mel and
> Vlastima
> are cc’d, feel free to add more since I must miss others).
>
> If pageblock size is smaller than a PMD THP size, current compaction
> code will not be able to efficiently defragment memory for PMD THP
> creation,
> since compaction code works at pageblock granularity. This means we need to
> decouple compaction granularity from pageblock. pageblock uses different
> migratetypes (UNMOVABLE, MOVABLE, RECLAIMABLE, ...) to help reduce memory
> fragmentation by grouping pages by mobility. When compaction works on
> multiple
> pageblocks to generate a PMD THP, it needs all pageblocks within the range
> share the same migratetype, otherwise, compacting memory within a region
> with
> MIGRATE_UNMOVABLE would highly likely result in a waste of time, since
> unmovable
> pages just prevent a big free page from creation. So the key question is
> how to prevent MIGRATE_UNMOVABLE fragment a PMD THP size pageblock range,
> anti-fragmentation for pageblocks?
>
> Do we want to have super-pageblock for that? Or we allow sub-pageblock for
> virtio-mem and CMA reservation? That is probably what we want to discuss
> on the infrastructure side.
>
>
> Best Regards,
> Yan, Zi
>

[-- Attachment #1.2: Type: text/html, Size: 6010 bytes --]

[-- Attachment #2: CMA-optimizations-alignment.pdf --]
[-- Type: application/pdf, Size: 575707 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2025-04-02 18:01 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-28  1:04 [LSF/MM/BPF TOPIC] CMA reservation optimizations Juan Yescas
2025-01-28  9:58 ` David Hildenbrand
2025-01-28 17:07   ` Juan Yescas
2025-01-28 18:33     ` David Hildenbrand
2025-01-28 20:15       ` Zi Yan
2025-04-02 18:01         ` Juan Yescas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox