linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* CMA, memdescs and folios
@ 2024-02-27 22:02 Matthew Wilcox
  2024-03-05  5:30 ` Aneesh Kumar K V
  2024-03-05 13:20 ` David Hildenbrand
  0 siblings, 2 replies; 3+ messages in thread
From: Matthew Wilcox @ 2024-02-27 22:02 UTC (permalink / raw)
  To: Marek Szyprowski, Michal Nazarewicz, Aneesh Kumar K.V, Joonsoo Kim
  Cc: Jianfeng Wang, linux-mm

It may be helpful to look at

https://kernelnewbies.org/MatthewWilcox/Memdescs

I don't yet have a plan for what CMA should look like in the memdesc
future.  Partly I just don't know CMA very well.  Some help would
be appreciated ...

First, I'm pretty sure that cma allocations are freed as a single
unit; there's no intended support for "allocate 2000MB from CMA, free
500MB-1500MB, use the first 500MB for one thing and the last 500MB for
something else".  Right?

Second, CMA doesn't actually grub around inside struct page itself,
so it has no dependencies on what struct page contains.  Is that true?

Third, I don't see where CMA manipulates the page refcount today.
Does it rely on somebody else setting the page refcount to 1 before
giving the pages to CMA?

Fourth, do users of CMA rely on pages being individually refcounted?
Is there a reason you've never implemented an equivalent to __GFP_COMP
before?

---

My strawman proposal is that, in a memdesc world, the individual pages
that are free within CMA get a type 0 subtype to make them readily
identifiable in memory dumps.  At allocation time, the caller will pass
in a memdesc to manage the pages (and CMA will assign it to all the pages,
just like the BuddyAllocator will).

As a step towards that, we can change CMA soon to return pages which have
a zero refcount.  That should catch any users which rely on individual
refcounts.


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CMA, memdescs and folios
  2024-02-27 22:02 CMA, memdescs and folios Matthew Wilcox
@ 2024-03-05  5:30 ` Aneesh Kumar K V
  2024-03-05 13:20 ` David Hildenbrand
  1 sibling, 0 replies; 3+ messages in thread
From: Aneesh Kumar K V @ 2024-03-05  5:30 UTC (permalink / raw)
  To: Matthew Wilcox, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Joonsoo Kim, npiggin
  Cc: Jianfeng Wang, linux-mm

On 2/28/24 3:32 AM, Matthew Wilcox wrote:
> It may be helpful to look at
> 
> https://kernelnewbies.org/MatthewWilcox/Memdescs
> 
> I don't yet have a plan for what CMA should look like in the memdesc
> future.  Partly I just don't know CMA very well.  Some help would
> be appreciated ...
> 
> First, I'm pretty sure that cma allocations are freed as a single
> unit; there's no intended support for "allocate 2000MB from CMA, free
> 500MB-1500MB, use the first 500MB for one thing and the last 500MB for
> something else".  Right?
> 
> Second, CMA doesn't actually grub around inside struct page itself,
> so it has no dependencies on what struct page contains.  Is that true?
> 
> Third, I don't see where CMA manipulates the page refcount today.
> Does it rely on somebody else setting the page refcount to 1 before
> giving the pages to CMA?
> 

 isolate_freepages_range -> split_map_pages -> post_alloc_hook -> set_page_refcounted 

> Fourth, do users of CMA rely on pages being individually refcounted?
> Is there a reason you've never implemented an equivalent to __GFP_COMP
> before?

for powerpc kvm hash page table usage (kvm_alloc_hpt_cma()/kvm_free_hpt_cma())
I guess we don't expect them to be individually refcounted.

> 
> ---
> 
> My strawman proposal is that, in a memdesc world, the individual pages
> that are free within CMA get a type 0 subtype to make them readily
> identifiable in memory dumps.  At allocation time, the caller will pass
> in a memdesc to manage the pages (and CMA will assign it to all the pages,
> just like the BuddyAllocator will).
> 
> As a step towards that, we can change CMA soon to return pages which have
> a zero refcount.  That should catch any users which rely on individual
> refcounts.



^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: CMA, memdescs and folios
  2024-02-27 22:02 CMA, memdescs and folios Matthew Wilcox
  2024-03-05  5:30 ` Aneesh Kumar K V
@ 2024-03-05 13:20 ` David Hildenbrand
  1 sibling, 0 replies; 3+ messages in thread
From: David Hildenbrand @ 2024-03-05 13:20 UTC (permalink / raw)
  To: Matthew Wilcox, Marek Szyprowski, Michal Nazarewicz,
	Aneesh Kumar K.V, Joonsoo Kim
  Cc: Jianfeng Wang, linux-mm

On 27.02.24 23:02, Matthew Wilcox wrote:
> It may be helpful to look at
> 
> https://kernelnewbies.org/MatthewWilcox/Memdescs
> 
> I don't yet have a plan for what CMA should look like in the memdesc
> future.  Partly I just don't know CMA very well.  Some help would
> be appreciated ...
> 
> First, I'm pretty sure that cma allocations are freed as a single
> unit; there's no intended support for "allocate 2000MB from CMA, free
> 500MB-1500MB, use the first 500MB for one thing and the last 500MB for
> something else".  Right?

hugetlb uses CMA to allocate gigantic folios.

I think we can demote them (e.g., 1 GiB -> 512 x 2 MiB), and then free 
individual demoted (is that a word?) folios back via CMA.

At least, there is a comment in __update_and_free_hugetlb_folio():

"Non-gigantic pages demoted from CMA allocated gigantic pages need to be 
given back to CMA in free_gigantic_folio."


And in fact, hugetlb_cma_reserve() prepares exactly for that: it sets 
the CMA granularity to the smallest folio size we could get after demoting.

"Note that 'order per bit' is based on smallest size that may be 
returned to CMA allocator in the case of huge page demotion."

I suspect there are other such users that free in different granularity 
than they allocated.

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2024-03-05 13:21 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-02-27 22:02 CMA, memdescs and folios Matthew Wilcox
2024-03-05  5:30 ` Aneesh Kumar K V
2024-03-05 13:20 ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox