* Next steps towards shrinking stuct page
@ 2024-09-25 17:39 Matthew Wilcox
2024-10-08 14:16 ` David Hildenbrand
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-09-25 17:39 UTC (permalink / raw)
To: linux-mm; +Cc: Yu Zhao
Yu Zhao asked what useful steps he could take towards shrinking struct
page. I outlined them in the THP Cabal meeting today, and here's the
"next level" of detail.
As mentioned in https://kernelnewbies.org/MatthewWilcox/Memdescs/Path
we want to get struct page from 64 to 32 bytes, at least for common
configurations.
I'm currently working on getting rid of references to page->index and
page->mapping. I have some commits that I haven't sent out yet (we're
still in the merge window, after all). So here are some things other
people could do:
1. KMSAN
Someone needs to figure out if KMSAN is really per-page or
per-allocation. Do kmsan_shadow and kmsan_origin need to live in
struct page, or do they need to live in each memdesc? And do we care
about KMSAN overhead for production builds? (does Android turn it on?)
2. Bump allocator
This chunk needs to be pulled out of struct page:
struct { /* page_pool used by netstack */
/**
* @pp_magic: magic value to avoid recycling non
* page_pool allocated pages.
*/
unsigned long pp_magic;
struct page_pool *pp;
unsigned long _pp_mapping_pad;
unsigned long dma_addr;
atomic_long_t pp_ref_count;
};
Similarly to struct slab, it needs to become its own struct. DavidH
and I talked about it at Plumbers and decided that it needs to be:
struct bump {
unsigned long _page_flags;
unsigned long bump_magic;
struct page_pool *bump_pp;
unsigned long _page_mapping_pad;
unsigned long dma_addr;
atomic_long_t bump_ref_count;
unsigned int _page_type;
atomic_t _refcount;
};
Some collaboration with the networking people would be good here, since
they're the primary user today. In particular Ilias Apalodimas has a
lot of history with this code.
I did some work on this (at the time I called it netmem; the networking
people have subsequently used the name netmem for their own purposes)
https://lore.kernel.org/linux-mm/20230111042214.907030-1-willy@infradead.org/
3. Reviewing the zpdesc series
https://lore.kernel.org/lkml/20240902072136.578720-1-alexs@kernel.org/
is the latest version
4. Make sure that we're good with memcg_data. I think we are (it's only
used in folios and slabs today, I _think_), but it'll be good to be
sure. Someone who understands memcg better than I do can probably find
some stuff to clean up.
There are probably other things that can be done to move us towards a
shrunken page, but these are the ones which come to mind. Happy to
elaborate on any points.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Next steps towards shrinking stuct page
2024-09-25 17:39 Next steps towards shrinking stuct page Matthew Wilcox
@ 2024-10-08 14:16 ` David Hildenbrand
2024-10-18 10:46 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2024-10-08 14:16 UTC (permalink / raw)
To: Matthew Wilcox, linux-mm; +Cc: Yu Zhao
>
> 4. Make sure that we're good with memcg_data. I think we are (it's only
> used in folios and slabs today, I _think_), but it'll be good to be
> sure. Someone who understands memcg better than I do can probably find
> some stuff to clean up.
Last time I looked at this (1month ago), I had the same impression.
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Next steps towards shrinking stuct page
2024-10-08 14:16 ` David Hildenbrand
@ 2024-10-18 10:46 ` Matthew Wilcox
2024-10-18 11:07 ` David Hildenbrand
0 siblings, 1 reply; 5+ messages in thread
From: Matthew Wilcox @ 2024-10-18 10:46 UTC (permalink / raw)
To: David Hildenbrand; +Cc: linux-mm, Yu Zhao
On Tue, Oct 08, 2024 at 04:16:39PM +0200, David Hildenbrand wrote:
> >
> > 4. Make sure that we're good with memcg_data. I think we are (it's only
> > used in folios and slabs today, I _think_), but it'll be good to be
> > sure. Someone who understands memcg better than I do can probably find
> > some stuff to clean up.
>
> Last time I looked at this (1month ago), I had the same impression.
I took a shot at this. The problem is mm/page_alloc.c:
__alloc_pages_noprof:
if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
__free_pages(page, order);
page = NULL;
}
I think someone (maybe Yosry?) tried to explain this problem to me at
LSFMM, but I didn't understand.
Most of the places that use GFP_KERNEL_ACCOUNT are either page tables
or kmalloc/kvmalloc. But I think we're going to need a new memdesc
type for accounted memory.
struct accounted_mem {
unsigned long flags;
struct obj_cgroup *objcg;
};
(we'll be able to drop MEMCG_DATA_* once we have the memdesc type, but
we need to keep it until we're there).
So I guess that's my next step -- adding:
struct accounted_mem {
unsigned long flags;
unsigned long padding[5];
unsigned int padding2[2];
unsigned long objcg;
};
and the various assertions that objcg & flags occupy the same bytes in
accounted_mem as they do in page, until we separate them entirely.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Next steps towards shrinking stuct page
2024-10-18 10:46 ` Matthew Wilcox
@ 2024-10-18 11:07 ` David Hildenbrand
2024-10-18 12:44 ` Matthew Wilcox
0 siblings, 1 reply; 5+ messages in thread
From: David Hildenbrand @ 2024-10-18 11:07 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: linux-mm, Yu Zhao
On 18.10.24 12:46, Matthew Wilcox wrote:
> On Tue, Oct 08, 2024 at 04:16:39PM +0200, David Hildenbrand wrote:
>>>
>>> 4. Make sure that we're good with memcg_data. I think we are (it's only
>>> used in folios and slabs today, I _think_), but it'll be good to be
>>> sure. Someone who understands memcg better than I do can probably find
>>> some stuff to clean up.
>>
>> Last time I looked at this (1month ago), I had the same impression.
>
> I took a shot at this. The problem is mm/page_alloc.c:
>
> __alloc_pages_noprof:
> if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
> unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> __free_pages(page, order);
> page = NULL;
> }
>
Right, I recall that we are only touching the first page. IIRC, it's not
necessarily a compound page.
> I think someone (maybe Yosry?) tried to explain this problem to me at
> LSFMM, but I didn't understand.
>
> Most of the places that use GFP_KERNEL_ACCOUNT are either page tables
> or kmalloc/kvmalloc. But I think we're going to need a new memdesc
> type for accounted memory.
>
> struct accounted_mem {
> unsigned long flags;
> struct obj_cgroup *objcg;
> };
>
> (we'll be able to drop MEMCG_DATA_* once we have the memdesc type, but
> we need to keep it until we're there).
>
> So I guess that's my next step -- adding:
>
> struct accounted_mem {
> unsigned long flags;
> unsigned long padding[5];
> unsigned int padding2[2];
> unsigned long objcg;
> };
>
> and the various assertions that objcg & flags occupy the same bytes in
> accounted_mem as they do in page, until we separate them entirely.
But how to do that without a compound page?
--
Cheers,
David / dhildenb
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: Next steps towards shrinking stuct page
2024-10-18 11:07 ` David Hildenbrand
@ 2024-10-18 12:44 ` Matthew Wilcox
0 siblings, 0 replies; 5+ messages in thread
From: Matthew Wilcox @ 2024-10-18 12:44 UTC (permalink / raw)
To: David Hildenbrand; +Cc: linux-mm, Yu Zhao
On Fri, Oct 18, 2024 at 01:07:43PM +0200, David Hildenbrand wrote:
> On 18.10.24 12:46, Matthew Wilcox wrote:
> > On Tue, Oct 08, 2024 at 04:16:39PM +0200, David Hildenbrand wrote:
> > > >
> > > > 4. Make sure that we're good with memcg_data. I think we are (it's only
> > > > used in folios and slabs today, I _think_), but it'll be good to be
> > > > sure. Someone who understands memcg better than I do can probably find
> > > > some stuff to clean up.
> > >
> > > Last time I looked at this (1month ago), I had the same impression.
> >
> > I took a shot at this. The problem is mm/page_alloc.c:
> >
> > __alloc_pages_noprof:
> > if (memcg_kmem_online() && (gfp & __GFP_ACCOUNT) && page &&
> > unlikely(__memcg_kmem_charge_page(page, gfp, order) != 0)) {
> > __free_pages(page, order);
> > page = NULL;
> > }
> >
>
> Right, I recall that we are only touching the first page. IIRC, it's not
> necessarily a compound page.
That fits with how split_page_memcg() operates.
> > struct accounted_mem {
> > unsigned long flags;
> > struct obj_cgroup *objcg;
> > };
[...]
> > So I guess that's my next step -- adding:
> >
> > struct accounted_mem {
> > unsigned long flags;
> > unsigned long padding[5];
> > unsigned int padding2[2];
> > unsigned long objcg;
> > };
> >
> > and the various assertions that objcg & flags occupy the same bytes in
> > accounted_mem as they do in page, until we separate them entirely.
>
> But how to do that without a compound page?
The first step is just an exercise in typing. We'll use the exact same
bits for the exact same purpose, just eliminating all references to
page->memcg_data. Later, we'll progress to separately allocating a
16-byte accounted_mem.
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2024-10-18 12:44 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-25 17:39 Next steps towards shrinking stuct page Matthew Wilcox
2024-10-08 14:16 ` David Hildenbrand
2024-10-18 10:46 ` Matthew Wilcox
2024-10-18 11:07 ` David Hildenbrand
2024-10-18 12:44 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox