From: Dave Airlie <airlied@gmail.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Kairui Song <ryncsn@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: list_lru isolate callback question?
Date: Fri, 6 Jun 2025 08:59:16 +1000 [thread overview]
Message-ID: <CAPM=9tzaB8DBWHegPD-8+iT3S5g0TtGKoOWXp7v9Psbbbr+uBg@mail.gmail.com> (raw)
In-Reply-To: <aEIcit0uqCCNXU-d@dread.disaster.area>
On Fri, 6 Jun 2025 at 08:39, Dave Chinner <david@fromorbit.com> wrote:
>
> On Thu, Jun 05, 2025 at 07:22:23PM +1000, Dave Airlie wrote:
> > On Thu, 5 Jun 2025 at 17:55, Kairui Song <ryncsn@gmail.com> wrote:
> > >
> > > On Thu, Jun 5, 2025 at 10:17 AM Dave Airlie <airlied@gmail.com> wrote:
> > > >
> > > > I've hit a case where I think it might be valuable to have the nid +
> > > > struct memcg for the item being iterated available in the isolate
> > > > callback, I know in theory we should be able to retrieve it from the
> > > > item, but I'm also not convinced we should need to since we have it
> > > > already in the outer function?
> > > >
> > > > typedef enum lru_status (*list_lru_walk_cb)(struct list_head *item,
> > > > struct list_lru_one *list,
> > > > int nid,
> > > > struct mem_cgroup *memcg,
> > > > void *cb_arg);
> > > >
> > >
> > > Hi Dave,
> > >
> > > > It's probably not essential (I think I can get the nid back easily,
> > > > not sure about the memcg yet), but I thought I'd ask if there would be
> > >
> > > If it's a slab object you should be able to get it easily with:
> > > memcg = mem_cgroup_from_slab_obj(item));
> > > nid = page_to_nid(virt_to_page(item));
> > >
> >
> > It's in relation to some work trying to tie GPU system memory
> > allocations into memcg properly,
> >
> > Not slab objects, but I do have pages so I'm using page_to_nid right now,
> > however these pages aren't currently setting p->memcg_data as I don't
> > need that for this, but maybe
> > this gives me a reason to go down that road.
>
> How are you accounting the page to the memcg if the page is not
> marked as owned by as specific memcg?
>
> Are you relying on the page being indexed in a specific list_lru to
> account for the page correcting in reclaim contexts, and that's why
> you need this information in the walk context?
>
> I'd actually like to know more details of the problem you are trying
> to solve - all I've heard is "we're trying to do <something> with
> GPUs and memcgs with list_lrus", but I don't know what it is so I
> can't really give decent feedback on your questions....
>
Big picture problem, GPU drivers do a lot of memory allocations for
userspace applications that historically have not gone via memcg
accounting. This has been pointed out to be bad and should be fixed.
As part of that problem, GPU drivers have the ability to hand out
uncached/writecombined pages to userspace, creating these pages
requires changing attributes and as such is a heavy weight operation
which necessitates page pools. These page pools only currently have a
global shrinker and roll their own NUMA awareness. The
uncached/writecombined memory isn't a core feature of userspace usage
patterns, but since we want to do things right it seems like a good
idea to clean up the space first.
Get proper vmstat/memcg tracking for all allocations done for the GPU,
these can be very large, so I think we should add core mm counters for
them and memcg ones as well, so userspace can see them and make more
educated decisions.
We don't need page level memcg tracking as the pages are all either
allocated to the process as part of a larger buffer object, or the
pages are in the pool which has the memcg info, so we aren't intending
on using __GFP_ACCOUNT at this stage. I also don't really like having
this as part of kmem, these really are userspace only things mostly
and they are mostly used by gpu and userspace.
My rough plan:
1. convert TTM page pools over to list_lru and use a NUMA aware shrinker
2. add global and memcg counters and tracking.
3. convert TTM page pools over to memcg aware shrinker so we get the
proper operation inside a memcg for some niche use cases.
4. Figure out how to deal with memory evictions from VRAM - this is
probably the hardest problem to solve as there is no great policy.
Also handwave shouldn't this all be folios at some point.
>
> The walk function is passed a struct list_lru_one. If there is a
> need to get the {nid,memcg} of the objects efficiently from walk
> contexts, then we should encode them into the struct list_lru_one
> at init time and retreive them from there.
Oh interesting, that might also be a decent option.
Dave.
next prev parent reply other threads:[~2025-06-05 22:59 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-05 2:16 Dave Airlie
2025-06-05 7:55 ` Kairui Song
2025-06-05 9:22 ` Dave Airlie
2025-06-05 13:53 ` Matthew Wilcox
2025-06-05 20:59 ` Dave Airlie
2025-06-05 22:39 ` Dave Chinner
2025-06-05 22:59 ` Dave Airlie [this message]
2025-06-10 22:44 ` Dave Chinner
2025-06-11 1:40 ` Dave Airlie
2025-06-10 23:07 ` Balbir Singh
2025-06-11 1:43 ` Dave Airlie
2025-06-11 22:34 ` Balbir Singh
2025-06-11 3:36 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPM=9tzaB8DBWHegPD-8+iT3S5g0TtGKoOWXp7v9Psbbbr+uBg@mail.gmail.com' \
--to=airlied@gmail.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=ryncsn@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox