From: Balbir Singh <balbirs@nvidia.com>
To: Dave Airlie <airlied@gmail.com>, Dave Chinner <david@fromorbit.com>
Cc: Kairui Song <ryncsn@gmail.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Linux Memory Management List <linux-mm@kvack.org>
Subject: Re: list_lru isolate callback question?
Date: Wed, 11 Jun 2025 09:07:05 +1000 [thread overview]
Message-ID: <206e24d8-9027-4a14-9e9a-710bc57921d6@nvidia.com> (raw)
In-Reply-To: <CAPM=9tzaB8DBWHegPD-8+iT3S5g0TtGKoOWXp7v9Psbbbr+uBg@mail.gmail.com>
On 6/6/25 08:59, Dave Airlie wrote:
> On Fri, 6 Jun 2025 at 08:39, Dave Chinner <david@fromorbit.com> wrote:
>>
>> On Thu, Jun 05, 2025 at 07:22:23PM +1000, Dave Airlie wrote:
>>> On Thu, 5 Jun 2025 at 17:55, Kairui Song <ryncsn@gmail.com> wrote:
>>>>
>>>> On Thu, Jun 5, 2025 at 10:17 AM Dave Airlie <airlied@gmail.com> wrote:
>>>>>
>>>>> I've hit a case where I think it might be valuable to have the nid +
>>>>> struct memcg for the item being iterated available in the isolate
>>>>> callback, I know in theory we should be able to retrieve it from the
>>>>> item, but I'm also not convinced we should need to since we have it
>>>>> already in the outer function?
>>>>>
>>>>> typedef enum lru_status (*list_lru_walk_cb)(struct list_head *item,
>>>>> struct list_lru_one *list,
>>>>> int nid,
>>>>> struct mem_cgroup *memcg,
>>>>> void *cb_arg);
>>>>>
>>>>
>>>> Hi Dave,
>>>>
>>>>> It's probably not essential (I think I can get the nid back easily,
>>>>> not sure about the memcg yet), but I thought I'd ask if there would be
>>>>
>>>> If it's a slab object you should be able to get it easily with:
>>>> memcg = mem_cgroup_from_slab_obj(item));
>>>> nid = page_to_nid(virt_to_page(item));
>>>>
>>>
>>> It's in relation to some work trying to tie GPU system memory
>>> allocations into memcg properly,
>>>
>>> Not slab objects, but I do have pages so I'm using page_to_nid right now,
>>> however these pages aren't currently setting p->memcg_data as I don't
>>> need that for this, but maybe
>>> this gives me a reason to go down that road.
>>
>> How are you accounting the page to the memcg if the page is not
>> marked as owned by as specific memcg?
>>
>> Are you relying on the page being indexed in a specific list_lru to
>> account for the page correcting in reclaim contexts, and that's why
>> you need this information in the walk context?
>>
>> I'd actually like to know more details of the problem you are trying
>> to solve - all I've heard is "we're trying to do <something> with
>> GPUs and memcgs with list_lrus", but I don't know what it is so I
>> can't really give decent feedback on your questions....
>>
>
> Big picture problem, GPU drivers do a lot of memory allocations for
> userspace applications that historically have not gone via memcg
> accounting. This has been pointed out to be bad and should be fixed.
>
> As part of that problem, GPU drivers have the ability to hand out
> uncached/writecombined pages to userspace, creating these pages
> requires changing attributes and as such is a heavy weight operation
> which necessitates page pools. These page pools only currently have a
> global shrinker and roll their own NUMA awareness. The
> uncached/writecombined memory isn't a core feature of userspace usage
> patterns, but since we want to do things right it seems like a good
> idea to clean up the space first.
>
> Get proper vmstat/memcg tracking for all allocations done for the GPU,
> these can be very large, so I think we should add core mm counters for
> them and memcg ones as well, so userspace can see them and make more
> educated decisions.
>
> We don't need page level memcg tracking as the pages are all either
> allocated to the process as part of a larger buffer object, or the
> pages are in the pool which has the memcg info, so we aren't intending
> on using __GFP_ACCOUNT at this stage. I also don't really like having
> this as part of kmem, these really are userspace only things mostly
> and they are mostly used by gpu and userspace.
>
> My rough plan:
> 1. convert TTM page pools over to list_lru and use a NUMA aware shrinker
> 2. add global and memcg counters and tracking.
> 3. convert TTM page pools over to memcg aware shrinker so we get the
> proper operation inside a memcg for some niche use cases.
> 4. Figure out how to deal with memory evictions from VRAM - this is
> probably the hardest problem to solve as there is no great policy.
>
> Also handwave shouldn't this all be folios at some point.
>
The key requirements for memcg would be to track the mm on whose behalf
the allocation was made.
kmemcg (__GFP_ACCOUNT) tracks only kernel
allocations (meant for kernel overheads), we don't really need it and
you've already mentioned this.
For memcg evictions reference count and reclaim is used today, I guess
in #4, you are referring to getting that information for VRAM?
Is the overall goal to overcommit VRAM or to restrict the amount of
VRAM usage or a combination of bith?
Balbir Singh
next prev parent reply other threads:[~2025-06-10 23:07 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-05 2:16 Dave Airlie
2025-06-05 7:55 ` Kairui Song
2025-06-05 9:22 ` Dave Airlie
2025-06-05 13:53 ` Matthew Wilcox
2025-06-05 20:59 ` Dave Airlie
2025-06-05 22:39 ` Dave Chinner
2025-06-05 22:59 ` Dave Airlie
2025-06-10 22:44 ` Dave Chinner
2025-06-11 1:40 ` Dave Airlie
2025-06-10 23:07 ` Balbir Singh [this message]
2025-06-11 1:43 ` Dave Airlie
2025-06-11 22:34 ` Balbir Singh
2025-06-11 3:36 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=206e24d8-9027-4a14-9e9a-710bc57921d6@nvidia.com \
--to=balbirs@nvidia.com \
--cc=airlied@gmail.com \
--cc=david@fromorbit.com \
--cc=hannes@cmpxchg.org \
--cc=linux-mm@kvack.org \
--cc=ryncsn@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox