From: Dave Airlie <airlied@gmail.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Maxime Ripard <mripard@redhat.com>,
"T.J. Mercier" <tjmercier@google.com>,
Eric Chanudet <echanude@redhat.com>,
Sumit Semwal <sumit.semwal@linaro.org>,
Benjamin Gaignard <benjamin.gaignard@collabora.com>,
Brian Starkey <Brian.Starkey@arm.com>,
John Stultz <jstultz@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
David Hildenbrand <david@kernel.org>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
"Liam R. Howlett" <Liam.Howlett@oracle.com>,
Vlastimil Babka <vbabka@suse.cz>,
Mike Rapoport <rppt@kernel.org>,
Suren Baghdasaryan <surenb@google.com>,
Michal Hocko <mhocko@suse.com>,
linux-media@vger.kernel.org, dri-devel@lists.freedesktop.org,
linaro-mm-sig@lists.linaro.org, linux-kernel@vger.kernel.org,
Albert Esteve <aesteve@redhat.com>,
linux-mm@kvack.org, Yosry Ahmed <yosryahmed@google.com>,
Shakeel Butt <shakeel.butt@linux.dev>
Subject: Re: [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting
Date: Mon, 2 Mar 2026 15:26:05 +1000 [thread overview]
Message-ID: <CAPM=9twnKZYOGchQ0cziSt5yUQxCXNWoKyBiQib2XWvkMiN=GA@mail.gmail.com> (raw)
In-Reply-To: <d1b287c9-46ff-4345-a410-7e1cfefb5c66@amd.com>
On Thu, 26 Feb 2026 at 21:32, Christian König <christian.koenig@amd.com> wrote:
>
> On 2/26/26 00:43, Dave Airlie wrote:
> >>>>
> >>>> Using module parameters to enable/disable it globally is just a
> >>>> workaround as far as I can see.
> >>>
> >>> That's a pretty good idea! It would indeed be a solution that could
> >>> satisfy everyone (I assume?).
> >>
> >> I think so yeah.
> >>
> >> From what I have seen we have three different use cases:
> >>
> >> 1. local device memory (VRAM), GTT/CMA and memcg are completely separate domains and you want to have completely separate values as limit for them.
> >>
> >> 2. local device memory (VRAM) is separate. GTT/CMA are accounted to memcg, you can still have separate values as limit so that nobody over allocates CMA (for example).
> >>
> >> 3. All three are accounted to memcg because system memory is actually used as fallback if applications over allocate device local memory.
> >>
> >> It's debatable what should be the default, but we clearly need to handle all three use cases. Potentially even on the same system.
> >
> >
> > Give me cases where 1 or 3 actually make sense in the real world.
> >
> > I can maybe take 1 if CMA is just old school CMA carved out preboot so
> > it's not in the main memory pool, but in that case it's just equiv to
> > device memory really
>
> Well I think #1 is pretty much the default for dGPUs on a desktop. That's why I mentioned it first.
But I don't think it's what we would want, if someone allocate a
system memory object then we should memcg account it. But in this
scenario it's where we really have to face eviction, and maybe in this
scenarios it makes sense to state that we need to reserve memcg space
for swapping objects, both out of VRAM and into swap itself.
I'm starting to think there isn't another good way to deal with
dynamic power and suspend/resume if we don't have some accounting for
moving objects out of VRAM into system memory, it's just whether we
can do something special to account for it, but not destroy the
process on behalf of another process doing the wrong thing.
>
> > If something is in the main memory pool, it should be accounted for
> > using memcg. You cannot remove memory from the main memory pool
> > without accounting for it.
>
> That's what I'm strongly disagreeing on. See the page cache is not accounted to memcg either, so when you open a file and the kernel caches the backing pages that doesn't reduce the amount you can allocate through malloc, doesn't it?
So the page cache is accounted according to Shakeel, so can we find
some other example. I really think this is a bad idea, partitioning a
single resource into two competing pools isn't going to work that
well.
>
> In other words system memory becomes the swap of device local memory. Just think about why memcg doesn't limits swap but only how much is swapped out.
But we still need swap for system memory as well, but there are
systems with no swap configured, and on those I think we need to be
integrated with memcg anyways to make it work.
> For those use cases you want to have a hard static limit on how much system memory can be used as swap. That's why we originally used to have the per driver gttsize, the global TTM page limit etc...
>
> The problem is that we weakened those limitations because of the APU use case and that in turn resulted in all those problems with browsers over allocating system memory etc....
>
> Now cgroups should provide an alternative and I still think that this is the right approach to solve this, but in this alternative I think we want to preserve the original idea of separate domains for dGPUs.
>
> > Now we can add gpu limits to memcg, that
> > was going to me a next step in my series.
> >
> > Whether we have that as a percentage or a hard limit, we would just
> > say GPU can consume 95% of the configured max for this cgroup.
>
> That is only useful on APUs which don't have local memory because those make all of their allocations through system memory.
>
> dGPUs should be much more limited in that regard.
So you think we should limit the system memory allocations on dGPU.
I'm worried about GTT|VRAM allocations which once evicted, there might
be no reason to push back into VRAM and that ending up as a backdoor
to allocating a lot of system memory and bypassing memcg. I don't
really like the idea of bypassing memcg at all.
>
> > 3 to me just sounds like we haven't figured out fallback or
> > suspend/resume accounting yet, which is true, but I'm not sure there
> > is a reason for 3 to exist outside of the we don't know how to account
> > for temporary storage of swapped out VRAM objects.
>
> Mario has fixed or is at least working on the suspend/resume problems. So I don't consider that an issue any more.
>
> The use case 3 happens on HPC systems where device local memory is basically just a cache. For example this one here: https://en.wikipedia.org/wiki/Frontier_(supercomputer)
>
> In this use case you don't care if a buffer is in device local memory or system memory, what you care about is that things are reliable and for that your task at hand shouldn't exceeds a certain limit.
>
> E.g. you run computation A which can use 100GB of resources and when computation B starts concurrently you don't want A to suddenly fail because it now fights with B for resources.
>
> > Like it might be we need to have it so we have a limited transfer pool
> > of system memory for VRAM objects to "live in" but we move them to
> > swap as soon as possible once we get to the limit on that. Now what we
> > do on systems where no swap is available, that gets into I've no idea
> > space.
> >
> > Static partitioning memcg up into a dmem and memcg isn't going to
> > solve this, we should solve it inside memcg.
>
> Well it's certainly possible to solve all of this in memcg, but I don't think it's very elegant.
>
> Static partitioning between memcg and dmeme for the dGPU case and merged accounting for the APU case by default and then giving the system administrator to eventually switch to use case 3 sounds much more flexible to me.
>
> At least the obvious advantage is that you don't start to add module parameters to TTM, DMA-buf heaps and drivers if they should or should not account to memcg, but rather keep all the logic inside cgroups.
I don't think we should have to static partition at all here, it's
just asking for problems later, and it without proper accounting will
cause a bunch of reclaim unnecessarily.
Dave.
next prev parent reply other threads:[~2026-03-02 5:26 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-18 17:14 Eric Chanudet
2026-02-18 17:14 ` [PATCH v2 1/3] cma: Register dmem region for each cma region Eric Chanudet
2026-02-18 17:14 ` [PATCH v2 2/3] cma: Provide accessor to cma dmem region Eric Chanudet
2026-02-18 17:14 ` [PATCH v2 3/3] dma-buf: heaps: cma: charge each cma heap's dmem Eric Chanudet
2026-02-19 7:17 ` Christian König
2026-02-19 17:10 ` Eric Chanudet
2026-02-20 8:16 ` Christian König
2026-02-23 16:14 ` Eric Chanudet
2026-02-19 9:16 ` Maxime Ripard
2026-02-19 17:21 ` Eric Chanudet
2026-02-19 9:45 ` [PATCH v2 0/3] dma-buf: heaps: cma: enable dmem cgroup accounting Albert Esteve
2026-02-20 1:14 ` T.J. Mercier
2026-02-20 9:45 ` Christian König
2026-02-23 19:39 ` Eric Chanudet
2026-02-24 9:43 ` Maxime Ripard
2026-02-24 10:32 ` Christian König
2026-02-25 23:43 ` Dave Airlie
2026-02-26 11:32 ` Christian König
2026-02-26 13:45 ` Shakeel Butt
2026-03-02 5:26 ` Dave Airlie [this message]
2026-02-24 9:42 ` Maxime Ripard
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAPM=9twnKZYOGchQ0cziSt5yUQxCXNWoKyBiQib2XWvkMiN=GA@mail.gmail.com' \
--to=airlied@gmail.com \
--cc=Brian.Starkey@arm.com \
--cc=Liam.Howlett@oracle.com \
--cc=aesteve@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=benjamin.gaignard@collabora.com \
--cc=christian.koenig@amd.com \
--cc=david@kernel.org \
--cc=dri-devel@lists.freedesktop.org \
--cc=echanude@redhat.com \
--cc=jstultz@google.com \
--cc=linaro-mm-sig@lists.linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-media@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=mhocko@suse.com \
--cc=mripard@redhat.com \
--cc=rppt@kernel.org \
--cc=shakeel.butt@linux.dev \
--cc=sumit.semwal@linaro.org \
--cc=surenb@google.com \
--cc=tjmercier@google.com \
--cc=vbabka@suse.cz \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox