linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: David Airlie <airlied@redhat.com>
Cc: Dave Airlie <airlied@gmail.com>,
	dri-devel@lists.freedesktop.org,
	Matthew Brost <matthew.brost@intel.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2)
Date: Wed, 25 Jun 2025 13:55:00 +0200	[thread overview]
Message-ID: <7dd0885a-7e7c-41a9-ae81-811fc344caf5@amd.com> (raw)
In-Reply-To: <CAMwc25ruHtW165VRuDv5_tjaZGcL5H9CWeTjcCstXK09bDPhdw@mail.gmail.com>

On 24.06.25 03:12, David Airlie wrote:
> On Mon, Jun 23, 2025 at 6:54 PM Christian König
> <christian.koenig@amd.com> wrote:
>>
>> On 6/19/25 09:20, Dave Airlie wrote:
>>> From: Dave Airlie <airlied@redhat.com>
>>>
>>> While discussing memcg intergration with gpu memory allocations,
>>> it was pointed out that there was no numa/system counters for
>>> GPU memory allocations.
>>>
>>> With more integrated memory GPU server systems turning up, and
>>> more requirements for memory tracking it seems we should start
>>> closing the gap.
>>>
>>> Add two counters to track GPU per-node system memory allocations.
>>>
>>> The first is currently allocated to GPU objects, and the second
>>> is for memory that is stored in GPU page pools that can be reclaimed,
>>> by the shrinker.
>>>
>>> Cc: Christian Koenig <christian.koenig@amd.com>
>>> Cc: Matthew Brost <matthew.brost@intel.com>
>>> Cc: Johannes Weiner <hannes@cmpxchg.org>
>>> Cc: linux-mm@kvack.org
>>> Cc: Andrew Morton <akpm@linux-foundation.org>
>>> Signed-off-by: Dave Airlie <airlied@redhat.com>
>>>
>>> ---
>>>
>>> v2: add more info to the documentation on this memory.
>>>
>>> I'd like to get acks to merge this via the drm tree, if possible,
>>>
>>> Dave.
>>> ---
>>>  Documentation/filesystems/proc.rst | 8 ++++++++
>>>  drivers/base/node.c                | 5 +++++
>>>  fs/proc/meminfo.c                  | 6 ++++++
>>>  include/linux/mmzone.h             | 2 ++
>>>  mm/show_mem.c                      | 9 +++++++--
>>>  mm/vmstat.c                        | 2 ++
>>>  6 files changed, 30 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst
>>> index 5236cb52e357..7cc5a9185190 100644
>>> --- a/Documentation/filesystems/proc.rst
>>> +++ b/Documentation/filesystems/proc.rst
>>> @@ -1095,6 +1095,8 @@ Example output. You may not have all of these fields.
>>>      CmaFree:               0 kB
>>>      Unaccepted:            0 kB
>>>      Balloon:               0 kB
>>> +    GPUActive:             0 kB
>>> +    GPUReclaim:            0 kB
>>
>> Active certainly makes sense, but I think we should rather disable the pool on newer CPUs than adding reclaimable memory here.
> 
> I'm not just concerned about newer platforms though, even on Fedora 42
> on my test ryzen1+7900xt machine, with a desktop session running
> 
> nr_gpu_active 7473
> nr_gpu_reclaim 6656
> 
> It's not an insignificant amount of memory.

That was not what I meant, that you have quite a bunch of memory allocated to the GPU is correct.

But the problem is more that we used the pool for way to many thinks which is actually not necessary.

But granted this is orthogonal to that patch here.

> I also think if we get to
> some sort of discardable GTT objects with a shrinker they should
> probably be accounted in reclaim.

The problem is that this is extremely driver specific.

On amdgpu we have some temporary buffers which can be reclaimed immediately, but the really big chunk is for example what XE does with it's shrinker.

See Thomas TTM patches from a few month ago. If memory is active or reclaimable does not depend on how it is allocated, but on how it is used.

So the accounting need to be at the driver level if you really want to distinct between the two states.

Christian.

> 
> Dave.
> 



  reply	other threads:[~2025-06-25 11:55 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-06-19  7:20 Dave Airlie
2025-06-19  7:20 ` [PATCH 2/2] drm/ttm: use gpu mm stats to track gpu memory allocations Dave Airlie
2025-06-19 22:37 ` [PATCH 1/2] mm: add gpu active/reclaim per-node stat counters (v2) Andrew Morton
2025-06-20 17:57 ` Zi Yan
2025-06-20 18:51 ` Shakeel Butt
2025-06-23  8:54 ` Christian König
2025-06-24  1:12   ` David Airlie
2025-06-25 11:55     ` Christian König [this message]
2025-06-25 19:16       ` David Airlie
2025-06-26  9:00         ` Christian König

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7dd0885a-7e7c-41a9-ae81-811fc344caf5@amd.com \
    --to=christian.koenig@amd.com \
    --cc=airlied@gmail.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hannes@cmpxchg.org \
    --cc=linux-mm@kvack.org \
    --cc=matthew.brost@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox