From: David Airlie <airlied@redhat.com>
To: "Christian König" <christian.koenig@amd.com>
Cc: Dave Airlie <airlied@gmail.com>,
dri-devel@lists.freedesktop.org, linux-mm@kvack.org,
Johannes Weiner <hannes@cmpxchg.org>,
Dave Chinner <david@fromorbit.com>,
Kairui Song <kasong@tencent.com>
Subject: Re: [PATCH 12/17] ttm: add objcg pointer to bo and tt
Date: Tue, 1 Jul 2025 18:06:42 +1000 [thread overview]
Message-ID: <CAMwc25q-kBRGDrphU+iAyqENZhgdRtEnSrR9z6b5bQ_JFzzK2g@mail.gmail.com> (raw)
In-Reply-To: <cf6cb95f-df79-40ae-95d5-dc5a7620a136@amd.com>
On Tue, Jul 1, 2025 at 5:22 PM Christian König <christian.koenig@amd.com> wrote:
>
> On 30.06.25 23:33, David Airlie wrote:
> > On Mon, Jun 30, 2025 at 8:24 PM Christian König
> > <christian.koenig@amd.com> wrote:
> >>
> >> On 30.06.25 06:49, Dave Airlie wrote:
> >>> From: Dave Airlie <airlied@redhat.com>
> >>>
> >>> This just adds the obj cgroup pointer to the bo and tt structs,
> >>> and sets it between them.
> >>>
> >>> Signed-off-by: Dave Airlie <airlied@redhat.com>
> >>> ---
> >>> drivers/gpu/drm/ttm/ttm_tt.c | 1 +
> >>> include/drm/ttm/ttm_bo.h | 6 ++++++
> >>> include/drm/ttm/ttm_tt.h | 2 ++
> >>> 3 files changed, 9 insertions(+)
> >>>
> >>> diff --git a/drivers/gpu/drm/ttm/ttm_tt.c b/drivers/gpu/drm/ttm/ttm_tt.c
> >>> index 8f38de3b2f1c..0c54d5e2bfdd 100644
> >>> --- a/drivers/gpu/drm/ttm/ttm_tt.c
> >>> +++ b/drivers/gpu/drm/ttm/ttm_tt.c
> >>> @@ -162,6 +162,7 @@ static void ttm_tt_init_fields(struct ttm_tt *ttm,
> >>> ttm->caching = caching;
> >>> ttm->restore = NULL;
> >>> ttm->backup = NULL;
> >>> + ttm->objcg = bo->objcg;
> >>> }
> >>>
> >>> int ttm_tt_init(struct ttm_tt *ttm, struct ttm_buffer_object *bo,
> >>> diff --git a/include/drm/ttm/ttm_bo.h b/include/drm/ttm/ttm_bo.h
> >>> index 099dc2604baa..f26ec0a0273f 100644
> >>> --- a/include/drm/ttm/ttm_bo.h
> >>> +++ b/include/drm/ttm/ttm_bo.h
> >>> @@ -135,6 +135,12 @@ struct ttm_buffer_object {
> >>> * reservation lock.
> >>> */
> >>> struct sg_table *sg;
> >>> +
> >>> + /**
> >>> + * @objcg: object cgroup to charge this to if it ends up using system memory.
> >>> + * NULL means don't charge.
> >>> + */
> >>> + struct obj_cgroup *objcg;
> >>> };
> >>>
> >>> #define TTM_BO_MAP_IOMEM_MASK 0x80
> >>> diff --git a/include/drm/ttm/ttm_tt.h b/include/drm/ttm/ttm_tt.h
> >>> index 15d4019685f6..c13fea4c2915 100644
> >>> --- a/include/drm/ttm/ttm_tt.h
> >>> +++ b/include/drm/ttm/ttm_tt.h
> >>> @@ -126,6 +126,8 @@ struct ttm_tt {
> >>> enum ttm_caching caching;
> >>> /** @restore: Partial restoration from backup state. TTM private */
> >>> struct ttm_pool_tt_restore *restore;
> >>> + /** @objcg: Object cgroup for this TT allocation */
> >>> + struct obj_cgroup *objcg;
> >>> };
> >>
> >> We should probably keep that out of the pool and account the memory to the BO instead.
> >>
> >
> > I tried that like 2-3 patch posting iterations ago, you suggested it
> > then, it didn't work. It has to be done at the pool level, I think it
> > was due to swap handling.
>
> When you do it at the pool level the swap/shrink handling is broken as well, just not for amdgpu.
>
> See xe_bo_shrink() and drivers/gpu/drm/xe/xe_shrinker.c on how XE does it.
I've read all of that, but I don't think it needs changing yet, though
I do think I probably need to do a bit more work on the ttm
backup/restore paths to account things, but again we suffer from the
what happens if your cgroup runs out of space on a restore path,
similiar to eviction.
Blocking the problems we can solve now on the problems we've no idea
how to solve means nobody gets experience with solving anything.
> So the best we can do is to do it at the resource level because that is common for everybody.
>
> This doesn't takes swapping on amdgpu into account, but that should not be that relevant since we wanted to remove that and switch to the XE approach anyway.
I don't understand, we cannot do it at the resource level, I sent
patches to try, they don't fundamentally work properly, so it isn't
going to fly. We can solve it at the pool level, so we should, if we
somehow rearchitect things later to solve it at the resource level,
but I feel we'd have to make swap handling operate at the resource
level instead of tt level to have any chance.
Swapping via the backup/restore paths should be accounted properly,
since moving pages out to swap one way cgroups can reduce the memory
usage, if we can't account that swapped pages aren't removed from the
page count, then it isn't going to work properly.
Dave.
next prev parent reply other threads:[~2025-07-01 8:14 UTC|newest]
Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-30 4:49 drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver Dave Airlie
2025-06-30 4:49 ` [PATCH 01/17] mm: add gpu active/reclaim per-node stat counters (v2) Dave Airlie
2025-06-30 4:49 ` [PATCH 02/17] drm/ttm: use gpu mm stats to track gpu memory allocations. (v2) Dave Airlie
2025-06-30 10:04 ` Christian König
2025-07-01 1:41 ` David Airlie
2025-07-02 16:08 ` Shakeel Butt
2025-06-30 4:49 ` [PATCH 03/17] mm/list_lru: export list_lru_add Dave Airlie
2025-06-30 4:49 ` [PATCH 04/17] ttm/pool: port to list_lru. (v2) Dave Airlie
2025-06-30 10:37 ` kernel test robot
2025-06-30 4:49 ` [PATCH 05/17] ttm/pool: drop numa specific pools Dave Airlie
2025-06-30 10:12 ` Christian König
2025-06-30 4:49 ` [PATCH 06/17] ttm/pool: make pool shrinker NUMA aware Dave Airlie
2025-06-30 10:15 ` Christian König
2025-06-30 21:30 ` David Airlie
2025-06-30 4:49 ` [PATCH 07/17] ttm/pool: track allocated_pages per numa node Dave Airlie
2025-06-30 4:49 ` [PATCH 08/17] memcg: add support for GPU page counters Dave Airlie
2025-07-02 16:06 ` Shakeel Butt
2025-07-03 5:43 ` David Airlie
2025-06-30 4:49 ` [PATCH 09/17] memcg: export memcg_list_lru_alloc Dave Airlie
2025-06-30 4:49 ` [PATCH 10/17] ttm: add a memcg accounting flag to the alloc/populate APIs Dave Airlie
2025-06-30 9:56 ` kernel test robot
2025-06-30 10:20 ` Christian König
2025-07-01 1:46 ` David Airlie
2025-06-30 4:49 ` [PATCH 11/17] ttm/pool: initialise the shrinker earlier Dave Airlie
2025-06-30 4:49 ` [PATCH 12/17] ttm: add objcg pointer to bo and tt Dave Airlie
2025-06-30 10:24 ` Christian König
2025-06-30 21:33 ` David Airlie
2025-07-01 7:22 ` Christian König
2025-07-01 8:06 ` David Airlie [this message]
2025-07-01 8:15 ` Christian König
2025-07-01 22:11 ` David Airlie
2025-07-02 7:27 ` Christian König
2025-07-02 7:57 ` David Airlie
2025-07-02 8:24 ` Christian König
2025-07-03 5:53 ` David Airlie
2025-06-30 4:49 ` [PATCH 13/17] ttm/pool: enable memcg tracking and shrinker Dave Airlie
2025-06-30 10:23 ` Christian König
2025-06-30 21:23 ` David Airlie
2025-06-30 11:59 ` kernel test robot
2025-07-02 16:41 ` Shakeel Butt
2025-06-30 4:49 ` [PATCH 14/17] ttm: hook up memcg placement flags Dave Airlie
2025-06-30 4:49 ` [PATCH 15/17] memcontrol: allow objcg api when memcg is config off Dave Airlie
2025-06-30 4:49 ` [PATCH 16/17] memcontrol: export current_obj_cgroup Dave Airlie
2025-06-30 4:49 ` [PATCH 17/17] amdgpu: add support for memory cgroups Dave Airlie
2025-07-02 16:02 ` Shakeel Butt
2025-07-03 2:53 ` David Airlie
2025-07-03 17:58 ` Shakeel Butt
2025-07-03 18:15 ` Christian König
2025-07-03 20:06 ` Shakeel Butt
2025-07-03 21:22 ` David Airlie
2025-07-04 9:39 ` Christian König
2025-07-01 23:26 ` drm/ttm/memcg/lru: enable memcg tracking for ttm and amdgpu driver Balbir Singh
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMwc25q-kBRGDrphU+iAyqENZhgdRtEnSrR9z6b5bQ_JFzzK2g@mail.gmail.com \
--to=airlied@redhat.com \
--cc=airlied@gmail.com \
--cc=christian.koenig@amd.com \
--cc=david@fromorbit.com \
--cc=dri-devel@lists.freedesktop.org \
--cc=hannes@cmpxchg.org \
--cc=kasong@tencent.com \
--cc=linux-mm@kvack.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox