linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* A path forward to cleaning up dying cgroups?
@ 2025-02-05 17:48 Hamza Mahfooz
  2025-02-05 17:50 ` Hamza Mahfooz
  0 siblings, 1 reply; 10+ messages in thread
From: Hamza Mahfooz @ 2025-02-05 17:48 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Johannes Weiner, Shakeel Butt, Andrew Morton,
	cgroups, linux-kernel, Tejun Heo, Michal Koutný,
	Michal Hocko, Muchun Song, Allen Pais, Hamza Mahfooz,
	Yosry Ahmed

I was just curious as to what the status of the issue described in [1]
is. It appears that the last time someone took a stab at it was in [2].

Though it seems like there has been relative silence regarding it since
then. So, has there been any discussion regarding the issue since then
and does anyone know if there is consensus on how we should go about
resolving this issue within the kernel?

BR,
Hamza

[1] https://lwn.net/Articles/895431/
[2] https://lore.kernel.org/r/20220621125658.64935-1-songmuchun@bytedance.com/


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-05 17:48 A path forward to cleaning up dying cgroups? Hamza Mahfooz
@ 2025-02-05 17:50 ` Hamza Mahfooz
  2025-02-05 18:08   ` Johannes Weiner
  0 siblings, 1 reply; 10+ messages in thread
From: Hamza Mahfooz @ 2025-02-05 17:50 UTC (permalink / raw)
  To: linux-mm
  Cc: Roman Gushchin, Johannes Weiner, Shakeel Butt, Andrew Morton,
	cgroups, linux-kernel, Tejun Heo, Michal Koutný,
	Michal Hocko, Muchun Song, Allen Pais, Yosry Ahmed

Cc: Shakeel Butt <shakeel.butt@linux.dev>

On 2/5/25 12:48, Hamza Mahfooz wrote:
> I was just curious as to what the status of the issue described in [1]
> is. It appears that the last time someone took a stab at it was in [2].
> 
> Though it seems like there has been relative silence regarding it since
> then. So, has there been any discussion regarding the issue since then
> and does anyone know if there is consensus on how we should go about
> resolving this issue within the kernel?
> 
> BR,
> Hamza
> 
> [1] https://lwn.net/Articles/895431/
> [2] https://lore.kernel.org/r/20220621125658.64935-1-songmuchun@bytedance.com/



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-05 17:50 ` Hamza Mahfooz
@ 2025-02-05 18:08   ` Johannes Weiner
  2025-02-05 18:16     ` Yosry Ahmed
                       ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Johannes Weiner @ 2025-02-05 18:08 UTC (permalink / raw)
  To: Hamza Mahfooz
  Cc: linux-mm, Roman Gushchin, Shakeel Butt, Andrew Morton, cgroups,
	linux-kernel, Tejun Heo, Michal Koutný,
	Michal Hocko, Muchun Song, Allen Pais, Yosry Ahmed

On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
> Cc: Shakeel Butt <shakeel.butt@linux.dev>
> 
> On 2/5/25 12:48, Hamza Mahfooz wrote:
> > I was just curious as to what the status of the issue described in [1]
> > is. It appears that the last time someone took a stab at it was in [2].

If memory serves, the sticking point was whether pages should indeed
be reparented on cgroup death, or whether they could be moved
arbitrarily to other cgroups that are still using them.

It's a bit unfortunate, because the reparenting patches were tested
and reviewed, and the arbitrary recharging was just an idea that
ttbomk nobody seriously followed up on afterwards.

We also recently removed the charge moving code from cgroup1, along
with the subtle page access/locking/accounting rules it imposed on the
rest of the MM. I'm doubtful there is much appetite in either camp for
bringing this back.

So I would still love to see Muchun's patches merged. They fix a
seemingly universally experienced operational issue in memcg, and we
shouldn't hold it up unless somebody actually posts alternative code.

Thoughts?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-05 18:08   ` Johannes Weiner
@ 2025-02-05 18:16     ` Yosry Ahmed
  2025-02-06  4:56       ` Kairui Song
  2025-02-05 18:31     ` Roman Gushchin
  2025-02-05 18:46     ` Shakeel Butt
  2 siblings, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2025-02-05 18:16 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hamza Mahfooz, linux-mm, Roman Gushchin, Shakeel Butt,
	Andrew Morton, cgroups, linux-kernel, Tejun Heo,
	Michal Koutný,
	Michal Hocko, Muchun Song, Zach O'Keefe, Kinsey Ho,
	Yosry Ahmed, Allen Pais

On Wed, Feb 05, 2025 at 01:08:42PM -0500, Johannes Weiner wrote:
> On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > 
> > On 2/5/25 12:48, Hamza Mahfooz wrote:
> > > I was just curious as to what the status of the issue described in [1]
> > > is. It appears that the last time someone took a stab at it was in [2].
> 
> If memory serves, the sticking point was whether pages should indeed
> be reparented on cgroup death, or whether they could be moved
> arbitrarily to other cgroups that are still using them.
> 
> It's a bit unfortunate, because the reparenting patches were tested
> and reviewed, and the arbitrary recharging was just an idea that
> ttbomk nobody seriously followed up on afterwards.

There was an RFC series [1] for the recharging, but all memcg
maintainers hated it :P

https://lore.kernel.org/lkml/20230720070825.992023-1-yosryahmed@google.com/

> 
> We also recently removed the charge moving code from cgroup1, along
> with the subtle page access/locking/accounting rules it imposed on the
> rest of the MM. I'm doubtful there is much appetite in either camp for
> bringing this back.

Yeah with the charge moving code gone the case for recharging grows
weaker.

> 
> So I would still love to see Muchun's patches merged. They fix a
> seemingly universally experienced operational issue in memcg, and we
> shouldn't hold it up unless somebody actually posts alternative code.
> 
> Thoughts?

Adding Zach and Kinsey who were recently looking into this from the
Google side.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-05 18:08   ` Johannes Weiner
  2025-02-05 18:16     ` Yosry Ahmed
@ 2025-02-05 18:31     ` Roman Gushchin
  2025-02-05 18:46     ` Shakeel Butt
  2 siblings, 0 replies; 10+ messages in thread
From: Roman Gushchin @ 2025-02-05 18:31 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hamza Mahfooz, linux-mm, Shakeel Butt, Andrew Morton, cgroups,
	linux-kernel, Tejun Heo, Michal Koutný,
	Michal Hocko, Muchun Song, Allen Pais, Yosry Ahmed

On Wed, Feb 05, 2025 at 01:08:42PM -0500, Johannes Weiner wrote:
> On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > 
> > On 2/5/25 12:48, Hamza Mahfooz wrote:
> > > I was just curious as to what the status of the issue described in [1]
> > > is. It appears that the last time someone took a stab at it was in [2].
> 
> If memory serves, the sticking point was whether pages should indeed
> be reparented on cgroup death, or whether they could be moved
> arbitrarily to other cgroups that are still using them.
> 
> It's a bit unfortunate, because the reparenting patches were tested
> and reviewed, and the arbitrary recharging was just an idea that
> ttbomk nobody seriously followed up on afterwards.
> 
> We also recently removed the charge moving code from cgroup1, along
> with the subtle page access/locking/accounting rules it imposed on the
> rest of the MM. I'm doubtful there is much appetite in either camp for
> bringing this back.
> 
> So I would still love to see Muchun's patches merged. They fix a
> seemingly universally experienced operational issue in memcg, and we
> shouldn't hold it up unless somebody actually posts alternative code.
> 
> Thoughts?

I don't have a strong opinion here. Reparenting is clearly not perfect,
but I agree that we don't have any better solutions, only vague ideas.
I believe Muchun's code would require some refresh, but generally is fine
to merge.

This all comes up to the handling of memory shared between cgroups.
Sharing can be spatial (2 or more simultaneously existing cgroups) or
temporal (a cgroup is being deleted and recreated, the workload tries to
reuse old pages). The reparenting turns temporal sharing into the spacial.
It helps with dying cgroups, but comes at the cost of permanently wrong
accounting and issues with the memory protection.

Thanks!


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-05 18:08   ` Johannes Weiner
  2025-02-05 18:16     ` Yosry Ahmed
  2025-02-05 18:31     ` Roman Gushchin
@ 2025-02-05 18:46     ` Shakeel Butt
  2025-02-06  3:30       ` Muchun Song
  2 siblings, 1 reply; 10+ messages in thread
From: Shakeel Butt @ 2025-02-05 18:46 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Hamza Mahfooz, linux-mm, Roman Gushchin, Andrew Morton, cgroups,
	linux-kernel, Tejun Heo, Michal Koutný,
	Michal Hocko, Muchun Song, Allen Pais, Yosry Ahmed

On Wed, Feb 05, 2025 at 01:08:42PM -0500, Johannes Weiner wrote:
> On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
> > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > 
> > On 2/5/25 12:48, Hamza Mahfooz wrote:
> > > I was just curious as to what the status of the issue described in [1]
> > > is. It appears that the last time someone took a stab at it was in [2].
> 
> If memory serves, the sticking point was whether pages should indeed
> be reparented on cgroup death, or whether they could be moved
> arbitrarily to other cgroups that are still using them.
> 
> It's a bit unfortunate, because the reparenting patches were tested
> and reviewed, and the arbitrary recharging was just an idea that
> ttbomk nobody seriously followed up on afterwards.
> 
> We also recently removed the charge moving code from cgroup1, along
> with the subtle page access/locking/accounting rules it imposed on the
> rest of the MM. I'm doubtful there is much appetite in either camp for
> bringing this back.
> 
> So I would still love to see Muchun's patches merged. They fix a
> seemingly universally experienced operational issue in memcg, and we
> shouldn't hold it up unless somebody actually posts alternative code.
> 
> Thoughts?

I think the recharging (or whatever the alternative) can be a followup
to this. I agree this is a good change.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-05 18:46     ` Shakeel Butt
@ 2025-02-06  3:30       ` Muchun Song
  2025-02-06  3:34         ` Waiman Long
  2025-02-06 15:51         ` Kamalesh Babulal
  0 siblings, 2 replies; 10+ messages in thread
From: Muchun Song @ 2025-02-06  3:30 UTC (permalink / raw)
  To: Shakeel Butt
  Cc: Johannes Weiner, Hamza Mahfooz, linux-mm, Roman Gushchin,
	Andrew Morton, cgroups, linux-kernel, Tejun Heo,
	Michal Koutný,
	Michal Hocko, Allen Pais, Yosry Ahmed



> On Feb 6, 2025, at 02:46, Shakeel Butt <shakeel.butt@linux.dev> wrote:
> 
> On Wed, Feb 05, 2025 at 01:08:42PM -0500, Johannes Weiner wrote:
>> On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
>>> Cc: Shakeel Butt <shakeel.butt@linux.dev>
>>> 
>>> On 2/5/25 12:48, Hamza Mahfooz wrote:
>>>> I was just curious as to what the status of the issue described in [1]
>>>> is. It appears that the last time someone took a stab at it was in [2].
>> 
>> If memory serves, the sticking point was whether pages should indeed
>> be reparented on cgroup death, or whether they could be moved
>> arbitrarily to other cgroups that are still using them.
>> 
>> It's a bit unfortunate, because the reparenting patches were tested
>> and reviewed, and the arbitrary recharging was just an idea that
>> ttbomk nobody seriously followed up on afterwards.
>> 
>> We also recently removed the charge moving code from cgroup1, along
>> with the subtle page access/locking/accounting rules it imposed on the
>> rest of the MM. I'm doubtful there is much appetite in either camp for
>> bringing this back.
>> 
>> So I would still love to see Muchun's patches merged. They fix a
>> seemingly universally experienced operational issue in memcg, and we
>> shouldn't hold it up unless somebody actually posts alternative code.
>> 
>> Thoughts?
> 
> I think the recharging (or whatever the alternative) can be a followup
> to this. I agree this is a good change.

I agree with you. We've been encountering dying memory issues for years
on our servers. As Roman said, I need to refresh my patches. So I need
some time for refreshing.

Muchun,
Thanks.



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-06  3:30       ` Muchun Song
@ 2025-02-06  3:34         ` Waiman Long
  2025-02-06 15:51         ` Kamalesh Babulal
  1 sibling, 0 replies; 10+ messages in thread
From: Waiman Long @ 2025-02-06  3:34 UTC (permalink / raw)
  To: Muchun Song, Shakeel Butt
  Cc: Johannes Weiner, Hamza Mahfooz, linux-mm, Roman Gushchin,
	Andrew Morton, cgroups, linux-kernel, Tejun Heo,
	Michal Koutný,
	Michal Hocko, Allen Pais, Yosry Ahmed

On 2/5/25 10:30 PM, Muchun Song wrote:
>
>> On Feb 6, 2025, at 02:46, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>
>> On Wed, Feb 05, 2025 at 01:08:42PM -0500, Johannes Weiner wrote:
>>> On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
>>>> Cc: Shakeel Butt <shakeel.butt@linux.dev>
>>>>
>>>> On 2/5/25 12:48, Hamza Mahfooz wrote:
>>>>> I was just curious as to what the status of the issue described in [1]
>>>>> is. It appears that the last time someone took a stab at it was in [2].
>>> If memory serves, the sticking point was whether pages should indeed
>>> be reparented on cgroup death, or whether they could be moved
>>> arbitrarily to other cgroups that are still using them.
>>>
>>> It's a bit unfortunate, because the reparenting patches were tested
>>> and reviewed, and the arbitrary recharging was just an idea that
>>> ttbomk nobody seriously followed up on afterwards.
>>>
>>> We also recently removed the charge moving code from cgroup1, along
>>> with the subtle page access/locking/accounting rules it imposed on the
>>> rest of the MM. I'm doubtful there is much appetite in either camp for
>>> bringing this back.
>>>
>>> So I would still love to see Muchun's patches merged. They fix a
>>> seemingly universally experienced operational issue in memcg, and we
>>> shouldn't hold it up unless somebody actually posts alternative code.
>>>
>>> Thoughts?
>> I think the recharging (or whatever the alternative) can be a followup
>> to this. I agree this is a good change.
> I agree with you. We've been encountering dying memory issues for years
> on our servers. As Roman said, I need to refresh my patches. So I need
> some time for refreshing.

Glad to hear that. I have been waiting for a resolution of the dying 
memory cgroup problems for years :-)

Cheers,
Longman



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-05 18:16     ` Yosry Ahmed
@ 2025-02-06  4:56       ` Kairui Song
  0 siblings, 0 replies; 10+ messages in thread
From: Kairui Song @ 2025-02-06  4:56 UTC (permalink / raw)
  To: Yosry Ahmed, Muchun Song
  Cc: Johannes Weiner, Hamza Mahfooz, linux-mm, Roman Gushchin,
	Shakeel Butt, Andrew Morton, cgroups, linux-kernel, Tejun Heo,
	Michal Koutný,
	Michal Hocko, Zach O'Keefe, Kinsey Ho, Yosry Ahmed,
	Allen Pais

On Thu, Feb 6, 2025 at 2:16 AM Yosry Ahmed <yosry.ahmed@linux.dev> wrote:
>
> On Wed, Feb 05, 2025 at 01:08:42PM -0500, Johannes Weiner wrote:
> > On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
> > > Cc: Shakeel Butt <shakeel.butt@linux.dev>
> > >
> > > On 2/5/25 12:48, Hamza Mahfooz wrote:
> > > > I was just curious as to what the status of the issue described in [1]
> > > > is. It appears that the last time someone took a stab at it was in [2].
> >
> > If memory serves, the sticking point was whether pages should indeed
> > be reparented on cgroup death, or whether they could be moved
> > arbitrarily to other cgroups that are still using them.
> >
> > It's a bit unfortunate, because the reparenting patches were tested
> > and reviewed, and the arbitrary recharging was just an idea that
> > ttbomk nobody seriously followed up on afterwards.
>
> There was an RFC series [1] for the recharging, but all memcg
> maintainers hated it :P
>
> https://lore.kernel.org/lkml/20230720070825.992023-1-yosryahmed@google.com/

We have been suffering from dying cgroup issues for years too, and I
just saw this series. Will it be a good idea to combine this with
reparenting instead (if we will go with the reparenting approach)?
Using objcg API to charge the folios does help speed up the
reparenting, but also adds some overhead and complexity. Just walking
and reparenting the folios seems a more direct approach.

And another idea is, per our observation, dying cgroups have few pages
that are mapped, as the process has all exited. Most folios are just
cache. Shared mapped pages are minor especially for containers. So a
deferred recharge on access seems good enough? Mapped folios may also
be finally unmap someday and get recharged. And at least this makes
accounting more accurate.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: A path forward to cleaning up dying cgroups?
  2025-02-06  3:30       ` Muchun Song
  2025-02-06  3:34         ` Waiman Long
@ 2025-02-06 15:51         ` Kamalesh Babulal
  1 sibling, 0 replies; 10+ messages in thread
From: Kamalesh Babulal @ 2025-02-06 15:51 UTC (permalink / raw)
  To: Muchun Song, Shakeel Butt
  Cc: Johannes Weiner, Hamza Mahfooz, linux-mm, Roman Gushchin,
	Andrew Morton, cgroups, linux-kernel, Tejun Heo,
	Michal Koutný,
	Michal Hocko, Allen Pais, Yosry Ahmed



On 06/02/25 09:00, Muchun Song wrote:
> 
> 
>> On Feb 6, 2025, at 02:46, Shakeel Butt <shakeel.butt@linux.dev> wrote:
>>
>> On Wed, Feb 05, 2025 at 01:08:42PM -0500, Johannes Weiner wrote:
>>> On Wed, Feb 05, 2025 at 12:50:19PM -0500, Hamza Mahfooz wrote:
>>>> Cc: Shakeel Butt <shakeel.butt@linux.dev>
>>>>
>>>> On 2/5/25 12:48, Hamza Mahfooz wrote:
>>>>> I was just curious as to what the status of the issue described in [1]
>>>>> is. It appears that the last time someone took a stab at it was in [2].
>>>
>>> If memory serves, the sticking point was whether pages should indeed
>>> be reparented on cgroup death, or whether they could be moved
>>> arbitrarily to other cgroups that are still using them.
>>>
>>> It's a bit unfortunate, because the reparenting patches were tested
>>> and reviewed, and the arbitrary recharging was just an idea that
>>> ttbomk nobody seriously followed up on afterwards.
>>>
>>> We also recently removed the charge moving code from cgroup1, along
>>> with the subtle page access/locking/accounting rules it imposed on the
>>> rest of the MM. I'm doubtful there is much appetite in either camp for
>>> bringing this back.
>>>
>>> So I would still love to see Muchun's patches merged. They fix a
>>> seemingly universally experienced operational issue in memcg, and we
>>> shouldn't hold it up unless somebody actually posts alternative code.
>>>
>>> Thoughts?
>>
>> I think the recharging (or whatever the alternative) can be a followup
>> to this. I agree this is a good change.
> 
> I agree with you. We've been encountering dying memory issues for years
> on our servers. As Roman said, I need to refresh my patches. So I need
> some time for refreshing.
> 

We have seen the dying cgroups issue too and look forward to your patches.
Happy to help with testing/reviewing.

-- 
Thanks,
Kamalesh



^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2025-02-06 15:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-02-05 17:48 A path forward to cleaning up dying cgroups? Hamza Mahfooz
2025-02-05 17:50 ` Hamza Mahfooz
2025-02-05 18:08   ` Johannes Weiner
2025-02-05 18:16     ` Yosry Ahmed
2025-02-06  4:56       ` Kairui Song
2025-02-05 18:31     ` Roman Gushchin
2025-02-05 18:46     ` Shakeel Butt
2025-02-06  3:30       ` Muchun Song
2025-02-06  3:34         ` Waiman Long
2025-02-06 15:51         ` Kamalesh Babulal

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox