linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug
@ 2025-09-06  3:21 Andrew Guerrero
  2025-09-07 13:10 ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Guerrero @ 2025-09-06  3:21 UTC (permalink / raw)
  To: cgroups, linux-mm, linux-kernel
  Cc: hannes, mhocko, vdavydov.dev, akpm, shakeelb, guro, gunnarku, stable

A filesystem writeback performance issue was discovered by repeatedly
running CPU hotplug operations while a process in a cgroup with memory
and io controllers enabled wrote to an ext4 file in a loop.

When a CPU is offlined, the memcg_hotplug_cpu_dead() callback function
flushes per-cpu vmstats counters. However, instead of applying a per-cpu
counter once to each cgroup in the heirarchy, the per-cpu counter is
applied repeatedly just to the nested cgroup. Under certain conditions,
the per-cpu NR_FILE_DIRTY counter is routinely positive during hotplug
events and the dirty file count artifically inflates. Once the dirty
file count grows past the dirty_freerun_ceiling(), balance_dirty_pages()
starts a backgroup writeback each time a file page is marked dirty
within the nested cgroup.

This change fixes memcg_hotplug_cpu_dead() so that the per-cpu vmstats
and vmevents counters are applied once to each cgroup in the heirarchy,
similar to __mod_memcg_state() and __count_memcg_events().

Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
Signed-off-by: Andrew Guerrero <ajgja@amazon.com>
Reviewed-by: Gunnar Kudrjavets <gunnarku@amazon.com>
---
Hey all,

This patch is intended for the 5.10 longterm release branch. It will not apply
cleanly to mainline and is inadvertantly fixed by a larger series of changes in 
later release branches:
a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").

In 5.15, the counter flushing code is completely removed. This may be another
viable option here too, though it's a larger change.

Thanks!
---
 mm/memcontrol.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 142b4d5e08fe..8e085a4f45b7 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -2394,7 +2394,7 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 			x = this_cpu_xchg(memcg->vmstats_percpu->stat[i], 0);
 			if (x)
 				for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
-					atomic_long_add(x, &memcg->vmstats[i]);
+					atomic_long_add(x, &mi->vmstats[i]);
 
 			if (i >= NR_VM_NODE_STAT_ITEMS)
 				continue;
@@ -2417,7 +2417,7 @@ static int memcg_hotplug_cpu_dead(unsigned int cpu)
 			x = this_cpu_xchg(memcg->vmstats_percpu->events[i], 0);
 			if (x)
 				for (mi = memcg; mi; mi = parent_mem_cgroup(mi))
-					atomic_long_add(x, &memcg->vmevents[i]);
+					atomic_long_add(x, &mi->vmevents[i]);
 		}
 	}
 

base-commit: c30b4019ea89633d790f0bfcbb03234f0d006f87
-- 
2.47.3



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug
  2025-09-06  3:21 [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug Andrew Guerrero
@ 2025-09-07 13:10 ` Greg KH
  2025-09-08 21:09   ` Andrew Guerrero
  0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2025-09-07 13:10 UTC (permalink / raw)
  To: Andrew Guerrero
  Cc: cgroups, linux-mm, linux-kernel, hannes, mhocko, vdavydov.dev,
	akpm, shakeelb, guro, gunnarku, stable

On Sat, Sep 06, 2025 at 03:21:08AM +0000, Andrew Guerrero wrote:
> A filesystem writeback performance issue was discovered by repeatedly
> running CPU hotplug operations while a process in a cgroup with memory
> and io controllers enabled wrote to an ext4 file in a loop.
> 
> When a CPU is offlined, the memcg_hotplug_cpu_dead() callback function
> flushes per-cpu vmstats counters. However, instead of applying a per-cpu
> counter once to each cgroup in the heirarchy, the per-cpu counter is
> applied repeatedly just to the nested cgroup. Under certain conditions,
> the per-cpu NR_FILE_DIRTY counter is routinely positive during hotplug
> events and the dirty file count artifically inflates. Once the dirty
> file count grows past the dirty_freerun_ceiling(), balance_dirty_pages()
> starts a backgroup writeback each time a file page is marked dirty
> within the nested cgroup.
> 
> This change fixes memcg_hotplug_cpu_dead() so that the per-cpu vmstats
> and vmevents counters are applied once to each cgroup in the heirarchy,
> similar to __mod_memcg_state() and __count_memcg_events().
> 
> Fixes: 42a300353577 ("mm: memcontrol: fix recursive statistics correctness & scalabilty")
> Signed-off-by: Andrew Guerrero <ajgja@amazon.com>
> Reviewed-by: Gunnar Kudrjavets <gunnarku@amazon.com>
> ---
> Hey all,
> 
> This patch is intended for the 5.10 longterm release branch. It will not apply
> cleanly to mainline and is inadvertantly fixed by a larger series of changes in 
> later release branches:
> a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").

Why can't we take those instead?

> In 5.15, the counter flushing code is completely removed. This may be another
> viable option here too, though it's a larger change.

If it's not needed anymore, why not just remove it with the upstream
commits as well?

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug
  2025-09-07 13:10 ` Greg KH
@ 2025-09-08 21:09   ` Andrew Guerrero
  2025-09-12 12:45     ` Greg KH
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Guerrero @ 2025-09-08 21:09 UTC (permalink / raw)
  To: gregkh
  Cc: ajgja, akpm, cgroups, gunnarku, guro, hannes, roman.gushchin,
	linux-kernel, linux-mm, mhocko, shakeel.butt, muchun.song,
	stable, vdavydov.dev

On 2025-09-07 13:10 UTC, Greg KH wrote:
> On Sat, Sep 06, 2025 at 03:21:08AM +0000, Andrew Guerrero wrote:
> > This patch is intended for the 5.10 longterm release branch. It will not apply
> > cleanly to mainline and is inadvertantly fixed by a larger series of changes in 
> > later release branches:
> > a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").
> 
> Why can't we take those instead?
> 
> > In 5.15, the counter flushing code is completely removed. This may be another
> > viable option here too, though it's a larger change.
> 
> If it's not needed anymore, why not just remove it with the upstream
> commits as well?

Yeah, my understanding is the typical flow is to pull commits from upstream into
stable branches. However, I'm not confident I know the the answer to "which
upstream commits?" To get started,

`git log -L :memcg_hotplug_cpu_dead:mm/memcontrol.c linux-5.10.y..linux-5.15.y`

tells me that the upstream changes to pull are:

- https://lore.kernel.org/all/20210209163304.77088-1-hannes@cmpxchg.org/T/#u
- https://lore.kernel.org/all/20210716212137.1391164-1-shakeelb@google.com/T/#u

However, these are substantial features that "fix" the issue indirectly by
transitioning the memcg accounting system over to rstats. I can pick these 10
upstream commits, but I'm worried I may overlook some additional patches from
5.15.y that need to go along with them. I may need some guidance if we go this
route.

Another reasonable option is to take neither route. We can maintain this patch
internally and then drop it once we upgrade to a new kernel version.

Let me know how you would like to proceed.

Thanks!
Andrew


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug
  2025-09-08 21:09   ` Andrew Guerrero
@ 2025-09-12 12:45     ` Greg KH
  2025-09-16 17:10       ` Andrew Guerrero
  0 siblings, 1 reply; 5+ messages in thread
From: Greg KH @ 2025-09-12 12:45 UTC (permalink / raw)
  To: Andrew Guerrero
  Cc: akpm, cgroups, gunnarku, guro, hannes, roman.gushchin,
	linux-kernel, linux-mm, mhocko, shakeel.butt, muchun.song,
	stable, vdavydov.dev

On Mon, Sep 08, 2025 at 09:09:00PM +0000, Andrew Guerrero wrote:
> On 2025-09-07 13:10 UTC, Greg KH wrote:
> > On Sat, Sep 06, 2025 at 03:21:08AM +0000, Andrew Guerrero wrote:
> > > This patch is intended for the 5.10 longterm release branch. It will not apply
> > > cleanly to mainline and is inadvertantly fixed by a larger series of changes in 
> > > later release branches:
> > > a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").
> > 
> > Why can't we take those instead?
> > 
> > > In 5.15, the counter flushing code is completely removed. This may be another
> > > viable option here too, though it's a larger change.
> > 
> > If it's not needed anymore, why not just remove it with the upstream
> > commits as well?
> 
> Yeah, my understanding is the typical flow is to pull commits from upstream into
> stable branches. However, I'm not confident I know the the answer to "which
> upstream commits?" To get started,
> 
> `git log -L :memcg_hotplug_cpu_dead:mm/memcontrol.c linux-5.10.y..linux-5.15.y`
> 
> tells me that the upstream changes to pull are:
> 
> - https://lore.kernel.org/all/20210209163304.77088-1-hannes@cmpxchg.org/T/#u
> - https://lore.kernel.org/all/20210716212137.1391164-1-shakeelb@google.com/T/#u
> 
> However, these are substantial features that "fix" the issue indirectly by
> transitioning the memcg accounting system over to rstats. I can pick these 10
> upstream commits, but I'm worried I may overlook some additional patches from
> 5.15.y that need to go along with them. I may need some guidance if we go this
> route.

Testing is key :)

> Another reasonable option is to take neither route. We can maintain this patch
> internally and then drop it once we upgrade to a new kernel version.

Perhaps just do that for now if you all are hitting this issue?  It
seems to be the only report I've seen so far.

thanks,

greg k-h


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug
  2025-09-12 12:45     ` Greg KH
@ 2025-09-16 17:10       ` Andrew Guerrero
  0 siblings, 0 replies; 5+ messages in thread
From: Andrew Guerrero @ 2025-09-16 17:10 UTC (permalink / raw)
  To: gregkh
  Cc: ajgja, akpm, cgroups, gunnarku, guro, hannes, linux-kernel,
	linux-mm, mhocko, muchun.song, roman.gushchin, shakeel.butt,
	stable, vdavydov.dev

On 2025-09-12 12:45 UTC, Greg KH wrote:
> On Mon, Sep 08, 2025 at 09:09:00PM +0000, Andrew Guerrero wrote:
> > On 2025-09-07 13:10 UTC, Greg KH wrote:
> > > On Sat, Sep 06, 2025 at 03:21:08AM +0000, Andrew Guerrero wrote:
> > > > This patch is intended for the 5.10 longterm release branch. It will not apply
> > > > cleanly to mainline and is inadvertantly fixed by a larger series of changes in 
> > > > later release branches:
> > > > a3d4c05a4474 ("mm: memcontrol: fix cpuhotplug statistics flushing").
> > > 
> > > Why can't we take those instead?
> > > 
> > > > In 5.15, the counter flushing code is completely removed. This may be another
> > > > viable option here too, though it's a larger change.
> > > 
> > > If it's not needed anymore, why not just remove it with the upstream
> > > commits as well?
> > 
> > Yeah, my understanding is the typical flow is to pull commits from upstream into
> > stable branches. However, I'm not confident I know the the answer to "which
> > upstream commits?" To get started,
> > 
> > `git log -L :memcg_hotplug_cpu_dead:mm/memcontrol.c linux-5.10.y..linux-5.15.y`
> > 
> > tells me that the upstream changes to pull are:
> > 
> > - https://lore.kernel.org/all/20210209163304.77088-1-hannes@cmpxchg.org/T/#u
> > - https://lore.kernel.org/all/20210716212137.1391164-1-shakeelb@google.com/T/#u
> > 
> > However, these are substantial features that "fix" the issue indirectly by
> > transitioning the memcg accounting system over to rstats. I can pick these 10
> > upstream commits, but I'm worried I may overlook some additional patches from
> > 5.15.y that need to go along with them. I may need some guidance if we go this
> > route.
> 
> Testing is key :)
> 
> > Another reasonable option is to take neither route. We can maintain this patch
> > internally and then drop it once we upgrade to a new kernel version.
> 
> Perhaps just do that for now if you all are hitting this issue?  It
> seems to be the only report I've seen so far.

We are hitting this issue only in a stress test, and I think we got lucky with
experiencing it, so I wouldn't be too surprised if this is the first and only
report.

Thanks for taking a look!

Andrew


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2025-09-16 17:10 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-09-06  3:21 [PATCH] mm: memcontrol: fix memcg accounting during cpu hotplug Andrew Guerrero
2025-09-07 13:10 ` Greg KH
2025-09-08 21:09   ` Andrew Guerrero
2025-09-12 12:45     ` Greg KH
2025-09-16 17:10       ` Andrew Guerrero

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox