From: Shakeel Butt <shakeel.butt@linux.dev>
To: Leon Huang Fu <leon.huangfu@shopee.com>
Cc: akpm@linux-foundation.org, cgroups@vger.kernel.org,
corbet@lwn.net, hannes@cmpxchg.org, inwardvessel@gmail.com,
jack@suse.cz, joel.granados@kernel.org, kyle.meyer@hpe.com,
lance.yang@linux.dev, laoar.shao@gmail.com,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-mm@kvack.org, mclapinski@google.com, mhocko@kernel.org,
muchun.song@linux.dev, roman.gushchin@linux.dev,
yosry.ahmed@linux.dev
Subject: Re: [PATCH mm-new v2] mm/memcontrol: Flush stats when write stat file
Date: Thu, 6 Nov 2025 15:55:59 -0800 [thread overview]
Message-ID: <blygjeudtqyxk7bhw5ycveofo4e322nycxyvupdnzq3eg7qtpo@cya4bifb2dlk> (raw)
In-Reply-To: <20251106033045.41607-1-leon.huangfu@shopee.com>
On Thu, Nov 06, 2025 at 11:30:45AM +0800, Leon Huang Fu wrote:
> On Thu, Nov 6, 2025 at 9:19 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > +Yosry, JP
> >
> > On Wed, Nov 05, 2025 at 03:49:16PM +0800, Leon Huang Fu wrote:
> > > On high-core count systems, memory cgroup statistics can become stale
> > > due to per-CPU caching and deferred aggregation. Monitoring tools and
> > > management applications sometimes need guaranteed up-to-date statistics
> > > at specific points in time to make accurate decisions.
> >
> > Can you explain a bit more on your environment where you are seeing
> > stale stats? More specifically, how often the management applications
> > are reading the memcg stats and if these applications are reading memcg
> > stats for each nodes of the cgroup tree.
> >
> > We force flush all the memcg stats at root level every 2 seconds but it
> > seems like that is not enough for your case. I am fine with an explicit
> > way for users to flush the memcg stats. In that way only users who want
> > to has to pay for the flush cost.
> >
>
> Thanks for the feedback. I encountered this issue while running the LTP
> memcontrol02 test case [1] on a 256-core server with the 6.6.y kernel on XFS,
> where it consistently failed.
>
> I was aware that Yosry had improved the memory statistics refresh mechanism
> in "mm: memcg: subtree stats flushing and thresholds" [2], so I attempted to
> backport that patchset to 6.6.y [3]. However, even on the 6.15.0-061500-generic
> kernel with those improvements, the test still fails intermittently on XFS.
>
> I've created a simplified reproducer that mirrors the LTP test behavior. The
> test allocates 50 MiB of page cache and then verifies that memory.current and
> memory.stat's "file" field are approximately equal (within 5% tolerance).
>
> The failure pattern looks like:
>
> After alloc: memory.current=52690944, memory.stat.file=48496640, size=52428800
> Checks: current>=size=OK, file>0=OK, current~=file(5%)=FAIL
>
> Here's the reproducer code and test script (attached below for reference).
>
> To reproduce on XFS:
> sudo ./run.sh --xfs
> for i in {1..100}; do sudo ./run.sh --run; echo "==="; sleep 0.1; done
> sudo ./run.sh --cleanup
>
> The test fails sporadically, typically a few times out of 100 runs, confirming
> that the improved flush isn't sufficient for this workload pattern.
I was hoping that you have a real world workload/scenario which is
facing this issue. For the test a simple 'sleep 2' would be enough.
Anyways that is not an argument against adding an inteface for flushing.
next prev parent reply other threads:[~2025-11-06 23:56 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-05 7:49 Leon Huang Fu
2025-11-05 8:19 ` Michal Hocko
2025-11-05 8:39 ` Lance Yang
2025-11-05 8:51 ` Leon Huang Fu
2025-11-06 1:19 ` Shakeel Butt
2025-11-06 3:30 ` Leon Huang Fu
2025-11-06 5:35 ` JP Kobryn
2025-11-06 6:42 ` Leon Huang Fu
2025-11-06 23:55 ` Shakeel Butt [this message]
2025-11-10 6:37 ` Leon Huang Fu
2025-11-10 20:19 ` Yosry Ahmed
2025-11-06 17:02 ` JP Kobryn
2025-11-10 6:20 ` Leon Huang Fu
2025-11-10 19:24 ` JP Kobryn
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=blygjeudtqyxk7bhw5ycveofo4e322nycxyvupdnzq3eg7qtpo@cya4bifb2dlk \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=corbet@lwn.net \
--cc=hannes@cmpxchg.org \
--cc=inwardvessel@gmail.com \
--cc=jack@suse.cz \
--cc=joel.granados@kernel.org \
--cc=kyle.meyer@hpe.com \
--cc=lance.yang@linux.dev \
--cc=laoar.shao@gmail.com \
--cc=leon.huangfu@shopee.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mclapinski@google.com \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox