From: Shakeel Butt <shakeel.butt@linux.dev>
To: Harry Yoo <harry.yoo@oracle.com>
Cc: Dev Jain <dev.jain@arm.com>,
Andrew Morton <akpm@linux-foundation.org>,
Johannes Weiner <hannes@cmpxchg.org>,
Michal Hocko <mhocko@kernel.org>,
Roman Gushchin <roman.gushchin@linux.dev>,
Muchun Song <muchun.song@linux.dev>,
Qi Zheng <qi.zheng@linux.dev>, Vlastimil Babka <vbabka@suse.cz>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org,
Meta kernel team <kernel-team@meta.com>
Subject: Re: [PATCH 1/4] memcg: use mod_node_page_state to update stats
Date: Wed, 11 Feb 2026 01:24:35 -0800 [thread overview]
Message-ID: <aYxIr5neJP8wBdZg@linux.dev> (raw)
In-Reply-To: <aYxDkkDI4mk3r011@hyeyoo>
On Wed, Feb 11, 2026 at 05:53:38PM +0900, Harry Yoo wrote:
> On Wed, Feb 11, 2026 at 01:07:40PM +0530, Dev Jain wrote:
> >
> > On 10/02/26 9:59 pm, Shakeel Butt wrote:
> > > On Tue, Feb 10, 2026 at 01:08:49PM +0530, Dev Jain wrote:
> > > [...]
> > >>> Oh so it is arm64 specific issue. I tested on x86-64 machine and it solves
> > >>> the little regression it had before. So, on arm64 all this_cpu_ops i.e. without
> > >>> double underscore, uses LL/SC instructions.
> > >>>
> > >>> Need more thought on this.
> > >>>
> > >>>>> Also can you confirm whether my analysis of the regression was correct?
> > >>>>> Because if it was, then this diff looks wrong - AFAIU preempt_disable()
> > >>>>> won't stop an irq handler from interrupting the execution, so this
> > >>>>> will introduce a bug for code paths running in irq context.
> > >>>>>
> > >>>> I was worried about the correctness too, but this_cpu_add() is safe
> > >>>> against IRQs and so the stat will be _eventually_ consistent?
> > >>>>
> > >>>> Ofc it's so confusing! Maybe I'm the one confused.
> > >>> Yeah there is no issue with proposed patch as it is making the function
> > >>> re-entrant safe.
> > >> Ah yes, this_cpu_add() does the addition in one shot without read-modify-write.
> > >>
> > >> I am still puzzled whether the original patch was a bug fix or an optimization.
> > > The original patch was a cleanup patch. The memcg stats update functions
> > > were already irq/nmi safe without disabling irqs and that patch did the
> > > same for the numa stats. Though it seems like that is causing regression
> > > for arm64 as this_cpu* ops are expensive on arm64.
> > >
> > >> The patch description says that node stat updation uses irq unsafe interface.
> > >> Therefore, we had foo() calling __foo() nested with local_irq_save/restore. But
> > >> there were code paths which directly called __foo() - so, your patch fixes a bug right
> > > No, those places were already disabling irqs and should be fine.
> >
> > Please correct me if I am missing something here. Simply putting an
> > if (!irqs_disabled()) -> dump_stack() in __lruvec_stat_mod_folio, before
> > calling __mod_node_page_state, reveals:
> >
> > [ 6.486375] Call trace:
> > [ 6.486376] show_stack+0x20/0x38 (C)
> > [ 6.486379] dump_stack_lvl+0x74/0x90
> > [ 6.486382] dump_stack+0x18/0x28
> > [ 6.486383] __lruvec_stat_mod_folio+0x160/0x180
> > [ 6.486385] folio_add_file_rmap_ptes+0x128/0x480
> > [ 6.486388] set_pte_range+0xe8/0x320
> > [ 6.486389] finish_fault+0x260/0x508
> > [ 6.486390] do_fault+0x2d0/0x598
> > [ 6.486391] __handle_mm_fault+0x398/0xb60
> > [ 6.486393] handle_mm_fault+0x15c/0x298
> > [ 6.486394] __get_user_pages+0x204/0xb88
> > [ 6.486395] populate_vma_page_range+0xbc/0x1b8
> > [ 6.486396] __mm_populate+0xcc/0x1e0
> > [ 6.486397] __arm64_sys_mlockall+0x1d4/0x1f8
> > [ 6.486398] invoke_syscall+0x50/0x120
> > [ 6.486399] el0_svc_common.constprop.0+0x48/0xf0
> > [ 6.486400] do_el0_svc+0x24/0x38
> > [ 6.486400] el0_svc+0x34/0xf0
> > [ 6.486402] el0t_64_sync_handler+0xa0/0xe8
> > [ 6.486404] el0t_64_sync+0x198/0x1a0
> >
> > Indeed finish_fault() takes a PTL spin lock without irq disablement.
>
> That indeed looks incorrect to me.
> I was assuming __foo() is always called with IRQs disabled!
Not necessarily. For stats which never get updated in IRQ context, can
be updated using __foo() with just premption disabled.
>
> > > I am working on adding batched stats update functionality in the hope
> > > that will fix the regression.
> >
> > Thanks! FYI, I have zeroed in the issue on to preempt_disable(). Dropping this
> > from _pcpu_protect_return solves the regression.
>
> That's interesting, why is the cost of preempt disable/enable so high?
>
What made you (Dev) so convinced that preempt_disable is that expensive.
> > Unlike x86, arm64 does a preempt_disable
> > when doing this_cpu_*. On a cursory look it seems like this is unnecessary - since we
> > are doing preempt_enable() immediately after reading the pointer, CPU migration is
> > possible anyways, so there is nothing to be gained by reading pcpu pointer with
> > preemption disabled. I am investigating whether we can simply drop this in general.
>
[...]
>
> ... so, removing preempt disable _in general_ is probably not a good idea.
>
Yup, I agree here.
> [1] https://lore.kernel.org/all/20190311164837.GD24275@lakrids.cambridge.arm.com
>
> --
> Cheers,
> Harry / Hyeonggon
>
next prev parent reply other threads:[~2026-02-11 9:24 UTC|newest]
Thread overview: 49+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-11-10 23:20 [PATCH 0/4] memcg: cleanup the memcg stats interfaces Shakeel Butt
2025-11-10 23:20 ` [PATCH 1/4] memcg: use mod_node_page_state to update stats Shakeel Butt
2025-11-11 1:39 ` Harry Yoo
2025-11-11 18:58 ` Roman Gushchin
2026-01-29 13:05 ` Dev Jain
2026-02-02 4:26 ` Shakeel Butt
2026-02-02 4:48 ` Dev Jain
2026-02-02 4:54 ` Shakeel Butt
2026-02-02 8:53 ` Dev Jain
2026-02-04 20:38 ` Shakeel Butt
2026-02-05 5:20 ` Dev Jain
2026-02-05 5:45 ` Harry Yoo
2026-02-05 5:58 ` Shakeel Butt
2026-02-10 7:38 ` Dev Jain
2026-02-10 16:29 ` Shakeel Butt
2026-02-11 7:37 ` Dev Jain
2026-02-11 8:53 ` Harry Yoo
2026-02-11 9:24 ` Shakeel Butt [this message]
2026-02-11 10:14 ` Harry Yoo
2026-02-12 5:16 ` Dev Jain
2026-02-12 5:14 ` Dev Jain
2026-02-12 1:31 ` Shakeel Butt
2025-11-10 23:20 ` [PATCH 2/4] memcg: remove __mod_lruvec_kmem_state Shakeel Butt
2025-11-11 1:46 ` Harry Yoo
2025-11-11 8:23 ` Qi Zheng
2025-11-11 18:58 ` Roman Gushchin
2025-11-10 23:20 ` [PATCH 3/4] memcg: remove __mod_lruvec_state Shakeel Butt
2025-11-11 5:21 ` Harry Yoo
2025-11-11 18:58 ` Roman Gushchin
2025-11-10 23:20 ` [PATCH 4/4] memcg: remove __lruvec_stat_mod_folio Shakeel Butt
2025-11-11 5:41 ` Harry Yoo
2025-11-11 18:59 ` Roman Gushchin
2025-11-11 0:59 ` [PATCH 0/4] memcg: cleanup the memcg stats interfaces Harry Yoo
2025-11-11 2:23 ` Qi Zheng
2025-11-11 2:39 ` Shakeel Butt
2025-11-11 2:48 ` Qi Zheng
2025-11-11 3:00 ` Shakeel Butt
2025-11-11 3:07 ` Qi Zheng
2025-11-11 3:18 ` Harry Yoo
2025-11-11 3:29 ` Qi Zheng
2025-11-11 3:05 ` Harry Yoo
2025-11-11 8:01 ` Sebastian Andrzej Siewior
2025-11-11 8:36 ` Qi Zheng
2025-11-11 16:45 ` Shakeel Butt
2025-11-12 2:11 ` Qi Zheng
2025-11-11 9:54 ` Vlastimil Babka
2025-11-11 19:01 ` Roman Gushchin
2025-11-11 19:34 ` Shakeel Butt
2025-11-15 19:27 ` Shakeel Butt
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=aYxIr5neJP8wBdZg@linux.dev \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=cgroups@vger.kernel.org \
--cc=dev.jain@arm.com \
--cc=hannes@cmpxchg.org \
--cc=harry.yoo@oracle.com \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=muchun.song@linux.dev \
--cc=qi.zheng@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=vbabka@suse.cz \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox