From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Dev Jain <dev.jain@arm.com>
Cc: Jann Horn <jannh@google.com>,
kernel test robot <oliver.sang@intel.com>,
oe-lkp@lists.linux.dev, lkp@intel.com,
linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Barry Song <baohua@kernel.org>, Pedro Falcato <pfalcato@suse.de>,
Anshuman Khandual <anshuman.khandual@arm.com>,
Bang Li <libang.li@antgroup.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
bibo mao <maobibo@loongson.cn>,
David Hildenbrand <david@redhat.com>,
Hugh Dickins <hughd@google.com>, Ingo Molnar <mingo@kernel.org>,
Lance Yang <ioworker0@gmail.com>,
Liam Howlett <liam.howlett@oracle.com>,
Matthew Wilcox <willy@infradead.org>,
Peter Xu <peterx@redhat.com>,
Qi Zheng <zhengqi.arch@bytedance.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Vlastimil Babka <vbabka@suse.cz>,
Yang Shi <yang@os.amperecomputing.com>, Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org
Subject: Re: [linus:master] [mm] f822a9a81a: stress-ng.bigheap.realloc_calls_per_sec 37.3% regression
Date: Thu, 7 Aug 2025 18:53:25 +0100 [thread overview]
Message-ID: <0f0a3e8d-e83e-43b9-8a26-7368a0655f45@lucifer.local> (raw)
In-Reply-To: <cc727530-8535-4d98-9fc4-f6a36941ca75@arm.com>
On Thu, Aug 07, 2025 at 11:20:13PM +0530, Dev Jain wrote:
>
> On 07/08/25 11:16 pm, Jann Horn wrote:
> > On Thu, Aug 7, 2025 at 7:41 PM Lorenzo Stoakes
> > <lorenzo.stoakes@oracle.com> wrote:
> > > On Thu, Aug 07, 2025 at 07:37:38PM +0200, Jann Horn wrote:
> > > > On Thu, Aug 7, 2025 at 10:28 AM Lorenzo Stoakes
> > > > <lorenzo.stoakes@oracle.com> wrote:
> > > > > On Thu, Aug 07, 2025 at 04:17:09PM +0800, kernel test robot wrote:
> > > > > > 94dab12d86cf77ff f822a9a81a31311d67f260aea96
> > > > > > ---------------- ---------------------------
> > > > > > %stddev %change %stddev
> > > > > > \ | \
> > > > > > 13777 ą 37% +45.0% 19979 ą 27% numa-vmstat.node1.nr_slab_reclaimable
> > > > > > 367205 +2.3% 375703 vmstat.system.in
> > > > > > 55106 ą 37% +45.1% 79971 ą 27% numa-meminfo.node1.KReclaimable
> > > > > > 55106 ą 37% +45.1% 79971 ą 27% numa-meminfo.node1.SReclaimable
> > > > > > 559381 -37.3% 350757 stress-ng.bigheap.realloc_calls_per_sec
> > > > > > 11468 +1.2% 11603 stress-ng.time.system_time
> > > > > > 296.25 +4.5% 309.70 stress-ng.time.user_time
> > > > > > 0.81 ą187% -100.0% 0.00 perf-sched.sch_delay.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> > > > > > 9.36 ą165% -100.0% 0.00 perf-sched.sch_delay.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> > > > > > 0.81 ą187% -100.0% 0.00 perf-sched.wait_time.avg.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> > > > > > 9.36 ą165% -100.0% 0.00 perf-sched.wait_time.max.ms.__cond_resched.zap_pte_range.zap_pmd_range.isra.0
> > > > > > 5.50 ą 17% +390.9% 27.00 ą 56% perf-c2c.DRAM.local
> > > > > > 388.50 ą 10% +114.7% 834.17 ą 33% perf-c2c.DRAM.remote
> > > > > > 1214 ą 13% +107.3% 2517 ą 31% perf-c2c.HITM.local
> > > > > > 135.00 ą 19% +130.9% 311.67 ą 32% perf-c2c.HITM.remote
> > > > > > 1349 ą 13% +109.6% 2829 ą 31% perf-c2c.HITM.total
> > > > > Yeah this also looks pretty consistent too...
> > > > FWIW, HITM hat different meanings depending on exactly which
> > > > microarchitecture that test happened on; the message says it is from
> > > > Sapphire Rapids, which is a successor of Ice Lake, so HITM is less
> > > > meaningful than if it came from a pre-IceLake system (see
> > > > https://lore.kernel.org/all/CAG48ez3RmV6SsVw9oyTXxQXHp3rqtKDk2qwJWo9TGvXCq7Xr-w@mail.gmail.com/).
> > > >
> > > > To me those numbers mainly look like you're accessing a lot more
> > > > cache-cold data. (On pre-IceLake they would indicate cacheline
> > > > bouncing, but I guess here they probably don't.) And that makes sense,
> > > > since before the patch, this path was just moving PTEs around without
> > > > looking at the associated pages/folios; basically more or less like a
> > > > memcpy() on x86-64. But after the patch, for every 8 bytes that you
> > > > copy, you have to load a cacheline from the vmemmap to get the page.
> > > Yup this is representative of what my investigation is showing.
> > >
> > > I've narrowed it down but want to wait to report until I'm sure...
> > >
> > > But yeah we're doing a _lot_ more work.
> > >
> > > I'm leaning towards disabling except for arm64 atm tbh, seems mremap is
> > > especially sensitive to this (I found issues with this with my abortive mremap
> > > anon merging stuff too, but really expected it there...)
> > Another approach would be to always read and write PTEs in
> > contpte-sized chunks here, without caring whether they're actually
> > contiguous or whatever, or something along those lines.
>
> The initial approach was to wrap all of this around pte_batch_hint(),
> effectively making the optimization only on arm64. I guess that sounds
> reasonable now.
>
I wish people would just wait for me to finish checking this on my box...
Anyway, as with Jann's point, I have empirical evidence to support that yes
it's the folio lookup that's the issue.
I also was thinking exactly to try this hint thing and to see.
Let me try...
next prev parent reply other threads:[~2025-08-07 17:54 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-08-07 8:17 kernel test robot
2025-08-07 8:27 ` Lorenzo Stoakes
2025-08-07 8:56 ` Dev Jain
2025-08-07 10:21 ` David Hildenbrand
2025-08-07 16:06 ` Dev Jain
2025-08-07 16:10 ` Lorenzo Stoakes
2025-08-07 16:16 ` Lorenzo Stoakes
2025-08-07 17:04 ` Dev Jain
2025-08-07 17:07 ` Lorenzo Stoakes
2025-08-07 17:11 ` Dev Jain
2025-08-07 17:37 ` Jann Horn
2025-08-07 17:41 ` Lorenzo Stoakes
2025-08-07 17:46 ` Jann Horn
2025-08-07 17:50 ` Dev Jain
2025-08-07 17:53 ` Lorenzo Stoakes [this message]
2025-08-07 17:51 ` Lorenzo Stoakes
2025-08-07 18:01 ` David Hildenbrand
2025-08-07 18:04 ` Lorenzo Stoakes
2025-08-07 18:13 ` David Hildenbrand
2025-08-07 18:07 ` Jann Horn
2025-08-07 18:31 ` David Hildenbrand
2025-08-07 19:52 ` Lorenzo Stoakes
2025-08-07 17:59 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0f0a3e8d-e83e-43b9-8a26-7368a0655f45@lucifer.local \
--to=lorenzo.stoakes@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=anshuman.khandual@arm.com \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=david@redhat.com \
--cc=dev.jain@arm.com \
--cc=hughd@google.com \
--cc=ioworker0@gmail.com \
--cc=jannh@google.com \
--cc=liam.howlett@oracle.com \
--cc=libang.li@antgroup.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lkp@intel.com \
--cc=maobibo@loongson.cn \
--cc=mingo@kernel.org \
--cc=oe-lkp@lists.linux.dev \
--cc=oliver.sang@intel.com \
--cc=peterx@redhat.com \
--cc=pfalcato@suse.de \
--cc=ryan.roberts@arm.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
--cc=yang@os.amperecomputing.com \
--cc=zhengqi.arch@bytedance.com \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox