linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Mateusz Guzik <mjguzik@gmail.com>
To: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>,
	David Laight <david.laight.linux@gmail.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	David Hildenbrand <david@kernel.org>,
	 "Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	 Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	oliver.sang@intel.com,  linux-mm@kvack.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm: avoid use of BIT() macro for initialising VMA flags
Date: Fri, 12 Dec 2025 18:26:47 +0100	[thread overview]
Message-ID: <CAGudoHHMbyjO7uFGhpuYWmqTOk4oYwKt_qWoEfvNgsFtC0L3_g@mail.gmail.com> (raw)
In-Reply-To: <9cd62896-f259-45ce-9165-63b74f7c1f6a@lucifer.local>

On Fri, Dec 12, 2025 at 4:03 PM Lorenzo Stoakes
<lorenzo.stoakes@oracle.com> wrote:
>
> On Fri, Dec 12, 2025 at 01:24:57PM +0100, Mateusz Guzik wrote:
> > So I had a look where the timing difference is coming from and I think
> > I have the answer: init_ipc_ns does not have a guaranteed cacheline
> > placement and things get moved around with the patch.
> >
> > On my kernels (nm vmlinux-newbits | sort -nk 1 | less)
> >
> > before:
> > ffffffff839ffb60 T init_ipc_ns
> > ffffffff83a00020 t event_exit__msgrcv
> >
> > after:
> > ffffffff839ffbc0 T init_ipc_ns
> > ffffffff83a00080 t event_exit__msgrcv
> >
> > This is the pervasive problem of vars from all .o files placed
> > adjacent to each other, meaning changes in one .o file result in
> > offsets changing in other files and then you get performance
> > fluctuations as not-explicitly-padded variables share (or no longer
> > share) cachelines.
> >
> > I brought this up a year ago elsewhere:
> > https://gcc.gnu.org/pipermail/gcc/2024-October/245004.html
> >
> > maybe i should pick it up again and see it through
> >
> > as for the thing at hand, someone(tm) will want to make sure the
> > namespace is cacheline aligned and possibly pad its own internals
> > afterwards. Personally I can't be bothered.
>
> Thanks! Looking it seems we accumulate a bunch of offsets in:
>
>         print_fmt_dax_pte_fault_class
>         print_fmt_dax_pmd_load_hole_class
>         print_fmt_dax_pmd_fault_class
>
> (entries that the bloat-o-meter confirms changed in size)
>
> That continue on to eventually offset init_ipc_ns.
>
> It actually looked in my testing like performance _improved_ with the change,
> and I notice locally I end up aligned on the struct:
>
> ffffffff82be16a0 T init_ipc_ns
>
> ->
>
> ffffffff82be1700 T init_ipc_ns
>
> Examining stress-ng before 2b6a3f061f11:
>
> Command is:
> $ stress-ng --timeout 60 --times --verify --metrics --no-rand-seed --msg $(nproc)
>
> Results:
>
> stress-ng: metrc: [1662] stressor       bogo ops real time  usr time  sys time   bogo ops/s     bogo ops/s CPU used per       RSS Max
>
> stress-ng: metrc: [776] msg           964459758     60.00    154.63    632.77  16073484.12     1224879.38        21.17          2132
>
> After 2b6a3f061f11:
>
> stress-ng: metrc: [782] msg           1326214608     60.00    194.19    713.34  22102974.72     1461348.11        24.40          2140
>
> And if I simply do:
>
>  ipc/msgutil.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/ipc/msgutil.c b/ipc/msgutil.c
> index 7a03f6d03de3..df0d7a067bcf 100644
> --- a/ipc/msgutil.c
> +++ b/ipc/msgutil.c
> @@ -26,7 +26,7 @@ DEFINE_SPINLOCK(mq_lock);
>   * compiled when either CONFIG_SYSVIPC and CONFIG_POSIX_MQUEUE
>   * and not CONFIG_IPC_NS.
>   */
> -struct ipc_namespace init_ipc_ns = {
> +struct ipc_namespace init_ipc_ns ____cacheline_aligned_in_smp = {
>         .ns.__ns_ref = REFCOUNT_INIT(1),
>         .user_ns = &init_user_ns,
>         .ns.inum = ns_init_inum(&init_ipc_ns),
> --
> 2.52.0
>
> We observe:
>
> stress-ng: metrc: [764] msg           1321700290     60.00    196.73    723.82  22028700.93     1435779.72        24.75          2116 aligned
>
> So this really _does_ look like an alignment issue.
>
> So I think I should just submit the above patch right? Can you see how it behaves for you?

It is my recommendation to drop the matter now that we know the vm
bits patch is a victim of circumstance.

While slapping an alignment requirement on the var is the usual fix
and is a part of a full solution, the situation here is of the fucky
sort and stopping at precisely that might happen to be the bad call.

I state again there is contention on a spin lock and contention on a
rwsem separately, and most importantly that there is a level of load
on the rwsem which triggers pessimal behavior during which it abandons
spinning.

If the cacheline bouncing affects areas used by either of these locks,
you don't know if you got a win because you in fact *slowed it down*
and avoided the pessimal rwsem condition. Then if someone(tm) fixed up
rwsems, the ipc code would be leaving performance on the table with an
annotation specifically putting it in that spot.

So I would refrain from adding it as is.

If I was to work on this (which I wont), the first step would be to
stabilize the rwsem problem. Afterwards annotate the init_ipc_ns var
to stabilize the placement, afterwards rearrange the fields + add
padding as needed to reduce the bouncing. But this can't be done with
rwsem behavior at play.


  reply	other threads:[~2025-12-12 17:27 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-12-05 17:50 Lorenzo Stoakes
2025-12-05 17:52 ` Lorenzo Stoakes
2025-12-05 18:43 ` David Laight
2025-12-05 19:18   ` Lorenzo Stoakes
2025-12-05 21:34     ` David Laight
2025-12-06 16:43       ` Lorenzo Stoakes
2025-12-08 16:42         ` Lorenzo Stoakes
2025-12-08 18:57           ` David Laight
2025-12-09  8:28           ` Vlastimil Babka
2025-12-09  9:26             ` Mateusz Guzik
2025-12-10 16:18               ` Lorenzo Stoakes
2025-12-10 22:44                 ` Mateusz Guzik
2025-12-12 12:24               ` Mateusz Guzik
2025-12-12 13:02                 ` David Laight
2025-12-12 13:13                   ` Mateusz Guzik
2025-12-12 15:03                 ` Lorenzo Stoakes
2025-12-12 17:26                   ` Mateusz Guzik [this message]
2025-12-05 21:49     ` David Laight
2025-12-06 16:47       ` Lorenzo Stoakes
2025-12-05 19:56 ` John Hubbard
2025-12-06 16:42   ` Lorenzo Stoakes
2025-12-05 20:15 ` Andrew Morton
2025-12-05 20:18   ` David Hildenbrand (Red Hat)
2025-12-06  0:40   ` Stephen Rothwell
2025-12-06  3:12     ` Andrew Morton
2025-12-06 16:35       ` Lorenzo Stoakes
2025-12-06  1:14 ` Al Viro
2025-12-06  1:26   ` Al Viro
2025-12-06 12:35     ` Vlastimil Babka
2025-12-06 16:34       ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGudoHHMbyjO7uFGhpuYWmqTOk4oYwKt_qWoEfvNgsFtC0L3_g@mail.gmail.com \
    --to=mjguzik@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=david.laight.linux@gmail.com \
    --cc=david@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@suse.com \
    --cc=oliver.sang@intel.com \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox