linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>,
	Andrii Nakryiko <andrii@kernel.org>,
	 linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org,
	peterz@infradead.org,  oleg@redhat.com, rostedt@goodmis.org,
	mhiramat@kernel.org,  bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, jolsa@kernel.org,
	 paulmck@kernel.org, willy@infradead.org,
	akpm@linux-foundation.org,  mjguzik@gmail.com,
	brauner@kernel.org, jannh@google.com, mhocko@kernel.org,
	 vbabka@suse.cz, hannes@cmpxchg.org, Liam.Howlett@oracle.com,
	 lorenzo.stoakes@oracle.com
Subject: Re: [PATCH v3 tip/perf/core 2/4] mm: switch to 64-bit mm_lock_seq/vm_lock_seq on 64-bit architectures
Date: Thu, 17 Oct 2024 11:55:35 -0700	[thread overview]
Message-ID: <CAEf4BzbOXrbixQA=fpg17QPBv+4myAQrHvCX42hVye0Ww9W2Aw@mail.gmail.com> (raw)
In-Reply-To: <CAJuCfpFpPvBLgZNxwHuT-kLsvBABWyK9H6tFCmsTCtVpOxET6Q@mail.gmail.com>

On Wed, Oct 16, 2024 at 7:02 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Sun, Oct 13, 2024 at 12:56 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> >
> > On Thu, Oct 10, 2024 at 01:56:42PM GMT, Andrii Nakryiko wrote:
> > > To increase mm->mm_lock_seq robustness, switch it from int to long, so
> > > that it's a 64-bit counter on 64-bit systems and we can stop worrying
> > > about it wrapping around in just ~4 billion iterations. Same goes for
> > > VMA's matching vm_lock_seq, which is derived from mm_lock_seq.
>
> vm_lock_seq does not need to be long but for consistency I guess that

How come, we literally assign vm_lock_seq from mm_lock_seq and do
direct comparisons. They have to be exactly the same type, no?

> makes sense. While at it, can you please change these seq counters to
> be unsigned?

There is `vma->vm_lock_seq = -1;` in kernel/fork.c, should it be
switched to ULONG_MAX then? In general, unless this is critical for
correctness, I'd very much like stuff like this to be done in the mm
tree afterwards, but it seems trivial enough, so if you insist I'll do
it.

> Also, did you check with pahole if the vm_area_struct layout change
> pushes some members into a difference cacheline or creates new gaps?
>

Just did. We had 3 byte hole after `bool detached;`, it now grew to 7
bytes (so +4) and then vm_lock_seq itself is now 8 bytes (so +4),
which now does push rb and rb_subtree_last into *THE SAME* cache line
(which sounds like an improvement to me). vm_lock_seq and vm_lock stay
in the same cache line. vm_pgoff and vm_file are now in the same cache
line, and given they are probably always accessed together, seems like
a good accidental change as well. See below pahole outputs before and
after.

That singular detached bool looks like a complete waste, tbh. Maybe it
would be better to roll it into vm_flags and save 8 bytes? (not that I
want to do those mm changes in this patch set, of course...).
vm_area_struct is otherwise nicely tightly packed.

tl;dr, seems fine, and detached would be best to get rid of, if
possible (but that's a completely separate thing)

BEFORE
======
struct vm_area_struct {
        union {
                struct {
                        long unsigned int vm_start;      /*     0     8 */
                        long unsigned int vm_end;        /*     8     8 */
                };                                       /*     0    16 */
                struct callback_head vm_rcu;             /*     0    16 */
        } __attribute__((__aligned__(8)));               /*     0    16 */
        struct mm_struct *         vm_mm;                /*    16     8 */
        pgprot_t                   vm_page_prot;         /*    24     8 */
        union {
                const vm_flags_t   vm_flags;             /*    32     8 */
                vm_flags_t         __vm_flags;           /*    32     8 */
        };                                               /*    32     8 */
        bool                       detached;             /*    40     1 */

        /* XXX 3 bytes hole, try to pack */

        int                        vm_lock_seq;          /*    44     4 */
        struct vma_lock *          vm_lock;              /*    48     8 */
        struct {
                struct rb_node     rb;                   /*    56    24 */
                /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
                long unsigned int  rb_subtree_last;      /*    80     8 */
        }                                                /*    56    32 */
        struct list_head           anon_vma_chain;       /*    88    16 */
        struct anon_vma *          anon_vma;             /*   104     8 */
        const struct vm_operations_struct  * vm_ops;     /*   112     8 */
        long unsigned int          vm_pgoff;             /*   120     8 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        struct file *              vm_file;              /*   128     8 */
        void *                     vm_private_data;      /*   136     8 */
        atomic_long_t              swap_readahead_info;  /*   144     8 */
        struct mempolicy *         vm_policy;            /*   152     8 */
        struct vma_numab_state *   numab_state;          /*   160     8 */
        struct vm_userfaultfd_ctx  vm_userfaultfd_ctx;   /*   168     8 */

        /* size: 176, cachelines: 3, members: 18 */
        /* sum members: 173, holes: 1, sum holes: 3 */
        /* forced alignments: 2 */
        /* last cacheline: 48 bytes */
} __attribute__((__aligned__(8)));

AFTER
=====
struct vm_area_struct {
        union {
                struct {
                        long unsigned int vm_start;      /*     0     8 */
                        long unsigned int vm_end;        /*     8     8 */
                };                                       /*     0    16 */
                struct callback_head vm_rcu;             /*     0    16 */
        } __attribute__((__aligned__(8)));               /*     0    16 */
        struct mm_struct *         vm_mm;                /*    16     8 */
        pgprot_t                   vm_page_prot;         /*    24     8 */
        union {
                const vm_flags_t   vm_flags;             /*    32     8 */
                vm_flags_t         __vm_flags;           /*    32     8 */
        };                                               /*    32     8 */
        bool                       detached;             /*    40     1 */

        /* XXX 7 bytes hole, try to pack */

        long int                   vm_lock_seq;          /*    48     8 */
        struct vma_lock *          vm_lock;              /*    56     8 */
        /* --- cacheline 1 boundary (64 bytes) --- */
        struct {
                struct rb_node     rb;                   /*    64    24 */
                long unsigned int  rb_subtree_last;      /*    88     8 */
        }                                                /*    64    32 */
        struct list_head           anon_vma_chain;       /*    96    16 */
        struct anon_vma *          anon_vma;             /*   112     8 */
        const struct vm_operations_struct  * vm_ops;     /*   120     8 */
        /* --- cacheline 2 boundary (128 bytes) --- */
        long unsigned int          vm_pgoff;             /*   128     8 */
        struct file *              vm_file;              /*   136     8 */
        void *                     vm_private_data;      /*   144     8 */
        atomic_long_t              swap_readahead_info;  /*   152     8 */
        struct mempolicy *         vm_policy;            /*   160     8 */
        struct vma_numab_state *   numab_state;          /*   168     8 */
        struct vm_userfaultfd_ctx  vm_userfaultfd_ctx;   /*   176     8 */

        /* size: 184, cachelines: 3, members: 18 */
        /* sum members: 177, holes: 1, sum holes: 7 */
        /* forced alignments: 2 */
        /* last cacheline: 56 bytes */
} __attribute__((__aligned__(8)));


> > >
> > > I didn't use __u64 outright to keep 32-bit architectures unaffected, but
> > > if it seems important enough, I have nothing against using __u64.
> > >
> > > Suggested-by: Jann Horn <jannh@google.com>
> > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> >
> > Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
>


  reply	other threads:[~2024-10-17 18:55 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-10 20:56 [PATCH v3 tip/perf/core 0/4] uprobes,mm: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 1/4] mm: introduce mmap_lock_speculation_{start|end} Andrii Nakryiko
2024-10-13  7:56   ` Shakeel Butt
2024-10-14 20:27     ` Andrii Nakryiko
2024-10-14 20:48       ` Suren Baghdasaryan
2024-10-23 20:10   ` Peter Zijlstra
2024-10-23 22:17     ` Suren Baghdasaryan
2024-10-24  9:56       ` Peter Zijlstra
2024-10-24 16:28         ` Suren Baghdasaryan
2024-10-24 21:04           ` Suren Baghdasaryan
2024-10-24 23:20             ` Andrii Nakryiko
2024-10-24 23:33               ` Suren Baghdasaryan
2024-10-25  5:12                 ` Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 2/4] mm: switch to 64-bit mm_lock_seq/vm_lock_seq on 64-bit architectures Andrii Nakryiko
2024-10-13  7:56   ` Shakeel Butt
2024-10-17  2:01     ` Suren Baghdasaryan
2024-10-17 18:55       ` Andrii Nakryiko [this message]
2024-10-17 19:42         ` Suren Baghdasaryan
2024-10-17 20:12           ` Andrii Nakryiko
2024-10-23 19:02       ` Peter Zijlstra
2024-10-23 19:12         ` Andrii Nakryiko
2024-10-23 19:31   ` Peter Zijlstra
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 3/4] uprobes: simplify find_active_uprobe_rcu() VMA checks Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 4/4] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution Andrii Nakryiko
2024-10-11  5:01   ` Oleg Nesterov
2024-10-23 19:22   ` Peter Zijlstra
2024-10-23 20:02     ` Andrii Nakryiko
2024-10-23 20:19       ` Peter Zijlstra
2024-10-23 17:54 ` [PATCH v3 tip/perf/core 0/4] uprobes,mm: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzbOXrbixQA=fpg17QPBv+4myAQrHvCX42hVye0Ww9W2Aw@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox