linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Shakeel Butt <shakeel.butt@linux.dev>,
	Andrii Nakryiko <andrii@kernel.org>,
	 linux-trace-kernel@vger.kernel.org, linux-mm@kvack.org,
	peterz@infradead.org,  oleg@redhat.com, rostedt@goodmis.org,
	mhiramat@kernel.org,  bpf@vger.kernel.org,
	linux-kernel@vger.kernel.org, jolsa@kernel.org,
	 paulmck@kernel.org, willy@infradead.org,
	akpm@linux-foundation.org,  mjguzik@gmail.com,
	brauner@kernel.org, jannh@google.com, mhocko@kernel.org,
	 vbabka@suse.cz, hannes@cmpxchg.org, Liam.Howlett@oracle.com,
	 lorenzo.stoakes@oracle.com
Subject: Re: [PATCH v3 tip/perf/core 2/4] mm: switch to 64-bit mm_lock_seq/vm_lock_seq on 64-bit architectures
Date: Thu, 17 Oct 2024 13:12:53 -0700	[thread overview]
Message-ID: <CAEf4BzatVr=70EH9AcHbEwSLy1asJ2x_NKZKsqiAYnkpFpihJg@mail.gmail.com> (raw)
In-Reply-To: <CAJuCfpHGjkPXMGsttb7bMVr0R1Crv8J_zw5_suj+1nCaT=1fBw@mail.gmail.com>

On Thu, Oct 17, 2024 at 12:42 PM Suren Baghdasaryan <surenb@google.com> wrote:
>
> On Thu, Oct 17, 2024 at 11:55 AM Andrii Nakryiko
> <andrii.nakryiko@gmail.com> wrote:
> >
> > On Wed, Oct 16, 2024 at 7:02 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > >
> > > On Sun, Oct 13, 2024 at 12:56 AM Shakeel Butt <shakeel.butt@linux.dev> wrote:
> > > >
> > > > On Thu, Oct 10, 2024 at 01:56:42PM GMT, Andrii Nakryiko wrote:
> > > > > To increase mm->mm_lock_seq robustness, switch it from int to long, so
> > > > > that it's a 64-bit counter on 64-bit systems and we can stop worrying
> > > > > about it wrapping around in just ~4 billion iterations. Same goes for
> > > > > VMA's matching vm_lock_seq, which is derived from mm_lock_seq.
> > >
> > > vm_lock_seq does not need to be long but for consistency I guess that
> >
> > How come, we literally assign vm_lock_seq from mm_lock_seq and do
> > direct comparisons. They have to be exactly the same type, no?
>
> Not necessarily. vm_lock_seq is a snapshot of the mm_lock_seq but it
> does not have to be a "complete" snapshot. Just something that has a
> very high probability of identifying a match and a rare false positive
> is not a problem (see comment in
> https://elixir.bootlin.com/linux/v6.11.3/source/include/linux/mm.h#L678).
> So, something like this for taking and comparing a snapshot would do:
>
> vma->vm_lock_seq = (unsigned int)mm->mm_lock_seq;
> if (vma->vm_lock_seq == (unsigned int)mm->mm_lock_seq)

Ah, ok, I see what's the idea behind it, makes sense.

>
> >
> > > makes sense. While at it, can you please change these seq counters to
> > > be unsigned?
> >
> > There is `vma->vm_lock_seq = -1;` in kernel/fork.c, should it be
> > switched to ULONG_MAX then? In general, unless this is critical for
> > correctness, I'd very much like stuff like this to be done in the mm
> > tree afterwards, but it seems trivial enough, so if you insist I'll do
> > it.
>
> Yeah, ULONG_MAX should work fine here. vma->vm_lock_seq is initialized
> to -1 to avoid false initial match with mm->mm_lock_seq which is
> initialized to 0. As I said, a false match is not a problem but if we
> can avoid it, that's better.

ok, ULONG_MAX and unsigned long it is, will update

>
> >
> > > Also, did you check with pahole if the vm_area_struct layout change
> > > pushes some members into a difference cacheline or creates new gaps?
> > >
> >
> > Just did. We had 3 byte hole after `bool detached;`, it now grew to 7
> > bytes (so +4) and then vm_lock_seq itself is now 8 bytes (so +4),
> > which now does push rb and rb_subtree_last into *THE SAME* cache line
> > (which sounds like an improvement to me). vm_lock_seq and vm_lock stay
> > in the same cache line. vm_pgoff and vm_file are now in the same cache
> > line, and given they are probably always accessed together, seems like
> > a good accidental change as well. See below pahole outputs before and
> > after.
>
> Ok, sounds good to me. Looks like keeping both sequence numbers 64bit
> is not an issue. Changing them to unsigned would be nice and trivial
> but I don't insist. You can add:
>
> Reviewed-by: Suren Baghdasaryan <surenb@google.com>

thanks, will switch to unsigned in the next revision (next week,
probably, to let some of the pending patches land)

>
> >
> > That singular detached bool looks like a complete waste, tbh. Maybe it
> > would be better to roll it into vm_flags and save 8 bytes? (not that I
> > want to do those mm changes in this patch set, of course...).
> > vm_area_struct is otherwise nicely tightly packed.
> >
> > tl;dr, seems fine, and detached would be best to get rid of, if
> > possible (but that's a completely separate thing)
>
> Yeah, I'll take a look at that. Thanks!
>
> >
> > BEFORE
> > ======
> > struct vm_area_struct {
> >         union {
> >                 struct {
> >                         long unsigned int vm_start;      /*     0     8 */
> >                         long unsigned int vm_end;        /*     8     8 */
> >                 };                                       /*     0    16 */
> >                 struct callback_head vm_rcu;             /*     0    16 */
> >         } __attribute__((__aligned__(8)));               /*     0    16 */
> >         struct mm_struct *         vm_mm;                /*    16     8 */
> >         pgprot_t                   vm_page_prot;         /*    24     8 */
> >         union {
> >                 const vm_flags_t   vm_flags;             /*    32     8 */
> >                 vm_flags_t         __vm_flags;           /*    32     8 */
> >         };                                               /*    32     8 */
> >         bool                       detached;             /*    40     1 */
> >
> >         /* XXX 3 bytes hole, try to pack */
> >
> >         int                        vm_lock_seq;          /*    44     4 */
> >         struct vma_lock *          vm_lock;              /*    48     8 */
> >         struct {
> >                 struct rb_node     rb;                   /*    56    24 */
> >                 /* --- cacheline 1 boundary (64 bytes) was 16 bytes ago --- */
> >                 long unsigned int  rb_subtree_last;      /*    80     8 */
> >         }                                                /*    56    32 */
> >         struct list_head           anon_vma_chain;       /*    88    16 */
> >         struct anon_vma *          anon_vma;             /*   104     8 */
> >         const struct vm_operations_struct  * vm_ops;     /*   112     8 */
> >         long unsigned int          vm_pgoff;             /*   120     8 */
> >         /* --- cacheline 2 boundary (128 bytes) --- */
> >         struct file *              vm_file;              /*   128     8 */
> >         void *                     vm_private_data;      /*   136     8 */
> >         atomic_long_t              swap_readahead_info;  /*   144     8 */
> >         struct mempolicy *         vm_policy;            /*   152     8 */
> >         struct vma_numab_state *   numab_state;          /*   160     8 */
> >         struct vm_userfaultfd_ctx  vm_userfaultfd_ctx;   /*   168     8 */
> >
> >         /* size: 176, cachelines: 3, members: 18 */
> >         /* sum members: 173, holes: 1, sum holes: 3 */
> >         /* forced alignments: 2 */
> >         /* last cacheline: 48 bytes */
> > } __attribute__((__aligned__(8)));
> >
> > AFTER
> > =====
> > struct vm_area_struct {
> >         union {
> >                 struct {
> >                         long unsigned int vm_start;      /*     0     8 */
> >                         long unsigned int vm_end;        /*     8     8 */
> >                 };                                       /*     0    16 */
> >                 struct callback_head vm_rcu;             /*     0    16 */
> >         } __attribute__((__aligned__(8)));               /*     0    16 */
> >         struct mm_struct *         vm_mm;                /*    16     8 */
> >         pgprot_t                   vm_page_prot;         /*    24     8 */
> >         union {
> >                 const vm_flags_t   vm_flags;             /*    32     8 */
> >                 vm_flags_t         __vm_flags;           /*    32     8 */
> >         };                                               /*    32     8 */
> >         bool                       detached;             /*    40     1 */
> >
> >         /* XXX 7 bytes hole, try to pack */
> >
> >         long int                   vm_lock_seq;          /*    48     8 */
> >         struct vma_lock *          vm_lock;              /*    56     8 */
> >         /* --- cacheline 1 boundary (64 bytes) --- */
> >         struct {
> >                 struct rb_node     rb;                   /*    64    24 */
> >                 long unsigned int  rb_subtree_last;      /*    88     8 */
> >         }                                                /*    64    32 */
> >         struct list_head           anon_vma_chain;       /*    96    16 */
> >         struct anon_vma *          anon_vma;             /*   112     8 */
> >         const struct vm_operations_struct  * vm_ops;     /*   120     8 */
> >         /* --- cacheline 2 boundary (128 bytes) --- */
> >         long unsigned int          vm_pgoff;             /*   128     8 */
> >         struct file *              vm_file;              /*   136     8 */
> >         void *                     vm_private_data;      /*   144     8 */
> >         atomic_long_t              swap_readahead_info;  /*   152     8 */
> >         struct mempolicy *         vm_policy;            /*   160     8 */
> >         struct vma_numab_state *   numab_state;          /*   168     8 */
> >         struct vm_userfaultfd_ctx  vm_userfaultfd_ctx;   /*   176     8 */
> >
> >         /* size: 184, cachelines: 3, members: 18 */
> >         /* sum members: 177, holes: 1, sum holes: 7 */
> >         /* forced alignments: 2 */
> >         /* last cacheline: 56 bytes */
> > } __attribute__((__aligned__(8)));
> >
> >
> > > > >
> > > > > I didn't use __u64 outright to keep 32-bit architectures unaffected, but
> > > > > if it seems important enough, I have nothing against using __u64.
> > > > >
> > > > > Suggested-by: Jann Horn <jannh@google.com>
> > > > > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
> > > >
> > > > Reviewed-by: Shakeel Butt <shakeel.butt@linux.dev>
> > >


  reply	other threads:[~2024-10-17 20:13 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-10-10 20:56 [PATCH v3 tip/perf/core 0/4] uprobes,mm: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 1/4] mm: introduce mmap_lock_speculation_{start|end} Andrii Nakryiko
2024-10-13  7:56   ` Shakeel Butt
2024-10-14 20:27     ` Andrii Nakryiko
2024-10-14 20:48       ` Suren Baghdasaryan
2024-10-23 20:10   ` Peter Zijlstra
2024-10-23 22:17     ` Suren Baghdasaryan
2024-10-24  9:56       ` Peter Zijlstra
2024-10-24 16:28         ` Suren Baghdasaryan
2024-10-24 21:04           ` Suren Baghdasaryan
2024-10-24 23:20             ` Andrii Nakryiko
2024-10-24 23:33               ` Suren Baghdasaryan
2024-10-25  5:12                 ` Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 2/4] mm: switch to 64-bit mm_lock_seq/vm_lock_seq on 64-bit architectures Andrii Nakryiko
2024-10-13  7:56   ` Shakeel Butt
2024-10-17  2:01     ` Suren Baghdasaryan
2024-10-17 18:55       ` Andrii Nakryiko
2024-10-17 19:42         ` Suren Baghdasaryan
2024-10-17 20:12           ` Andrii Nakryiko [this message]
2024-10-23 19:02       ` Peter Zijlstra
2024-10-23 19:12         ` Andrii Nakryiko
2024-10-23 19:31   ` Peter Zijlstra
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 3/4] uprobes: simplify find_active_uprobe_rcu() VMA checks Andrii Nakryiko
2024-10-10 20:56 ` [PATCH v3 tip/perf/core 4/4] uprobes: add speculative lockless VMA-to-inode-to-uprobe resolution Andrii Nakryiko
2024-10-11  5:01   ` Oleg Nesterov
2024-10-23 19:22   ` Peter Zijlstra
2024-10-23 20:02     ` Andrii Nakryiko
2024-10-23 20:19       ` Peter Zijlstra
2024-10-23 17:54 ` [PATCH v3 tip/perf/core 0/4] uprobes,mm: speculative lockless VMA-to-uprobe lookup Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzatVr=70EH9AcHbEwSLy1asJ2x_NKZKsqiAYnkpFpihJg@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=brauner@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=jolsa@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@kernel.org \
    --cc=mjguzik@gmail.com \
    --cc=oleg@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox