linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Andrii Nakryiko <andrii.nakryiko@gmail.com>
To: Jann Horn <jannh@google.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
	akpm@linux-foundation.org, Liam.Howlett@oracle.com,
	 lorenzo.stoakes@oracle.com, david@redhat.com, vbabka@suse.cz,
	 peterx@redhat.com, hannes@cmpxchg.org, mhocko@kernel.org,
	paulmck@kernel.org,  shuah@kernel.org, adobriyan@gmail.com,
	brauner@kernel.org,  josef@toxicpanda.com, yebin10@huawei.com,
	linux@weissschuh.net,  willy@infradead.org, osalvador@suse.de,
	andrii@kernel.org,  ryan.roberts@arm.com,
	christophe.leroy@csgroup.eu,  linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org,  linux-mm@kvack.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v3 8/8] mm/maps: execute PROCMAP_QUERY ioctl under RCU
Date: Thu, 1 May 2025 15:10:13 -0700	[thread overview]
Message-ID: <CAEf4BzarQAmj477Lyp2aS0i2RM4JaxnAVvem6Kz-Eh1a5x-=6A@mail.gmail.com> (raw)
In-Reply-To: <CAG48ez1cR+kXBsvk4murYDBBxSzg9g5FSU--P8-BCrMKV6A+KA@mail.gmail.com>

On Tue, Apr 29, 2025 at 10:25 AM Jann Horn <jannh@google.com> wrote:
>
> On Tue, Apr 29, 2025 at 7:15 PM Suren Baghdasaryan <surenb@google.com> wrote:
> > On Tue, Apr 29, 2025 at 8:56 AM Jann Horn <jannh@google.com> wrote:
> > > On Wed, Apr 23, 2025 at 12:54 AM Andrii Nakryiko
> > > <andrii.nakryiko@gmail.com> wrote:
> > > > On Fri, Apr 18, 2025 at 10:50 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > > Utilize speculative vma lookup to find and snapshot a vma without
> > > > > taking mmap_lock during PROCMAP_QUERY ioctl execution. Concurrent
> > > > > address space modifications are detected and the lookup is retried.
> > > > > While we take the mmap_lock for reading during such contention, we
> > > > > do that momentarily only to record new mm_wr_seq counter.
> > > >
> > > > PROCMAP_QUERY is an even more obvious candidate for fully lockless
> > > > speculation, IMO (because it's more obvious that vma's use is
> > > > localized to do_procmap_query(), instead of being spread across
> > > > m_start/m_next and m_show as with seq_file approach). We do
> > > > rcu_read_lock(), mmap_lock_speculate_try_begin(), query for VMA (no
> > > > mmap_read_lock), use that VMA to produce (speculative) output, and
> > > > then validate that VMA or mm_struct didn't change with
> > > > mmap_lock_speculate_retry(). If it did - retry, if not - we are done.
> > > > No need for vma_copy and any gets/puts, no?
> > >
> > > I really strongly dislike this "fully lockless" approach because it
> > > means we get data races all over the place, and it gets hard to reason
> > > about what happens especially if we do anything other than reading
> > > plain data from the VMA. When reading the implementation of
> > > do_procmap_query(), at basically every memory read you'd have to think
> > > twice as hard to figure out which fields can be concurrently updated
> > > elsewhere and whether the subsequent sequence count recheck can
> > > recover from the resulting badness.
> > >
> > > Just as one example, I think get_vma_name() could (depending on
> > > compiler optimizations) crash with a NULL deref if the VMA's ->vm_ops
> > > pointer is concurrently changed to &vma_dummy_vm_ops by vma_close()
> > > between "if (vma->vm_ops && vma->vm_ops->name)" and
> > > "vma->vm_ops->name(vma)". And I think this illustrates how the "fully
> > > lockless" approach creates more implicit assumptions about the
> > > behavior of core MM code, which could be broken by future changes to
> > > MM code.
> >
> > Yeah, I'll need to re-evaluate such an approach after your review. I
> > like having get_stable_vma() to obtain a completely stable version of
> > the vma in a localized place and then stop worrying about possible
> > races. If implemented correctly, would that be enough to address your
> > concern, Jann?
>
> Yes, I think a stable local snapshot of the VMA (where tricky data
> races are limited to the VMA snapshotting code) is a good tradeoff.

I'm not sure I agree with VMA snapshot being better either, tbh. It is
error-prone to have a byte-by-byte local copy of VMA (which isn't
really that VMA anymore), and passing it into ops callbacks (which
expect "real" VMA)... Who guarantees that this won't backfire,
depending on vm_ops implementations? And constantly copying 176+ bytes
just to access a few fields out of it is a bit unfortunate...

Also taking mmap_read_lock() sort of defeats the point of "RCU-only
access". It's still locking/unlocking and bouncing cache lines between
writer and reader frequently. How slow is per-VMA formatting? If we
take mmap_read_lock, format VMA information into a buffer under this
lock, and drop the mmap_read_lock, would it really be that much slower
compared to what Suren is doing in this patch set? And if no, that
would be so much simpler compared to this semi-locked/semi-RCU way
that is added in this patch set, no?

But I do agree that vma->vm_ops->name access is hard to do in a
completely lockless way reliably. But also how frequently VMAs have
custom names/anon_vma_name? What if we detect that VMA has some
"fancy" functionality (like this custom name thing), and just fallback
to mmap_read_lock-protected logic, which needs to be supported as a
fallback even for lockless approach?

This way we can process most (typical) VMAs completely locklessly,
while not adding any extra assumptions for all the potentially
complicated data pieces. WDYT?


  reply	other threads:[~2025-05-01 22:10 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-18 17:49 [PATCH v3 0/8] perform /proc/pid/maps read and PROCMAP_QUERY " Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 1/8] selftests/proc: add /proc/pid/maps tearing from vma split test Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 2/8] selftests/proc: extend /proc/pid/maps tearing test to include vma resizing Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 3/8] selftests/proc: extend /proc/pid/maps tearing test to include vma remapping Suren Baghdasaryan
2025-04-18 18:30   ` Lorenzo Stoakes
2025-04-18 19:31     ` Suren Baghdasaryan
2025-04-18 19:55       ` Lorenzo Stoakes
2025-04-18 20:03         ` Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 4/8] selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 5/8] selftests/proc: add verbose more for tests to facilitate debugging Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 6/8] mm: make vm_area_struct anon_name field RCU-safe Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU Suren Baghdasaryan
2025-04-22 22:48   ` Andrii Nakryiko
2025-04-23 21:49     ` Suren Baghdasaryan
2025-04-23 22:06       ` Andrii Nakryiko
2025-04-24  0:23         ` Liam R. Howlett
2025-04-24 15:20           ` Suren Baghdasaryan
2025-04-24 16:03             ` Andrii Nakryiko
2025-04-24 16:42               ` Liam R. Howlett
2025-04-24 17:38                 ` Suren Baghdasaryan
2025-04-24 17:44                 ` Andrii Nakryiko
2025-04-29 15:39   ` Jann Horn
2025-04-29 17:09     ` Suren Baghdasaryan
2025-04-29 17:20       ` Jann Horn
2025-04-29 18:04         ` Suren Baghdasaryan
2025-04-29 18:54           ` Jann Horn
2025-04-29 20:33             ` Suren Baghdasaryan
2025-04-29 20:43               ` Jann Horn
2025-05-05 13:50             ` Christian Brauner
2025-05-05 16:38               ` Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 8/8] mm/maps: execute PROCMAP_QUERY ioctl " Suren Baghdasaryan
2025-04-22 22:54   ` Andrii Nakryiko
2025-04-23 21:53     ` Suren Baghdasaryan
2025-04-29 15:55     ` Jann Horn
2025-04-29 17:14       ` Suren Baghdasaryan
2025-04-29 17:24         ` Jann Horn
2025-05-01 22:10           ` Andrii Nakryiko [this message]
2025-05-02 15:11             ` Jann Horn
2025-05-02 16:16               ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAEf4BzarQAmj477Lyp2aS0i2RM4JaxnAVvem6Kz-Eh1a5x-=6A@mail.gmail.com' \
    --to=andrii.nakryiko@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=brauner@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yebin10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox