linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: "Liam R. Howlett" <Liam.Howlett@oracle.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>,
	akpm@linux-foundation.org, lorenzo.stoakes@oracle.com,
	david@redhat.com, vbabka@suse.cz, peterx@redhat.com,
	jannh@google.com, hannes@cmpxchg.org, mhocko@kernel.org,
	paulmck@kernel.org, shuah@kernel.org, adobriyan@gmail.com,
	brauner@kernel.org, josef@toxicpanda.com, yebin10@huawei.com,
	linux@weissschuh.net, willy@infradead.org, osalvador@suse.de,
	andrii@kernel.org, ryan.roberts@arm.com,
	christophe.leroy@csgroup.eu, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU
Date: Wed, 23 Apr 2025 20:23:54 -0400	[thread overview]
Message-ID: <6ay37xorr35nw4ljtptnfqchuaozu73ffvjpmwopat42n4t6vr@qnr6xvralx2o> (raw)
In-Reply-To: <CAEf4BzYctDuS4DRTzdRQyyhCYvFTggOz=wcbizXEYvC_z_SSng@mail.gmail.com>

* Andrii Nakryiko <andrii.nakryiko@gmail.com> [250423 18:06]:
> On Wed, Apr 23, 2025 at 2:49 PM Suren Baghdasaryan <surenb@google.com> wrote:
> >
> > On Tue, Apr 22, 2025 at 3:49 PM Andrii Nakryiko
> > <andrii.nakryiko@gmail.com> wrote:
> > >
> > > On Fri, Apr 18, 2025 at 10:50 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > >
> > > > With maple_tree supporting vma tree traversal under RCU and vma and
> > > > its important members being RCU-safe, /proc/pid/maps can be read under
> > > > RCU and without the need to read-lock mmap_lock. However vma content
> > > > can change from under us, therefore we make a copy of the vma and we
> > > > pin pointer fields used when generating the output (currently only
> > > > vm_file and anon_name). Afterwards we check for concurrent address
> > > > space modifications, wait for them to end and retry. While we take
> > > > the mmap_lock for reading during such contention, we do that momentarily
> > > > only to record new mm_wr_seq counter. This change is designed to reduce
> > >
> > > This is probably a stupid question, but why do we need to take a lock
> > > just to record this counter? uprobes get away without taking mmap_lock
> > > even for reads, and still record this seq counter. And then detect
> > > whether there were any modifications in between. Why does this change
> > > need more heavy-weight mmap_read_lock to do speculative reads?
> >
> > Not a stupid question. mmap_read_lock() is used to wait for the writer
> > to finish what it's doing and then we continue by recording a new
> > sequence counter value and call mmap_read_unlock. This is what
> > get_vma_snapshot() does. But your question made me realize that we can
> > optimize m_start() further by not taking mmap_read_lock at all.
> > Instead of taking mmap_read_lock then doing drop_mmap_lock() we can
> > try mmap_lock_speculate_try_begin() and only if it fails do the same
> > dance we do in the get_vma_snapshot(). I think that should work.
> 
> Ok, yeah, it would be great to avoid taking a lock in a common case!

We can check this counter once per 4k block and maintain the same
'tearing' that exists today instead of per-vma.  Not that anyone said
they had an issue with changing it, but since we're on this road anyways
I'd thought I'd point out where we could end up.

I am concerned about live locking in either scenario, but I haven't
looked too deep into this pattern.

I also don't love (as usual) the lack of ensured forward progress.

It seems like we have four cases for the vm area state now:
1. we want to read a stable vma or set of vmas (per-vma locking)
2. we want to read a stable mm state for reading (the very short named
mmap_lock_speculate_try_begin)
3. we ensure a stable vma/mm state for reading (mmap read lock)
4. we are writing - get out of my way (mmap write lock).

Cheers,
Liam



  reply	other threads:[~2025-04-24  0:24 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-04-18 17:49 [PATCH v3 0/8] perform /proc/pid/maps read and PROCMAP_QUERY " Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 1/8] selftests/proc: add /proc/pid/maps tearing from vma split test Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 2/8] selftests/proc: extend /proc/pid/maps tearing test to include vma resizing Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 3/8] selftests/proc: extend /proc/pid/maps tearing test to include vma remapping Suren Baghdasaryan
2025-04-18 18:30   ` Lorenzo Stoakes
2025-04-18 19:31     ` Suren Baghdasaryan
2025-04-18 19:55       ` Lorenzo Stoakes
2025-04-18 20:03         ` Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 4/8] selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 5/8] selftests/proc: add verbose more for tests to facilitate debugging Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 6/8] mm: make vm_area_struct anon_name field RCU-safe Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 7/8] mm/maps: read proc/pid/maps under RCU Suren Baghdasaryan
2025-04-22 22:48   ` Andrii Nakryiko
2025-04-23 21:49     ` Suren Baghdasaryan
2025-04-23 22:06       ` Andrii Nakryiko
2025-04-24  0:23         ` Liam R. Howlett [this message]
2025-04-24 15:20           ` Suren Baghdasaryan
2025-04-24 16:03             ` Andrii Nakryiko
2025-04-24 16:42               ` Liam R. Howlett
2025-04-24 17:38                 ` Suren Baghdasaryan
2025-04-24 17:44                 ` Andrii Nakryiko
2025-04-29 15:39   ` Jann Horn
2025-04-29 17:09     ` Suren Baghdasaryan
2025-04-29 17:20       ` Jann Horn
2025-04-29 18:04         ` Suren Baghdasaryan
2025-04-29 18:54           ` Jann Horn
2025-04-29 20:33             ` Suren Baghdasaryan
2025-04-29 20:43               ` Jann Horn
2025-05-05 13:50             ` Christian Brauner
2025-05-05 16:38               ` Suren Baghdasaryan
2025-04-18 17:49 ` [PATCH v3 8/8] mm/maps: execute PROCMAP_QUERY ioctl " Suren Baghdasaryan
2025-04-22 22:54   ` Andrii Nakryiko
2025-04-23 21:53     ` Suren Baghdasaryan
2025-04-29 15:55     ` Jann Horn
2025-04-29 17:14       ` Suren Baghdasaryan
2025-04-29 17:24         ` Jann Horn
2025-05-01 22:10           ` Andrii Nakryiko
2025-05-02 15:11             ` Jann Horn
2025-05-02 16:16               ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6ay37xorr35nw4ljtptnfqchuaozu73ffvjpmwopat42n4t6vr@qnr6xvralx2o \
    --to=liam.howlett@oracle.com \
    --cc=adobriyan@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=brauner@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=josef@toxicpanda.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=yebin10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox