linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Suren Baghdasaryan <surenb@google.com>, akpm@linux-foundation.org
Cc: Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com,
	david@redhat.com, peterx@redhat.com, jannh@google.com,
	hannes@cmpxchg.org, mhocko@kernel.org, paulmck@kernel.org,
	shuah@kernel.org, adobriyan@gmail.com, brauner@kernel.org,
	josef@toxicpanda.com, yebin10@huawei.com, linux@weissschuh.net,
	willy@infradead.org, osalvador@suse.de, andrii@kernel.org,
	ryan.roberts@arm.com, christophe.leroy@csgroup.eu,
	tjmercier@google.com, kaleshsingh@google.com,
	aha310510@gmail.com, linux-kernel@vger.kernel.org,
	linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
	linux-kselftest@vger.kernel.org
Subject: Re: [PATCH v7 7/7] fs/proc/task_mmu: read proc/pid/maps under per-vma lock
Date: Wed, 16 Jul 2025 15:57:06 +0200	[thread overview]
Message-ID: <dd88b2fc-6963-454b-8cc0-7bd3360a562e@suse.cz> (raw)
In-Reply-To: <20250716030557.1547501-8-surenb@google.com>

On 7/16/25 05:05, Suren Baghdasaryan wrote:
> With maple_tree supporting vma tree traversal under RCU and per-vma
> locks, /proc/pid/maps can be read while holding individual vma locks
> instead of locking the entire address space.
> A completely lockless approach (walking vma tree under RCU) would be
> quite complex with the main issue being get_vma_name() using callbacks
> which might not work correctly with a stable vma copy, requiring
> original (unstable) vma - see special_mapping_name() for example.
> 
> When per-vma lock acquisition fails, we take the mmap_lock for reading,
> lock the vma, release the mmap_lock and continue. This fallback to mmap
> read lock guarantees the reader to make forward progress even during
> lock contention. This will interfere with the writer but for a very
> short time while we are acquiring the per-vma lock and only when there
> was contention on the vma reader is interested in.
> 
> We shouldn't see a repeated fallback to mmap read locks in practice, as
> this require a very unlikely series of lock contentions (for instance
> due to repeated vma split operations). However even if this did somehow
> happen, we would still progress.
> 
> One case requiring special handling is when a vma changes between the
> time it was found and the time it got locked. A problematic case would
> be if a vma got shrunk so that its vm_start moved higher in the address
> space and a new vma was installed at the beginning:
> 
> reader found:               |--------VMA A--------|
> VMA is modified:            |-VMA B-|----VMA A----|
> reader locks modified VMA A
> reader reports VMA A:       |  gap  |----VMA A----|
> 
> This would result in reporting a gap in the address space that does not
> exist. To prevent this we retry the lookup after locking the vma, however
> we do that only when we identify a gap and detect that the address space
> was changed after we found the vma.
> 
> This change is designed to reduce mmap_lock contention and prevent a
> process reading /proc/pid/maps files (often a low priority task, such
> as monitoring/data collection services) from blocking address space
> updates. Note that this change has a userspace visible disadvantage:
> it allows for sub-page data tearing as opposed to the previous mechanism
> where data tearing could happen only between pages of generated output
> data. Since current userspace considers data tearing between pages to be
> acceptable, we assume is will be able to handle sub-page data tearing
> as well.
> 
> Signed-off-by: Suren Baghdasaryan <surenb@google.com>

Reviewed-by: Vlastimil Babka <vbabka@suse.cz>

Nit: the previous patch changed lines with e.g. -2UL to -2 and this seems
changing the same lines to add a comment e.g. *ppos = -2; /* -2 indicates
gate vma */

That comment could have been added in the previous patch already. Also if
you feel the need to add the comments, maybe it's time to just name those
special values with a #define or something :)



  reply	other threads:[~2025-07-16 13:57 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-07-16  3:05 [PATCH v7 0/7] use per-vma locks for /proc/pid/maps reads Suren Baghdasaryan
2025-07-16  3:05 ` [PATCH v7 1/7] selftests/proc: add /proc/pid/maps tearing from vma split test Suren Baghdasaryan
2025-07-16 10:44   ` David Hildenbrand
2025-07-16 10:50     ` Lorenzo Stoakes
2025-07-16 14:20       ` Suren Baghdasaryan
2025-07-16 16:44         ` Suren Baghdasaryan
2025-07-16  3:05 ` [PATCH v7 2/7] selftests/proc: extend /proc/pid/maps tearing test to include vma resizing Suren Baghdasaryan
2025-07-16  3:05 ` [PATCH v7 3/7] selftests/proc: extend /proc/pid/maps tearing test to include vma remapping Suren Baghdasaryan
2025-07-16  3:05 ` [PATCH v7 4/7] selftests/proc: test PROCMAP_QUERY ioctl while vma is concurrently modified Suren Baghdasaryan
2025-07-16 10:04   ` David Hildenbrand
2025-07-16 10:39     ` Lorenzo Stoakes
2025-07-16 16:43       ` Suren Baghdasaryan
2025-07-16  3:05 ` [PATCH v7 5/7] selftests/proc: add verbose more for tests to facilitate debugging Suren Baghdasaryan
2025-07-16  3:05 ` [PATCH v7 6/7] fs/proc/task_mmu: remove conversion of seq_file position to unsigned Suren Baghdasaryan
2025-07-16 10:41   ` David Hildenbrand
2025-07-16  3:05 ` [PATCH v7 7/7] fs/proc/task_mmu: read proc/pid/maps under per-vma lock Suren Baghdasaryan
2025-07-16 13:57   ` Vlastimil Babka [this message]
2025-07-16 14:29     ` Suren Baghdasaryan
2025-07-16 22:55 ` [PATCH v7 0/7] use per-vma locks for /proc/pid/maps reads Andrew Morton
2025-07-17  1:38   ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=dd88b2fc-6963-454b-8cc0-7bd3360a562e@suse.cz \
    --to=vbabka@suse.cz \
    --cc=Liam.Howlett@oracle.com \
    --cc=adobriyan@gmail.com \
    --cc=aha310510@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrii@kernel.org \
    --cc=brauner@kernel.org \
    --cc=christophe.leroy@csgroup.eu \
    --cc=david@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=jannh@google.com \
    --cc=josef@toxicpanda.com \
    --cc=kaleshsingh@google.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux@weissschuh.net \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mhocko@kernel.org \
    --cc=osalvador@suse.de \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=ryan.roberts@arm.com \
    --cc=shuah@kernel.org \
    --cc=surenb@google.com \
    --cc=tjmercier@google.com \
    --cc=willy@infradead.org \
    --cc=yebin10@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox