linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: Vlastimil Babka <vbabka@suse.cz>
Cc: Pedro Falcato <pfalcato@suse.de>,
	Andrew Morton <akpm@linux-foundation.org>,
	Jonathan Corbet <corbet@lwn.net>,
	David Hildenbrand <david@redhat.com>,
	"Liam R . Howlett" <Liam.Howlett@oracle.com>,
	Mike Rapoport <rppt@kernel.org>,
	Suren Baghdasaryan <surenb@google.com>,
	Michal Hocko <mhocko@suse.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Masami Hiramatsu <mhiramat@kernel.org>,
	Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Jann Horn <jannh@google.com>,
	linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-doc@vger.kernel.org, linux-mm@kvack.org,
	linux-trace-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org, Andrei Vagin <avagin@gmail.com>,
	Barry Song <21cnbao@gmail.com>
Subject: Re: [PATCH 1/3] mm: introduce VM_MAYBE_GUARD and make visible for guard regions
Date: Thu, 30 Oct 2025 19:47:23 +0000	[thread overview]
Message-ID: <88b72728-fa3f-4a70-9ea2-40ff50673047@lucifer.local> (raw)
In-Reply-To: <3ae457cd-6c18-4870-a617-7f937b107cb4@suse.cz>

On Thu, Oct 30, 2025 at 07:47:34PM +0100, Vlastimil Babka wrote:
> On 10/30/25 19:31, Vlastimil Babka wrote:
> > On 10/30/25 17:43, Lorenzo Stoakes wrote:
> >> On Thu, Oct 30, 2025 at 04:31:56PM +0000, Pedro Falcato wrote:
> >>> On Thu, Oct 30, 2025 at 04:23:58PM +0000, Lorenzo Stoakes wrote:
> >>> > On Thu, Oct 30, 2025 at 04:16:20PM +0000, Pedro Falcato wrote:
> >>> > > On Wed, Oct 29, 2025 at 04:50:31PM +0000, Lorenzo Stoakes wrote:
> >>> > > > Currently, if a user needs to determine if guard regions are present in a
> >>> > > > range, they have to scan all VMAs (or have knowledge of which ones might
> >>> > > > have guard regions).
> >>> > > >
> >>> > > > Since commit 8e2f2aeb8b48 ("fs/proc/task_mmu: add guard region bit to
> >>> > > > pagemap") and the related commit a516403787e0 ("fs/proc: extend the
> >>> > > > PAGEMAP_SCAN ioctl to report guard regions"), users can use either
> >>> > > > /proc/$pid/pagemap or the PAGEMAP_SCAN functionality to perform this
> >>> > > > operation at a virtual address level.
> >>> > > >
> >>> > > > This is not ideal, and it gives no visibility at a /proc/$pid/smaps level
> >>> > > > that guard regions exist in ranges.
> >>> > > >
> >>> > > > This patch remedies the situation by establishing a new VMA flag,
> >>> > > > VM_MAYBE_GUARD, to indicate that a VMA may contain guard regions (it is
> >>> > > > uncertain because we cannot reasonably determine whether a
> >>> > > > MADV_GUARD_REMOVE call has removed all of the guard regions in a VMA, and
> >>> > > > additionally VMAs may change across merge/split).
> >>> > > >
> >>> > > > We utilise 0x800 for this flag which makes it available to 32-bit
> >>> > > > architectures also, a flag that was previously used by VM_DENYWRITE, which
> >>> > > > was removed in commit 8d0920bde5eb ("mm: remove VM_DENYWRITE") and hasn't
> >>> > > > bee reused yet.
> >>> > > >
> >>> > > > The MADV_GUARD_INSTALL madvise() operation now must take an mmap write
> >>> > > > lock (and also VMA write lock) whereas previously it did not, but this
> >>> > > > seems a reasonable overhead.
> >>> > >
> >>> > > Do you though? Could it be possible to simply atomically set the flag with
> >>> > > the read lock held? This would make it so we can't split the VMA (and tightly
> >>> >
> >>> > VMA flags are not accessed atomically so no I don't think we can do that in any
> >>> > workable way.
> >>> >
> >>>
> >>> FWIW I think you could work it as an atomic flag and treat those races as benign
> >>> (this one, at least).
> >>
> >> It's not benign as we need to ensure that page tables are correctly propagated
> >> on fork.
> >
> > Could we use MADVISE_VMA_READ_LOCK mode (would be actually an improvement
> > over the current MADVISE_MMAP_READ_LOCK), together with the atomic flag
> > setting? I think the places that could race with us to cause RMW use vma
> > write lock so that would be excluded. Fork AFAICS unfortunately doesn't (for
> > the oldmm) and it probably would't make sense to start doing it. Maybe we
> > could think of something to deal with this special case...
>
> During discussion with Pedro off-list I realized fork takes mmap lock for
> write on the old mm, so if we kept taking mmap sem for read, then vma lock
> for read in addition (which should be cheap enough, also we'd only need it
> in case VM_MAYBE_GUARD is not yet set), and set the flag atomicaly, perhaps
> that would cover all non-bening races?
>
>

We take VMA write lock in dup_mmap() on each mpnt (old VMA).

We take the VMA write lock (vma_start_write()) for each mpnt.

We then vm_area_dup() the mpnt to the new VMA before calling:

copy_page_range()
-> vma_needs_copy()

Which is where the check is done.

So we are holding the VMA write lock, so a VMA read lock should suffice no?

For belts + braces we could atomically read the flag in vma_needs_copy(),
though note it's intended VM_COPY_ON_FORK could have more than one flag.

We could drop that for now and be explicit.


  reply	other threads:[~2025-10-30 19:47 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-29 16:50 [PATCH 0/3] introduce VM_MAYBE_GUARD and make it sticky Lorenzo Stoakes
2025-10-29 16:50 ` [PATCH 1/3] mm: introduce VM_MAYBE_GUARD and make visible for guard regions Lorenzo Stoakes
2025-10-29 19:50   ` Randy Dunlap
2025-10-30  8:13     ` Lorenzo Stoakes
2025-10-30  1:05   ` Suren Baghdasaryan
2025-10-30  8:22     ` Lorenzo Stoakes
2025-10-30 16:16   ` Pedro Falcato
2025-10-30 16:23     ` Lorenzo Stoakes
2025-10-30 16:31       ` Pedro Falcato
2025-10-30 16:43         ` Lorenzo Stoakes
2025-10-30 18:31           ` Vlastimil Babka
2025-10-30 18:47             ` Vlastimil Babka
2025-10-30 19:47               ` Lorenzo Stoakes [this message]
2025-10-30 21:48                 ` Vlastimil Babka
2025-10-31 23:12                   ` Suren Baghdasaryan
2025-11-03  9:34                     ` Lorenzo Stoakes
2025-11-05 19:48               ` Lorenzo Stoakes
2025-11-06  7:34                 ` Vlastimil Babka
2025-10-30 19:16             ` Lorenzo Stoakes
2025-10-30 19:37               ` Lorenzo Stoakes
2025-10-29 16:50 ` [PATCH 2/3] mm: implement sticky, copy on fork VMA flags Lorenzo Stoakes
2025-10-30  4:35   ` Suren Baghdasaryan
2025-10-30  8:25     ` Lorenzo Stoakes
2025-10-30 16:25   ` Pedro Falcato
2025-10-30 16:34     ` Lorenzo Stoakes
2025-10-29 16:50 ` [PATCH 3/3] selftests/mm/guard-regions: add smaps visibility test Lorenzo Stoakes
2025-10-30  4:40   ` Suren Baghdasaryan
2025-10-30  8:25     ` Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=88b72728-fa3f-4a70-9ea2-40ff50673047@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=avagin@gmail.com \
    --cc=corbet@lwn.net \
    --cc=david@redhat.com \
    --cc=jannh@google.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=pfalcato@suse.de \
    --cc=rostedt@goodmis.org \
    --cc=rppt@kernel.org \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox