From: Suren Baghdasaryan <surenb@google.com>
To: "Liam R. Howlett" <Liam.Howlett@oracle.com>,
Suren Baghdasaryan <surenb@google.com>,
Peter Zijlstra <peterz@infradead.org>,
akpm@linux-foundation.org, willy@infradead.org,
lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz,
hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
mgorman@techsingularity.net, david@redhat.com,
peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net,
paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com,
hdanton@sina.com, hughd@google.com, lokeshgidra@google.com,
minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
souravpanda@google.com, pasha.tatashin@soleen.com,
klarasmodin@gmail.com, corbet@lwn.net,
linux-doc@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count
Date: Wed, 18 Dec 2024 13:53:17 -0800 [thread overview]
Message-ID: <CAJuCfpHXRX=LLa67eWYvrK=UDxKMaOequFXfqOqDHbRrmsT9SQ@mail.gmail.com> (raw)
In-Reply-To: <ulbspoec633hfm54f3jzvoqs6ilskxou3qykk2u727pbaltvfl@lb53vjcaxnuf>
On Wed, Dec 18, 2024 at 12:38 PM Liam R. Howlett
<Liam.Howlett@oracle.com> wrote:
>
> * Suren Baghdasaryan <surenb@google.com> [241218 15:01]:
> > On Wed, Dec 18, 2024 at 11:38 AM 'Liam R. Howlett' via kernel-team
> > <kernel-team@android.com> wrote:
> > >
> > > * Suren Baghdasaryan <surenb@google.com> [241218 14:29]:
> > > > On Wed, Dec 18, 2024 at 11:07 AM Suren Baghdasaryan <surenb@google.com> wrote:
> > > > >
> > > > > On Wed, Dec 18, 2024 at 11:00 AM 'Liam R. Howlett' via kernel-team
> > > > > <kernel-team@android.com> wrote:
> > > > > >
> > > > > > * Suren Baghdasaryan <surenb@google.com> [241218 12:58]:
> > > > > > > On Wed, Dec 18, 2024 at 9:44 AM Peter Zijlstra <peterz@infradead.org> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 18, 2024 at 09:36:42AM -0800, Suren Baghdasaryan wrote:
> > > > > > > >
> > > > > > > > > > You will not. vms_complete_munmap_vmas() will call remove_vma() to
> > > > > > > > > > remove PTEs IIRC, and if you do start_write() and detach() before
> > > > > > > > > > dropping mmap_lock_write, you should be good.
> > > > > > > > >
> > > > > > > > > Ok, I think we will have to move mmap_write_downgrade() inside
> > > > > > > > > vms_complete_munmap_vmas() to be called after remove_vma().
> > > > > > > > > vms_clear_ptes() is using vmas, so we can't move remove_vma() before
> > > > > > > > > mmap_write_downgrade().
> > > > > > > >
> > > > > > > > Why ?!
> > > > > > > >
> > > > > > > > vms_clear_ptes() and remove_vma() are fine where they are -- there is no
> > > > > > > > concurrency left at this point.
> > > > > > > >
> > > > > > > > Note that by doing vma_start_write() inside vms_complete_munmap_vmas(),
> > > > > > > > which is *after* the vmas have been unhooked from the mm, you wait for
> > > > > > > > any concurrent user to go away.
> > > > > > > >
> > > > > > > > And since they're unhooked, there can't be any new users.
> > > > > > > >
> > > > > > > > So you're the one and only user left, and code is fine the way it is.
> > > > > > >
> > > > > > > Ok, let me make sure I understand this part of your proposal. From
> > > > > > > your earlier email:
> > > > > > >
> > > > > > > @@ -1173,6 +1173,11 @@ static void vms_complete_munmap_vmas(struct
> > > > > > > vma_munmap_struct *vms,
> > > > > > > struct vm_area_struct *vma;
> > > > > > > struct mm_struct *mm;
> > > > > > >
> > > > > > > + mas_for_each(mas_detach, vma, ULONG_MAX) {
> > > > > > > + vma_start_write(next);
> > > > > > > + vma_mark_detached(next, true);
> > > > > > > + }
> > > > > > > +
> > > > > > > mm = current->mm;
> > > > > > > mm->map_count -= vms->vma_count;
> > > > > > > mm->locked_vm -= vms->locked_vm;
> > > > > > >
> > > > > > > This would mean:
> > > > > > >
> > > > > > > vms_complete_munmap_vmas
> > > > > > > vma_start_write
> > > > > > > vma_mark_detached
> > > > > > > mmap_write_downgrade
> > > > > > > vms_clear_ptes
> > > > > > > remove_vma
> > > > > > >
> > > > > > > And remove_vma will be just freeing the vmas. Is that correct?
> > > > > > > I'm a bit confused because the original thinking was that
> > > > > > > vma_mark_detached() would drop the last refcnt and if it's 0 we would
> > > > > > > free the vma right there. If that's still what we want to do then I
> > > > > > > think the above sequence should look like this:
> > > > > > >
> > > > > > > vms_complete_munmap_vmas
> > > > > > > vms_clear_ptes
> > > > > > > remove_vma
> > > > > > > vma_start_write
> > > > > > > vma_mark_detached
> > > > > > > mmap_write_downgrade
> > > > > > >
> > > > > > > because vma_start_write+vma_mark_detached should be done under mmap_write_lock.
> > > > > > > Please let me know which way you want to move forward.
> > > > > > >
> > > > > >
> > > > > > Are we sure we're not causing issues with the MAP_FIXED path here?
> > > > > >
> > > > > > With the above change, we'd be freeing the PTEs before marking the vmas
> > > > > > as detached or vma_start_write().
> > > > >
> > > > > IIUC when we call vms_complete_munmap_vmas() all vmas inside
> > > > > mas_detach have been already write-locked, no?
> > >
> > > That's the way it is today - but I thought you were moving the lock to
> > > the complete stage, not adding a new one? (why add a new one otherwise?)
> >
> > Is my understanding correct that mas_detach is populated by
> > vms_gather_munmap_vmas() only with vmas that went through
> > __split_vma() (and were write-locked there)? I don't see any path that
> > would add any other vma into mas_detach but maybe I'm missing
> > something?
>
> No, that is not correct.
>
> vms_gather_munmap_vmas() calls split on the first vma, then adds all
> vmas that are within the range of the munmap() call. Potentially
> splitting the last vma and adding that in the
> "if (next->vm_end > vms->end)" block.
>
> Sometimes this is a single vma that gets split twice, sometimes no
> splits happen and entire vmas are unmapped, sometimes it's just one vma
> that isn't split.
>
> My observation is the common case is a single vma, but besides that we
> see 3, and sometimes 7 at a time, but it could be any number of vmas and
> not all of them are split.
>
> There is a loop for_each_vma_range() that does:
>
> vma_start_write(next);
> mas_set(mas_detach, vms->mas_count++);
> mas_store_gfp(mas_detach, next, GFP_KERNEL);
Ah, ok I see now. I completely misunderstood what for_each_vma_range()
was doing.
Then I think vma_start_write() should remain inside
vms_gather_munmap_vmas() and all vmas in mas_detach should be
write-locked, even the ones we are not modifying. Otherwise what would
prevent the race I mentioned before?
__mmap_region
__mmap_prepare
vms_gather_munmap_vmas // adds vmas to be unmapped into mas_detach,
// some locked
by __split_vma(), some not locked
lock_vma_under_rcu()
vma = mas_walk // finds
unlocked vma also in mas_detach
vma_start_read(vma) //
succeeds since vma is not locked
// vma->detached, vm_start,
vm_end checks pass
// vma is successfully read-locked
vms_clean_up_area(mas_detach)
vms_clear_ptes
// steps on a cleared PTE
__mmap_new_vma
vma_set_range // installs new vma in the range
__mmap_complete
vms_complete_munmap_vmas // vmas are write-locked and detached
but it's too late
>
>
> >
> > >
> > > >
> > > > Yeah, I think we can simply do this:
> > > >
> > > > vms_complete_munmap_vmas
> > > > vms_clear_ptes
> > > > remove_vma
> > > > vma_mark_detached
> > > > mmap_write_downgrade
> > > >
> > > > If my assumption is incorrect, assertion inside vma_mark_detached()
> > > > should trigger. I tried a quick test and so far nothing exploded.
> > > >
> > >
> > > If they are write locked, then the page faults are not a concern. There
> > > is also the rmap race that Jann found in mmap_region() [1]. This is
> > > probably also fine since you are keeping the write lock in place earlier
> > > on in the gather stage. Note the ptes will already be cleared by the
> > > time vms_complete_munmap_vmas() is called in this case.
> > >
> > > [1] https://lore.kernel.org/all/CAG48ez0ZpGzxi=-5O_uGQ0xKXOmbjeQ0LjZsRJ1Qtf2X5eOr1w@mail.gmail.com/
> > >
> > > Thanks,
> > > Liam
> > >
> > > To unsubscribe from this group and stop receiving emails from it, send an email to kernel-team+unsubscribe@android.com.
> > >
next prev parent reply other threads:[~2024-12-18 21:53 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-16 19:24 [PATCH v6 00/16] move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 01/16] mm: introduce vma_start_read_locked{_nested} helpers Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 02/16] mm: move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 03/16] mm: mark vma as detached until it's added into vma tree Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 04/16] mm/nommu: fix the last places where vma is not locked before being attached Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 05/16] types: move struct rcuwait into types.h Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 06/16] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail Suren Baghdasaryan
2024-12-17 11:31 ` Lokesh Gidra
2024-12-17 15:51 ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 07/16] mm: move mmap_init_lock() out of the header file Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 08/16] mm: uninline the main body of vma_start_write() Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 09/16] refcount: introduce __refcount_{add|inc}_not_zero_limited Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count Suren Baghdasaryan
2024-12-16 20:42 ` Peter Zijlstra
2024-12-16 20:53 ` Suren Baghdasaryan
2024-12-16 21:15 ` Peter Zijlstra
2024-12-16 21:53 ` Suren Baghdasaryan
2024-12-16 22:00 ` Peter Zijlstra
2024-12-16 21:37 ` Peter Zijlstra
2024-12-16 21:44 ` Suren Baghdasaryan
2024-12-17 10:30 ` Peter Zijlstra
2024-12-17 16:27 ` Suren Baghdasaryan
2024-12-18 9:41 ` Peter Zijlstra
2024-12-18 10:06 ` Peter Zijlstra
2024-12-18 15:37 ` Liam R. Howlett
2024-12-18 15:50 ` Suren Baghdasaryan
2024-12-18 16:18 ` Peter Zijlstra
2024-12-18 17:36 ` Suren Baghdasaryan
2024-12-18 17:44 ` Peter Zijlstra
2024-12-18 17:58 ` Suren Baghdasaryan
2024-12-18 19:00 ` Liam R. Howlett
2024-12-18 19:07 ` Suren Baghdasaryan
2024-12-18 19:29 ` Suren Baghdasaryan
2024-12-18 19:38 ` Liam R. Howlett
2024-12-18 20:00 ` Suren Baghdasaryan
2024-12-18 20:38 ` Liam R. Howlett
2024-12-18 21:53 ` Suren Baghdasaryan [this message]
2024-12-18 21:55 ` Suren Baghdasaryan
2024-12-19 0:35 ` Andrew Morton
2024-12-19 0:47 ` Suren Baghdasaryan
2024-12-19 9:13 ` Peter Zijlstra
2024-12-19 11:20 ` Peter Zijlstra
2024-12-19 16:17 ` Suren Baghdasaryan
2024-12-19 17:16 ` Liam R. Howlett
2024-12-19 17:42 ` Peter Zijlstra
2024-12-19 18:18 ` Liam R. Howlett
2024-12-19 18:46 ` Peter Zijlstra
2024-12-19 18:55 ` Liam R. Howlett
2024-12-20 15:22 ` Suren Baghdasaryan
2024-12-23 3:03 ` Suren Baghdasaryan
2024-12-26 17:12 ` Suren Baghdasaryan
2024-12-19 16:14 ` Suren Baghdasaryan
2024-12-19 17:23 ` Peter Zijlstra
2024-12-19 8:55 ` Peter Zijlstra
2024-12-19 16:08 ` Suren Baghdasaryan
2024-12-19 8:53 ` Peter Zijlstra
2024-12-19 16:08 ` Suren Baghdasaryan
2024-12-18 15:57 ` Suren Baghdasaryan
2024-12-18 16:13 ` Peter Zijlstra
2024-12-18 15:42 ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 11/16] mm: enforce vma to be in detached state before freeing Suren Baghdasaryan
2024-12-16 21:16 ` Peter Zijlstra
2024-12-16 21:18 ` Peter Zijlstra
2024-12-16 21:57 ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 12/16] mm: remove extra vma_numab_state_init() call Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 13/16] mm: introduce vma_ensure_detached() Suren Baghdasaryan
2024-12-17 10:26 ` Peter Zijlstra
2024-12-17 15:58 ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 14/16] mm: prepare lock_vma_under_rcu() for vma reuse possibility Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 15/16] mm: make vma cache SLAB_TYPESAFE_BY_RCU Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 16/16] docs/mm: document latest changes to vm_lock Suren Baghdasaryan
2024-12-16 19:39 ` [PATCH v6 00/16] move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-17 18:42 ` Andrew Morton
2024-12-17 18:49 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpHXRX=LLa67eWYvrK=UDxKMaOequFXfqOqDHbRrmsT9SQ@mail.gmail.com' \
--to=surenb@google.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kernel-team@android.com \
--cc=klarasmodin@gmail.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=minchan@google.com \
--cc=mjguzik@gmail.com \
--cc=oleg@redhat.com \
--cc=oliver.sang@intel.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulmck@kernel.org \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=shakeel.butt@linux.dev \
--cc=souravpanda@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox