From: Suren Baghdasaryan <surenb@google.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: akpm@linux-foundation.org, willy@infradead.org,
liam.howlett@oracle.com, lorenzo.stoakes@oracle.com,
mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org,
mjguzik@gmail.com, oliver.sang@intel.com,
mgorman@techsingularity.net, david@redhat.com,
peterx@redhat.com, oleg@redhat.com, dave@stgolabs.net,
paulmck@kernel.org, brauner@kernel.org, dhowells@redhat.com,
hdanton@sina.com, hughd@google.com, lokeshgidra@google.com,
minchan@google.com, jannh@google.com, shakeel.butt@linux.dev,
souravpanda@google.com, pasha.tatashin@soleen.com,
klarasmodin@gmail.com, corbet@lwn.net,
linux-doc@vger.kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count
Date: Wed, 18 Dec 2024 07:42:30 -0800 [thread overview]
Message-ID: <CAJuCfpHkTScPqME=++tzv=qu0aev-ewprLa=GGw8Qp4PQtTL9w@mail.gmail.com> (raw)
In-Reply-To: <20241218094104.GC2354@noisy.programming.kicks-ass.net>
On Wed, Dec 18, 2024 at 1:41 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Dec 17, 2024 at 08:27:46AM -0800, Suren Baghdasaryan wrote:
>
> > > So I just replied there, and no, I don't think it makes sense. Just put
> > > the kmem_cache_free() in vma_refcount_put(), to be done on 0.
> >
> > That's very appealing indeed and makes things much simpler. The
> > problem I see with that is the case when we detach a vma from the tree
> > to isolate it, then do some cleanup and only then free it. That's done
> > in vms_gather_munmap_vmas() here:
> > https://elixir.bootlin.com/linux/v6.12.5/source/mm/vma.c#L1240 and we
> > even might reattach detached vmas back:
> > https://elixir.bootlin.com/linux/v6.12.5/source/mm/vma.c#L1312. IOW,
> > detached state is not final and we can't destroy the object that
> > reached this state.
>
> Urgh, so that's the munmap() path, but arguably when that fails, the
> map stays in place.
>
> I think this means you're marking detached too soon; you should only
> mark detached once you reach the point of no return.
>
> That said, once you've reached the point of no return; and are about to
> go remove the page-tables, you very much want to ensure a lack of
> concurrency.
>
> So perhaps waiting for out-standing readers at this point isn't crazy.
>
> Also, I'm having a very hard time reading this maple tree stuff :/
> Afaict vms_gather_munmap_vmas() only adds the VMAs to be removed to a
> second tree, it does not in fact unlink them from the mm yet.
Yes, I think you are correct.
>
> AFAICT it's vma_iter_clear_gfp() that actually wipes the vmas from the
> mm -- and that being able to fail is mind boggling and I suppose is what
> gives rise to much of this insanity :/
>
> Anyway, I would expect remove_vma() to be the one that marks it detached
> (it's already unreachable through vma_lookup() at this point) and there
> you should wait for concurrent readers to bugger off.
There is an issue with that. Note that vms_complete_munmap_vmas()
that's calling remove_vma() might drop the mmap write lock, so
detaching without a write lock would break current rules.
>
> > We could change states to: 0=unused (we can free
> > the object), 1=detached, 2=attached, etc. but then vma_start_read()
> > should do something like refcount_inc_more_than_one() instead of
> > refcount_inc_not_zero(). Would you be ok with such an approach?
>
> Urgh, I would strongly suggest ditching refcount_t if we go this route.
> The thing is; refcount_t should remain a 'simple' straight forward
> interface and not allow people to do the wrong thing. Its not meant to
> be the kitchen sink -- we have atomic_t for that.
Ack. If we go this route I'll use atomics directly.
>
> Anyway, the more common scheme at that point is using -1 for 'free', I
> think folio->_mapcount uses that even. For that see:
> atomic_add_negative*().
Thanks for the reference.
>
> > > Additionally, having vma_end_write() would allow you to put a lockdep
> > > annotation in vma_{start,end}_write() -- which was I think the original
> > > reason I proposed it a while back, that and having improved clarity when
> > > reading the code, since explicitly marking the end of a section is
> > > helpful.
> >
> > The vma->vmlock_dep_map is tracking vma->vm_refcnt, not the
> > vma->vm_lock_seq (similar to how today vma->vm_lock has its lockdep
> > tracking that rw_semaphore). If I implement vma_end_write() then it
> > will simply be something like:
> >
> > void vma_end_write(vma)
> > {
> > vma_assert_write_locked(vma);
> > vma->vm_lock_seq = UINT_MAX;
> > }
> >
> > so, vmlock_dep_map would not be involved.
>
> That's just weird; why would you not track vma_{start,end}_write() with
> the exclusive side of the 'rwsem' dep_map ?
>
> > If you want to track vma->vm_lock_seq with a separate lockdep, that
> > would be more complicated. Specifically for vma_end_write_all() that
> > would require us to call rwsem_release() on all locked vmas, however
> > we currently do not track individual locked vmas. vma_end_write_all()
> > allows us not to worry about tracking them, knowing that once we do
> > mmap_write_unlock() they all will get unlocked with one increment of
> > mm->mm_lock_seq. If your suggestion is to replace vma_end_write_all()
> > with vma_end_write() and unlock vmas individually across the mm code,
> > that would be a sizable effort. If that is indeed your ultimate goal,
> > I can do that as a separate project: introduce vma_end_write(),
> > gradually add them in required places (not yet sure how complex that
> > would be), then retire vma_end_write_all() and add a lockdep for
> > vma->vm_lock_seq.
>
> Yeah, so ultimately I think it would be clearer if you explicitly mark
> the point where the vma modification is 'done'. But I don't suppose we
> have to do that here.
Ack.
next prev parent reply other threads:[~2024-12-18 15:42 UTC|newest]
Thread overview: 74+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-12-16 19:24 [PATCH v6 00/16] move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 01/16] mm: introduce vma_start_read_locked{_nested} helpers Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 02/16] mm: move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 03/16] mm: mark vma as detached until it's added into vma tree Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 04/16] mm/nommu: fix the last places where vma is not locked before being attached Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 05/16] types: move struct rcuwait into types.h Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 06/16] mm: allow vma_start_read_locked/vma_start_read_locked_nested to fail Suren Baghdasaryan
2024-12-17 11:31 ` Lokesh Gidra
2024-12-17 15:51 ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 07/16] mm: move mmap_init_lock() out of the header file Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 08/16] mm: uninline the main body of vma_start_write() Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 09/16] refcount: introduce __refcount_{add|inc}_not_zero_limited Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 10/16] mm: replace vm_lock and detached flag with a reference count Suren Baghdasaryan
2024-12-16 20:42 ` Peter Zijlstra
2024-12-16 20:53 ` Suren Baghdasaryan
2024-12-16 21:15 ` Peter Zijlstra
2024-12-16 21:53 ` Suren Baghdasaryan
2024-12-16 22:00 ` Peter Zijlstra
2024-12-16 21:37 ` Peter Zijlstra
2024-12-16 21:44 ` Suren Baghdasaryan
2024-12-17 10:30 ` Peter Zijlstra
2024-12-17 16:27 ` Suren Baghdasaryan
2024-12-18 9:41 ` Peter Zijlstra
2024-12-18 10:06 ` Peter Zijlstra
2024-12-18 15:37 ` Liam R. Howlett
2024-12-18 15:50 ` Suren Baghdasaryan
2024-12-18 16:18 ` Peter Zijlstra
2024-12-18 17:36 ` Suren Baghdasaryan
2024-12-18 17:44 ` Peter Zijlstra
2024-12-18 17:58 ` Suren Baghdasaryan
2024-12-18 19:00 ` Liam R. Howlett
2024-12-18 19:07 ` Suren Baghdasaryan
2024-12-18 19:29 ` Suren Baghdasaryan
2024-12-18 19:38 ` Liam R. Howlett
2024-12-18 20:00 ` Suren Baghdasaryan
2024-12-18 20:38 ` Liam R. Howlett
2024-12-18 21:53 ` Suren Baghdasaryan
2024-12-18 21:55 ` Suren Baghdasaryan
2024-12-19 0:35 ` Andrew Morton
2024-12-19 0:47 ` Suren Baghdasaryan
2024-12-19 9:13 ` Peter Zijlstra
2024-12-19 11:20 ` Peter Zijlstra
2024-12-19 16:17 ` Suren Baghdasaryan
2024-12-19 17:16 ` Liam R. Howlett
2024-12-19 17:42 ` Peter Zijlstra
2024-12-19 18:18 ` Liam R. Howlett
2024-12-19 18:46 ` Peter Zijlstra
2024-12-19 18:55 ` Liam R. Howlett
2024-12-20 15:22 ` Suren Baghdasaryan
2024-12-23 3:03 ` Suren Baghdasaryan
2024-12-26 17:12 ` Suren Baghdasaryan
2024-12-19 16:14 ` Suren Baghdasaryan
2024-12-19 17:23 ` Peter Zijlstra
2024-12-19 8:55 ` Peter Zijlstra
2024-12-19 16:08 ` Suren Baghdasaryan
2024-12-19 8:53 ` Peter Zijlstra
2024-12-19 16:08 ` Suren Baghdasaryan
2024-12-18 15:57 ` Suren Baghdasaryan
2024-12-18 16:13 ` Peter Zijlstra
2024-12-18 15:42 ` Suren Baghdasaryan [this message]
2024-12-16 19:24 ` [PATCH v6 11/16] mm: enforce vma to be in detached state before freeing Suren Baghdasaryan
2024-12-16 21:16 ` Peter Zijlstra
2024-12-16 21:18 ` Peter Zijlstra
2024-12-16 21:57 ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 12/16] mm: remove extra vma_numab_state_init() call Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 13/16] mm: introduce vma_ensure_detached() Suren Baghdasaryan
2024-12-17 10:26 ` Peter Zijlstra
2024-12-17 15:58 ` Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 14/16] mm: prepare lock_vma_under_rcu() for vma reuse possibility Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 15/16] mm: make vma cache SLAB_TYPESAFE_BY_RCU Suren Baghdasaryan
2024-12-16 19:24 ` [PATCH v6 16/16] docs/mm: document latest changes to vm_lock Suren Baghdasaryan
2024-12-16 19:39 ` [PATCH v6 00/16] move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-12-17 18:42 ` Andrew Morton
2024-12-17 18:49 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAJuCfpHkTScPqME=++tzv=qu0aev-ewprLa=GGw8Qp4PQtTL9w@mail.gmail.com' \
--to=surenb@google.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dave@stgolabs.net \
--cc=david@redhat.com \
--cc=dhowells@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=hughd@google.com \
--cc=jannh@google.com \
--cc=kernel-team@android.com \
--cc=klarasmodin@gmail.com \
--cc=liam.howlett@oracle.com \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lokeshgidra@google.com \
--cc=lorenzo.stoakes@oracle.com \
--cc=mgorman@techsingularity.net \
--cc=mhocko@suse.com \
--cc=minchan@google.com \
--cc=mjguzik@gmail.com \
--cc=oleg@redhat.com \
--cc=oliver.sang@intel.com \
--cc=pasha.tatashin@soleen.com \
--cc=paulmck@kernel.org \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=shakeel.butt@linux.dev \
--cc=souravpanda@google.com \
--cc=vbabka@suse.cz \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox