linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Peter Zijlstra <peterz@infradead.org>
To: Suren Baghdasaryan <surenb@google.com>
Cc: Matthew Wilcox <willy@infradead.org>,
	akpm@linux-foundation.org, liam.howlett@oracle.com,
	lorenzo.stoakes@oracle.com, mhocko@suse.com, vbabka@suse.cz,
	hannes@cmpxchg.org, mjguzik@gmail.com, oliver.sang@intel.com,
	mgorman@techsingularity.net, david@redhat.com, peterx@redhat.com,
	oleg@redhat.com, dave@stgolabs.net, paulmck@kernel.org,
	brauner@kernel.org, dhowells@redhat.com, hdanton@sina.com,
	hughd@google.com, minchan@google.com, jannh@google.com,
	shakeel.butt@linux.dev, souravpanda@google.com,
	pasha.tatashin@soleen.com, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock
Date: Fri, 13 Dec 2024 10:22:23 +0100	[thread overview]
Message-ID: <20241213092223.GB2484@noisy.programming.kicks-ass.net> (raw)
In-Reply-To: <CAJuCfpHKFZ2Q1R1Knh-LFLUYcTX6CJuEsqNM5AwxRyDUAzdcVw@mail.gmail.com>

On Thu, Dec 12, 2024 at 06:17:44AM -0800, Suren Baghdasaryan wrote:
> On Thu, Dec 12, 2024 at 1:17 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Wed, Dec 11, 2024 at 07:01:16PM -0800, Suren Baghdasaryan wrote:
> >
> > > > > > I think your proposal should work. Let me try to code it and see if
> > > > > > something breaks.
> > >
> > > Ok, I tried it out and things are a bit more complex:
> > > 1. We should allow write-locking a detached VMA, IOW vma_start_write()
> > > can be called when vm_refcnt is 0.
> >
> > This sounds dodgy, refcnt being zero basically means the object is dead
> > and you shouldn't be touching it no more. Where does this happen and
> > why?
> >
> > Notably, it being 0 means it is no longer in the mas tree and can't be
> > found anymore.
> 
> It happens when a newly created vma that was not yet attached
> (vma->vm_refcnt = 0) is write-locked before being added into the vma
> tree. For example:
> mmap()
>   mmap_write_lock()
>   vma = vm_area_alloc() // vma->vm_refcnt = 0 (detached)
>   //vma attributes are initialized
>   vma_start_write() // write 0x8000 0001 into vma->vm_refcnt
>   mas_store_gfp()
>   vma_mark_attached()
>   mmap_write_lock() // vma_end_write_all()
> 
> In this scenario, we write-lock the VMA before adding it into the tree
> to prevent readers (pagefaults) from using it until we drop the
> mmap_write_lock().

Ah, but you can do that by setting vma->vm_lock_seq and setting the ref
to 1 before adding it (its not visible before adding anyway, so nobody
cares).

You'll note that the read thing checks both the msb (or other high bit
depending on the actual type you're going with) *and* the seq. That is
needed because we must not set the sequence number before all existing
readers are drained, but since this is pre-add that is not a concern.

> > > 2. Adding 0x80000000 saturates refcnt, so I have to use a lower bit
> > > 0x40000000 to denote writers.
> >
> > I'm confused, what? We're talking about atomic_t, right?
> 
> I thought you suggested using refcount_t. According to
> https://elixir.bootlin.com/linux/v6.13-rc2/source/include/linux/refcount.h#L22
> valid values would be [0..0x7fff_ffff] and 0x80000000 is outside of
> that range. What am I missing?

I was talking about atomic_t :-), but yeah, maybe we can use refcount_t,
but I hadn't initially considered that.



  parent reply	other threads:[~2024-12-13  9:22 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-11 20:55 [PATCH 0/4] move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-11-11 20:55 ` [PATCH 1/4] mm: introduce vma_start_read_locked{_nested} helpers Suren Baghdasaryan
2024-11-11 20:55 ` [PATCH 2/4] mm: move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-11-12  9:36   ` kernel test robot
2024-11-12  9:47   ` kernel test robot
2024-11-12 10:18   ` kernel test robot
2024-11-12 15:57   ` Vlastimil Babka
2024-11-12 16:08     ` Suren Baghdasaryan
2024-11-12 16:58       ` Suren Baghdasaryan
2024-11-11 20:55 ` [PATCH 3/4] mm: replace rw_semaphore with atomic_t in vma_lock Suren Baghdasaryan
2024-11-12  0:06   ` Andrew Morton
2024-11-12  0:52     ` Suren Baghdasaryan
2024-11-12  0:35   ` Davidlohr Bueso
2024-11-12  0:59     ` Suren Baghdasaryan
2024-11-12  4:58   ` Matthew Wilcox
2024-11-12 15:18     ` Suren Baghdasaryan
2024-12-10 22:38       ` Peter Zijlstra
2024-12-10 23:37         ` Suren Baghdasaryan
2024-12-11  8:25           ` Peter Zijlstra
2024-12-11 15:20             ` Suren Baghdasaryan
2024-12-12  3:01               ` Suren Baghdasaryan
2024-12-12  9:16                 ` Peter Zijlstra
2024-12-12 14:17                   ` Suren Baghdasaryan
2024-12-12 14:19                     ` Suren Baghdasaryan
2024-12-13  4:48                       ` Suren Baghdasaryan
2024-12-13  9:57                         ` Peter Zijlstra
2024-12-13 17:45                           ` Suren Baghdasaryan
2024-12-13 18:19                             ` Matthew Wilcox
2024-12-13 18:35                             ` Peter Zijlstra
2024-12-13 18:37                               ` Suren Baghdasaryan
2024-12-16 19:32                                 ` Suren Baghdasaryan
2024-12-13  9:22                     ` Peter Zijlstra [this message]
2024-12-13 17:41                       ` Suren Baghdasaryan
2024-11-11 20:55 ` [PATCH 4/4] mm: move lesser used vma_area_struct members into the last cacheline Suren Baghdasaryan
2024-11-12 10:07   ` Vlastimil Babka
2024-11-12 15:45     ` Suren Baghdasaryan
2024-11-11 21:41 ` [PATCH 0/4] move per-vma lock into vm_area_struct Suren Baghdasaryan
2024-11-12  2:48   ` Liam R. Howlett
2024-11-12  3:27     ` Suren Baghdasaryan
2024-11-11 22:18 ` Davidlohr Bueso
2024-11-11 23:19   ` Suren Baghdasaryan
2024-11-12  0:03     ` Davidlohr Bueso
2024-11-12  0:43       ` Suren Baghdasaryan
2024-11-12  0:11     ` Andrew Morton
2024-11-12  0:48       ` Suren Baghdasaryan
2024-11-12  2:35 ` Liam R. Howlett
2024-11-12  3:23   ` Suren Baghdasaryan
2024-11-12  9:39 ` Lorenzo Stoakes
2024-11-12 15:38   ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241213092223.GB2484@noisy.programming.kicks-ass.net \
    --to=peterz@infradead.org \
    --cc=akpm@linux-foundation.org \
    --cc=brauner@kernel.org \
    --cc=dave@stgolabs.net \
    --cc=david@redhat.com \
    --cc=dhowells@redhat.com \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=hughd@google.com \
    --cc=jannh@google.com \
    --cc=kernel-team@android.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=minchan@google.com \
    --cc=mjguzik@gmail.com \
    --cc=oleg@redhat.com \
    --cc=oliver.sang@intel.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=paulmck@kernel.org \
    --cc=peterx@redhat.com \
    --cc=shakeel.butt@linux.dev \
    --cc=souravpanda@google.com \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox