From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 18A49C3DA78
	for <linux-mm@archiver.kernel.org>; Tue, 17 Jan 2023 03:16:52 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 99F3E6B0075; Mon, 16 Jan 2023 22:16:51 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 94FAF6B0078; Mon, 16 Jan 2023 22:16:51 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 83E216B007B; Mon, 16 Jan 2023 22:16:51 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12])
	by kanga.kvack.org (Postfix) with ESMTP id 7102D6B0075
	for <linux-mm@kvack.org>; Mon, 16 Jan 2023 22:16:51 -0500 (EST)
Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id 4870F406CC
	for <linux-mm@kvack.org>; Tue, 17 Jan 2023 03:16:51 +0000 (UTC)
X-FDA: 80362829022.13.D04B033
Received: from mail3-167.sinamail.sina.com.cn (mail3-167.sinamail.sina.com.cn [202.108.3.167])
	by imf06.hostedemail.com (Postfix) with ESMTP id 2FB2418000A
	for <linux-mm@kvack.org>; Tue, 17 Jan 2023 03:16:47 +0000 (UTC)
Authentication-Results: imf06.hostedemail.com;
	dkim=none;
	dmarc=none;
	spf=pass (imf06.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.167 as permitted sender) smtp.mailfrom=hdanton@sina.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1673925409;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=qbRcSLuLCK2OS4W2pw1nXF4w9d08FOFknWxQmEV0B3o=;
	b=21fJ3ynBEJAQj9ZsMoSQhbmTyC/6qv2PjIZYjAAxjE/ihVmPFB3eaNRNzyF4EAnFpPRr1L
	PE7+iwgHVaCGYfbjycsJ/GQ0W4KNw43CEdYGvX9CjSMJKTktgvV06/mGt6X3SjFe0ROgr2
	AyBpeZCq+7JZaSZRhTpCdLoz8EgN8+Y=
ARC-Authentication-Results: i=1;
	imf06.hostedemail.com;
	dkim=none;
	dmarc=none;
	spf=pass (imf06.hostedemail.com: domain of hdanton@sina.com designates 202.108.3.167 as permitted sender) smtp.mailfrom=hdanton@sina.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1673925409; a=rsa-sha256;
	cv=none;
	b=76zswu5c8opAa8jFI3/STuW31FgVBS4rKDnd79fJ8WoyGlX+thZrIuEyljlCAD1287qc7l
	I/xLfYUpIaktMIl8AC/a3wp1BTkLnO31GJclUzxFE72aElxi4NuRc3MYesTUAHmFV9f1dv
	X77RsGLk0+sJFcavfPECqlu30Ea52sQ=
Received: from unknown (HELO localhost.localdomain)([114.249.61.130])
	by sina.com (172.16.97.32) with ESMTP
	id 63C6123800012EBE; Tue, 17 Jan 2023 11:12:57 +0800 (CST)
X-Sender: hdanton@sina.com
X-Auth-ID: hdanton@sina.com
X-SMAIL-MID: 557301628978
From: Hillf Danton <hdanton@sina.com>
To: Suren Baghdasaryan <surenb@google.com>
Cc: vbabka@suse.cz,
	hannes@cmpxchg.org,
	mgorman@techsingularity.net,
	peterz@infradead.org,
	hughd@google.com,
	linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH 41/41] mm: replace rw_semaphore with atomic_t in vma_lock
Date: Tue, 17 Jan 2023 11:16:32 +0800
Message-Id: <20230117031632.2321-1-hdanton@sina.com>
In-Reply-To: <CAJuCfpHoHcZxQZgt4Ki1kiBu9O+sANZQambOa+1gSQu2brPoyA@mail.gmail.com>
References: <20230109205336.3665937-42-surenb@google.com> <20230116140649.2012-1-hdanton@sina.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
X-Rspamd-Queue-Id: 2FB2418000A
X-Rspamd-Server: rspam09
X-Rspam-User: 
X-Stat-Signature: 9gxzwuq64b6sawcfct4ym3ghnm61e17z
X-HE-Tag: 1673925407-75379
X-HE-Meta: U2FsdGVkX1915ZibeaJxnziaHVoC2+sOhijmFqHR33g0WsYJG01JbTwj92Pp+eDoI2USyxnHhlA4PHXPjU+QWPAWWDRbo1XHIRQ1Cgw97ytHLA3gT2leKRGQoAbOx3aJbAhRInaZphM871F0D3fei7kedZsG3A4ni2WQdEuBzw/nqmKLJY3uoNyAzQBVsCHplcPW6QeC9vFoziq3wQfj9ZWguU8MbJlGsNWi4muf3TfZcCVXP4nm0SkPp8XO3CcvyG/4zltu63GFksUqFrwEO5yNOqvu7xTIbX5URUKNxzHEBFTxDExeS/hK5LM6vWHjZORNGjqEUTYSPUcwM2njLLaydEl4T1DL5lecv8LSwgBRjfCcSa/lsRQJWno5smFp9ZWcdqspORfi+uF+QraI+P6RcCCWsKMGcW21JdX5/uBC96G2BMmnUbPKz6b7p/Y55aX7gBi8ndVpKoTEmAq5VcS8+PlfbOmLufYcrej7eLzToAha4L7H+QO2GCWMoUYrSLjmsIjoi88EHz5101T5gSzGU6CzeNqfT4r0JU/LAQAr6B3mlkJpshJMmAyp37rED+B2pohmlbVUCr+7A85xKomO0bE82Lzuv0xG/Rp33smMt4F//qQyAm8d8LgCD7XRYV62FNumFGtiJUYpHQoXRK9vr4imUztOCyGFEB+dq3byRsqROmXGnMEU9+SmKPnnFyHs90+R8AJgUO3XRNF4vQuVvqXesgmOyKIg4TkJPZAEovgZVUQ32d9gdWvTXk9BwUWXyz4ETDoKvromoNn4Z4dOK63OictzR5P5nY8ByEjzoQK0BFs9AyX0yNqsBdhB+39g3XNH/h/Nx0lCALUV+g==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, 16 Jan 2023 15:08:48 -0800 Suren Baghdasaryan <surenb@google.com>
> On Mon, Jan 16, 2023 at 6:07 AM Hillf Danton <hdanton@sina.com> wrote:
> > On Mon, 9 Jan 2023 12:53:36 -0800 Suren Baghdasaryan <surenb@google.com>
> > >  static inline bool vma_read_trylock(struct vm_area_struct *vma)
> > >  {
> > >       /* Check before locking. A race might cause false locked result. */
> > > -     if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > > +     if (vma->vm_lock->lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))
> > >               return false;
> >
> > Add mb to pair with the above wmb like
> 
> The wmb above is to ensure the ordering between updates of lock_seq
> and vm_lock->count (lock_seq is updated first and vm_lock->count only
> after that). The first access to vm_lock->count in this function is
> atomic_inc_unless_negative() and it's an atomic RMW operation with a
> return value. According to documentation such functions are fully
> ordered, therefore I think we already have an implicit full memory
> barrier between reads of lock_seq and vm_lock->count here. Am I wrong?

No you are not. Revisit it in case of full mb not ensured.

#ifndef arch_atomic_inc_unless_negative
static __always_inline bool
arch_atomic_inc_unless_negative(atomic_t *v)
{
	int c = arch_atomic_read(v);

	do {
		if (unlikely(c < 0))
			return false;
	} while (!arch_atomic_try_cmpxchg(v, &c, c + 1));

	return true;
}
#define arch_atomic_inc_unless_negative arch_atomic_inc_unless_negative
#endif

> >         int count = atomic_read(&vma->vm_lock->count);
> >         for (;;) {
> >                 int new = count + 1;
> >
> >                 if (count < 0 || new < 0)
> >                         return false;
> >
> >                 new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
> >                 if (new == count)
> >                         break;
> >                 count = new;
> >                 cpu_relax();
> >         }
> >
> >         (wake up waiting readers after taking the lock;
> >         but the write lock owner before this read trylock should be
> >         responsible for waking waiters up.)
> >
> >         lock acquire for read.
> 
> This schema might cause readers to wait, which is not an exact

Could you specify a bit more on wait?

> replacement for down_read_trylock(). The requirement to wake up
> waiting readers also complicates things

If the writer lock owner is preempted by a reader while releasing lock,

	set count to zero
			  <-- preempt
	wake up waiters

then lock is owned by reader but with read waiters.

That is buggy if write waiter starvation is allowed in this patchset.

> and since we can always fall
> back to mmap_lock, that complication is unnecessary IMHO. I could use
> part of your suggestion like this:
> 
>                  int new = count + 1;
> 
>                  if (count < 0 || new < 0)
>                          return false;
> 
>                  new = atomic_cmpxchg(&vma->vm_lock->count, count, new);
>                  if (new == count)
>                          return false;
> 
> Compared to doing atomic_inc_unless_negative() first, like I did
> originally, this schema opens a bit wider window for the writer to get
> in the middle and cause the reader to fail locking but I don't think
> it would result in any visible regression.

It is definitely fine for writer to acquire the lock while reader is
doing trylock, no?