From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 96460EEB580
	for <linux-mm@archiver.kernel.org>; Sun, 10 Sep 2023 22:56:54 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8AFF96B0153; Sun, 10 Sep 2023 18:56:53 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 85FE96B0154; Sun, 10 Sep 2023 18:56:53 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 727FF6B0155; Sun, 10 Sep 2023 18:56:53 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 6371A6B0153
	for <linux-mm@kvack.org>; Sun, 10 Sep 2023 18:56:53 -0400 (EDT)
Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 1C5241A07B5
	for <linux-mm@kvack.org>; Sun, 10 Sep 2023 22:56:52 +0000 (UTC)
X-FDA: 81222199464.09.67CA1F9
Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41])
	by imf14.hostedemail.com (Postfix) with ESMTP id 324DD10000A
	for <linux-mm@kvack.org>; Sun, 10 Sep 2023 22:56:50 +0000 (UTC)
Authentication-Results: imf14.hostedemail.com;
	dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="vo0c2th/";
	spf=pass (imf14.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=david@fromorbit.com;
	dmarc=pass (policy=quarantine) header.from=fromorbit.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1694386610;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=+TPanxOg8S3V8mMy6Pp/m7xqBOLfWHi+HzjTxmwzObk=;
	b=5hDXZvpKA+U0WzR2PsUs0em2/GLiQRhvpVZC1owYlf3SIu6CJMPRvguNo99jGgdCRV4ks8
	umuA2JI/Uy73n0CJV3NTFeW528KLoyLfIVHsDUdHEZJtm+rM3ZEzGTNgQCFTuaqeBoufIO
	heYg3B9CbDho5PNCZNSdcE1vp4R5uYw=
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694386610; a=rsa-sha256;
	cv=none;
	b=zfrXzIeHqS9FD+qby6C6dEJFRzsiegWQT4Rtnbv+qSTMElxKqpot818XdsGjtvjQRILSVA
	P2zWo7nCPVJ3UQcYhwWFqQz32cZcPTIZAPW/rDDLzadM0Oc9XQ2k9eub9jkIjvl/a4rEWi
	Wbzj+ZkmGHETPsJURRufTsFHLwPl/Zo=
ARC-Authentication-Results: i=1;
	imf14.hostedemail.com;
	dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="vo0c2th/";
	spf=pass (imf14.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=david@fromorbit.com;
	dmarc=pass (policy=quarantine) header.from=fromorbit.com
Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-6bd04558784so2708080a34.3
        for <linux-mm@kvack.org>; Sun, 10 Sep 2023 15:56:49 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1694386609; x=1694991409; darn=kvack.org;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to;
        bh=+TPanxOg8S3V8mMy6Pp/m7xqBOLfWHi+HzjTxmwzObk=;
        b=vo0c2th/xRA6s+mu3j4Qvu3bhYr/vQgOCuqJYtVMP3e/ZHE/30HPCMudvPXEQ7Vk7Z
         En+ddiMs+bQqBn63EwFHedMmftOQn4Q5SlNlrxm/slRMVMum5ygSB4A/nuWpmBSz42gq
         mOyzmmVk+fkBDHAOCc0Xol1GoybAsNbiIiHxR6aGD9PomIpixfiO9xAXh5d+WJ0ARzKA
         4bUzoU1FNZxJuNPjpZG+Ss5NTpfqrSuG2TSk7D13QV2UIXJJS49pH7iA8Csz/+O6NY+w
         Ri5UnhnJZGZdLaz/3yNNYHcOBLGoD4MMV7J7B9WWYOB4JunNe8RIhspq6kElHrHISCq4
         ze5w==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20230601; t=1694386609; x=1694991409;
        h=in-reply-to:content-disposition:mime-version:references:message-id
         :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date
         :message-id:reply-to;
        bh=+TPanxOg8S3V8mMy6Pp/m7xqBOLfWHi+HzjTxmwzObk=;
        b=I/LanVzNOTdPFKOkm4I3/Zn29ToWMg8PfNgEWdIvvTU+rY/LqwZuB/RUYuvmDTGv37
         8QiD4PvpFCLPuCBeqJ4lV08U/CPAp083xdndgQ2B9+bBJ+tWCreuYcIR0GxJv21NlrK3
         yxmnn3IukMk+M35MFd5qEVHzO3znW5FaL4jM6gOvLAjG5eV/P/GNWjS2KGPWjCc9m6Uu
         vTc7wA+dDkSRmj41wDHyA2bC44N34Et3stogJAuEVcM0Hjs7EGv3cCfiCCHFtBDL/SMy
         Bes0anfDc7Iu9oJpBrVD5Uepjdafk3Mq57dDpg3kz67IPSOWb13x3K95dgOqaxT1bC+Y
         zc5w==
X-Gm-Message-State: AOJu0YwMjwYfNWv0G5XMjgW3bQMt1i4EtIevUFHJSJcxoNL51uxgTRMU
	1Xl6OCxykfpd5+ToVcp5Hoq17w==
X-Google-Smtp-Source: AGHT+IG/p9+fYYrOaawvIfMev9QnWOl8/CdwqzWwYldOYP2/A4VHMvM9tDx8xTlaFVw6ZOc1ZFKwkw==
X-Received: by 2002:a05:6358:3422:b0:134:c1eb:8744 with SMTP id h34-20020a056358342200b00134c1eb8744mr9289128rwd.9.1694386609107;
        Sun, 10 Sep 2023 15:56:49 -0700 (PDT)
Received: from dread.disaster.area (pa49-195-66-88.pa.nsw.optusnet.com.au. [49.195.66.88])
        by smtp.gmail.com with ESMTPSA id x22-20020aa79196000000b0068bc014f352sm4378359pfa.7.2023.09.10.15.56.48
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Sun, 10 Sep 2023 15:56:48 -0700 (PDT)
Received: from dave by dread.disaster.area with local (Exim 4.96)
	(envelope-from <david@fromorbit.com>)
	id 1qfTMD-00DXC8-24;
	Mon, 11 Sep 2023 08:56:45 +1000
Date: Mon, 11 Sep 2023 08:56:45 +1000
From: Dave Chinner <david@fromorbit.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Matthew Wilcox <willy@infradead.org>, Ingo Molnar <mingo@redhat.com>,
	Will Deacon <will@kernel.org>, Waiman Long <longman@redhat.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Chandan Babu R <chandan.babu@oracle.com>,
	"Darrick J . Wong" <djwong@kernel.org>, linux-xfs@vger.kernel.org
Subject: Re: [PATCH 1/5] locking: Add rwsem_is_write_locked()
Message-ID: <ZP5JrYOge3tSAvj7@dread.disaster.area>
References: <20230907174705.2976191-1-willy@infradead.org>
 <20230907174705.2976191-2-willy@infradead.org>
 <20230907190810.GA14243@noisy.programming.kicks-ass.net>
 <ZPoift7B3UDQgmWB@casper.infradead.org>
 <20230907193838.GB14243@noisy.programming.kicks-ass.net>
 <ZPpV+MeFqX6RHIYw@dread.disaster.area>
 <20230908104434.GB24372@noisy.programming.kicks-ass.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <20230908104434.GB24372@noisy.programming.kicks-ass.net>
X-Stat-Signature: sy4eajafjs94bbggipecmook7x9nq6n7
X-Rspamd-Server: rspam10
X-Rspamd-Queue-Id: 324DD10000A
X-Rspam-User: 
X-HE-Tag: 1694386610-34802
X-HE-Meta: U2FsdGVkX1/AO+GMjhS56gc4Eav3fTGKdhncp/aNWxW/yxBLs3rjc4LNyxDqKDUKrJvAUgrEcvaTE3gkxfK5m+OAErFZEzrrE3VqPfpGGgB6+f3FKdSBDrfu5pE9wXmhCVhV4ZcE6KicM3+e+5otVp/KRLdBeLklbCfC0Ri/woyJQgIl8QQJMNHyf+wxEAa8CaHT0BBq0Q0IJJf3vxVBG/fv1fDsUMCBT5kfvEnLxbMa0NfwMmPx7r+TCYbIYGv1TU7sUZex48y9MC9fj3KTQSEIT6ivvR0HOIjQYfeVg2jarGPGDmiQGbMhGC+m7jYMQoWNYgqaWd2hTadh1w+mSUm07XFNwps75sLNkuV/1wewHaFM5dagY0Rwy9I6R9up4Q2Vh4niAw/C8hqrxJVsYFl8LNvkQWiUrgsPDwJFPb4TrKaV2j7YV7km8xA6LV0xp2MjrQgpRTHaYmPl8M5tc7BPkVLwI2QcGrIu1c1QpuyNtS7bNp00z0Hw2Frq5j4yZIb1oVFy7zyd6WI4B6CPaT0dhhJFK/wGZw4Wg6vLT8o99S2iWc26qn4SvYsSkhmu73BBMfT5XVS3AgaeWub4X3Byq/Bi1nMDc5N4aA+uY4decO3PZuA4UWHVajkK03hYSsvMapwYbgNV4rK3lAysCM229/5HiFlEnkEzAM3tiPCH6ceqtIL4iF/epLO/dTdzQe8DBPdAPz3fwZSyLBUJXxt35mxObvS94ndSPaVESOPpnMkYqErAahB2e1D5Ubnc8nydhFHae1k2cUIGagfcCaQ8woTpGsF5uodm7JrljN9zhRcZjDNJRN8nhVFZwSTy8OM1p4BSbKe9I/x8Sepfj3V3073P5N1A/C5ve6WbRp9qbaWeDtNbNvbx486QdvgAQrbH37S0RjnoB9Xf22HqX2z4bp+oz46uiWclFSaWizvDiJMQbsai91G7wJcNVKOPOK+mRvNxHdpbzJQyF0g
 lkzOXC20
 l9vPBlx3pM8BgjhPXXV2U37xLNl7CQkogglTbtc4stnSVxvWbaUMOqPWthEpzK70tr9yWl2aVhVmm+7CWoTCbQIjls8ZrVd0afjKdjNjdGlkWd1AvCXlSHRMdc7MSXWJ2EOQLQxFygK3+a/sBUAcg6FJUf9U3Q9KDeDFtU8NP3MUPRWsSpVI62O9DtXV7euL4sNdefTouiZY34fRKksP4wjxbgDOFdMT6GBH2bCQCpvdO+/xdvLYB1YUOcWND/wzsk1ijzciRyp6528mlWvrA9pURaxsSGix8+TddUmu0xe4cu6nJp1WyefeCWSL4QPkzxbZmSz3PDE9Ex4bVx7xgh4bc8hNqsiNcN6Dm6FWyJhxGM/y3s3OPjlxcLLZT3M3hzQ99blBJ8GacD9WnCnn6p/8EpbYMh2zKEKDSkklxF56MpLd0nKmXmiQKMN2wjI4qAymkX4V3XgLhA8X9pc+9ApWixOSLzboCKThQcB+4QyHYR0hUsV2+24FXUg==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote:
> On Fri, Sep 08, 2023 at 09:00:08AM +1000, Dave Chinner wrote:
> 
> > > Right, but if you're not the lock owner, your answer to the question is
> > > a dice-roll, it might be locked, it might not be.
> > 
> > Except that the person writing the code knows the call chain that
> > leads up to that code, and so they have a pretty good idea whether
> > the object should be locked or not. If we are running that code, and
> > the object is locked, then it's pretty much guaranteed that the
> > owner of the lock is code that executed the check, because otherwise
> > we have a *major lock implementation bug*.
> 
> Agreed, and this is fine. However there's been some very creative
> 'use' of the _is_locked() class of functions in the past that did not
> follow 'common' sense.
> 
> If all usage was: I should be holding this, lets check. I probably
> wouldn't have this bad feeling about things.

So your argument against such an interface is essentially "we can't
have nice things because someone might abuse them"?

> > > Most devs should run with lockdep on when writing new code, and I know
> > > the sanitizer robots run with lockdep on.
> > > 
> > > In general there seems to be a ton of lockdep on coverage.
> > 
> > *cough*
> > 
> > Bit locks, semaphores, and all sorts of other constructs for IO
> > serialisation (like inode_dio_wait()) have no lockdep coverage at
> > all. IOWs, large chunks of many filesystems, the VFS and the VM have
> > little to no lockdep coverage at all.
> 
> True, however I was commenting on the assertion that vm code has
> duplicate asserts with the implication that was because not a lot of
> people run with lockdep on.

I think that implication is pretty much spot on the money for any
subsystem that has significant correctness testing overhead. A
single fstests regression test pass for a single configuration for
XFS takes well over 4 hours to run. If I add lockdep, it's about 7
hours. If I add lockdep and KASAN, it's closer to 12 hours. It just
isn't viable to run test kernels with these options day-in, day-out.
Maybe once a release I'll run a weekend sanity check with them
enabled, but otherwise the rare issue they find just isn't worth the
cost of enabling them....


> > > > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm)
> > > > to give us a good assertion when lockdep is disabled.
> > > 
> > > Is that really worth it still? I mean, much of these assertions pre-date
> > > lockdep.
> > 
> > And we're trying to propagate them because lockdep isn't a viable
> > option for day to day testing of filesystems because of it's
> > overhead vs how infrequently it finds new problems.
> 
> ... in XFS. Lockdep avoids a giant pile of broken from entering the
> kernel and the robots still report plenty.

Nobody is suggesting that lockdep gets replaced by these functions.
They are *in addition* to lockdep, and are used to give use 99.9%
certainty that locks are being used correctly without adding any
runtime overhead at all.

That's the whole point - it is simple introspection code that will
find most gross locking mistakes that people make very quickly,
without any real costs. If you're worried about people abusing
introspection code, then you're forgetting that they can just dig
directly at the rwsem guts directly themselves. Adding an interface
that does introspection right and enables the internal
imp[lementation to change without breaking anything is a no brainer;
it stops people from just digging in the guts of the structure and
guaranteeing that code will break if the implementation changes...

> > > > XFS has a problem with using lockdep in general, which is that a worker
> > > > thread can be spawned and use the fact that the spawner is holding the
> > > > lock.  There's no mechanism for the worker thread to ask "Does struct
> > > > task_struct *p hold the lock?".
> > > 
> > > Will be somewhat tricky to make happen -- but might be doable. It is
> > > however an interface that is *very* hard to use correctly. Basically I
> > > think you want to also assert that your target task 'p' is blocked,
> > > right?
> > > 
> > > That is: assert @p is blocked and holds @lock.
> > 
> > That addresses the immediate symptom; it doesn't address the large
> > problem with lockdep and needing non-owner rwsem semantics.
> > 
> > i.e. synchronous task based locking models don't work for
> > asynchronous multi-stage pipeline processing engines like XFS. The
> > lock protects the data object and follows the data object through
> > the processing pipeline, whilst the original submitter moves on to
> > the next operation to processes without blocking.
> > 
> > This is the non-blocking, async processing model that io_uring
> > development is pushing filesystems towards, so assuming that we only
> > hand a lock to a single worker task and then wait for it complete
> > (i.e. synchronous operation) flies in the face of current
> > development directions...
> 
> I was looking at things from an interface abuse perspective. How easy is
> it to do the wrong thing. As said, we've had a bunch of really dodgy
> code with the _is_locked class of functions, hence my desire to find
> something else.
> 
> As to the whole non-owner locking, yes, that's problematic. I'm not
> convinced async operations require non-owner locking, at the same time I
> do see that IO completions pose a challence.
> 
> Coming from the schedulability and real-time corner, non-owner locks are
> a nightmare because of the inversions. So yeah, fun to be had I'm sure.

I'm not sure you understand the scope of the problem with modern
filesystems vs RT processing. The moment code enters a modern
filesystem, it gives up all hope of real-time response guarantees.
There is currently nothing a RT process can do but wait for the
filesystem to finish with the locks it holds, and the wait times are
effectively unbound because there may be a requirement for tens of
thousands of IOs to be done before the lock is dropped and the RT
task can make progress.

Priority inheritance for the lock owner won't make any difference
here, because the latency is not caused by something running on a
CPU.  IOWs, lock inversions and non-owner locks are the very least
of the problems for RT priority apps when it comes to filesystem
operations.

The solution for RT priority apps avoiding priority inversions in
filesystems is going be io_uring. i.e. the initial NOWAIT operation
is done with RT priority in the RT task itself, but if that is going
to be blocked it gets punted to a background worker for async
processing and the RT priority task goes on to processing the next
thing it needs to do.

All the  background async operations are performed with the same
(non-RT) priority and we just don't need to care about priority
inversions or the problems RT has with non-owner lock contexts. The
RT tasks themselves don't care, either, because they don't ever get
stuck waiting on a filesystem lock that a lower priority task might
hold, or get stuck on an operation that might require unbound
amounts of IO to complete (e.g. transaction reservations).

IOWs, if we want to make "RT with filesystems" a reality, we need to
stop worrying about constraining lock implementations and handling
priority inversions. Instead, we need to look towards how to make
filesystem infrastructure fully non-blocking for RT priority tasks
and writing RT applications to use that infrastructure....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com