From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 96460EEB580 for ; Sun, 10 Sep 2023 22:56:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8AFF96B0153; Sun, 10 Sep 2023 18:56:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 85FE96B0154; Sun, 10 Sep 2023 18:56:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 727FF6B0155; Sun, 10 Sep 2023 18:56:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 6371A6B0153 for ; Sun, 10 Sep 2023 18:56:53 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 1C5241A07B5 for ; Sun, 10 Sep 2023 22:56:52 +0000 (UTC) X-FDA: 81222199464.09.67CA1F9 Received: from mail-ot1-f41.google.com (mail-ot1-f41.google.com [209.85.210.41]) by imf14.hostedemail.com (Postfix) with ESMTP id 324DD10000A for ; Sun, 10 Sep 2023 22:56:50 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="vo0c2th/"; spf=pass (imf14.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1694386610; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+TPanxOg8S3V8mMy6Pp/m7xqBOLfWHi+HzjTxmwzObk=; b=5hDXZvpKA+U0WzR2PsUs0em2/GLiQRhvpVZC1owYlf3SIu6CJMPRvguNo99jGgdCRV4ks8 umuA2JI/Uy73n0CJV3NTFeW528KLoyLfIVHsDUdHEZJtm+rM3ZEzGTNgQCFTuaqeBoufIO heYg3B9CbDho5PNCZNSdcE1vp4R5uYw= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1694386610; a=rsa-sha256; cv=none; b=zfrXzIeHqS9FD+qby6C6dEJFRzsiegWQT4Rtnbv+qSTMElxKqpot818XdsGjtvjQRILSVA P2zWo7nCPVJ3UQcYhwWFqQz32cZcPTIZAPW/rDDLzadM0Oc9XQ2k9eub9jkIjvl/a4rEWi Wbzj+ZkmGHETPsJURRufTsFHLwPl/Zo= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="vo0c2th/"; spf=pass (imf14.hostedemail.com: domain of david@fromorbit.com designates 209.85.210.41 as permitted sender) smtp.mailfrom=david@fromorbit.com; dmarc=pass (policy=quarantine) header.from=fromorbit.com Received: by mail-ot1-f41.google.com with SMTP id 46e09a7af769-6bd04558784so2708080a34.3 for ; Sun, 10 Sep 2023 15:56:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1694386609; x=1694991409; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=+TPanxOg8S3V8mMy6Pp/m7xqBOLfWHi+HzjTxmwzObk=; b=vo0c2th/xRA6s+mu3j4Qvu3bhYr/vQgOCuqJYtVMP3e/ZHE/30HPCMudvPXEQ7Vk7Z En+ddiMs+bQqBn63EwFHedMmftOQn4Q5SlNlrxm/slRMVMum5ygSB4A/nuWpmBSz42gq mOyzmmVk+fkBDHAOCc0Xol1GoybAsNbiIiHxR6aGD9PomIpixfiO9xAXh5d+WJ0ARzKA 4bUzoU1FNZxJuNPjpZG+Ss5NTpfqrSuG2TSk7D13QV2UIXJJS49pH7iA8Csz/+O6NY+w Ri5UnhnJZGZdLaz/3yNNYHcOBLGoD4MMV7J7B9WWYOB4JunNe8RIhspq6kElHrHISCq4 ze5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1694386609; x=1694991409; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=+TPanxOg8S3V8mMy6Pp/m7xqBOLfWHi+HzjTxmwzObk=; b=I/LanVzNOTdPFKOkm4I3/Zn29ToWMg8PfNgEWdIvvTU+rY/LqwZuB/RUYuvmDTGv37 8QiD4PvpFCLPuCBeqJ4lV08U/CPAp083xdndgQ2B9+bBJ+tWCreuYcIR0GxJv21NlrK3 yxmnn3IukMk+M35MFd5qEVHzO3znW5FaL4jM6gOvLAjG5eV/P/GNWjS2KGPWjCc9m6Uu vTc7wA+dDkSRmj41wDHyA2bC44N34Et3stogJAuEVcM0Hjs7EGv3cCfiCCHFtBDL/SMy Bes0anfDc7Iu9oJpBrVD5Uepjdafk3Mq57dDpg3kz67IPSOWb13x3K95dgOqaxT1bC+Y zc5w== X-Gm-Message-State: AOJu0YwMjwYfNWv0G5XMjgW3bQMt1i4EtIevUFHJSJcxoNL51uxgTRMU 1Xl6OCxykfpd5+ToVcp5Hoq17w== X-Google-Smtp-Source: AGHT+IG/p9+fYYrOaawvIfMev9QnWOl8/CdwqzWwYldOYP2/A4VHMvM9tDx8xTlaFVw6ZOc1ZFKwkw== X-Received: by 2002:a05:6358:3422:b0:134:c1eb:8744 with SMTP id h34-20020a056358342200b00134c1eb8744mr9289128rwd.9.1694386609107; Sun, 10 Sep 2023 15:56:49 -0700 (PDT) Received: from dread.disaster.area (pa49-195-66-88.pa.nsw.optusnet.com.au. [49.195.66.88]) by smtp.gmail.com with ESMTPSA id x22-20020aa79196000000b0068bc014f352sm4378359pfa.7.2023.09.10.15.56.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 10 Sep 2023 15:56:48 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qfTMD-00DXC8-24; Mon, 11 Sep 2023 08:56:45 +1000 Date: Mon, 11 Sep 2023 08:56:45 +1000 From: Dave Chinner To: Peter Zijlstra Cc: Matthew Wilcox , Ingo Molnar , Will Deacon , Waiman Long , linux-kernel@vger.kernel.org, linux-mm@kvack.org, Chandan Babu R , "Darrick J . Wong" , linux-xfs@vger.kernel.org Subject: Re: [PATCH 1/5] locking: Add rwsem_is_write_locked() Message-ID: References: <20230907174705.2976191-1-willy@infradead.org> <20230907174705.2976191-2-willy@infradead.org> <20230907190810.GA14243@noisy.programming.kicks-ass.net> <20230907193838.GB14243@noisy.programming.kicks-ass.net> <20230908104434.GB24372@noisy.programming.kicks-ass.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230908104434.GB24372@noisy.programming.kicks-ass.net> X-Stat-Signature: sy4eajafjs94bbggipecmook7x9nq6n7 X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 324DD10000A X-Rspam-User: X-HE-Tag: 1694386610-34802 X-HE-Meta: U2FsdGVkX1/AO+GMjhS56gc4Eav3fTGKdhncp/aNWxW/yxBLs3rjc4LNyxDqKDUKrJvAUgrEcvaTE3gkxfK5m+OAErFZEzrrE3VqPfpGGgB6+f3FKdSBDrfu5pE9wXmhCVhV4ZcE6KicM3+e+5otVp/KRLdBeLklbCfC0Ri/woyJQgIl8QQJMNHyf+wxEAa8CaHT0BBq0Q0IJJf3vxVBG/fv1fDsUMCBT5kfvEnLxbMa0NfwMmPx7r+TCYbIYGv1TU7sUZex48y9MC9fj3KTQSEIT6ivvR0HOIjQYfeVg2jarGPGDmiQGbMhGC+m7jYMQoWNYgqaWd2hTadh1w+mSUm07XFNwps75sLNkuV/1wewHaFM5dagY0Rwy9I6R9up4Q2Vh4niAw/C8hqrxJVsYFl8LNvkQWiUrgsPDwJFPb4TrKaV2j7YV7km8xA6LV0xp2MjrQgpRTHaYmPl8M5tc7BPkVLwI2QcGrIu1c1QpuyNtS7bNp00z0Hw2Frq5j4yZIb1oVFy7zyd6WI4B6CPaT0dhhJFK/wGZw4Wg6vLT8o99S2iWc26qn4SvYsSkhmu73BBMfT5XVS3AgaeWub4X3Byq/Bi1nMDc5N4aA+uY4decO3PZuA4UWHVajkK03hYSsvMapwYbgNV4rK3lAysCM229/5HiFlEnkEzAM3tiPCH6ceqtIL4iF/epLO/dTdzQe8DBPdAPz3fwZSyLBUJXxt35mxObvS94ndSPaVESOPpnMkYqErAahB2e1D5Ubnc8nydhFHae1k2cUIGagfcCaQ8woTpGsF5uodm7JrljN9zhRcZjDNJRN8nhVFZwSTy8OM1p4BSbKe9I/x8Sepfj3V3073P5N1A/C5ve6WbRp9qbaWeDtNbNvbx486QdvgAQrbH37S0RjnoB9Xf22HqX2z4bp+oz46uiWclFSaWizvDiJMQbsai91G7wJcNVKOPOK+mRvNxHdpbzJQyF0g lkzOXC20 l9vPBlx3pM8BgjhPXXV2U37xLNl7CQkogglTbtc4stnSVxvWbaUMOqPWthEpzK70tr9yWl2aVhVmm+7CWoTCbQIjls8ZrVd0afjKdjNjdGlkWd1AvCXlSHRMdc7MSXWJ2EOQLQxFygK3+a/sBUAcg6FJUf9U3Q9KDeDFtU8NP3MUPRWsSpVI62O9DtXV7euL4sNdefTouiZY34fRKksP4wjxbgDOFdMT6GBH2bCQCpvdO+/xdvLYB1YUOcWND/wzsk1ijzciRyp6528mlWvrA9pURaxsSGix8+TddUmu0xe4cu6nJp1WyefeCWSL4QPkzxbZmSz3PDE9Ex4bVx7xgh4bc8hNqsiNcN6Dm6FWyJhxGM/y3s3OPjlxcLLZT3M3hzQ99blBJ8GacD9WnCnn6p/8EpbYMh2zKEKDSkklxF56MpLd0nKmXmiQKMN2wjI4qAymkX4V3XgLhA8X9pc+9ApWixOSLzboCKThQcB+4QyHYR0hUsV2+24FXUg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Sep 08, 2023 at 12:44:34PM +0200, Peter Zijlstra wrote: > On Fri, Sep 08, 2023 at 09:00:08AM +1000, Dave Chinner wrote: > > > > Right, but if you're not the lock owner, your answer to the question is > > > a dice-roll, it might be locked, it might not be. > > > > Except that the person writing the code knows the call chain that > > leads up to that code, and so they have a pretty good idea whether > > the object should be locked or not. If we are running that code, and > > the object is locked, then it's pretty much guaranteed that the > > owner of the lock is code that executed the check, because otherwise > > we have a *major lock implementation bug*. > > Agreed, and this is fine. However there's been some very creative > 'use' of the _is_locked() class of functions in the past that did not > follow 'common' sense. > > If all usage was: I should be holding this, lets check. I probably > wouldn't have this bad feeling about things. So your argument against such an interface is essentially "we can't have nice things because someone might abuse them"? > > > Most devs should run with lockdep on when writing new code, and I know > > > the sanitizer robots run with lockdep on. > > > > > > In general there seems to be a ton of lockdep on coverage. > > > > *cough* > > > > Bit locks, semaphores, and all sorts of other constructs for IO > > serialisation (like inode_dio_wait()) have no lockdep coverage at > > all. IOWs, large chunks of many filesystems, the VFS and the VM have > > little to no lockdep coverage at all. > > True, however I was commenting on the assertion that vm code has > duplicate asserts with the implication that was because not a lot of > people run with lockdep on. I think that implication is pretty much spot on the money for any subsystem that has significant correctness testing overhead. A single fstests regression test pass for a single configuration for XFS takes well over 4 hours to run. If I add lockdep, it's about 7 hours. If I add lockdep and KASAN, it's closer to 12 hours. It just isn't viable to run test kernels with these options day-in, day-out. Maybe once a release I'll run a weekend sanity check with them enabled, but otherwise the rare issue they find just isn't worth the cost of enabling them.... > > > > we also have VM_BUG_ON_MM(!rwsem_is_write_locked(&mm->mmap_lock), mm) > > > > to give us a good assertion when lockdep is disabled. > > > > > > Is that really worth it still? I mean, much of these assertions pre-date > > > lockdep. > > > > And we're trying to propagate them because lockdep isn't a viable > > option for day to day testing of filesystems because of it's > > overhead vs how infrequently it finds new problems. > > ... in XFS. Lockdep avoids a giant pile of broken from entering the > kernel and the robots still report plenty. Nobody is suggesting that lockdep gets replaced by these functions. They are *in addition* to lockdep, and are used to give use 99.9% certainty that locks are being used correctly without adding any runtime overhead at all. That's the whole point - it is simple introspection code that will find most gross locking mistakes that people make very quickly, without any real costs. If you're worried about people abusing introspection code, then you're forgetting that they can just dig directly at the rwsem guts directly themselves. Adding an interface that does introspection right and enables the internal imp[lementation to change without breaking anything is a no brainer; it stops people from just digging in the guts of the structure and guaranteeing that code will break if the implementation changes... > > > > XFS has a problem with using lockdep in general, which is that a worker > > > > thread can be spawned and use the fact that the spawner is holding the > > > > lock. There's no mechanism for the worker thread to ask "Does struct > > > > task_struct *p hold the lock?". > > > > > > Will be somewhat tricky to make happen -- but might be doable. It is > > > however an interface that is *very* hard to use correctly. Basically I > > > think you want to also assert that your target task 'p' is blocked, > > > right? > > > > > > That is: assert @p is blocked and holds @lock. > > > > That addresses the immediate symptom; it doesn't address the large > > problem with lockdep and needing non-owner rwsem semantics. > > > > i.e. synchronous task based locking models don't work for > > asynchronous multi-stage pipeline processing engines like XFS. The > > lock protects the data object and follows the data object through > > the processing pipeline, whilst the original submitter moves on to > > the next operation to processes without blocking. > > > > This is the non-blocking, async processing model that io_uring > > development is pushing filesystems towards, so assuming that we only > > hand a lock to a single worker task and then wait for it complete > > (i.e. synchronous operation) flies in the face of current > > development directions... > > I was looking at things from an interface abuse perspective. How easy is > it to do the wrong thing. As said, we've had a bunch of really dodgy > code with the _is_locked class of functions, hence my desire to find > something else. > > As to the whole non-owner locking, yes, that's problematic. I'm not > convinced async operations require non-owner locking, at the same time I > do see that IO completions pose a challence. > > Coming from the schedulability and real-time corner, non-owner locks are > a nightmare because of the inversions. So yeah, fun to be had I'm sure. I'm not sure you understand the scope of the problem with modern filesystems vs RT processing. The moment code enters a modern filesystem, it gives up all hope of real-time response guarantees. There is currently nothing a RT process can do but wait for the filesystem to finish with the locks it holds, and the wait times are effectively unbound because there may be a requirement for tens of thousands of IOs to be done before the lock is dropped and the RT task can make progress. Priority inheritance for the lock owner won't make any difference here, because the latency is not caused by something running on a CPU. IOWs, lock inversions and non-owner locks are the very least of the problems for RT priority apps when it comes to filesystem operations. The solution for RT priority apps avoiding priority inversions in filesystems is going be io_uring. i.e. the initial NOWAIT operation is done with RT priority in the RT task itself, but if that is going to be blocked it gets punted to a background worker for async processing and the RT priority task goes on to processing the next thing it needs to do. All the background async operations are performed with the same (non-RT) priority and we just don't need to care about priority inversions or the problems RT has with non-owner lock contexts. The RT tasks themselves don't care, either, because they don't ever get stuck waiting on a filesystem lock that a lower priority task might hold, or get stuck on an operation that might require unbound amounts of IO to complete (e.g. transaction reservations). IOWs, if we want to make "RT with filesystems" a reality, we need to stop worrying about constraining lock implementations and handling priority inversions. Instead, we need to look towards how to make filesystem infrastructure fully non-blocking for RT priority tasks and writing RT applications to use that infrastructure.... Cheers, Dave. -- Dave Chinner david@fromorbit.com