From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id E0A46106ACEC for ; Thu, 12 Mar 2026 21:44:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 282666B00B5; Thu, 12 Mar 2026 17:44:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2304D6B00B6; Thu, 12 Mar 2026 17:44:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0DA7F6B00B7; Thu, 12 Mar 2026 17:44:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id EFACC6B00B5 for ; Thu, 12 Mar 2026 17:44:30 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 994D8140477 for ; Thu, 12 Mar 2026 21:44:30 +0000 (UTC) X-FDA: 84538740300.25.F9EC492 Received: from flow-b6-smtp.messagingengine.com (flow-b6-smtp.messagingengine.com [202.12.124.141]) by imf18.hostedemail.com (Postfix) with ESMTP id 8B27F1C0009 for ; Thu, 12 Mar 2026 21:44:28 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=ownmail.net header.s=fm1 header.b=HynKVXD7; dkim=pass header.d=messagingengine.com header.s=fm1 header.b=sD1IpxmF; spf=pass (imf18.hostedemail.com: domain of neilb@ownmail.net designates 202.12.124.141 as permitted sender) smtp.mailfrom=neilb@ownmail.net; dmarc=pass (policy=none) header.from=ownmail.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1773351868; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=FHJsH8Kre5d09x8tLYt2s8UM1ZjvIsbiC0DN+DyY4Wk=; b=LIHE8GvKutBuyv2dVBBQVZ7T+I3bod56haQIfLb5Ykvbl0FMY1yyyPlEyg/xURQWt29WAR t12V68C6/v04wQciyVJnUgfGSODzaJrgo/D6M+ph0WDpsv71QwD+IVbGAQrCZsw9P/VkCq APJxpTWPgcuFp1GLjqcKrjsM3ylOwqU= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=ownmail.net header.s=fm1 header.b=HynKVXD7; dkim=pass header.d=messagingengine.com header.s=fm1 header.b=sD1IpxmF; spf=pass (imf18.hostedemail.com: domain of neilb@ownmail.net designates 202.12.124.141 as permitted sender) smtp.mailfrom=neilb@ownmail.net; dmarc=pass (policy=none) header.from=ownmail.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1773351868; a=rsa-sha256; cv=none; b=qypACAHV8YabP8yyVdDkIFu5/1VS+97aoXgNQMlUPmSQ6pi6Mepdn8sAUKQm/hY9GmkwxY GilHAHpQXMQ3zOSTXnUdp9xBiY83KUEGP36LFT2Juk2hZ+5m3YVn8zut7eY7+Xa4ekqJhP 5FeNqE1SidGKaDLFPavywR1zE4e78ro= Received: from phl-compute-05.internal (phl-compute-05.internal [10.202.2.45]) by mailflow.stl.internal (Postfix) with ESMTP id 649AD1300FE6; Thu, 12 Mar 2026 17:44:25 -0400 (EDT) Received: from phl-frontend-04 ([10.202.2.163]) by phl-compute-05.internal (MEProxy); Thu, 12 Mar 2026 17:44:27 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ownmail.net; h= cc:cc:content-transfer-encoding:content-type:date:date:from:from :in-reply-to:message-id:mime-version:reply-to:reply-to:subject :subject:to:to; s=fm1; t=1773351865; x=1773359065; bh=FHJsH8Kre5 d09x8tLYt2s8UM1ZjvIsbiC0DN+DyY4Wk=; b=HynKVXD7sGiagH9pyAhm4nEmbG vnoLEKsMRIiSoa4G7euyElVNaPJlW0anEmEFcWVt480xGfw7ZBeWqyJHh/ktIp9H PvZZACKxP9UOdtcmm83xJahGzP6ashogKl0QDSMPEPl70AAQ3icBB8PfEnzEFMRq sX3yWNvxvY/eWlaObNEhjpO2PDvl14q8XwgFR80C8mJOcy9vmuKQRaNg5N7I0NBu YzYQ2wP8zbtuE/A1y/UOTxlh7tZ0N66BfUwAQyR2N36DYGboFbAHY0UTwAvDGw1M d6RYVJexiZ6tvhPF9XL+NJ38N7femUBleqpharU4wR7cs3N8FjN0xclTa88Q== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:date:date:feedback-id:feedback-id:from:from :in-reply-to:message-id:mime-version:reply-to:reply-to:subject :subject:to:to:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; t=1773351865; x=1773359065; bh=FHJsH8Kre5d09x8tLYt2s8UM1Zjv IsbiC0DN+DyY4Wk=; b=sD1IpxmFFBbUWLaj2WkVbBMNwpLBYky36Wu3DMHfzT0X nYRW/ck1rxXdAMzJNXUejfFN/jOa+nMSve9JT7wFkvNE8kcvSIA6yoXL/zieN3xM iZ+dMzyq43JMAdlzXgdWh8E1/CDLAWkabqAdcZh+T2TEs39XC5Dyt8plLkNXPMtI Y+r2Dz3efkKqhCljIQwPhIv8XYGf4dd7RG0/DSMZfTI3O1ghah+7C1QcisTS9EWV p5Ay6DlW7fs9l7Lbcemw1oXfwXFw+gQWGOzjp8GKsZTmGNsABOtouzH+861W6TOI iR7R/iP6i1+Z6AxnjMg7KFFlVxvvm0hDCXBMXVB/MQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeefgedrtddtgddvkeejkeelucetufdoteggodetrf dotffvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfurfetoffkrfgpnffqhgenuceu rghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmnecujf gurhephffvvefufffkofhrggfgsedtkeertdertddtnecuhfhrohhmpefpvghilheurhho fihnuceonhgvihhlsgesohifnhhmrghilhdrnhgvtheqnecuggftrfgrthhtvghrnhepge etfeegtddtvdeigfegueevfeelleelgfejueefueektdelieeikeevtdelveelnecuvehl uhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepnhgvihhlsgesoh ifnhhmrghilhdrnhgvthdpnhgspghrtghpthhtohephedupdhmohguvgepshhmthhpohhu thdprhgtphhtthhopehvihhrohesiigvnhhivhdrlhhinhhugidrohhrghdruhhkpdhrtg hpthhtoheplhhinhhugidqgihfshesvhhgvghrrdhkvghrnhgvlhdrohhrghdprhgtphht thhopehlihhnuhigqdhunhhiohhnfhhssehvghgvrhdrkhgvrhhnvghlrdhorhhgpdhrtg hpthhtoheplhhinhhugidqthhrrggtvgdqkhgvrhhnvghlsehvghgvrhdrkhgvrhhnvghl rdhorhhgpdhrtghpthhtoheplhhinhhugidqnhhfshesvhhgvghrrdhkvghrnhgvlhdroh hrghdprhgtphhtthhopehlihhnuhigqdhkvghrnhgvlhesvhhgvghrrdhkvghrnhgvlhdr ohhrghdprhgtphhtthhopehlihhnuhigqdhfshguvghvvghlsehvghgvrhdrkhgvrhhnvg hlrdhorhhgpdhrtghpthhtoheplhhinhhugidqvgigthegsehvghgvrhdrkhgvrhhnvghl rdhorhhgpdhrtghpthhtoheplhhinhhugidqvghfihesvhhgvghrrdhkvghrnhgvlhdroh hrgh X-ME-Proxy: Feedback-ID: i9d664b8f:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Thu, 12 Mar 2026 17:44:08 -0400 (EDT) From: NeilBrown To: Linus Torvalds , Alexander Viro , Christian Brauner , Jan Kara , Jeff Layton , Trond Myklebust , Anna Schumaker , Carlos Maiolino , Miklos Szeredi , Amir Goldstein , Jan Harkes , Hugh Dickins , Baolin Wang , David Howells , Marc Dionne , Steve French , Namjae Jeon , Sungjong Seo , Yuezhang Mo , Andreas Hindborg , Breno Leitao , "Theodore Ts'o" , Andreas Dilger , Steven Rostedt , Masami Hiramatsu , Ilya Dryomov , Alex Markuze , Viacheslav Dubeyko , Tyler Hicks , Andreas Gruenbacher , Richard Weinberger , Anton Ivanov , Johannes Berg , Jeremy Kerr , Ard Biesheuvel Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org, linux-xfs@vger.kernel.org, linux-unionfs@vger.kernel.org, coda@cs.cmu.edu, linux-mm@kvack.org, linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org, linux-trace-kernel@vger.kernel.org, ceph-devel@vger.kernel.org, ecryptfs@vger.kernel.org, gfs2@lists.linux.dev, linux-um@lists.infradead.org, linux-efi@vger.kernel.org Subject: [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops Date: Fri, 13 Mar 2026 08:11:47 +1100 Message-ID: <20260312214330.3885211-1-neilb@ownmail.net> X-Mailer: git-send-email 2.50.0.107.gf914562f5916.dirty Reply-To: NeilBrown MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: 8B27F1C0009 X-Stat-Signature: uoipmir9yi57xynaf64c1ad8izeiib9i X-Rspam-User: X-Rspamd-Server: rspam06 X-HE-Tag: 1773351868-124893 X-HE-Meta: U2FsdGVkX1+8HnS/+VMEIoTeXWpYonfgOVWyIC2+vKcAkXjesL9rqSj0O/ZUqO1JSCWhd8OUltrxZVFm+HccDz/7hJ9LvkcD019tGapqZD/fGb81mjeLZWNa2ODRVMVkBiKlBWgmTeFV3/orLoQJpueX8Qr6m7yRmDUchkVdbKKSgOmZH0Ee+OKP8MUME3hqmV2jGXjV1EtU968j+mmH9CUvtF5iVco0q1GmPpd40dLM59HmkXzKoPW7gg+VVuQYUHgbCRam4m8ELJ3Ec2Ri0CKMpJsZA3UQLMfO2aMYO5n3EDIMamet294zt9jfqc1HR/DCXABKE3nJRzoBGSZyaCH+9wYVEV54W7XkDyLyGAZACQPIr6mZ6RBPUJOR6t0IeCF6iqspqsJApsTaZ0EIei1xTE2lyYjz5MyUQZooeZx3a8QSLIesEj4ZcKFX4KrgV9mkDTO8Vcbem7kHGphxaW0onorUfw77lbSEcX44KmgO5zYWDOQMVwKTMPg1c5rAGYjmpx1JjN2oDDhrFQPhx4NbhqVYTMY6Q0Tyww34mMEOGrhWmOSGKpKP9R3QwynB7SjWHmx5C6l7vXPch005+qVgsk5tOz/gYhUg7mHVlmwdATPc1/vrbAKW5kv/V3O/79so+Ct/Wv8rcl+6JPk3EP6t51z4ol9AsLK8cUvM5kuylpdxTLZW7evG6CdEYUtadSkNwwihjY2mlFXu+qca8ijgDfPe4LUwI+oclgcBBGhqo7/+bFrge+L0quXaa+kfKYKthouJUvJLUfPcC/Dp+jB+pahlnkPmb9m5dXqlXOKa2k25t0QYhe9dgIpjV0dseKlAACAocqrK+0GvtnwZZrl4KXsaUhavmEnGKTRk4FICkb6ECDAJ1Hzl8ro947OXWhcr0XcNK48GlqZ+1mrVgsC41fuqqOvbo0MkCKA1XqSw2heAyCCwlBfGmHb7FkHNnjdmNbmZfBNTnDfXqny VO10KEy3 0oMmcr+sePEf8SNkjDuJLFPi3my2JUuh3WXTwF6K723ETw3qTXV+Hzs2OFJrG2WAlJnjAF+7nvsp70rwEYjt6wzHCJeiQdxiVZoxQyjtDBbLlgzrzROvI0CJ3C9SSdR0//GUC9Sk4FcNT9SENVygZvHdq4XO5YK7QYIoyDKvR+6El91Y7GQt7v8KbGu77FUptoPAp1vAtVbxp1bqdSnQlAHucYhX8TbVZGMiLgKkwMN6haT+9qqg81wNEgmmTpzH9jIW9PAPvVE1Sxw3/B5we0gTbOQqNgi2Y2/5DWq7kdZrivwe9etGZ290pY0hfCJsuaen/B6K91ia+N27S5uWMyVZ/NC993s5UJ3aAhqPXnPkQpduFqv3sedkXp1TvJAEjGYkD1sGfznv2vwkB7RYk7/IpPxgLxzHV1oJUU3P9XcC0hEyGrR8Bzird88gm5LsU+xWO/5Ub3YN29o6XxuOG8AzKHdIuzorcNAMR/KVQev06oPuezIjJ22hnMp+PRPPykXlTkYubg4CeDQ2FlKNtU4r3UrGp0+c4jmp/yHPDtJZrjjhDo6631q+svNjEpCCoZioMRnCtDxvD5ufpqRUi67T4P6GId58tsOXLgqnP6LKM6ECRDVnqC4IHsZytgiJBOW49 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: This patch set progresses my effort to improve concurrency of directory operations and specifically to allow concurrent updates in a given directory. There are a bunch of VFS patches which introduce some new APIs and improve existing ones. Then a bunch of per-filesystem changes which adjust to meet new needs, often using the new APIs, then a final bunch of VFS patches which discard some APIs that are no longer wanted, and one (the second last) which makes the big change. Some of the fs patches don't depend on any preceeding patch and if maintainers wanted to take those early I certainly wouldn't object! I've put a '*' next to patches which I think can be taken at any time. My longer term goal involves pushing the parent-directory locking down into filesystems (which can then discard it if it isn't needed) and using exclusive dentry locking in the VFS for all directory operations other than readdir - which by its nature needs shared locking and will continue to use the directory lock. The VFS already has exclusive dentry locking for the limited case of lookup. Newly created dentries (when created by d_alloc_parallel()) are exclusively locked using the DCACHE_PAR_LOOKUP bit. They remain exclusive locked until they are hashed as negative or positive dentries, or they are discarded. DCACHE_PAR_LOOKUP currently depends on a shared parent lock to exclude directory modifying operations. This patch set removes this dependency so that d_alloc_parallel() can be called without locking and all directory modifying operations receive either a hashed dentry or an in-lookup dentry (they currently recieve either a hashed or unhashed, or sometimes in-lookup (atomic_open only)). The cases where a filesystem can receive an in-lookup dentry are: - lookup. Currently can receive in-lookup or unhashed. After this patch set it always receives in-lookup - atomic_open. Currently can receive in-lookup or hashed-negative. This doesn't change with this patchset. - rename. currently can receive hashed or unhashed. After this patchset can also receive in-lookup where previously it would receive unhashed. This is only for the target of a rename over NFS. - link, mknod, mkdir, symlink. currently received hashed-negative except for NFS which notices the implied exclusive create and skips the lookup so the filesystem can received unhashed-negative for the operation. There are two particular needs to be addressed before we can use d_alloc_parallel() outside of the directory lock. 1/ d_alloc_parallel() effects a blocking lock so lock ordering is important. If we are to take the directory lock *after* calling d_alloc_parallel() (and still holding an in-lookup dentry, as happens at least when ->atomic_open is called) then we must never call d_alloc_parallel() while holding the directory lock, even a shared lock. This particularly affects readdir as several filesystems prime the dcache with readdir results and so use d_alloc_parallel() in the ->iterate_shared handler, which will now have deadlock potential. To address this we introduce d_alloc_noblock() which fails rather than blocking. A few other cases of potential lock inversion exist. These are addressed by dropping the directory lock when it is safe to do so before calling d_alloc_parallel(). This requires the addtion of LOOKUP_SHARED so that ->lookup knows how the parent is locked. This is ugly but is gone by the end of the series. After the locking is rearranged in the second last patch, ->lookup is only ever called with a shared lock. 2/ As d_alloc_parallel() will be able to run without the directory lock, holding that lock exclusively is not enough to protect some dcache manipulations. In particular, several filesystems d_drop() a dentry and (possibly) re-hash it. This will no longer be safe as d_alloc_parallel() could run while the dentry was dropped, would find that name doesn't exist in the dcache, and would create a new dentry leading to two uncoordinated dentries with the same name. It will still be safe to d_drop() a dentry after the operation has completed, whether in success or failure. But d_drop()ing before that is best avoided. An early d_drop() that isn't followed by a rehash is not clearly problematic for a filesystem which still uses parent locking (as all do at present) but is good to discourage that pattern now. This is addressed, in part, by changing d_splice_alias() to be able to instantiate any negative dentry, whether hashed, unhashed, or in-lookup. This removes the need for d_drop() in most cases. New APIs added are: - d_alloc_noblock - see patch 05 for details - d_duplicate - patch 06 Removed APIs: - d_alloc - d_rehash - d_add - lookup_one - lookup_noperm Changed APIs: - d_alloc_paralle - no longer requires a waitqueue_head_t - d_splice_alias - now works with in-lookup dentry - d_alloc_name - now works with ->d_hash d_alloc_name() should be used with d_make_persistent(). These don't require VFS locking as the filesystem doesn't permit create/remove via VFS calls, and provides its own locking to avoid duplicate names. d_splice_alias() should *always* be used: in ->lookup in ->iterate_shared for cache priming. in ->atomic_open, possibly via a call to ->lookup in ->mkdir unless d_instantiate_new() can be used. in ->link ->symlink ->mknod if ->lookup skips LOOKUP_CREATE|LOOKUP_EXCL Thanks for reading this far! I've been testing NFS but haven't tried anything else yet. As well as the normal review of details I'd love to know if I've missed any important conseqeunces of the locking change. It is a big conceptual change and there could easily be surprising implications. Thanks, NeilBrown [PATCH 01/53] VFS: fix various typos in documentation for [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel() [PATCH 05/53] VFS: introduce d_alloc_noblock() [PATCH 06/53] VFS: add d_duplicate() [PATCH 07/53] VFS: Add LOOKUP_SHARED flag. [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in *[PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from [PATCH 10/53] nfs: use d_splice_alias() in nfs_link() [PATCH 11/53] nfs: don't d_drop() before d_splice_alias() [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in [PATCH 13/53] nfs: Use d_alloc_noblock() in nfs_prime_dcache() [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename [PATCH 15/53] nfs: use d_duplicate() *[PATCH 16/53] ovl: drop dir lock for lookups in impure readdir *[PATCH 17/53] coda: don't d_drop() early. [PATCH 18/53] shmem: use d_duplicate() *[PATCH 19/53] afs: use d_time instead of d_fsdata *[PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode() [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename() [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock. [PATCH 24/53] afs: use d_duplicate() *[PATCH 25/53] smb/client: use d_time to store a timestamp in dentry, *[PATCH 26/53] smb/client: don't unhashed and rehash to prevent new *[PATCH 27/53] smb/client: use d_splice_alias() in atomic_open [PATCH 28/53] smb/client: Use d_alloc_noblock() in *[PATCH 29/53] exfat: simplify exfat_lookup() *[PATCH 30/53] configfs: remove d_add() calls before [PATCH 31/53] configfs: stop using d_add(). *[PATCH 32/53] ext4: move dcache modifying code out of __ext4_link() *[PATCH 33/53] ext4: use on-stack dentries in [PATCH 34/53] tracefs: stop using d_add(). [PATCH 35/53] cephfs: stop using d_add(). *[PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME [PATCH 37/53] cephfs: Use d_alloc_noblock() in [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias() [PATCH 39/53] ecryptfs: stop using d_add(). [PATCH 40/53] gfs2: stop using d_add(). [PATCH 41/53] libfs: stop using d_add(). [PATCH 42/53] fuse: don't d_drop() before d_splice_alias() [PATCH 43/53] fuse: Use d_alloc_noblock() in fuse_direntplus_link() [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in [PATCH 45/53] efivarfs: use d_alloc_name() [PATCH 46/53] Remove references to d_add() in documentation and [PATCH 47/53] VFS: make d_alloc() local to VFS. [PATCH 48/53] VFS: remove d_add() [PATCH 49/53] VFS: remove d_rehash() [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm() [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl(). [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock [PATCH 53/53] VFS: remove LOOKUP_SHARED