From: NeilBrown <neilb@ownmail.net>
To: Linus Torvalds <torvalds@linux-foundation.org>,
Alexander Viro <viro@zeniv.linux.org.uk>,
Christian Brauner <brauner@kernel.org>, Jan Kara <jack@suse.cz>,
Jeff Layton <jlayton@kernel.org>,
Trond Myklebust <trondmy@kernel.org>,
Anna Schumaker <anna@kernel.org>,
Carlos Maiolino <cem@kernel.org>,
Miklos Szeredi <miklos@szeredi.hu>,
Amir Goldstein <amir73il@gmail.com>,
Jan Harkes <jaharkes@cs.cmu.edu>, Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
David Howells <dhowells@redhat.com>,
Marc Dionne <marc.dionne@auristor.com>,
Steve French <sfrench@samba.org>,
Namjae Jeon <linkinjeon@kernel.org>,
Sungjong Seo <sj1557.seo@samsung.com>,
Yuezhang Mo <yuezhang.mo@sony.com>,
Andreas Hindborg <a.hindborg@kernel.org>,
Breno Leitao <leitao@debian.org>, "Theodore Ts'o" <tytso@mit.edu>,
Andreas Dilger <adilger.kernel@dilger.ca>,
Steven Rostedt <rostedt@goodmis.org>,
Masami Hiramatsu <mhiramat@kernel.org>,
Ilya Dryomov <idryomov@gmail.com>,
Alex Markuze <amarkuze@redhat.com>,
Viacheslav Dubeyko <slava@dubeyko.com>,
Tyler Hicks <code@tyhicks.com>,
Andreas Gruenbacher <agruenba@redhat.com>,
Richard Weinberger <richard@nod.at>,
Anton Ivanov <anton.ivanov@cambridgegreys.com>,
Johannes Berg <johannes@sipsolutions.net>,
Jeremy Kerr <jk@ozlabs.org>, Ard Biesheuvel <ardb@kernel.org>
Cc: linux-fsdevel@vger.kernel.org, linux-nfs@vger.kernel.org,
linux-xfs@vger.kernel.org, linux-unionfs@vger.kernel.org,
coda@cs.cmu.edu, linux-mm@kvack.org,
linux-afs@lists.infradead.org, linux-cifs@vger.kernel.org,
linux-ext4@vger.kernel.org, linux-kernel@vger.kernel.org,
linux-trace-kernel@vger.kernel.org, ceph-devel@vger.kernel.org,
ecryptfs@vger.kernel.org, gfs2@lists.linux.dev,
linux-um@lists.infradead.org, linux-efi@vger.kernel.org
Subject: [PATCH 51/53] VFS: use d_alloc_parallel() in lookup_one_qstr_excl().
Date: Fri, 13 Mar 2026 08:12:38 +1100 [thread overview]
Message-ID: <20260312214330.3885211-52-neilb@ownmail.net> (raw)
In-Reply-To: <20260312214330.3885211-1-neilb@ownmail.net>
From: NeilBrown <neil@brown.name>
lookup_one_qstr_excl() is used for lookups prior to directory
modifications (other than open(O_CREATE)), whether create, remove,
or rename.
A future patch will lift lookup out of the i_rwsem lock that protects
the directory during these operations (only taking a shared lock if the
target name is not yet in the dcache).
To prepare for this change and particularly to allow lookup to
always be done outside the parent i_rwsem, change lookup_one_qstr_excl()
to use d_alloc_parallel().
For the target of create and rename some filesystems skip the
preliminary lookup and combine it with the main operation. This is only
safe if the operation has exclusive access to the dentry. Currently
this is guaranteed by an exclusive lock on the directory.
d_alloc_parallel() provides alternate exclusive access (in the case
where the name isn't in the dcache and ->lookup will be called).
As a result of this change, ->lookup is now only ever called with a
d_in_lookup() dentry. Consequently we can remove the d_in_lookup()
check from d_add_ci() which is only used in ->lookup.
If LOOKUP_EXCL or LOOKUP_RENAME_TARGET is passed, the caller must ensure
d_lookup_done() is called at an appropriate time, and must not assume
that it can test for positive or negative dentries without confirming
that the dentry is no longer d_in_lookup() - unless it is filesystem
code acting on itself and *knows* that ->lookup() always completes the
lookup (currently true for all filesystems other than NFS).
This is all handled in start_creating() and end_dirop() and friends.
Note that as lookup_one_qstr_excl() is called with an exclusive lock on
the directory, d_alloc_parallel() cannot race with another thread and
cannot return a non-in-lookup dentry. However that is expected to
change so that case is handled with this patch.
Signed-off-by: NeilBrown <neil@brown.name>
---
Documentation/filesystems/porting.rst | 14 ++++++++++
fs/dcache.c | 16 +++--------
fs/namei.c | 38 ++++++++++++++++++++-------
3 files changed, 46 insertions(+), 22 deletions(-)
diff --git a/Documentation/filesystems/porting.rst b/Documentation/filesystems/porting.rst
index 7e83bd3c5a12..5ddc5ecfcc64 100644
--- a/Documentation/filesystems/porting.rst
+++ b/Documentation/filesystems/porting.rst
@@ -1403,3 +1403,17 @@ you really don't want it.
lookup_one() and lookup_noperm() are no longer available. Use
start_creating() or similar instead.
+
+---
+
+**mandatory**
+
+All start_creating and start_renaming functions may return a
+d_in_lookup() dentry if passed "O_CREATE|O_EXCL" or "O_RENAME_TARGET".
+end_dirop() calls the necessary d_lookup_done(). If the caller
+*knows* which filesystem is being used, it may know that this is not
+possible. Otherwise it must be careful testing if the dentry is
+positive or negative as the lookup may not have been performed yet.
+
+inode_operations.lookup() is now only ever called with a d_in_lookup()
+dentry.
diff --git a/fs/dcache.c b/fs/dcache.c
index abb96ad8e015..f573716d1a04 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -2261,18 +2261,10 @@ struct dentry *d_add_ci(struct dentry *dentry, struct inode *inode,
inode_unlock_shared(d_inode(dentry->d_parent));
else
inode_unlock(d_inode(dentry->d_parent));
- if (d_in_lookup(dentry)) {
- found = d_alloc_parallel(dentry->d_parent, name);
- if (IS_ERR(found) || !d_in_lookup(found)) {
- iput(inode);
- return found;
- }
- } else {
- found = d_alloc(dentry->d_parent, name);
- if (!found) {
- iput(inode);
- return ERR_PTR(-ENOMEM);
- }
+ found = d_alloc_parallel(dentry->d_parent, name);
+ if (IS_ERR(found) || !d_in_lookup(found)) {
+ iput(inode);
+ return found;
}
if (shared)
inode_lock_shared(d_inode(dentry->d_parent));
diff --git a/fs/namei.c b/fs/namei.c
index cb80490a869f..bba419f2fc53 100644
--- a/fs/namei.c
+++ b/fs/namei.c
@@ -1774,13 +1774,14 @@ static struct dentry *lookup_dcache(const struct qstr *name,
}
/*
- * Parent directory has inode locked exclusive. This is one
- * and only case when ->lookup() gets called on non in-lookup
- * dentries - as the matter of fact, this only gets called
- * when directory is guaranteed to have no in-lookup children
- * at all.
- * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed.
- * Will return -EEXIST if name is found and LOOKUP_EXCL was passed.
+ * Parent directory has inode locked.
+ * If Lookup_EXCL or LOOKUP_RENAME_TARGET is set
+ * d_lookup_done() must be called before the dentry is dput()
+ * If the dentry is not d_in_lookup():
+ * Will return -ENOENT if name isn't found and LOOKUP_CREATE wasn't passed.
+ * Will return -EEXIST if name is found and LOOKUP_EXCL was passed.
+ * If it is d_in_lookup() then these conditions can only be checked by the
+ * file system when carrying out the intent (create or rename).
*/
static struct dentry *lookup_one_qstr_excl(const struct qstr *name,
struct dentry *base, unsigned int flags)
@@ -1798,18 +1799,27 @@ static struct dentry *lookup_one_qstr_excl(const struct qstr *name,
if (unlikely(IS_DEADDIR(dir)))
return ERR_PTR(-ENOENT);
- dentry = d_alloc(base, name);
- if (unlikely(!dentry))
- return ERR_PTR(-ENOMEM);
+ dentry = d_alloc_parallel(base, name);
+ if (unlikely(IS_ERR(dentry)))
+ return dentry;
+ if (unlikely(!d_in_lookup(dentry)))
+ /* Raced with another thread which did the lookup */
+ goto found;
old = dir->i_op->lookup(dir, dentry, flags);
if (unlikely(old)) {
+ d_lookup_done(dentry);
dput(dentry);
dentry = old;
}
found:
if (IS_ERR(dentry))
return dentry;
+ if (d_in_lookup(dentry))
+ /* We cannot check for errors - the caller will have to
+ * wait for any create-etc attempt to get relevant errors.
+ */
+ return dentry;
if (d_is_negative(dentry) && !(flags & LOOKUP_CREATE)) {
dput(dentry);
return ERR_PTR(-ENOENT);
@@ -2921,6 +2931,8 @@ static struct dentry *__start_dirop(struct dentry *parent, struct qstr *name,
* The lookup is performed and necessary locks are taken so that, on success,
* the returned dentry can be operated on safely.
* The qstr must already have the hash value calculated.
+ * The dentry may be d_in_lookup() if %LOOKUP_EXCL or %LOOKUP_RENAME_TARGET
+ * is given, depending on the filesystem.
*
* Returns: a locked dentry, or an error.
*
@@ -2942,6 +2954,7 @@ void end_dirop(struct dentry *de)
{
if (!IS_ERR(de)) {
inode_unlock(de->d_parent->d_inode);
+ d_lookup_done(de);
dput(de);
}
}
@@ -3854,8 +3867,10 @@ __start_renaming(struct renamedata *rd, int lookup_flags,
return 0;
out_dput_d2:
+ d_lookup_done(d2);
dput(d2);
out_dput_d1:
+ d_lookup_done(d1);
dput(d1);
out_unlock:
unlock_rename(rd->old_parent, rd->new_parent);
@@ -3950,6 +3965,7 @@ __start_renaming_dentry(struct renamedata *rd, int lookup_flags,
return 0;
out_dput_d2:
+ d_lookup_done(d2);
dput(d2);
out_unlock:
unlock_rename(old_dentry->d_parent, rd->new_parent);
@@ -4059,6 +4075,8 @@ EXPORT_SYMBOL(start_renaming_two_dentries);
void end_renaming(struct renamedata *rd)
{
+ d_lookup_done(rd->old_dentry);
+ d_lookup_done(rd->new_dentry);
unlock_rename(rd->old_parent, rd->new_parent);
dput(rd->old_dentry);
dput(rd->new_dentry);
--
2.50.0.107.gf914562f5916.dirty
next prev parent reply other threads:[~2026-03-12 21:58 UTC|newest]
Thread overview: 58+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-12 21:11 [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops NeilBrown
2026-03-12 21:11 ` [PATCH 01/53] VFS: fix various typos in documentation for start_creating start_removing etc NeilBrown
2026-03-12 21:11 ` [PATCH 02/53] VFS: enhance d_splice_alias() to handle in-lookup dentries NeilBrown
2026-03-12 21:11 ` [PATCH 03/53] VFS: allow d_alloc_name() to be used with ->d_hash NeilBrown
2026-03-12 21:11 ` [PATCH 04/53] VFS: use global wait-queue table for d_alloc_parallel() NeilBrown
2026-03-12 21:11 ` [PATCH 05/53] VFS: introduce d_alloc_noblock() NeilBrown
2026-03-12 21:11 ` [PATCH 06/53] VFS: add d_duplicate() NeilBrown
2026-03-12 21:11 ` [PATCH 07/53] VFS: Add LOOKUP_SHARED flag NeilBrown
2026-03-12 21:11 ` [PATCH 08/53] VFS/xfs: drop parent lock across d_alloc_parallel() in d_add_ci() NeilBrown
2026-03-12 21:11 ` [PATCH 09/53] nfs: remove d_drop()/d_alloc_parallel() from nfs_atomic_open() NeilBrown
2026-03-12 21:11 ` [PATCH 10/53] nfs: use d_splice_alias() in nfs_link() NeilBrown
2026-03-12 21:11 ` [PATCH 11/53] nfs: don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:11 ` [PATCH 12/53] nfs: don't d_drop() before d_splice_alias() in atomic_create NeilBrown
2026-03-12 21:12 ` [PATCH 14/53] nfs: use d_alloc_noblock() in silly-rename NeilBrown
2026-03-12 21:12 ` [PATCH 15/53] nfs: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 16/53] ovl: drop dir lock for lookups in impure readdir NeilBrown
2026-03-15 13:51 ` Amir Goldstein
2026-03-12 21:12 ` [PATCH 17/53] coda: don't d_drop() early NeilBrown
2026-03-12 21:12 ` [PATCH 18/53] shmem: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 19/53] afs: use d_time instead of d_fsdata NeilBrown
2026-03-12 21:12 ` [PATCH 20/53] afs: don't unhash/rehash dentries during unlink/rename NeilBrown
2026-03-12 21:12 ` [PATCH 21/53] afs: use d_splice_alias() in afs_vnode_new_inode() NeilBrown
2026-03-12 21:12 ` [PATCH 22/53] afs: use d_alloc_nonblock in afs_sillyrename() NeilBrown
2026-03-12 21:12 ` [PATCH 23/53] afs: lookup_atsys to drop and reclaim lock NeilBrown
2026-03-12 21:12 ` [PATCH 24/53] afs: use d_duplicate() NeilBrown
2026-03-12 21:12 ` [PATCH 25/53] smb/client: use d_time to store a timestamp in dentry, not d_fsdata NeilBrown
2026-03-12 21:12 ` [PATCH 26/53] smb/client: don't unhashed and rehash to prevent new opens NeilBrown
2026-03-12 21:12 ` [PATCH 27/53] smb/client: use d_splice_alias() in atomic_open NeilBrown
2026-03-12 21:12 ` [PATCH 29/53] exfat: simplify exfat_lookup() NeilBrown
2026-03-12 21:12 ` [PATCH 30/53] configfs: remove d_add() calls before configfs_attach_group() NeilBrown
2026-03-12 21:12 ` [PATCH 31/53] configfs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 32/53] ext4: move dcache modifying code out of __ext4_link() NeilBrown
2026-03-17 10:00 ` Jan Kara
2026-03-12 21:12 ` [PATCH 33/53] ext4: use on-stack dentries in ext4_fc_replay_link_internal() NeilBrown
2026-03-17 9:37 ` Jan Kara
2026-03-12 21:12 ` [PATCH 34/53] tracefs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 35/53] cephfs: " NeilBrown
2026-03-12 21:12 ` [PATCH 36/53] cephfs: remove d_alloc from CEPH_MDS_OP_LOOKUPNAME handling in ceph_fill_trace() NeilBrown
2026-03-12 21:12 ` [PATCH 37/53] cephfs: Use d_alloc_noblock() in ceph_readdir_prepopulate() NeilBrown
2026-03-12 21:12 ` [PATCH 38/53] cephfs: Don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:12 ` [PATCH 39/53] ecryptfs: stop using d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 40/53] gfs2: " NeilBrown
2026-03-12 21:12 ` [PATCH 41/53] libfs: " NeilBrown
2026-03-12 21:12 ` [PATCH 42/53] fuse: don't d_drop() before d_splice_alias() NeilBrown
2026-03-12 21:12 ` [PATCH 44/53] hostfs: don't d_drop() before d_splice_alias() in hostfs_mkdir() NeilBrown
2026-03-12 21:12 ` [PATCH 45/53] efivarfs: use d_alloc_name() NeilBrown
2026-03-12 21:12 ` [PATCH 46/53] Remove references to d_add() in documentation and comments NeilBrown
2026-03-12 21:12 ` [PATCH 47/53] VFS: make d_alloc() local to VFS NeilBrown
2026-03-12 21:12 ` [PATCH 48/53] VFS: remove d_add() NeilBrown
2026-03-12 21:12 ` [PATCH 49/53] VFS: remove d_rehash() NeilBrown
2026-03-12 21:12 ` [PATCH 50/53] VFS: remove lookup_one() and lookup_noperm() NeilBrown
2026-03-12 21:12 ` NeilBrown [this message]
2026-03-12 21:12 ` [PATCH 52/53] VFS: lift d_alloc_parallel above inode_lock NeilBrown
2026-03-12 21:12 ` [PATCH 53/53] VFS: remove LOOKUP_SHARED NeilBrown
2026-03-12 23:38 ` [PATCH RFC 00/53] lift lookup out of exclive lock for dir ops Steven Rostedt
2026-03-13 0:18 ` NeilBrown
2026-03-12 23:46 ` Linus Torvalds
2026-03-13 0:09 ` NeilBrown
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260312214330.3885211-52-neilb@ownmail.net \
--to=neilb@ownmail.net \
--cc=a.hindborg@kernel.org \
--cc=adilger.kernel@dilger.ca \
--cc=agruenba@redhat.com \
--cc=amarkuze@redhat.com \
--cc=amir73il@gmail.com \
--cc=anna@kernel.org \
--cc=anton.ivanov@cambridgegreys.com \
--cc=ardb@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=brauner@kernel.org \
--cc=cem@kernel.org \
--cc=ceph-devel@vger.kernel.org \
--cc=coda@cs.cmu.edu \
--cc=code@tyhicks.com \
--cc=dhowells@redhat.com \
--cc=ecryptfs@vger.kernel.org \
--cc=gfs2@lists.linux.dev \
--cc=hughd@google.com \
--cc=idryomov@gmail.com \
--cc=jack@suse.cz \
--cc=jaharkes@cs.cmu.edu \
--cc=jk@ozlabs.org \
--cc=jlayton@kernel.org \
--cc=johannes@sipsolutions.net \
--cc=leitao@debian.org \
--cc=linkinjeon@kernel.org \
--cc=linux-afs@lists.infradead.org \
--cc=linux-cifs@vger.kernel.org \
--cc=linux-efi@vger.kernel.org \
--cc=linux-ext4@vger.kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=linux-nfs@vger.kernel.org \
--cc=linux-trace-kernel@vger.kernel.org \
--cc=linux-um@lists.infradead.org \
--cc=linux-unionfs@vger.kernel.org \
--cc=linux-xfs@vger.kernel.org \
--cc=marc.dionne@auristor.com \
--cc=mhiramat@kernel.org \
--cc=miklos@szeredi.hu \
--cc=neil@brown.name \
--cc=richard@nod.at \
--cc=rostedt@goodmis.org \
--cc=sfrench@samba.org \
--cc=sj1557.seo@samsung.com \
--cc=slava@dubeyko.com \
--cc=torvalds@linux-foundation.org \
--cc=trondmy@kernel.org \
--cc=tytso@mit.edu \
--cc=viro@zeniv.linux.org.uk \
--cc=yuezhang.mo@sony.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox