* 6.19 tmpfs __d_lookup() lockup
@ 2025-12-12 3:56 Hugh Dickins
2025-12-12 5:02 ` Al Viro
0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2025-12-12 3:56 UTC (permalink / raw)
To: Al Viro
Cc: Christian Brauner, Andrew Morton, Baolin Wang, linux-fsdevel,
linux-kernel, linux-mm
Hi Al,
Though good for ordinary usage, your persistency mods for 6.19 cause
tmpfs to lockup in __d_lookup() during several fsstressing xfstests:
fairly reliably in generic/269, generic/476, generic/650, generic/750,
less reliably in generic/013, generic/585. Typical dmesg below.
I have not spotted what's wrong: better hand over to you.
Of course, 2313598222f9 ("convert ramfs and tmpfs") (of Feb 26 2024!)
comes out as the first failing commit, no surprise there.
I did try inserting a BUG_ON(node == node->next) on line 2438 of
fs/dcache.c, just after __d_lookup's hlist_bl_for_each_entry_rcu(),
and that BUG was immediately hit (but, for all I know, perhaps that's
an unreliable asserition, perhaps it's acceptable for a race to result
in a momentary node == node->next there).
I did try hacking on xfstests common/rc, to allow ramfs where tmpfs
is enabled (and _df_device() then needs $DF_PROG -a to show ramfs).
I did not get a lockup or crash from any of them on ramfs (and 476 and
750 actually passed). I don't draw any conclusion from that, maybe
ramfs just does not support some of the options which help fsstress
to generate the issue; but it might be useful background info.
Hoping you can soon guess what's gone wrong,
or ask for more info, thanks,
Hugh
[ 54.354788] run fstests generic/269 at 2025-12-10 19:40:16
[ 114.660512] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[ 114.670705] rcu: Tasks blocked on level-0 rcu_node (CPUs 0-7): P3185
[ 114.681006] rcu: (detected by 6, t=15007 jiffies, g=3377, q=6940 ncpus=8)
[ 114.691346] task:fsstress state:R running task stack:0 pid:3185 tgid:3185 ppid:3059 task_flags:0x400140 flags:0x00080003
[ 114.702226] Call Trace:
[ 114.712908] <TASK>
[ 114.723528] __schedule+0x67c/0x6c5
[ 114.734245] ? __d_lookup_rcu+0x7a/0x9c
[ 114.744921] ? irqentry_exit+0x27/0x35
[ 114.755547] ? sysvec_apic_timer_interrupt+0xa8/0xae
[ 114.766300] ? asm_sysvec_apic_timer_interrupt+0x1b/0x20
[ 114.776953] ? __d_lookup+0x37/0xad
[ 114.787357] ? __d_lookup+0x2f/0xad
[ 114.797700] ? d_lookup+0x2b/0x42
[ 114.807970] ? lookup_dcache+0x1f/0x60
[ 114.818265] ? lookup_one_qstr_excl+0x1a/0xbe
[ 114.828525] ? do_renameat2+0x1d1/0x3e4
[ 114.838808] ? getname_flags+0x4b/0x17a
[ 114.849093] ? __do_sys_rename+0x36/0x3e
[ 114.859363] ? __x64_sys_rename+0x11/0x13
[ 114.869634] ? x64_sys_call+0x34d/0xe2c
[ 114.879797] ? do_syscall_64+0x53/0x123
[ 114.889820] ? entry_SYSCALL_64_after_hwframe+0x4b/0x53
[ 114.899748] </TASK>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 6.19 tmpfs __d_lookup() lockup
2025-12-12 3:56 6.19 tmpfs __d_lookup() lockup Hugh Dickins
@ 2025-12-12 5:02 ` Al Viro
2025-12-12 5:15 ` Hugh Dickins
2025-12-12 5:34 ` Al Viro
0 siblings, 2 replies; 18+ messages in thread
From: Al Viro @ 2025-12-12 5:02 UTC (permalink / raw)
To: Hugh Dickins
Cc: Christian Brauner, Andrew Morton, Baolin Wang, linux-fsdevel,
linux-kernel, linux-mm
On Thu, Dec 11, 2025 at 07:56:38PM -0800, Hugh Dickins wrote:
> Of course, 2313598222f9 ("convert ramfs and tmpfs") (of Feb 26 2024!)
> comes out as the first failing commit, no surprise there.
>
> I did try inserting a BUG_ON(node == node->next) on line 2438 of
> fs/dcache.c, just after __d_lookup's hlist_bl_for_each_entry_rcu(),
> and that BUG was immediately hit (but, for all I know, perhaps that's
> an unreliable asserition, perhaps it's acceptable for a race to result
> in a momentary node == node->next there).
Hmm... Could you check if we are somehow hitting d_in_lookup(dentry)
in d_make_persistent()?
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 6.19 tmpfs __d_lookup() lockup
2025-12-12 5:02 ` Al Viro
@ 2025-12-12 5:15 ` Hugh Dickins
2025-12-12 5:34 ` Al Viro
1 sibling, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2025-12-12 5:15 UTC (permalink / raw)
To: Al Viro
Cc: Hugh Dickins, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm
On Fri, 12 Dec 2025, Al Viro wrote:
> On Thu, Dec 11, 2025 at 07:56:38PM -0800, Hugh Dickins wrote:
>
> > Of course, 2313598222f9 ("convert ramfs and tmpfs") (of Feb 26 2024!)
> > comes out as the first failing commit, no surprise there.
> >
> > I did try inserting a BUG_ON(node == node->next) on line 2438 of
> > fs/dcache.c, just after __d_lookup's hlist_bl_for_each_entry_rcu(),
> > and that BUG was immediately hit (but, for all I know, perhaps that's
> > an unreliable asserition, perhaps it's acceptable for a race to result
> > in a momentary node == node->next there).
>
> Hmm... Could you check if we are somehow hitting d_in_lookup(dentry)
> in d_make_persistent()?
Am I being stupid and misunderstanding you? I haven't tried
running with any printks, it seems obvious from the source that
d_make_persistent() always calls __d_instantiate(), which has a
WARN_ON(d_in_lookup(dentry)) at the start.
But I didn't see that warning fire on any occasion.
Hugh
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 6.19 tmpfs __d_lookup() lockup
2025-12-12 5:02 ` Al Viro
2025-12-12 5:15 ` Hugh Dickins
@ 2025-12-12 5:34 ` Al Viro
2025-12-12 5:57 ` Hugh Dickins
1 sibling, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-12-12 5:34 UTC (permalink / raw)
To: Hugh Dickins
Cc: Christian Brauner, Andrew Morton, Baolin Wang, linux-fsdevel,
linux-kernel, linux-mm
On Fri, Dec 12, 2025 at 05:02:25AM +0000, Al Viro wrote:
> On Thu, Dec 11, 2025 at 07:56:38PM -0800, Hugh Dickins wrote:
>
> > Of course, 2313598222f9 ("convert ramfs and tmpfs") (of Feb 26 2024!)
> > comes out as the first failing commit, no surprise there.
> >
> > I did try inserting a BUG_ON(node == node->next) on line 2438 of
> > fs/dcache.c, just after __d_lookup's hlist_bl_for_each_entry_rcu(),
> > and that BUG was immediately hit (but, for all I know, perhaps that's
> > an unreliable asserition, perhaps it's acceptable for a race to result
> > in a momentary node == node->next there).
>
> Hmm... Could you check if we are somehow hitting d_in_lookup(dentry)
> in d_make_persistent()?
Another question: is it a CONFIG_UNICODE build and are you running with
casefolding anywhere in the vicinity? If so, does this thing reproduce
without that?
Because that's one potential area of difference between shmem and ramfs
(as well as just about anything else where tree-in-dcache might be
relevant); if that's where the breakage happens, it would narrow the
things down nicely...
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 6.19 tmpfs __d_lookup() lockup
2025-12-12 5:34 ` Al Viro
@ 2025-12-12 5:57 ` Hugh Dickins
2025-12-12 6:30 ` Al Viro
0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2025-12-12 5:57 UTC (permalink / raw)
To: Al Viro
Cc: Hugh Dickins, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm
On Fri, 12 Dec 2025, Al Viro wrote:
> On Fri, Dec 12, 2025 at 05:02:25AM +0000, Al Viro wrote:
> > On Thu, Dec 11, 2025 at 07:56:38PM -0800, Hugh Dickins wrote:
> >
> > > Of course, 2313598222f9 ("convert ramfs and tmpfs") (of Feb 26 2024!)
> > > comes out as the first failing commit, no surprise there.
> > >
> > > I did try inserting a BUG_ON(node == node->next) on line 2438 of
> > > fs/dcache.c, just after __d_lookup's hlist_bl_for_each_entry_rcu(),
> > > and that BUG was immediately hit (but, for all I know, perhaps that's
> > > an unreliable asserition, perhaps it's acceptable for a race to result
> > > in a momentary node == node->next there).
> >
> > Hmm... Could you check if we are somehow hitting d_in_lookup(dentry)
> > in d_make_persistent()?
>
> Another question: is it a CONFIG_UNICODE build and are you running with
> casefolding anywhere in the vicinity? If so, does this thing reproduce
> without that?
>
> Because that's one potential area of difference between shmem and ramfs
> (as well as just about anything else where tree-in-dcache might be
> relevant); if that's where the breakage happens, it would narrow the
> things down nicely...
No, sad to say, CONFIG_UNICODE is not set.
(I see why you're asking, I did notice from the diff that the
case-folding stuff in shmem.c used to do something different but
now the same in several places; but the case-folding people will
have to look out for themselves, it's beyond me.)
(And yes, I was being stupid in my previous response: once I looked
at how simple d_in_lookup() is, I understood your "hitting"; but at
least I gave the right answer, no, that warning does not show up.)
Hugh
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 6.19 tmpfs __d_lookup() lockup
2025-12-12 5:57 ` Hugh Dickins
@ 2025-12-12 6:30 ` Al Viro
2025-12-12 7:17 ` Hugh Dickins
0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-12-12 6:30 UTC (permalink / raw)
To: Hugh Dickins
Cc: Christian Brauner, Andrew Morton, Baolin Wang, linux-fsdevel,
linux-kernel, linux-mm
On Thu, Dec 11, 2025 at 09:57:15PM -0800, Hugh Dickins wrote:
> No, sad to say, CONFIG_UNICODE is not set.
>
> (I see why you're asking, I did notice from the diff that the
> case-folding stuff in shmem.c used to do something different but
> now the same in several places; but the case-folding people will
> have to look out for themselves, it's beyond me.)
>
> (And yes, I was being stupid in my previous response: once I looked
> at how simple d_in_lookup() is, I understood your "hitting"; but at
> least I gave the right answer, no, that warning does not show up.)
A few more things to check:
1) do we, by any chance, ever see dentry_free() called with
dentry->d_flags & DCACHE_PERSISTENT?
2) does d_make_persistent() ever call __d_rehash() when called with
dentry->d_sb->s_magic == TMPFS_MAGIC?
3) is shmem_whiteout() ever called? If that's the case, could you try
to remove that d_rehash() call in it and see what happens? Because
that's another place where shmem is playing odd games...
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 6.19 tmpfs __d_lookup() lockup
2025-12-12 6:30 ` Al Viro
@ 2025-12-12 7:17 ` Hugh Dickins
2025-12-12 10:12 ` Hugh Dickins
0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2025-12-12 7:17 UTC (permalink / raw)
To: Al Viro
Cc: Hugh Dickins, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm
On Fri, 12 Dec 2025, Al Viro wrote:
> On Thu, Dec 11, 2025 at 09:57:15PM -0800, Hugh Dickins wrote:
>
> > No, sad to say, CONFIG_UNICODE is not set.
> >
> > (I see why you're asking, I did notice from the diff that the
> > case-folding stuff in shmem.c used to do something different but
> > now the same in several places; but the case-folding people will
> > have to look out for themselves, it's beyond me.)
> >
> > (And yes, I was being stupid in my previous response: once I looked
> > at how simple d_in_lookup() is, I understood your "hitting"; but at
> > least I gave the right answer, no, that warning does not show up.)
>
> A few more things to check:
>
> 1) do we, by any chance, ever see dentry_free() called with
> dentry->d_flags & DCACHE_PERSISTENT?
No.
>
> 2) does d_make_persistent() ever call __d_rehash() when called with
> dentry->d_sb->s_magic == TMPFS_MAGIC?
Yes, both if shmem_whiteout() does its d_rehash() and if it does not.
>
> 3) is shmem_whiteout() ever called? If that's the case, could you try
> to remove that d_rehash() call in it and see what happens? Because
> that's another place where shmem is playing odd games...
Yes, shmem_whiteout() does get called.
And when I remove that d_rehash() call from it, 269 476 650 and 750
complete without locking up. And when I remove the WARN_ON()s
inserted for 2) and 3), then they pass.
You are very much on the right lines!
Hugh
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 6.19 tmpfs __d_lookup() lockup
2025-12-12 7:17 ` Hugh Dickins
@ 2025-12-12 10:12 ` Hugh Dickins
2025-12-13 7:22 ` Al Viro
0 siblings, 1 reply; 18+ messages in thread
From: Hugh Dickins @ 2025-12-12 10:12 UTC (permalink / raw)
To: Al Viro
Cc: Hugh Dickins, Miklos Szeredi, Christian Brauner, Andrew Morton,
Baolin Wang, linux-fsdevel, linux-kernel, linux-mm
On Thu, 11 Dec 2025, Hugh Dickins wrote:
> On Fri, 12 Dec 2025, Al Viro wrote:
> >
> > A few more things to check:
> >
> > 1) do we, by any chance, ever see dentry_free() called with
> > dentry->d_flags & DCACHE_PERSISTENT?
>
> No.
>
> >
> > 2) does d_make_persistent() ever call __d_rehash() when called with
> > dentry->d_sb->s_magic == TMPFS_MAGIC?
>
> Yes, both if shmem_whiteout() does its d_rehash() and if it does not.
>
> >
> > 3) is shmem_whiteout() ever called? If that's the case, could you try
> > to remove that d_rehash() call in it and see what happens? Because
> > that's another place where shmem is playing odd games...
>
> Yes, shmem_whiteout() does get called.
>
> And when I remove that d_rehash() call from it, 269 476 650 and 750
> complete without locking up. And when I remove the WARN_ON()s
> inserted for 2) and 3), then they pass.
>
> You are very much on the right lines!
Well, more than that: it's exactly the right thing to do, isn't it?
shmem_mknod() already called d_make_peristent() which called __d_rehash(),
calling it a second time naturally leads to the __d_lookup() lockup seen.
And I can't see a place now for shmem_whiteout()'s "Cheat and hash" comment.
Al, may I please leave you to send in the fix to Christian and/or Linus?
You may have noticed other things on the way, that you might want to add.
But if your patch resembles the below (which has now passed xfstests
auto runs on tmpfs), please feel free to add or omit any or all of
Reported-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Thanks a lot for your very quick resolution!
Hugh
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4023,18 +4023,7 @@ static int shmem_whiteout(struct mnt_idmap *idmap,
error = shmem_mknod(idmap, old_dir, whiteout,
S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
dput(whiteout);
- if (error)
- return error;
-
- /*
- * Cheat and hash the whiteout while the old dentry is still in
- * place, instead of playing games with FS_RENAME_DOES_D_MOVE.
- *
- * d_lookup() will consistently find one of them at this point,
- * not sure which one, but that isn't even important.
- */
- d_rehash(whiteout);
- return 0;
+ return error;
}
/*
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: 6.19 tmpfs __d_lookup() lockup
2025-12-12 10:12 ` Hugh Dickins
@ 2025-12-13 7:22 ` Al Viro
2025-12-14 3:27 ` shmem_rename() bugs (was Re: 6.19 tmpfs __d_lookup() lockup) Al Viro
0 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-12-13 7:22 UTC (permalink / raw)
To: Hugh Dickins
Cc: Miklos Szeredi, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm
On Fri, Dec 12, 2025 at 02:12:17AM -0800, Hugh Dickins wrote:
> Well, more than that: it's exactly the right thing to do, isn't it?
> shmem_mknod() already called d_make_peristent() which called __d_rehash(),
> calling it a second time naturally leads to the __d_lookup() lockup seen.
> And I can't see a place now for shmem_whiteout()'s "Cheat and hash" comment.
>
> Al, may I please leave you to send in the fix to Christian and/or Linus?
> You may have noticed other things on the way, that you might want to add.
>
> But if your patch resembles the below (which has now passed xfstests
> auto runs on tmpfs), please feel free to add or omit any or all of
>
> Reported-by: Hugh Dickins <hughd@google.com>
> Acked-by: Hugh Dickins <hughd@google.com>
> Tested-by: Hugh Dickins <hughd@google.com>
The problem is that the comment is not quite accurate ;-)
What it's trying to say is that we get whiteout and old_dentry
sharing parent, name and both hashed, but that won't last for
long - as soon as we get to d_move(), old_dentry will change
name and/or parent.
The trouble is, it might not _get_ to that d_move() at
all. It used to be guaranteed back when shmem_whiteout() had
been introduced (shmem_renameat2() used to have no failure
exits past shmem_whiteout() returning success), but it's no longer
true - not since a2e459555c5f "shmem: stable directory offsets"
two years ago.
Failure, AFAICS, requires severe a OOM, but it's still
a bug. What's more, simple_offset_rename() itself does not recover
from a failure, without any whiteouts being involved.
What I'm going to do is a couple of patches - one fixing
the regression in this cycle (pretty much what you'd been testing),
then a separate fix for stable offsets failure handling (present
since 2023). I'll feed them to Linus; I hoped to do that with
old regression fixed first, to reduce the PITA for backports,
but if I don't have that debugged tomorrow, I'll send the recent
regression fix first.
^ permalink raw reply [flat|nested] 18+ messages in thread
* shmem_rename() bugs (was Re: 6.19 tmpfs __d_lookup() lockup)
2025-12-13 7:22 ` Al Viro
@ 2025-12-14 3:27 ` Al Viro
2025-12-14 3:30 ` [PATCH 1/2] shmem_whiteout(): fix regression from tree-in-dcache series Al Viro
` (2 more replies)
0 siblings, 3 replies; 18+ messages in thread
From: Al Viro @ 2025-12-14 3:27 UTC (permalink / raw)
To: Hugh Dickins
Cc: Miklos Szeredi, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm, Linus Torvalds,
Chuck Lever
On Sat, Dec 13, 2025 at 07:22:41AM +0000, Al Viro wrote:
> What I'm going to do is a couple of patches - one fixing
> the regression in this cycle (pretty much what you'd been testing),
> then a separate fix for stable offsets failure handling (present
> since 2023). I'll feed them to Linus; I hoped to do that with
> old regression fixed first, to reduce the PITA for backports,
> but if I don't have that debugged tomorrow, I'll send the recent
> regression fix first.
OK, I think I've got it; see
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #fixes
individual patches in followups; the first one deals with this cycle
regression, the second - with older bug in shmem_rename() failure
exits.
Folks, please review. If nobody objects, I'll send a pull request on
Monday.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH 1/2] shmem_whiteout(): fix regression from tree-in-dcache series
2025-12-14 3:27 ` shmem_rename() bugs (was Re: 6.19 tmpfs __d_lookup() lockup) Al Viro
@ 2025-12-14 3:30 ` Al Viro
2025-12-14 3:30 ` [RFC][PATCH 2/2] shmem: fix recovery on rename failures Al Viro
2025-12-16 6:02 ` [git pull] shmem rename fixes Al Viro
2 siblings, 0 replies; 18+ messages in thread
From: Al Viro @ 2025-12-14 3:30 UTC (permalink / raw)
To: Hugh Dickins
Cc: Miklos Szeredi, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm, Linus Torvalds,
Chuck Lever
From 3010f06c52aa7da51493df59303ea733a614597b Mon Sep 17 00:00:00 2001
From: Al Viro <viro@zeniv.linux.org.uk>
Date: Sat, 13 Dec 2025 12:36:15 -0500
Subject: [PATCH 1/2] shmem_whiteout(): fix regression from tree-in-dcache
series
Now that shmem_mknod() hashes the new dentry, d_rehash() in
shmem_whiteout() should be removed.
X-paperbag: brown
Reported-by: Hugh Dickins <hughd@google.com>
Acked-by: Hugh Dickins <hughd@google.com>
Tested-by: Hugh Dickins <hughd@google.com>
Fixes: 2313598222f9 ("convert ramfs and tmpfs")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
mm/shmem.c | 14 +-------------
1 file changed, 1 insertion(+), 13 deletions(-)
diff --git a/mm/shmem.c b/mm/shmem.c
index 3f194c9842a8..d3edc809e2e7 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4019,22 +4019,10 @@ static int shmem_whiteout(struct mnt_idmap *idmap,
whiteout = d_alloc(old_dentry->d_parent, &old_dentry->d_name);
if (!whiteout)
return -ENOMEM;
-
error = shmem_mknod(idmap, old_dir, whiteout,
S_IFCHR | WHITEOUT_MODE, WHITEOUT_DEV);
dput(whiteout);
- if (error)
- return error;
-
- /*
- * Cheat and hash the whiteout while the old dentry is still in
- * place, instead of playing games with FS_RENAME_DOES_D_MOVE.
- *
- * d_lookup() will consistently find one of them at this point,
- * not sure which one, but that isn't even important.
- */
- d_rehash(whiteout);
- return 0;
+ return error;
}
/*
--
2.47.3
^ permalink raw reply [flat|nested] 18+ messages in thread
* [RFC][PATCH 2/2] shmem: fix recovery on rename failures
2025-12-14 3:27 ` shmem_rename() bugs (was Re: 6.19 tmpfs __d_lookup() lockup) Al Viro
2025-12-14 3:30 ` [PATCH 1/2] shmem_whiteout(): fix regression from tree-in-dcache series Al Viro
@ 2025-12-14 3:30 ` Al Viro
2025-12-15 7:38 ` Hugh Dickins
` (2 more replies)
2025-12-16 6:02 ` [git pull] shmem rename fixes Al Viro
2 siblings, 3 replies; 18+ messages in thread
From: Al Viro @ 2025-12-14 3:30 UTC (permalink / raw)
To: Hugh Dickins
Cc: Miklos Szeredi, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm, Linus Torvalds,
Chuck Lever
maple_tree insertions can fail if we are seriously short on memory;
simple_offset_rename() does not recover well if it runs into that.
The same goes for simple_offset_rename_exchange().
Moreover, shmem_whiteout() expects that if it succeeds, the caller will
progress to d_move(), i.e. that shmem_rename2() won't fail past the
successful call of shmem_whiteout().
Not hard to fix, fortunately - mtree_store() can't fail if the index we
are trying to store into is already present in the tree as a singleton.
For simple_offset_rename_exchange() that's enough - we just need to be
careful about the order of operations.
For simple_offset_rename() solution is to preinsert the target into the
tree for new_dir; the rest can be done without any potentially failing
operations.
That preinsertion has to be done in shmem_rename2() rather than in
simple_offset_rename() itself - otherwise we'd need to deal with the
possibility of failure after successful shmem_whiteout().
Fixes: a2e459555c5f ("shmem: stable directory offsets")
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
fs/libfs.c | 50 +++++++++++++++++++---------------------------
include/linux/fs.h | 2 +-
mm/shmem.c | 18 ++++++++++++-----
3 files changed, 35 insertions(+), 35 deletions(-)
diff --git a/fs/libfs.c b/fs/libfs.c
index 9264523be85c..591eb649ebba 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -346,22 +346,22 @@ void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry)
* User space expects the directory offset value of the replaced
* (new) directory entry to be unchanged after a rename.
*
- * Returns zero on success, a negative errno value on failure.
+ * Caller must have grabbed a slot for new_dentry in the maple_tree
+ * associated with new_dir, even if dentry is negative.
*/
-int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
- struct inode *new_dir, struct dentry *new_dentry)
+void simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
+ struct inode *new_dir, struct dentry *new_dentry)
{
struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir);
struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir);
long new_offset = dentry2offset(new_dentry);
- simple_offset_remove(old_ctx, old_dentry);
+ if (WARN_ON(!new_offset))
+ return;
- if (new_offset) {
- offset_set(new_dentry, 0);
- return simple_offset_replace(new_ctx, old_dentry, new_offset);
- }
- return simple_offset_add(new_ctx, old_dentry);
+ simple_offset_remove(old_ctx, old_dentry);
+ offset_set(new_dentry, 0);
+ WARN_ON(simple_offset_replace(new_ctx, old_dentry, new_offset));
}
/**
@@ -388,31 +388,23 @@ int simple_offset_rename_exchange(struct inode *old_dir,
long new_index = dentry2offset(new_dentry);
int ret;
- simple_offset_remove(old_ctx, old_dentry);
- simple_offset_remove(new_ctx, new_dentry);
+ if (WARN_ON(!old_index || !new_index))
+ return -EINVAL;
- ret = simple_offset_replace(new_ctx, old_dentry, new_index);
- if (ret)
- goto out_restore;
+ ret = mtree_store(&new_ctx->mt, new_index, old_dentry, GFP_KERNEL);
+ if (WARN_ON(ret))
+ return ret;
- ret = simple_offset_replace(old_ctx, new_dentry, old_index);
- if (ret) {
- simple_offset_remove(new_ctx, old_dentry);
- goto out_restore;
+ ret = mtree_store(&old_ctx->mt, old_index, new_dentry, GFP_KERNEL);
+ if (WARN_ON(ret)) {
+ mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL);
+ return ret;
}
- ret = simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
- if (ret) {
- simple_offset_remove(new_ctx, old_dentry);
- simple_offset_remove(old_ctx, new_dentry);
- goto out_restore;
- }
+ offset_set(old_dentry, new_index);
+ offset_set(new_dentry, old_index);
+ simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
return 0;
-
-out_restore:
- (void)simple_offset_replace(old_ctx, old_dentry, old_index);
- (void)simple_offset_replace(new_ctx, new_dentry, new_index);
- return ret;
}
/**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 04ceeca12a0d..f5c9cf28c4dc 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -3247,7 +3247,7 @@ struct offset_ctx {
void simple_offset_init(struct offset_ctx *octx);
int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry);
void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry);
-int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
+void simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
struct inode *new_dir, struct dentry *new_dentry);
int simple_offset_rename_exchange(struct inode *old_dir,
struct dentry *old_dentry,
diff --git a/mm/shmem.c b/mm/shmem.c
index d3edc809e2e7..4232f8a39a43 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -4039,6 +4039,7 @@ static int shmem_rename2(struct mnt_idmap *idmap,
struct inode *inode = d_inode(old_dentry);
int they_are_dirs = S_ISDIR(inode->i_mode);
int error;
+ int had_offset = false;
if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
return -EINVAL;
@@ -4050,16 +4051,23 @@ static int shmem_rename2(struct mnt_idmap *idmap,
if (!simple_empty(new_dentry))
return -ENOTEMPTY;
+ error = simple_offset_add(shmem_get_offset_ctx(new_dir), new_dentry);
+ if (error == -EBUSY)
+ had_offset = true;
+ else if (unlikely(error))
+ return error;
+
if (flags & RENAME_WHITEOUT) {
error = shmem_whiteout(idmap, old_dir, old_dentry);
- if (error)
+ if (error) {
+ if (!had_offset)
+ simple_offset_remove(shmem_get_offset_ctx(new_dir),
+ new_dentry);
return error;
+ }
}
- error = simple_offset_rename(old_dir, old_dentry, new_dir, new_dentry);
- if (error)
- return error;
-
+ simple_offset_rename(old_dir, old_dentry, new_dir, new_dentry);
if (d_really_is_positive(new_dentry)) {
(void) shmem_unlink(new_dir, new_dentry);
if (they_are_dirs) {
--
2.47.3
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 2/2] shmem: fix recovery on rename failures
2025-12-14 3:30 ` [RFC][PATCH 2/2] shmem: fix recovery on rename failures Al Viro
@ 2025-12-15 7:38 ` Hugh Dickins
2025-12-15 11:58 ` Christian Brauner
2025-12-15 16:03 ` Chuck Lever
2 siblings, 0 replies; 18+ messages in thread
From: Hugh Dickins @ 2025-12-15 7:38 UTC (permalink / raw)
To: Al Viro
Cc: Hugh Dickins, Miklos Szeredi, Christian Brauner, Andrew Morton,
Baolin Wang, linux-fsdevel, linux-kernel, linux-mm,
Linus Torvalds, Chuck Lever
On Sun, 14 Dec 2025, Al Viro wrote:
> maple_tree insertions can fail if we are seriously short on memory;
> simple_offset_rename() does not recover well if it runs into that.
> The same goes for simple_offset_rename_exchange().
>
> Moreover, shmem_whiteout() expects that if it succeeds, the caller will
> progress to d_move(), i.e. that shmem_rename2() won't fail past the
> successful call of shmem_whiteout().
>
> Not hard to fix, fortunately - mtree_store() can't fail if the index we
> are trying to store into is already present in the tree as a singleton.
>
> For simple_offset_rename_exchange() that's enough - we just need to be
> careful about the order of operations.
>
> For simple_offset_rename() solution is to preinsert the target into the
> tree for new_dir; the rest can be done without any potentially failing
> operations.
>
> That preinsertion has to be done in shmem_rename2() rather than in
> simple_offset_rename() itself - otherwise we'd need to deal with the
> possibility of failure after successful shmem_whiteout().
>
> Fixes: a2e459555c5f ("shmem: stable directory offsets")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Well, what you say above, and what you've done below, make sense to me;
and neither I nor xfstests noticed anything wrong (aside from one
trivium - I'd prefer "bool had_offset = false" to "int ...";
maybe placed one line up to look prettier).
But please don't expect proper engagement from me on this one,
it's above my head, and I'll trust you and Chuck on it.
Thanks,
Hugh
> ---
> fs/libfs.c | 50 +++++++++++++++++++---------------------------
> include/linux/fs.h | 2 +-
> mm/shmem.c | 18 ++++++++++++-----
> 3 files changed, 35 insertions(+), 35 deletions(-)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 9264523be85c..591eb649ebba 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -346,22 +346,22 @@ void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry)
> * User space expects the directory offset value of the replaced
> * (new) directory entry to be unchanged after a rename.
> *
> - * Returns zero on success, a negative errno value on failure.
> + * Caller must have grabbed a slot for new_dentry in the maple_tree
> + * associated with new_dir, even if dentry is negative.
> */
> -int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
> - struct inode *new_dir, struct dentry *new_dentry)
> +void simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
> + struct inode *new_dir, struct dentry *new_dentry)
> {
> struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir);
> struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir);
> long new_offset = dentry2offset(new_dentry);
>
> - simple_offset_remove(old_ctx, old_dentry);
> + if (WARN_ON(!new_offset))
> + return;
>
> - if (new_offset) {
> - offset_set(new_dentry, 0);
> - return simple_offset_replace(new_ctx, old_dentry, new_offset);
> - }
> - return simple_offset_add(new_ctx, old_dentry);
> + simple_offset_remove(old_ctx, old_dentry);
> + offset_set(new_dentry, 0);
> + WARN_ON(simple_offset_replace(new_ctx, old_dentry, new_offset));
> }
>
> /**
> @@ -388,31 +388,23 @@ int simple_offset_rename_exchange(struct inode *old_dir,
> long new_index = dentry2offset(new_dentry);
> int ret;
>
> - simple_offset_remove(old_ctx, old_dentry);
> - simple_offset_remove(new_ctx, new_dentry);
> + if (WARN_ON(!old_index || !new_index))
> + return -EINVAL;
>
> - ret = simple_offset_replace(new_ctx, old_dentry, new_index);
> - if (ret)
> - goto out_restore;
> + ret = mtree_store(&new_ctx->mt, new_index, old_dentry, GFP_KERNEL);
> + if (WARN_ON(ret))
> + return ret;
>
> - ret = simple_offset_replace(old_ctx, new_dentry, old_index);
> - if (ret) {
> - simple_offset_remove(new_ctx, old_dentry);
> - goto out_restore;
> + ret = mtree_store(&old_ctx->mt, old_index, new_dentry, GFP_KERNEL);
> + if (WARN_ON(ret)) {
> + mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL);
> + return ret;
> }
>
> - ret = simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
> - if (ret) {
> - simple_offset_remove(new_ctx, old_dentry);
> - simple_offset_remove(old_ctx, new_dentry);
> - goto out_restore;
> - }
> + offset_set(old_dentry, new_index);
> + offset_set(new_dentry, old_index);
> + simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
> return 0;
> -
> -out_restore:
> - (void)simple_offset_replace(old_ctx, old_dentry, old_index);
> - (void)simple_offset_replace(new_ctx, new_dentry, new_index);
> - return ret;
> }
>
> /**
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 04ceeca12a0d..f5c9cf28c4dc 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3247,7 +3247,7 @@ struct offset_ctx {
> void simple_offset_init(struct offset_ctx *octx);
> int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry);
> void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry);
> -int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
> +void simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
> struct inode *new_dir, struct dentry *new_dentry);
> int simple_offset_rename_exchange(struct inode *old_dir,
> struct dentry *old_dentry,
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d3edc809e2e7..4232f8a39a43 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -4039,6 +4039,7 @@ static int shmem_rename2(struct mnt_idmap *idmap,
> struct inode *inode = d_inode(old_dentry);
> int they_are_dirs = S_ISDIR(inode->i_mode);
> int error;
> + int had_offset = false;
>
> if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
> return -EINVAL;
> @@ -4050,16 +4051,23 @@ static int shmem_rename2(struct mnt_idmap *idmap,
> if (!simple_empty(new_dentry))
> return -ENOTEMPTY;
>
> + error = simple_offset_add(shmem_get_offset_ctx(new_dir), new_dentry);
> + if (error == -EBUSY)
> + had_offset = true;
> + else if (unlikely(error))
> + return error;
> +
> if (flags & RENAME_WHITEOUT) {
> error = shmem_whiteout(idmap, old_dir, old_dentry);
> - if (error)
> + if (error) {
> + if (!had_offset)
> + simple_offset_remove(shmem_get_offset_ctx(new_dir),
> + new_dentry);
> return error;
> + }
> }
>
> - error = simple_offset_rename(old_dir, old_dentry, new_dir, new_dentry);
> - if (error)
> - return error;
> -
> + simple_offset_rename(old_dir, old_dentry, new_dir, new_dentry);
> if (d_really_is_positive(new_dentry)) {
> (void) shmem_unlink(new_dir, new_dentry);
> if (they_are_dirs) {
> --
> 2.47.3
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 2/2] shmem: fix recovery on rename failures
2025-12-14 3:30 ` [RFC][PATCH 2/2] shmem: fix recovery on rename failures Al Viro
2025-12-15 7:38 ` Hugh Dickins
@ 2025-12-15 11:58 ` Christian Brauner
2025-12-15 16:03 ` Chuck Lever
2 siblings, 0 replies; 18+ messages in thread
From: Christian Brauner @ 2025-12-15 11:58 UTC (permalink / raw)
To: Al Viro
Cc: Hugh Dickins, Miklos Szeredi, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm, Linus Torvalds,
Chuck Lever
On Sun, Dec 14, 2025 at 03:30:49AM +0000, Al Viro wrote:
> maple_tree insertions can fail if we are seriously short on memory;
> simple_offset_rename() does not recover well if it runs into that.
> The same goes for simple_offset_rename_exchange().
>
> Moreover, shmem_whiteout() expects that if it succeeds, the caller will
> progress to d_move(), i.e. that shmem_rename2() won't fail past the
> successful call of shmem_whiteout().
>
> Not hard to fix, fortunately - mtree_store() can't fail if the index we
> are trying to store into is already present in the tree as a singleton.
>
> For simple_offset_rename_exchange() that's enough - we just need to be
> careful about the order of operations.
>
> For simple_offset_rename() solution is to preinsert the target into the
> tree for new_dir; the rest can be done without any potentially failing
> operations.
>
> That preinsertion has to be done in shmem_rename2() rather than in
> simple_offset_rename() itself - otherwise we'd need to deal with the
> possibility of failure after successful shmem_whiteout().
>
> Fixes: a2e459555c5f ("shmem: stable directory offsets")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
Looks good to me,
Reviewed-by: Christian Brauner <brauner@kernel.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 2/2] shmem: fix recovery on rename failures
2025-12-14 3:30 ` [RFC][PATCH 2/2] shmem: fix recovery on rename failures Al Viro
2025-12-15 7:38 ` Hugh Dickins
2025-12-15 11:58 ` Christian Brauner
@ 2025-12-15 16:03 ` Chuck Lever
2025-12-15 16:54 ` Al Viro
2 siblings, 1 reply; 18+ messages in thread
From: Chuck Lever @ 2025-12-15 16:03 UTC (permalink / raw)
To: Al Viro, Hugh Dickins
Cc: Miklos Szeredi, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm, Linus Torvalds
On 12/13/25 10:30 PM, Al Viro wrote:
> maple_tree insertions can fail if we are seriously short on memory;
> simple_offset_rename() does not recover well if it runs into that.
> The same goes for simple_offset_rename_exchange().
>
> Moreover, shmem_whiteout() expects that if it succeeds, the caller will
> progress to d_move(), i.e. that shmem_rename2() won't fail past the
> successful call of shmem_whiteout().
>
> Not hard to fix, fortunately - mtree_store() can't fail if the index we
> are trying to store into is already present in the tree as a singleton.
>
> For simple_offset_rename_exchange() that's enough - we just need to be
> careful about the order of operations.
>
> For simple_offset_rename() solution is to preinsert the target into the
> tree for new_dir; the rest can be done without any potentially failing
> operations.
>
> That preinsertion has to be done in shmem_rename2() rather than in
> simple_offset_rename() itself - otherwise we'd need to deal with the
> possibility of failure after successful shmem_whiteout().
>
> Fixes: a2e459555c5f ("shmem: stable directory offsets")
> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
> ---
> fs/libfs.c | 50 +++++++++++++++++++---------------------------
> include/linux/fs.h | 2 +-
> mm/shmem.c | 18 ++++++++++++-----
> 3 files changed, 35 insertions(+), 35 deletions(-)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 9264523be85c..591eb649ebba 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -346,22 +346,22 @@ void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry)
> * User space expects the directory offset value of the replaced
> * (new) directory entry to be unchanged after a rename.
> *
> - * Returns zero on success, a negative errno value on failure.
> + * Caller must have grabbed a slot for new_dentry in the maple_tree
> + * associated with new_dir, even if dentry is negative.
> */
> -int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
> - struct inode *new_dir, struct dentry *new_dentry)
> +void simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
> + struct inode *new_dir, struct dentry *new_dentry)
> {
> struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir);
> struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir);
> long new_offset = dentry2offset(new_dentry);
>
> - simple_offset_remove(old_ctx, old_dentry);
> + if (WARN_ON(!new_offset))
> + return;
>
> - if (new_offset) {
> - offset_set(new_dentry, 0);
> - return simple_offset_replace(new_ctx, old_dentry, new_offset);
> - }
> - return simple_offset_add(new_ctx, old_dentry);
> + simple_offset_remove(old_ctx, old_dentry);
> + offset_set(new_dentry, 0);
> + WARN_ON(simple_offset_replace(new_ctx, old_dentry, new_offset));
> }
>
> /**
> @@ -388,31 +388,23 @@ int simple_offset_rename_exchange(struct inode *old_dir,
> long new_index = dentry2offset(new_dentry);
> int ret;
>
> - simple_offset_remove(old_ctx, old_dentry);
> - simple_offset_remove(new_ctx, new_dentry);
> + if (WARN_ON(!old_index || !new_index))
> + return -EINVAL;
>
> - ret = simple_offset_replace(new_ctx, old_dentry, new_index);
> - if (ret)
> - goto out_restore;
> + ret = mtree_store(&new_ctx->mt, new_index, old_dentry, GFP_KERNEL);
> + if (WARN_ON(ret))
> + return ret;
>
> - ret = simple_offset_replace(old_ctx, new_dentry, old_index);
> - if (ret) {
> - simple_offset_remove(new_ctx, old_dentry);
> - goto out_restore;
> + ret = mtree_store(&old_ctx->mt, old_index, new_dentry, GFP_KERNEL);
> + if (WARN_ON(ret)) {
> + mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL);
Under extreme memory pressure, this mtree_store() might also fail?
> + return ret;
> }
>
> - ret = simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
> - if (ret) {
> - simple_offset_remove(new_ctx, old_dentry);
> - simple_offset_remove(old_ctx, new_dentry);
> - goto out_restore;
> - }
> + offset_set(old_dentry, new_index);
> + offset_set(new_dentry, old_index);
> + simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
> return 0;
> -
> -out_restore:
> - (void)simple_offset_replace(old_ctx, old_dentry, old_index);
> - (void)simple_offset_replace(new_ctx, new_dentry, new_index);
> - return ret;
> }
>
> /**
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 04ceeca12a0d..f5c9cf28c4dc 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -3247,7 +3247,7 @@ struct offset_ctx {
> void simple_offset_init(struct offset_ctx *octx);
> int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry);
> void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry);
> -int simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
> +void simple_offset_rename(struct inode *old_dir, struct dentry *old_dentry,
> struct inode *new_dir, struct dentry *new_dentry);
> int simple_offset_rename_exchange(struct inode *old_dir,
> struct dentry *old_dentry,
> diff --git a/mm/shmem.c b/mm/shmem.c
> index d3edc809e2e7..4232f8a39a43 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -4039,6 +4039,7 @@ static int shmem_rename2(struct mnt_idmap *idmap,
> struct inode *inode = d_inode(old_dentry);
> int they_are_dirs = S_ISDIR(inode->i_mode);
> int error;
> + int had_offset = false;
>
> if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
> return -EINVAL;
> @@ -4050,16 +4051,23 @@ static int shmem_rename2(struct mnt_idmap *idmap,
> if (!simple_empty(new_dentry))
> return -ENOTEMPTY;
>
> + error = simple_offset_add(shmem_get_offset_ctx(new_dir), new_dentry);
> + if (error == -EBUSY)
> + had_offset = true;
> + else if (unlikely(error))
> + return error;
> +
> if (flags & RENAME_WHITEOUT) {
> error = shmem_whiteout(idmap, old_dir, old_dentry);
> - if (error)
> + if (error) {
> + if (!had_offset)
> + simple_offset_remove(shmem_get_offset_ctx(new_dir),
> + new_dentry);
> return error;
> + }
> }
>
> - error = simple_offset_rename(old_dir, old_dentry, new_dir, new_dentry);
> - if (error)
> - return error;
> -
> + simple_offset_rename(old_dir, old_dentry, new_dir, new_dentry);
> if (d_really_is_positive(new_dentry)) {
> (void) shmem_unlink(new_dir, new_dentry);
> if (they_are_dirs) {
Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
--
Chuck Lever
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [RFC][PATCH 2/2] shmem: fix recovery on rename failures
2025-12-15 16:03 ` Chuck Lever
@ 2025-12-15 16:54 ` Al Viro
0 siblings, 0 replies; 18+ messages in thread
From: Al Viro @ 2025-12-15 16:54 UTC (permalink / raw)
To: Chuck Lever
Cc: Hugh Dickins, Miklos Szeredi, Christian Brauner, Andrew Morton,
Baolin Wang, linux-fsdevel, linux-kernel, linux-mm,
Linus Torvalds
On Mon, Dec 15, 2025 at 11:03:58AM -0500, Chuck Lever wrote:
> > @@ -388,31 +388,23 @@ int simple_offset_rename_exchange(struct inode *old_dir,
> > long new_index = dentry2offset(new_dentry);
> > int ret;
> >
> > - simple_offset_remove(old_ctx, old_dentry);
> > - simple_offset_remove(new_ctx, new_dentry);
> > + if (WARN_ON(!old_index || !new_index))
> > + return -EINVAL;
> >
> > - ret = simple_offset_replace(new_ctx, old_dentry, new_index);
> > - if (ret)
> > - goto out_restore;
> > + ret = mtree_store(&new_ctx->mt, new_index, old_dentry, GFP_KERNEL);
> > + if (WARN_ON(ret))
> > + return ret;
> >
> > - ret = simple_offset_replace(old_ctx, new_dentry, old_index);
> > - if (ret) {
> > - simple_offset_remove(new_ctx, old_dentry);
> > - goto out_restore;
> > + ret = mtree_store(&old_ctx->mt, old_index, new_dentry, GFP_KERNEL);
> > + if (WARN_ON(ret)) {
> > + mtree_store(&new_ctx->mt, new_index, new_dentry, GFP_KERNEL);
>
> Under extreme memory pressure, this mtree_store() might also fail?
Neither should, really; adding after entry removal, as the mainline
does, might need allocations. But mtree_store() when entry exists
and isn't a part of a range should not allocate anything.
What happens is that mas_wr_store_type() will return wr_exact_fit to
mas_wr_preallocate(), which will shove it into ->store_type before
calling mas_prealloc_calc(), getting ->node_request set to 0 by the
latter, seeing that and buggering off without allocating anything.
So these WARN_ON() are of the "if it triggers, something's really wrong -
either lib/maple_tree.c had an odd change of behaviour, or we have
our tree in unexpected state" variety, not "warn that operation's
failing due to OOM" one.
^ permalink raw reply [flat|nested] 18+ messages in thread
* [git pull] shmem rename fixes
2025-12-14 3:27 ` shmem_rename() bugs (was Re: 6.19 tmpfs __d_lookup() lockup) Al Viro
2025-12-14 3:30 ` [PATCH 1/2] shmem_whiteout(): fix regression from tree-in-dcache series Al Viro
2025-12-14 3:30 ` [RFC][PATCH 2/2] shmem: fix recovery on rename failures Al Viro
@ 2025-12-16 6:02 ` Al Viro
2025-12-16 8:04 ` pr-tracker-bot
2 siblings, 1 reply; 18+ messages in thread
From: Al Viro @ 2025-12-16 6:02 UTC (permalink / raw)
To: Linus Torvalds
Cc: Miklos Szeredi, Christian Brauner, Andrew Morton, Baolin Wang,
linux-fsdevel, linux-kernel, linux-mm, Chuck Lever, Hugh Dickins
The following changes since commit 0048fbb4011ec55c32d3148b2cda56433f273375:
Merge tag 'locking-futex-2025-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip (2025-12-10 17:21:30 +0900)
are available in the Git repository at:
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git tags/pull-fixes
for you to fetch changes up to e1b4c6a58304fd490124cc2b454d80edc786665c:
shmem: fix recovery on rename failures (2025-12-16 00:57:29 -0500)
----------------------------------------------------------------
a couple of shmem rename fixes - recent regression from tree-in-dcache
series and older breakage from stable directory offsets stuff.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
----------------------------------------------------------------
Al Viro (2):
shmem_whiteout(): fix regression from tree-in-dcache series
shmem: fix recovery on rename failures
fs/libfs.c | 50 +++++++++++++++++++++-----------------------------
include/linux/fs.h | 2 +-
mm/shmem.c | 32 ++++++++++++++------------------
3 files changed, 36 insertions(+), 48 deletions(-)
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [git pull] shmem rename fixes
2025-12-16 6:02 ` [git pull] shmem rename fixes Al Viro
@ 2025-12-16 8:04 ` pr-tracker-bot
0 siblings, 0 replies; 18+ messages in thread
From: pr-tracker-bot @ 2025-12-16 8:04 UTC (permalink / raw)
To: Al Viro
Cc: Linus Torvalds, Miklos Szeredi, Christian Brauner, Andrew Morton,
Baolin Wang, linux-fsdevel, linux-kernel, linux-mm, Chuck Lever,
Hugh Dickins
The pull request you sent on Tue, 16 Dec 2025 06:02:57 +0000:
> git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git tags/pull-fixes
has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/40fbbd64bba6c6e7a72885d2f59b6a3be9991eeb
Thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2025-12-16 8:07 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-12-12 3:56 6.19 tmpfs __d_lookup() lockup Hugh Dickins
2025-12-12 5:02 ` Al Viro
2025-12-12 5:15 ` Hugh Dickins
2025-12-12 5:34 ` Al Viro
2025-12-12 5:57 ` Hugh Dickins
2025-12-12 6:30 ` Al Viro
2025-12-12 7:17 ` Hugh Dickins
2025-12-12 10:12 ` Hugh Dickins
2025-12-13 7:22 ` Al Viro
2025-12-14 3:27 ` shmem_rename() bugs (was Re: 6.19 tmpfs __d_lookup() lockup) Al Viro
2025-12-14 3:30 ` [PATCH 1/2] shmem_whiteout(): fix regression from tree-in-dcache series Al Viro
2025-12-14 3:30 ` [RFC][PATCH 2/2] shmem: fix recovery on rename failures Al Viro
2025-12-15 7:38 ` Hugh Dickins
2025-12-15 11:58 ` Christian Brauner
2025-12-15 16:03 ` Chuck Lever
2025-12-15 16:54 ` Al Viro
2025-12-16 6:02 ` [git pull] shmem rename fixes Al Viro
2025-12-16 8:04 ` pr-tracker-bot
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox