From: Christian Brauner <brauner@kernel.org>
To: Chuck Lever III <chuck.lever@oracle.com>
Cc: Jeff Layton <jlayton@kernel.org>, Chuck Lever <cel@kernel.org>,
linux-mm <linux-mm@kvack.org>,
Linux FS Devel <linux-fsdevel@vger.kernel.org>,
Hugh Dickens <hughd@google.com>,
"yukuai (C)" <yukuai3@huawei.com>,
"yangerkun@huaweicloud.com" <yangerkun@huaweicloud.com>
Subject: Re: [RFC PATCH 2/2] libfs: Improve behavior when directory offset values wrap
Date: Fri, 22 Nov 2024 13:57:43 +0100 [thread overview]
Message-ID: <20241122-bauer-willst-2d418ff7ab32@brauner> (raw)
In-Reply-To: <34F4206C-8C5F-4505-9E8F-2148E345B45E@oracle.com>
On Thu, Nov 21, 2024 at 02:54:16PM +0000, Chuck Lever III wrote:
>
>
> > On Nov 21, 2024, at 3:34 AM, Christian Brauner <brauner@kernel.org> wrote:
> >
> > On Wed, Nov 20, 2024 at 10:05:42AM -0500, Chuck Lever wrote:
> >> On Wed, Nov 20, 2024 at 09:59:54AM +0100, Christian Brauner wrote:
> >>> On Mon, Nov 18, 2024 at 03:58:09PM -0500, Chuck Lever wrote:
> >>>>
> >>>> I've been trying to imagine a solution that does not depend on the
> >>>> range of an integer value and has solidly deterministic behavior in
> >>>> the face of multiple child entry creations during one timer tick.
> >>>>
> >>>> We could partially re-use the legacy cursor/list mechanism.
> >>>>
> >>>> * When a child entry is created, it is added at the end of the
> >>>> parent's d_children list.
> >>>> * When a child entry is unlinked, it is removed from the parent's
> >>>> d_children list.
> >>>>
> >>>> This includes creation and removal of entries due to a rename.
> >>>>
> >>>>
> >>>> * When a directory is opened, mark the current end of the d_children
> >>>> list with a cursor dentry. New entries would then be added to this
> >>>> directory following this cursor dentry in the directory's
> >>>> d_children list.
> >>>> * When a directory is closed, its cursor dentry is removed from the
> >>>> d_children list and freed.
> >>>>
> >>>> Each cursor dentry would need to refer to an opendir instance
> >>>> (using, say, a pointer to the "struct file" for that open) so that
> >>>> multiple cursors in the same directory can reside in its d_chilren
> >>>> list and won't interfere with each other. Re-use the cursor dentry's
> >>>> d_fsdata field for that.
> >>>>
> >>>>
> >>>> * offset_readdir gets its starting entry using the mtree/xarray to
> >>>> map ctx->pos to a dentry.
> >>>> * offset_readdir continues iterating by following the .next pointer
> >>>> in the current dentry's d_child field.
> >>>> * offset_readdir returns EOD when it hits the cursor dentry matching
> >>>> this opendir instance.
> >>>>
> >>>>
> >>>> I think all of these operations could be O(1), but it might require
> >>>> some additional locking.
> >>>
> >>> This would be a bigger refactor of the whole stable offset logic. So
> >>> even if we end up doing that I think we should use the jiffies solution
> >>> for now.
> >>
> >> How should I mark patches so they can be posted for discussion and
> >> never applied? This series is marked RFC.
> >
> > There's no reason to not have it tested.
>
> 1/2 is reasonable to apply.
>
> 2/2 is work-in-progress. So, fair enough, if you are applying
> for testing purposes.
>
>
> > Generally I don't apply RFCs
> > but this code has caused various issues over multiple kernel releases so
> > I like to test this early.
>
> The biggest problem has been that I haven't found an
> authoritative and comprehensive explanation of how
> readdir / getdents needs to behave around renames.
>
> The previous cursor-based solution has always been a "best
> effort" implementation, and most of the other file systems
> have interesting gaps and heuristics in this area. It's
> difficult to piece all of these together to get the idealized
> design requirements, and also a get a sense of where
> compromises can be made.
>
> Any advice/guidance is welcome.
I didn't mean to make it sound like you did anything wrong or I was
blaming you. I was literally just trying to say we had weird behavior in
this area for legitimate reasons. Posix states this:
If posix_getdents() is called concurrently with an operation that
adds, deletes, or modifies a directory entry, the results from
posix_getdents() shall reflect either all of the effects of the
concurrent operation or none of them. If a sequence of calls to
posix_getdents() is made that reads from offset zero to end-of-file
and a file is removed from or added to the directory between the
first and last of those calls, whether the sequence of calls returns
an entry for that file is unspecified.
Which to me all reads like we're pretty much free in what to do as long
as we clearly document it.
> >> I am actually half-way through implementing the approach described
> >> here. It is not as big a re-write as you might think, and addresses
> >> some fundamental misunderstandings in the offset_iterate_dir() code.
> >
> > Ok, great then let's see it.
>
> I'm finishing it up now. Unfortunately I have several other
> (NFSD and not) bugs I'm working through.
No horry.
next prev parent reply other threads:[~2024-11-22 12:57 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-17 21:32 [RFC PATCH 0/2] Improve simple directory offset wrap behavior cel
2024-11-17 21:32 ` [RFC PATCH 1/2] libfs: Return ENOSPC when the directory offset range is exhausted cel
2024-11-18 19:55 ` Jeff Layton
2024-11-17 21:32 ` [RFC PATCH 2/2] libfs: Improve behavior when directory offset values wrap cel
2024-11-18 20:00 ` Jeff Layton
2024-11-18 20:58 ` Chuck Lever
2024-11-20 8:59 ` Christian Brauner
2024-11-20 15:05 ` Chuck Lever
2024-11-21 8:34 ` Christian Brauner
2024-11-21 14:54 ` Chuck Lever III
2024-11-21 21:18 ` Hugh Dickins
2024-11-21 21:29 ` Chuck Lever
2024-11-22 8:49 ` Amir Goldstein
2024-11-22 14:24 ` Chuck Lever III
2024-12-04 21:05 ` Chuck Lever
2024-11-22 12:57 ` Christian Brauner [this message]
2024-11-24 21:28 ` Chuck Lever III
2024-11-20 9:01 ` [RFC PATCH 0/2] Improve simple directory offset wrap behavior Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20241122-bauer-willst-2d418ff7ab32@brauner \
--to=brauner@kernel.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hughd@google.com \
--cc=jlayton@kernel.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=yangerkun@huaweicloud.com \
--cc=yukuai3@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox