From: Bernd Schubert <bernd.schubert@fastmail.fm>
To: Chuck Lever <cel@kernel.org>,
viro@zeniv.linux.org.uk, brauner@kernel.org, hughd@google.com,
akpm@linux-foundation.org
Cc: Chuck Lever <chuck.lever@oracle.com>,
jlayton@redhat.com, linux-mm@kvack.org,
linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v6 1/3] libfs: Add directory operations for stable offsets
Date: Fri, 30 Jun 2023 13:08:03 +0200 [thread overview]
Message-ID: <ca09c22e-8b85-b758-e38a-df78e829f132@fastmail.fm> (raw)
In-Reply-To: <168796590904.157221.11286772826871541854.stgit@manet.1015granger.net>
On 6/28/23 17:25, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> Create a vector of directory operations in fs/libfs.c that handles
> directory seeks and readdir via stable offsets instead of the
> current cursor-based mechanism.
>
> For the moment these are unused.
>
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
> fs/libfs.c | 247 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> include/linux/fs.h | 18 ++++
> 2 files changed, 265 insertions(+)
>
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 89cf614a3271..2b0d5ac472df 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -239,6 +239,253 @@ const struct inode_operations simple_dir_inode_operations = {
> };
> EXPORT_SYMBOL(simple_dir_inode_operations);
>
> +static void offset_set(struct dentry *dentry, unsigned long offset)
> +{
> + dentry->d_fsdata = (void *)offset;
> +}
> +
> +static unsigned long dentry2offset(struct dentry *dentry)
> +{
> + return (unsigned long)dentry->d_fsdata;
> +}
> +
> +/**
> + * simple_offset_init - initialize an offset_ctx
> + * @octx: directory offset map to be initialized
> + *
> + */
> +void simple_offset_init(struct offset_ctx *octx)
> +{
> + xa_init_flags(&octx->xa, XA_FLAGS_ALLOC1);
> +
> + /* 0 is '.', 1 is '..', so always start with offset 2 */
> + octx->next_offset = 2;
> +}
> +
> +/**
> + * simple_offset_add - Add an entry to a directory's offset map
> + * @octx: directory offset ctx to be updated
> + * @dentry: new dentry being added
> + *
> + * Returns zero on success. @so_ctx and the dentry offset are updated.
> + * Otherwise, a negative errno value is returned.
> + */
> +int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry)
> +{
> + static const struct xa_limit limit = XA_LIMIT(2, U32_MAX);
> + u32 offset;
> + int ret;
> +
> + if (dentry2offset(dentry) != 0)
> + return -EBUSY;
> +
> + ret = xa_alloc_cyclic(&octx->xa, &offset, dentry, limit,
> + &octx->next_offset, GFP_KERNEL);
> + if (ret < 0)
> + return ret;
> +
> + offset_set(dentry, offset);
> + return 0;
> +}
> +
> +/**
> + * simple_offset_remove - Remove an entry to a directory's offset map
> + * @octx: directory offset ctx to be updated
> + * @dentry: dentry being removed
> + *
> + */
> +void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry)
> +{
> + unsigned long index = dentry2offset(dentry);
> +
> + if (index == 0)
> + return;
> +
> + xa_erase(&octx->xa, index);
> + offset_set(dentry, 0);
> +}
> +
> +/**
> + * simple_offset_rename_exchange - exchange rename with directory offsets
> + * @old_dir: parent of dentry being moved
> + * @old_dentry: dentry being moved
> + * @new_dir: destination parent
> + * @new_dentry: destination dentry
> + *
> + * Returns zero on success. Otherwise a negative errno is returned and the
> + * rename is rolled back.
> + */
> +int simple_offset_rename_exchange(struct inode *old_dir,
> + struct dentry *old_dentry,
> + struct inode *new_dir,
> + struct dentry *new_dentry)
> +{
> + struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir);
> + struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir);
> + unsigned long old_index = dentry2offset(old_dentry);
> + unsigned long new_index = dentry2offset(new_dentry);
> + int ret;
> +
> + simple_offset_remove(old_ctx, old_dentry);
> + simple_offset_remove(new_ctx, new_dentry);
> +
> + ret = simple_offset_add(new_ctx, old_dentry);
> + if (ret)
> + goto out_restore;
> +
> + ret = simple_offset_add(old_ctx, new_dentry);
> + if (ret) {
> + simple_offset_remove(new_ctx, old_dentry);
> + goto out_restore;
> + }
> +
> + ret = simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
> + if (ret) {
> + simple_offset_remove(new_ctx, old_dentry);
> + simple_offset_remove(old_ctx, new_dentry);
> + goto out_restore;
> + }
> + return 0;
> +
> +out_restore:
> + offset_set(old_dentry, old_index);
> + xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL);
> + offset_set(new_dentry, new_index);
> + xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL);
> + return ret;
> +}
Thanks for the update, looks great!
> +
> +/**
> + * simple_offset_destroy - Release offset map
> + * @octx: directory offset ctx that is about to be destroyed
> + *
> + * During fs teardown (eg. umount), a directory's offset map might still
> + * contain entries. xa_destroy() cleans out anything that remains.
> + */
> +void simple_offset_destroy(struct offset_ctx *octx)
> +{
> + xa_destroy(&octx->xa);
> +}
> +
> +/**
> + * offset_dir_llseek - Advance the read position of a directory descriptor
> + * @file: an open directory whose position is to be updated
> + * @offset: a byte offset
> + * @whence: enumerator describing the starting position for this update
> + *
> + * SEEK_END, SEEK_DATA, and SEEK_HOLE are not supported for directories.
> + *
> + * Returns the updated read position if successful; otherwise a
> + * negative errno is returned and the read position remains unchanged.
> + */
> +static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
> +{
> + switch (whence) {
> + case SEEK_CUR:
> + offset += file->f_pos;
> + fallthrough;
> + case SEEK_SET:
> + if (offset >= 0)
> + break;
> + fallthrough;
> + default:
> + return -EINVAL;
> + }
> +
> + return vfs_setpos(file, offset, U32_MAX);
> +}
> +
> +static struct dentry *offset_find_next(struct xa_state *xas)
> +{
> + struct dentry *child, *found = NULL;
> +
> + rcu_read_lock();
> + child = xas_next_entry(xas, U32_MAX);
> + if (!child)
> + goto out;
> + spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
> + if (simple_positive(child))
> + found = dget_dlock(child);
> + spin_unlock(&child->d_lock);
> +out:
> + rcu_read_unlock();
> + return found;
> +}
> +
> +static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
> +{
> + loff_t offset = dentry2offset(dentry);
> + struct inode *inode = d_inode(dentry);
> +
> + return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset,
> + inode->i_ino, fs_umode_to_dtype(inode->i_mode));
> +}
> +
> +static void offset_iterate_dir(struct dentry *dir, struct dir_context *ctx)
> +{
> + struct inode *inode = d_inode(dir);
> + struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode);
> + XA_STATE(xas, &so_ctx->xa, ctx->pos);
> + struct dentry *dentry;
> +
> + while (true) {
> + spin_lock(&dir->d_lock);
> + dentry = offset_find_next(&xas);
> + spin_unlock(&dir->d_lock);
> + if (!dentry)
> + break;
> +
> + if (!offset_dir_emit(ctx, dentry)) {
> + dput(dentry);
> + break;
> + }
> +
> + dput(dentry);
> + ctx->pos = xas.xa_index + 1;
> + }
> +}
> +
> +/**
> + * offset_readdir - Emit entries starting at offset @ctx->pos
> + * @file: an open directory to iterate over
> + * @ctx: directory iteration context
> + *
> + * Caller must hold @file's i_rwsem to prevent insertion or removal of
> + * entries during this call.
> + *
> + * On entry, @ctx->pos contains an offset that represents the first entry
> + * to be read from the directory.
> + *
> + * The operation continues until there are no more entries to read, or
> + * until the ctx->actor indicates there is no more space in the caller's
> + * output buffer.
> + *
> + * On return, @ctx->pos contains an offset that will read the next entry
> + * in this directory when shmem_readdir() is called again with @ctx.
> + *
> + * Return values:
> + * %0 - Complete
> + */
> +static int offset_readdir(struct file *file, struct dir_context *ctx)
> +{
> + struct dentry *dir = file->f_path.dentry;
> +
> + lockdep_assert_held(&d_inode(dir)->i_rwsem);
> +
> + if (!dir_emit_dots(file, ctx))
> + return 0;
> +
> + offset_iterate_dir(dir, ctx);
> + return 0;
> +}
> +
> +const struct file_operations simple_offset_dir_operations = {
> + .llseek = offset_dir_llseek,
> + .iterate_shared = offset_readdir,
> + .read = generic_read_dir,
> + .fsync = noop_fsync,
> +};
> +
> static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev)
> {
> struct dentry *child = NULL;
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 133f0640fb24..85de389e4eb8 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -1767,6 +1767,7 @@ struct dir_context {
>
> struct iov_iter;
> struct io_uring_cmd;
> +struct offset_ctx;
>
> struct file_operations {
> struct module *owner;
> @@ -1854,6 +1855,7 @@ struct inode_operations {
> int (*fileattr_set)(struct mnt_idmap *idmap,
> struct dentry *dentry, struct fileattr *fa);
> int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa);
> + struct offset_ctx *(*get_offset_ctx)(struct inode *inode);
> } ____cacheline_aligned;
Should this be documented in filesystems/vfs.rst and
filesystems/locking.rst?
Thanks,
Bernd
next prev parent reply other threads:[~2023-06-30 11:08 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-28 15:25 [PATCH v6 0/3] shmemfs stable directory offsets Chuck Lever
2023-06-28 15:25 ` [PATCH v6 1/3] libfs: Add directory operations for stable offsets Chuck Lever
2023-06-30 11:08 ` Bernd Schubert [this message]
2023-06-28 15:25 ` [PATCH v6 2/3] shmem: Refactor shmem_symlink() Chuck Lever
2023-06-28 15:25 ` [PATCH v6 3/3] shmem: stable directory offsets Chuck Lever
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ca09c22e-8b85-b758-e38a-df78e829f132@fastmail.fm \
--to=bernd.schubert@fastmail.fm \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hughd@google.com \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox