Re: [PATCH v3 0/3] shmemfs stable directory offsets

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Re: [PATCH v3 0/3] shmemfs stable directory offsets
       [not found] <168605676256.32244.6158641147817585524.stgit@manet.1015granger.net>
@ 2023-06-21 13:12 ` Chuck Lever III
  2023-06-21 14:25   ` Christian Brauner
       [not found] ` <168605705924.32244.13384849924097654559.stgit@manet.1015granger.net>
       [not found] ` <168605707262.32244.4794425063054676856.stgit@manet.1015granger.net>
  2 siblings, 1 reply; 9+ messages in thread
From: Chuck Lever III @ 2023-06-21 13:12 UTC (permalink / raw)
  To: Al Viro, Christian Brauner, Hugh Dickins, Andrew Morton
  Cc: Chuck Lever, Jeff Layton, linux-mm, linux-fsdevel



> On Jun 6, 2023, at 9:10 AM, Chuck Lever <cel@kernel.org> wrote:
> 
> The following series is for continued discussion of the need for
> and implementation of stable directory offsets for shmemfs/tmpfs.
> 
> As discussed in Vancouver, I've re-implemented this change in libfs
> so that other "simple" filesystems can use it. There were a few
> other suggestions made during that event that I haven't tried yet.
> 
> Changes since v2:
> - Move bulk of stable offset support into fs/libfs.c
> - Replace xa_find_after with xas_find_next for efficiency
> 
> Changes since v1:
> - Break the single patch up into a series
> 
> Changes since RFC:
> - Destroy xarray in shmem_destroy_inode() instead of free_in_core_inode()
> - A few cosmetic updates
> 
> ---
> 
> Chuck Lever (3):
>      libfs: Add directory operations for stable offsets
>      shmem: Refactor shmem_symlink()
>      shmem: stable directory offsets
> 
> 
> fs/dcache.c            |   1 +
> fs/libfs.c             | 185 +++++++++++++++++++++++++++++++++++++++++
> include/linux/dcache.h |   1 +
> include/linux/fs.h     |   9 ++
> mm/shmem.c             |  58 +++++++++----
> 5 files changed, 240 insertions(+), 14 deletions(-)

The good news is that so far I have received no complaints from bots
on this series.

The bad news is I have received no human comments. Ping?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 0/3] shmemfs stable directory offsets
  2023-06-21 13:12 ` [PATCH v3 0/3] shmemfs stable directory offsets Chuck Lever III
@ 2023-06-21 14:25   ` Christian Brauner
  0 siblings, 0 replies; 9+ messages in thread
From: Christian Brauner @ 2023-06-21 14:25 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Al Viro, Hugh Dickins, Andrew Morton, Chuck Lever, Jeff Layton,
	linux-mm, linux-fsdevel

On Wed, Jun 21, 2023 at 01:12:46PM +0000, Chuck Lever III wrote:
> 
> 
> > On Jun 6, 2023, at 9:10 AM, Chuck Lever <cel@kernel.org> wrote:
> > 
> > The following series is for continued discussion of the need for
> > and implementation of stable directory offsets for shmemfs/tmpfs.
> > 
> > As discussed in Vancouver, I've re-implemented this change in libfs
> > so that other "simple" filesystems can use it. There were a few
> > other suggestions made during that event that I haven't tried yet.
> > 
> > Changes since v2:
> > - Move bulk of stable offset support into fs/libfs.c
> > - Replace xa_find_after with xas_find_next for efficiency
> > 
> > Changes since v1:
> > - Break the single patch up into a series
> > 
> > Changes since RFC:
> > - Destroy xarray in shmem_destroy_inode() instead of free_in_core_inode()
> > - A few cosmetic updates
> > 
> > ---
> > 
> > Chuck Lever (3):
> >      libfs: Add directory operations for stable offsets
> >      shmem: Refactor shmem_symlink()
> >      shmem: stable directory offsets
> > 
> > 
> > fs/dcache.c            |   1 +
> > fs/libfs.c             | 185 +++++++++++++++++++++++++++++++++++++++++
> > include/linux/dcache.h |   1 +
> > include/linux/fs.h     |   9 ++
> > mm/shmem.c             |  58 +++++++++----
> > 5 files changed, 240 insertions(+), 14 deletions(-)
> 
> The good news is that so far I have received no complaints from bots
> on this series.
> 
> The bad news is I have received no human comments. Ping?

I haven't gotten around to reviewing this yet but it is still on my
radar. We should plan to get this done for v6.6. I'll aim to review
during the merge window. Sorry for the delay.


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] libfs: Add directory operations for stable offsets
       [not found] ` <168605705924.32244.13384849924097654559.stgit@manet.1015granger.net>
@ 2023-06-23 22:21   ` Bernd Schubert
  2023-06-26 13:36     ` Bernd Schubert
  0 siblings, 1 reply; 9+ messages in thread
From: Bernd Schubert @ 2023-06-23 22:21 UTC (permalink / raw)
  To: Chuck Lever, viro, brauner, hughd, akpm
  Cc: Chuck Lever, linux-mm, linux-fsdevel



On 6/6/23 15:10, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Create a vector of directory operations in fs/libfs.c that handles
> directory seeks and readdir via stable offsets instead of the
> current cursor-based mechanism.
> 
> For the moment these are unused.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   fs/dcache.c            |    1
>   fs/libfs.c             |  185 ++++++++++++++++++++++++++++++++++++++++++++++++
>   include/linux/dcache.h |    1
>   include/linux/fs.h     |    9 ++
>   4 files changed, 196 insertions(+)
> 
> diff --git a/fs/dcache.c b/fs/dcache.c
> index 52e6d5fdab6b..9c9a801f3b33 100644
> --- a/fs/dcache.c
> +++ b/fs/dcache.c
> @@ -1813,6 +1813,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>   	dentry->d_sb = sb;
>   	dentry->d_op = NULL;
>   	dentry->d_fsdata = NULL;
> +	dentry->d_offset = 0;
>   	INIT_HLIST_BL_NODE(&dentry->d_hash);
>   	INIT_LIST_HEAD(&dentry->d_lru);
>   	INIT_LIST_HEAD(&dentry->d_subdirs);
> diff --git a/fs/libfs.c b/fs/libfs.c
> index 89cf614a3271..07317bbe1668 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -239,6 +239,191 @@ const struct inode_operations simple_dir_inode_operations = {
>   };
>   EXPORT_SYMBOL(simple_dir_inode_operations);
>   
> +/**
> + * stable_offset_init - initialize a parent directory
> + * @dir: parent directory to be initialized
> + *
> + */
> +void stable_offset_init(struct inode *dir)
> +{
> +	xa_init_flags(&dir->i_doff_map, XA_FLAGS_ALLOC1);
> +	dir->i_next_offset = 0;
> +}
> +EXPORT_SYMBOL(stable_offset_init);
> +
> +/**
> + * stable_offset_add - Add an entry to a directory's stable offset map
> + * @dir: parent directory being modified
> + * @dentry: new dentry being added
> + *
> + * Returns zero on success. Otherwise, a negative errno value is returned.
> + */
> +int stable_offset_add(struct inode *dir, struct dentry *dentry)
> +{
> +	struct xa_limit limit = XA_LIMIT(2, U32_MAX);
> +	u32 offset = 0;
> +	int ret;
> +
> +	if (dentry->d_offset)
> +		return -EBUSY;
> +
> +	ret = xa_alloc_cyclic(&dir->i_doff_map, &offset, dentry, limit,
> +			      &dir->i_next_offset, GFP_KERNEL);

Please see below at struct inode my question about i_next_offset.

> +	if (ret < 0)
> +		return ret;
> +
> +	dentry->d_offset = offset;
> +	return 0;
> +}
> +EXPORT_SYMBOL(stable_offset_add);
> +
> +/**
> + * stable_offset_remove - Remove an entry to a directory's stable offset map
> + * @dir: parent directory being modified
> + * @dentry: dentry being removed
> + *
> + */
> +void stable_offset_remove(struct inode *dir, struct dentry *dentry)
> +{
> +	if (!dentry->d_offset)
> +		return;
> +
> +	xa_erase(&dir->i_doff_map, dentry->d_offset);
> +	dentry->d_offset = 0;
> +}
> +EXPORT_SYMBOL(stable_offset_remove);
> +
> +/**
> + * stable_offset_destroy - Release offset map
> + * @dir: parent directory that is about to be destroyed
> + *
> + * During fs teardown (eg. umount), a directory's offset map might still
> + * contain entries. xa_destroy() cleans out anything that remains.
> + */
> +void stable_offset_destroy(struct inode *dir)
> +{
> +	xa_destroy(&dir->i_doff_map);
> +}
> +EXPORT_SYMBOL(stable_offset_destroy);
> +
> +/**
> + * stable_dir_llseek - Advance the read position of a directory descriptor
> + * @file: an open directory whose position is to be updated
> + * @offset: a byte offset
> + * @whence: enumerator describing the starting position for this update
> + *
> + * SEEK_END, SEEK_DATA, and SEEK_HOLE are not supported for directories.
> + *
> + * Returns the updated read position if successful; otherwise a
> + * negative errno is returned and the read position remains unchanged.
> + */
> +static loff_t stable_dir_llseek(struct file *file, loff_t offset, int whence)
> +{
> +	switch (whence) {
> +	case SEEK_CUR:
> +		offset += file->f_pos;
> +		fallthrough;
> +	case SEEK_SET:
> +		if (offset >= 0)
> +			break;
> +		fallthrough;
> +	default:
> +		return -EINVAL;
> +	}
> +
> +	return vfs_setpos(file, offset, U32_MAX);
> +}
> +
> +static struct dentry *stable_find_next(struct xa_state *xas)
> +{
> +	struct dentry *child, *found = NULL;
> +
> +	rcu_read_lock();
> +	child = xas_next_entry(xas, U32_MAX);
> +	if (!child)
> +		goto out;
> +	spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
> +	if (simple_positive(child))
> +		found = dget_dlock(child);
> +	spin_unlock(&child->d_lock);
> +out:
> +	rcu_read_unlock();
> +	return found;
> +}
> +
> +static bool stable_dir_emit(struct dir_context *ctx, struct dentry *dentry)
> +{
> +	struct inode *inode = d_inode(dentry);
> +
> +	return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len,
> +			  dentry->d_offset, inode->i_ino,
> +			  fs_umode_to_dtype(inode->i_mode));
> +}
> +
> +static void stable_iterate_dir(struct dentry *dir, struct dir_context *ctx)
> +{
> +	XA_STATE(xas, &((d_inode(dir))->i_doff_map), ctx->pos);
> +	struct dentry *dentry;
> +
> +	while (true) {
> +		spin_lock(&dir->d_lock);
> +		dentry = stable_find_next(&xas);
> +		spin_unlock(&dir->d_lock);
> +		if (!dentry)
> +			break;
> +
> +		if (!stable_dir_emit(ctx, dentry)) {
> +			dput(dentry);
> +			break;
> +		}
> +
> +		dput(dentry);
> +		ctx->pos = xas.xa_index + 1;
> +	}
> +}
> +
> +/**
> + * stable_readdir - Emit entries starting at offset @ctx->pos
> + * @file: an open directory to iterate over
> + * @ctx: directory iteration context
> + *
> + * Caller must hold @file's i_rwsem to prevent insertion or removal of
> + * entries during this call.
> + *
> + * On entry, @ctx->pos contains an offset that represents the first entry
> + * to be read from the directory.
> + *
> + * The operation continues until there are no more entries to read, or
> + * until the ctx->actor indicates there is no more space in the caller's
> + * output buffer.
> + *
> + * On return, @ctx->pos contains an offset that will read the next entry
> + * in this directory when shmem_readdir() is called again with @ctx.
> + *
> + * Return values:
> + *   %0 - Complete
> + */
> +static int stable_readdir(struct file *file, struct dir_context *ctx)
> +{
> +	struct dentry *dir = file->f_path.dentry;
> +
> +	lockdep_assert_held(&d_inode(dir)->i_rwsem);
> +
> +	if (!dir_emit_dots(file, ctx))
> +		return 0;
> +
> +	stable_iterate_dir(dir, ctx);
> +	return 0;
> +}
> +
> +const struct file_operations stable_dir_operations = {
> +	.llseek		= stable_dir_llseek,
> +	.iterate_shared	= stable_readdir,
> +	.read		= generic_read_dir,
> +	.fsync		= noop_fsync,
> +};
> +EXPORT_SYMBOL(stable_dir_operations);
> +
>   static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev)
>   {
>   	struct dentry *child = NULL;
> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
> index 6b351e009f59..579ce1800efe 100644
> --- a/include/linux/dcache.h
> +++ b/include/linux/dcache.h
> @@ -96,6 +96,7 @@ struct dentry {
>   	struct super_block *d_sb;	/* The root of the dentry tree */
>   	unsigned long d_time;		/* used by d_revalidate */
>   	void *d_fsdata;			/* fs-specific data */
> +	u32 d_offset;			/* directory offset in parent */
>   
>   	union {
>   		struct list_head d_lru;		/* LRU list */
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 133f0640fb24..3fc2c04ed8ff 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -719,6 +719,10 @@ struct inode {
>   #endif
>   
>   	void			*i_private; /* fs or device private pointer */
> +
> +	/* simplefs stable directory offset tracking */
> +	struct xarray		i_doff_map;
> +	u32			i_next_offset;

Hmm, I was grepping through the patches and only find that 
"i_next_offset" is initialized to 0 and then passed to xa_alloc_cyclic - 
does this really need to part of struct inode or could it be a local 
variable in stable_offset_add()?

I only managed to look a bit through the patches right now, personally I 
like v2 better as it doesn't extend struct inode with changes that can 
be used by in-memory file system only. What do others think? An 
alternative would be to have these fields in struct shmem_inode_info and 
pass it as extra argument to the stable_ functions?


Thanks,
Bernd


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] shmem: stable directory offsets
       [not found] ` <168605707262.32244.4794425063054676856.stgit@manet.1015granger.net>
@ 2023-06-26 13:18   ` Bernd Schubert
  2023-06-26 15:16     ` Chuck Lever III
  0 siblings, 1 reply; 9+ messages in thread
From: Bernd Schubert @ 2023-06-26 13:18 UTC (permalink / raw)
  To: Chuck Lever, viro, brauner, hughd, akpm
  Cc: Chuck Lever, linux-mm, linux-fsdevel



On 6/6/23 15:11, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> The current cursor-based directory offset mechanism doesn't work
> when a tmpfs filesystem is exported via NFS. This is because NFS
> clients do not open directories. Each server-side READDIR operation
> has to open the directory, read it, then close it. The cursor state
> for that directory, being associated strictly with the opened
> struct file, is thus discarded after each NFS READDIR operation.
> 
> Directory offsets are cached not only by NFS clients, but also by
> user space libraries on those clients. Essentially there is no way
> to invalidate those caches when directory offsets have changed on
> an NFS server after the offset-to-dentry mapping changes. Thus the
> whole application stack depends on unchanging directory offsets.
> 
> The solution we've come up with is to make the directory offset for
> each file in a tmpfs filesystem stable for the life of the directory
> entry it represents.
> 
> shmem_readdir() and shmem_dir_llseek() now use an xarray to map each
> directory offset (an loff_t integer) to the memory address of a
> struct dentry.
> 
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   mm/shmem.c |   39 +++++++++++++++++++++++++++++++++++----
>   1 file changed, 35 insertions(+), 4 deletions(-)
> 
> diff --git a/mm/shmem.c b/mm/shmem.c
> index 721f9fd064aa..fd9571056181 100644
> --- a/mm/shmem.c
> +++ b/mm/shmem.c
> @@ -2410,7 +2410,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
>   			/* Some things misbehave if size == 0 on a directory */
>   			inode->i_size = 2 * BOGO_DIRENT_SIZE;
>   			inode->i_op = &shmem_dir_inode_operations;
> -			inode->i_fop = &simple_dir_operations;
> +			inode->i_fop = &stable_dir_operations;
> +			stable_offset_init(inode);
>   			break;
>   		case S_IFLNK:
>   			/*
> @@ -2950,6 +2951,10 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
>   		if (error && error != -EOPNOTSUPP)
>   			goto out_iput;
>   
> +		error = stable_offset_add(dir, dentry);
> +		if (error)
> +			goto out_iput;
> +
>   		error = 0;

This line can be removed?

>   		dir->i_size += BOGO_DIRENT_SIZE;
>   		dir->i_ctime = dir->i_mtime = current_time(dir);
> @@ -3027,6 +3032,10 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
>   			goto out;
>   	}
>   
> +	ret = stable_offset_add(dir, dentry);
> +	if (ret)
> +		goto out;
> +

I think this should call shmem_free_inode() before goto out - reverse 
what shmem_reserve_inode() has done.

>   	dir->i_size += BOGO_DIRENT_SIZE;
>   	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
>   	inode_inc_iversion(dir);
> @@ -3045,6 +3054,8 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
>   	if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
>   		shmem_free_inode(inode->i_sb);
>   
> +	stable_offset_remove(dir, dentry);
> +
>   	dir->i_size -= BOGO_DIRENT_SIZE;
>   	inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
>   	inode_inc_iversion(dir);
> @@ -3103,24 +3114,37 @@ static int shmem_rename2(struct mnt_idmap *idmap,
>   {
>   	struct inode *inode = d_inode(old_dentry);
>   	int they_are_dirs = S_ISDIR(inode->i_mode);
> +	int error;
>   
>   	if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
>   		return -EINVAL;
>   
> -	if (flags & RENAME_EXCHANGE)
> +	if (flags & RENAME_EXCHANGE) {
> +		stable_offset_remove(old_dir, old_dentry);
> +		stable_offset_remove(new_dir, new_dentry);
> +		error = stable_offset_add(new_dir, old_dentry);
> +		if (error)
> +			return error;
> +		error = stable_offset_add(old_dir, new_dentry);
> +		if (error)
> +			return error;
>   		return simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
> +	}

Hmm, error handling issues? Everything needs to be reversed when any of 
the operations fails?

>   
>   	if (!simple_empty(new_dentry))
>   		return -ENOTEMPTY;
>   
>   	if (flags & RENAME_WHITEOUT) {
> -		int error;
> -
>   		error = shmem_whiteout(idmap, old_dir, old_dentry);
>   		if (error)
>   			return error;
>   	}
>   
> +	stable_offset_remove(old_dir, old_dentry);
> +	error = stable_offset_add(new_dir, old_dentry);
> +	if (error)
> +		return error;
> +
>   	if (d_really_is_positive(new_dentry)) {
>   		(void) shmem_unlink(new_dir, new_dentry);
>   		if (they_are_dirs) {
> @@ -3185,6 +3209,11 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
>   		folio_unlock(folio);
>   		folio_put(folio);
>   	}
> +
> +	error = stable_offset_add(dir, dentry);
> +	if (error)
> +		goto out_iput;
> +

Error handling, there is a kmemdup() above which needs to be freed? I'm 
not sure about folio, automatically released with the inode?

>   	dir->i_size += BOGO_DIRENT_SIZE;
>   	dir->i_ctime = dir->i_mtime = current_time(dir);
>   	inode_inc_iversion(dir);
> @@ -3920,6 +3949,8 @@ static void shmem_destroy_inode(struct inode *inode)
>   {
>   	if (S_ISREG(inode->i_mode))
>   		mpol_free_shared_policy(&SHMEM_I(inode)->policy);
> +	if (S_ISDIR(inode->i_mode))
> +		stable_offset_destroy(inode);
>   }
>   
>   static void shmem_init_inode(void *foo)
> 
> 

Thanks,
Bernd


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] libfs: Add directory operations for stable offsets
  2023-06-23 22:21   ` [PATCH v3 1/3] libfs: Add directory operations for stable offsets Bernd Schubert
@ 2023-06-26 13:36     ` Bernd Schubert
  2023-06-26 14:50       ` Chuck Lever III
  2023-06-26 15:22       ` Bernd Schubert
  0 siblings, 2 replies; 9+ messages in thread
From: Bernd Schubert @ 2023-06-26 13:36 UTC (permalink / raw)
  To: Chuck Lever, viro, brauner, hughd, akpm
  Cc: Chuck Lever, linux-mm, linux-fsdevel



On 6/24/23 00:21, Bernd Schubert wrote:
> 
> 
> On 6/6/23 15:10, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>>
>> Create a vector of directory operations in fs/libfs.c that handles
>> directory seeks and readdir via stable offsets instead of the
>> current cursor-based mechanism.
>>
>> For the moment these are unused.
>>
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>   fs/dcache.c            |    1
>>   fs/libfs.c             |  185 
>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>   include/linux/dcache.h |    1
>>   include/linux/fs.h     |    9 ++
>>   4 files changed, 196 insertions(+)
>>
>> diff --git a/fs/dcache.c b/fs/dcache.c
>> index 52e6d5fdab6b..9c9a801f3b33 100644
>> --- a/fs/dcache.c
>> +++ b/fs/dcache.c
>> @@ -1813,6 +1813,7 @@ static struct dentry *__d_alloc(struct 
>> super_block *sb, const struct qstr *name)
>>       dentry->d_sb = sb;
>>       dentry->d_op = NULL;
>>       dentry->d_fsdata = NULL;
>> +    dentry->d_offset = 0;
>>       INIT_HLIST_BL_NODE(&dentry->d_hash);
>>       INIT_LIST_HEAD(&dentry->d_lru);
>>       INIT_LIST_HEAD(&dentry->d_subdirs);
>> diff --git a/fs/libfs.c b/fs/libfs.c
>> index 89cf614a3271..07317bbe1668 100644
>> --- a/fs/libfs.c
>> +++ b/fs/libfs.c
>> @@ -239,6 +239,191 @@ const struct inode_operations 
>> simple_dir_inode_operations = {
>>   };
>>   EXPORT_SYMBOL(simple_dir_inode_operations);
>> +/**
>> + * stable_offset_init - initialize a parent directory
>> + * @dir: parent directory to be initialized
>> + *
>> + */
>> +void stable_offset_init(struct inode *dir)
>> +{
>> +    xa_init_flags(&dir->i_doff_map, XA_FLAGS_ALLOC1);
>> +    dir->i_next_offset = 0;
>> +}
>> +EXPORT_SYMBOL(stable_offset_init);
>> +
>> +/**
>> + * stable_offset_add - Add an entry to a directory's stable offset map
>> + * @dir: parent directory being modified
>> + * @dentry: new dentry being added
>> + *
>> + * Returns zero on success. Otherwise, a negative errno value is 
>> returned.
>> + */
>> +int stable_offset_add(struct inode *dir, struct dentry *dentry)
>> +{
>> +    struct xa_limit limit = XA_LIMIT(2, U32_MAX);
>> +    u32 offset = 0;
>> +    int ret;
>> +
>> +    if (dentry->d_offset)
>> +        return -EBUSY;
>> +
>> +    ret = xa_alloc_cyclic(&dir->i_doff_map, &offset, dentry, limit,
>> +                  &dir->i_next_offset, GFP_KERNEL);
> 
> Please see below at struct inode my question about i_next_offset.
> 
>> +    if (ret < 0)
>> +        return ret;
>> +
>> +    dentry->d_offset = offset;
>> +    return 0;
>> +}
>> +EXPORT_SYMBOL(stable_offset_add);
>> +
>> +/**
>> + * stable_offset_remove - Remove an entry to a directory's stable 
>> offset map
>> + * @dir: parent directory being modified
>> + * @dentry: dentry being removed
>> + *
>> + */
>> +void stable_offset_remove(struct inode *dir, struct dentry *dentry)
>> +{
>> +    if (!dentry->d_offset)
>> +        return;
>> +
>> +    xa_erase(&dir->i_doff_map, dentry->d_offset);
>> +    dentry->d_offset = 0;
>> +}
>> +EXPORT_SYMBOL(stable_offset_remove);
>> +
>> +/**
>> + * stable_offset_destroy - Release offset map
>> + * @dir: parent directory that is about to be destroyed
>> + *
>> + * During fs teardown (eg. umount), a directory's offset map might still
>> + * contain entries. xa_destroy() cleans out anything that remains.
>> + */
>> +void stable_offset_destroy(struct inode *dir)
>> +{
>> +    xa_destroy(&dir->i_doff_map);
>> +}
>> +EXPORT_SYMBOL(stable_offset_destroy);
>> +
>> +/**
>> + * stable_dir_llseek - Advance the read position of a directory 
>> descriptor
>> + * @file: an open directory whose position is to be updated
>> + * @offset: a byte offset
>> + * @whence: enumerator describing the starting position for this update
>> + *
>> + * SEEK_END, SEEK_DATA, and SEEK_HOLE are not supported for directories.
>> + *
>> + * Returns the updated read position if successful; otherwise a
>> + * negative errno is returned and the read position remains unchanged.
>> + */
>> +static loff_t stable_dir_llseek(struct file *file, loff_t offset, int 
>> whence)
>> +{
>> +    switch (whence) {
>> +    case SEEK_CUR:
>> +        offset += file->f_pos;
>> +        fallthrough;
>> +    case SEEK_SET:
>> +        if (offset >= 0)
>> +            break;
>> +        fallthrough;
>> +    default:
>> +        return -EINVAL;
>> +    }
>> +
>> +    return vfs_setpos(file, offset, U32_MAX);
>> +}
>> +
>> +static struct dentry *stable_find_next(struct xa_state *xas)
>> +{
>> +    struct dentry *child, *found = NULL;
>> +
>> +    rcu_read_lock();
>> +    child = xas_next_entry(xas, U32_MAX);
>> +    if (!child)
>> +        goto out;
>> +    spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
>> +    if (simple_positive(child))
>> +        found = dget_dlock(child);
>> +    spin_unlock(&child->d_lock);
>> +out:
>> +    rcu_read_unlock();
>> +    return found;
>> +}

I wonder if this should try the next dentry when simple_positive() 
returns false, what is if there is a readdir/unlink race? readdir now 
abort in the middle instead of continuing with the next dentry?

>> +
>> +static bool stable_dir_emit(struct dir_context *ctx, struct dentry 
>> *dentry)
>> +{
>> +    struct inode *inode = d_inode(dentry);
>> +
>> +    return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len,
>> +              dentry->d_offset, inode->i_ino,
>> +              fs_umode_to_dtype(inode->i_mode));
>> +}
>> +
>> +static void stable_iterate_dir(struct dentry *dir, struct dir_context 
>> *ctx)
>> +{
>> +    XA_STATE(xas, &((d_inode(dir))->i_doff_map), ctx->pos);
>> +    struct dentry *dentry;
>> +
>> +    while (true) {
>> +        spin_lock(&dir->d_lock);
>> +        dentry = stable_find_next(&xas);
>> +        spin_unlock(&dir->d_lock);
>> +        if (!dentry)
>> +            break;
>> +
>> +        if (!stable_dir_emit(ctx, dentry)) {
>> +            dput(dentry);
>> +            break;
>> +        }
>> +
>> +        dput(dentry);
>> +        ctx->pos = xas.xa_index + 1;
>> +    }
>> +}
>> +
>> +/**
>> + * stable_readdir - Emit entries starting at offset @ctx->pos
>> + * @file: an open directory to iterate over
>> + * @ctx: directory iteration context
>> + *
>> + * Caller must hold @file's i_rwsem to prevent insertion or removal of
>> + * entries during this call.
>> + *
>> + * On entry, @ctx->pos contains an offset that represents the first 
>> entry
>> + * to be read from the directory.
>> + *
>> + * The operation continues until there are no more entries to read, or
>> + * until the ctx->actor indicates there is no more space in the caller's
>> + * output buffer.
>> + *
>> + * On return, @ctx->pos contains an offset that will read the next entry
>> + * in this directory when shmem_readdir() is called again with @ctx.
>> + *
>> + * Return values:
>> + *   %0 - Complete
>> + */
>> +static int stable_readdir(struct file *file, struct dir_context *ctx)
>> +{
>> +    struct dentry *dir = file->f_path.dentry;
>> +
>> +    lockdep_assert_held(&d_inode(dir)->i_rwsem);
>> +
>> +    if (!dir_emit_dots(file, ctx))
>> +        return 0;
>> +
>> +    stable_iterate_dir(dir, ctx);
>> +    return 0;
>> +}
>> +
>> +const struct file_operations stable_dir_operations = {
>> +    .llseek        = stable_dir_llseek,
>> +    .iterate_shared    = stable_readdir,
>> +    .read        = generic_read_dir,
>> +    .fsync        = noop_fsync,
>> +};
>> +EXPORT_SYMBOL(stable_dir_operations);
>> +
>>   static struct dentry *find_next_child(struct dentry *parent, struct 
>> dentry *prev)
>>   {
>>       struct dentry *child = NULL;
>> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
>> index 6b351e009f59..579ce1800efe 100644
>> --- a/include/linux/dcache.h
>> +++ b/include/linux/dcache.h
>> @@ -96,6 +96,7 @@ struct dentry {
>>       struct super_block *d_sb;    /* The root of the dentry tree */
>>       unsigned long d_time;        /* used by d_revalidate */
>>       void *d_fsdata;            /* fs-specific data */
>> +    u32 d_offset;            /* directory offset in parent */
>>       union {
>>           struct list_head d_lru;        /* LRU list */
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 133f0640fb24..3fc2c04ed8ff 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -719,6 +719,10 @@ struct inode {
>>   #endif
>>       void            *i_private; /* fs or device private pointer */
>> +
>> +    /* simplefs stable directory offset tracking */
>> +    struct xarray        i_doff_map;
>> +    u32            i_next_offset;
> 
> Hmm, I was grepping through the patches and only find that 
> "i_next_offset" is initialized to 0 and then passed to xa_alloc_cyclic - 
> does this really need to part of struct inode or could it be a local 
> variable in stable_offset_add()?

Hmm, at best that is used for rename, but then is this offset right when 
old_dir and new_dir differ?

> 
> I only managed to look a bit through the patches right now, personally I 
> like v2 better as it doesn't extend struct inode with changes that can 
> be used by in-memory file system only. What do others think? An 
> alternative would be to have these fields in struct shmem_inode_info and 
> pass it as extra argument to the stable_ functions?
> 
> 
> Thanks,
> Bernd


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] libfs: Add directory operations for stable offsets
  2023-06-26 13:36     ` Bernd Schubert
@ 2023-06-26 14:50       ` Chuck Lever III
  2023-06-26 15:01         ` Bernd Schubert
  2023-06-26 15:22       ` Bernd Schubert
  1 sibling, 1 reply; 9+ messages in thread
From: Chuck Lever III @ 2023-06-26 14:50 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Chuck Lever, Al Viro, Christian Brauner, Hugh Dickins,
	Andrew Morton, linux-mm, linux-fsdevel



> On Jun 26, 2023, at 9:36 AM, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
> 
> 
> 
> On 6/24/23 00:21, Bernd Schubert wrote:
>> On 6/6/23 15:10, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>> 
>>> Create a vector of directory operations in fs/libfs.c that handles
>>> directory seeks and readdir via stable offsets instead of the
>>> current cursor-based mechanism.
>>> 
>>> For the moment these are unused.
>>> 
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>   fs/dcache.c            |    1
>>>   fs/libfs.c             |  185 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>   include/linux/dcache.h |    1
>>>   include/linux/fs.h     |    9 ++
>>>   4 files changed, 196 insertions(+)
>>> 
>>> diff --git a/fs/dcache.c b/fs/dcache.c
>>> index 52e6d5fdab6b..9c9a801f3b33 100644
>>> --- a/fs/dcache.c
>>> +++ b/fs/dcache.c
>>> @@ -1813,6 +1813,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>>>       dentry->d_sb = sb;
>>>       dentry->d_op = NULL;
>>>       dentry->d_fsdata = NULL;
>>> +    dentry->d_offset = 0;
>>>       INIT_HLIST_BL_NODE(&dentry->d_hash);
>>>       INIT_LIST_HEAD(&dentry->d_lru);
>>>       INIT_LIST_HEAD(&dentry->d_subdirs);
>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>> index 89cf614a3271..07317bbe1668 100644
>>> --- a/fs/libfs.c
>>> +++ b/fs/libfs.c
>>> @@ -239,6 +239,191 @@ const struct inode_operations simple_dir_inode_operations = {
>>>   };
>>>   EXPORT_SYMBOL(simple_dir_inode_operations);
>>> +/**
>>> + * stable_offset_init - initialize a parent directory
>>> + * @dir: parent directory to be initialized
>>> + *
>>> + */
>>> +void stable_offset_init(struct inode *dir)
>>> +{
>>> +    xa_init_flags(&dir->i_doff_map, XA_FLAGS_ALLOC1);
>>> +    dir->i_next_offset = 0;
>>> +}
>>> +EXPORT_SYMBOL(stable_offset_init);
>>> +
>>> +/**
>>> + * stable_offset_add - Add an entry to a directory's stable offset map
>>> + * @dir: parent directory being modified
>>> + * @dentry: new dentry being added
>>> + *
>>> + * Returns zero on success. Otherwise, a negative errno value is returned.
>>> + */
>>> +int stable_offset_add(struct inode *dir, struct dentry *dentry)
>>> +{
>>> +    struct xa_limit limit = XA_LIMIT(2, U32_MAX);
>>> +    u32 offset = 0;
>>> +    int ret;
>>> +
>>> +    if (dentry->d_offset)
>>> +        return -EBUSY;
>>> +
>>> +    ret = xa_alloc_cyclic(&dir->i_doff_map, &offset, dentry, limit,
>>> +                  &dir->i_next_offset, GFP_KERNEL);
>> Please see below at struct inode my question about i_next_offset.
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    dentry->d_offset = offset;
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(stable_offset_add);
>>> +
>>> +/**
>>> + * stable_offset_remove - Remove an entry to a directory's stable offset map
>>> + * @dir: parent directory being modified
>>> + * @dentry: dentry being removed
>>> + *
>>> + */
>>> +void stable_offset_remove(struct inode *dir, struct dentry *dentry)
>>> +{
>>> +    if (!dentry->d_offset)
>>> +        return;
>>> +
>>> +    xa_erase(&dir->i_doff_map, dentry->d_offset);
>>> +    dentry->d_offset = 0;
>>> +}
>>> +EXPORT_SYMBOL(stable_offset_remove);
>>> +
>>> +/**
>>> + * stable_offset_destroy - Release offset map
>>> + * @dir: parent directory that is about to be destroyed
>>> + *
>>> + * During fs teardown (eg. umount), a directory's offset map might still
>>> + * contain entries. xa_destroy() cleans out anything that remains.
>>> + */
>>> +void stable_offset_destroy(struct inode *dir)
>>> +{
>>> +    xa_destroy(&dir->i_doff_map);
>>> +}
>>> +EXPORT_SYMBOL(stable_offset_destroy);
>>> +
>>> +/**
>>> + * stable_dir_llseek - Advance the read position of a directory descriptor
>>> + * @file: an open directory whose position is to be updated
>>> + * @offset: a byte offset
>>> + * @whence: enumerator describing the starting position for this update
>>> + *
>>> + * SEEK_END, SEEK_DATA, and SEEK_HOLE are not supported for directories.
>>> + *
>>> + * Returns the updated read position if successful; otherwise a
>>> + * negative errno is returned and the read position remains unchanged.
>>> + */
>>> +static loff_t stable_dir_llseek(struct file *file, loff_t offset, int whence)
>>> +{
>>> +    switch (whence) {
>>> +    case SEEK_CUR:
>>> +        offset += file->f_pos;
>>> +        fallthrough;
>>> +    case SEEK_SET:
>>> +        if (offset >= 0)
>>> +            break;
>>> +        fallthrough;
>>> +    default:
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    return vfs_setpos(file, offset, U32_MAX);
>>> +}
>>> +
>>> +static struct dentry *stable_find_next(struct xa_state *xas)
>>> +{
>>> +    struct dentry *child, *found = NULL;
>>> +
>>> +    rcu_read_lock();
>>> +    child = xas_next_entry(xas, U32_MAX);
>>> +    if (!child)
>>> +        goto out;
>>> +    spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
>>> +    if (simple_positive(child))
>>> +        found = dget_dlock(child);
>>> +    spin_unlock(&child->d_lock);
>>> +out:
>>> +    rcu_read_unlock();
>>> +    return found;
>>> +}
>>> +
>>> +static bool stable_dir_emit(struct dir_context *ctx, struct dentry *dentry)
>>> +{
>>> +    struct inode *inode = d_inode(dentry);
>>> +
>>> +    return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len,
>>> +              dentry->d_offset, inode->i_ino,
>>> +              fs_umode_to_dtype(inode->i_mode));
>>> +}
>>> +
>>> +static void stable_iterate_dir(struct dentry *dir, struct dir_context *ctx)
>>> +{
>>> +    XA_STATE(xas, &((d_inode(dir))->i_doff_map), ctx->pos);
>>> +    struct dentry *dentry;
>>> +
>>> +    while (true) {
>>> +        spin_lock(&dir->d_lock);
>>> +        dentry = stable_find_next(&xas);
>>> +        spin_unlock(&dir->d_lock);
>>> +        if (!dentry)
>>> +            break;
>>> +
>>> +        if (!stable_dir_emit(ctx, dentry)) {
>>> +            dput(dentry);
>>> +            break;
>>> +        }
>>> +
>>> +        dput(dentry);
>>> +        ctx->pos = xas.xa_index + 1;
>>> +    }
>>> +}
>>> +
>>> +/**
>>> + * stable_readdir - Emit entries starting at offset @ctx->pos
>>> + * @file: an open directory to iterate over
>>> + * @ctx: directory iteration context
>>> + *
>>> + * Caller must hold @file's i_rwsem to prevent insertion or removal of
>>> + * entries during this call.
>>> + *
>>> + * On entry, @ctx->pos contains an offset that represents the first entry
>>> + * to be read from the directory.
>>> + *
>>> + * The operation continues until there are no more entries to read, or
>>> + * until the ctx->actor indicates there is no more space in the caller's
>>> + * output buffer.
>>> + *
>>> + * On return, @ctx->pos contains an offset that will read the next entry
>>> + * in this directory when shmem_readdir() is called again with @ctx.
>>> + *
>>> + * Return values:
>>> + *   %0 - Complete
>>> + */
>>> +static int stable_readdir(struct file *file, struct dir_context *ctx)
>>> +{
>>> +    struct dentry *dir = file->f_path.dentry;
>>> +
>>> +    lockdep_assert_held(&d_inode(dir)->i_rwsem);
>>> +
>>> +    if (!dir_emit_dots(file, ctx))
>>> +        return 0;
>>> +
>>> +    stable_iterate_dir(dir, ctx);
>>> +    return 0;
>>> +}
>>> +
>>> +const struct file_operations stable_dir_operations = {
>>> +    .llseek        = stable_dir_llseek,
>>> +    .iterate_shared    = stable_readdir,
>>> +    .read        = generic_read_dir,
>>> +    .fsync        = noop_fsync,
>>> +};
>>> +EXPORT_SYMBOL(stable_dir_operations);
>>> +
>>>   static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev)
>>>   {
>>>       struct dentry *child = NULL;
>>> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
>>> index 6b351e009f59..579ce1800efe 100644
>>> --- a/include/linux/dcache.h
>>> +++ b/include/linux/dcache.h
>>> @@ -96,6 +96,7 @@ struct dentry {
>>>       struct super_block *d_sb;    /* The root of the dentry tree */
>>>       unsigned long d_time;        /* used by d_revalidate */
>>>       void *d_fsdata;            /* fs-specific data */
>>> +    u32 d_offset;            /* directory offset in parent */
>>>       union {
>>>           struct list_head d_lru;        /* LRU list */
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index 133f0640fb24..3fc2c04ed8ff 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -719,6 +719,10 @@ struct inode {
>>>   #endif
>>>       void            *i_private; /* fs or device private pointer */
>>> +
>>> +    /* simplefs stable directory offset tracking */
>>> +    struct xarray        i_doff_map;
>>> +    u32            i_next_offset;
>> Hmm, I was grepping through the patches and only find that "i_next_offset" is initialized to 0 and then passed to xa_alloc_cyclic - does this really need to part of struct inode or could it be a local variable in stable_offset_add()?

This is a per-directory value so that each directory can use
the full range of offsets (from zero to UINT_MAX). If there were
only a single next_offset field, then all tmpfs directories
would share the offset range, which is not scalable (not to
mention, it would also be unlike the behavior of other
filesystems).

Yes, we could move this to the shmem-private part of the inode,
but then the API gets a little uglier.


>> I only managed to look a bit through the patches right now, personally I like v2 better as it doesn't extend struct inode with changes that can be used by in-memory file system only. What do others think? An alternative would be to have these fields in struct shmem_inode_info and pass it as extra argument to the stable_ functions?


--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] libfs: Add directory operations for stable offsets
  2023-06-26 14:50       ` Chuck Lever III
@ 2023-06-26 15:01         ` Bernd Schubert
  0 siblings, 0 replies; 9+ messages in thread
From: Bernd Schubert @ 2023-06-26 15:01 UTC (permalink / raw)
  To: Chuck Lever III
  Cc: Chuck Lever, Al Viro, Christian Brauner, Hugh Dickins,
	Andrew Morton, linux-mm, linux-fsdevel



On 6/26/23 16:50, Chuck Lever III wrote:
> 
> 
>> On Jun 26, 2023, at 9:36 AM, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
>>
>>
>>
>> On 6/24/23 00:21, Bernd Schubert wrote:
>>> On 6/6/23 15:10, Chuck Lever wrote:
>>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>>
>>>> Create a vector of directory operations in fs/libfs.c that handles
>>>> directory seeks and readdir via stable offsets instead of the
>>>> current cursor-based mechanism.
>>>>
>>>> For the moment these are unused.
>>>>
>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>> ---
>>>>    fs/dcache.c            |    1
>>>>    fs/libfs.c             |  185 ++++++++++++++++++++++++++++++++++++++++++++++++
>>>>    include/linux/dcache.h |    1
>>>>    include/linux/fs.h     |    9 ++
>>>>    4 files changed, 196 insertions(+)
>>>>
>>>> diff --git a/fs/dcache.c b/fs/dcache.c
>>>> index 52e6d5fdab6b..9c9a801f3b33 100644
>>>> --- a/fs/dcache.c
>>>> +++ b/fs/dcache.c
>>>> @@ -1813,6 +1813,7 @@ static struct dentry *__d_alloc(struct super_block *sb, const struct qstr *name)
>>>>        dentry->d_sb = sb;
>>>>        dentry->d_op = NULL;
>>>>        dentry->d_fsdata = NULL;
>>>> +    dentry->d_offset = 0;
>>>>        INIT_HLIST_BL_NODE(&dentry->d_hash);
>>>>        INIT_LIST_HEAD(&dentry->d_lru);
>>>>        INIT_LIST_HEAD(&dentry->d_subdirs);
>>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>>> index 89cf614a3271..07317bbe1668 100644
>>>> --- a/fs/libfs.c
>>>> +++ b/fs/libfs.c
>>>> @@ -239,6 +239,191 @@ const struct inode_operations simple_dir_inode_operations = {
>>>>    };
>>>>    EXPORT_SYMBOL(simple_dir_inode_operations);
>>>> +/**
>>>> + * stable_offset_init - initialize a parent directory
>>>> + * @dir: parent directory to be initialized
>>>> + *
>>>> + */
>>>> +void stable_offset_init(struct inode *dir)
>>>> +{
>>>> +    xa_init_flags(&dir->i_doff_map, XA_FLAGS_ALLOC1);
>>>> +    dir->i_next_offset = 0;
>>>> +}
>>>> +EXPORT_SYMBOL(stable_offset_init);
>>>> +
>>>> +/**
>>>> + * stable_offset_add - Add an entry to a directory's stable offset map
>>>> + * @dir: parent directory being modified
>>>> + * @dentry: new dentry being added
>>>> + *
>>>> + * Returns zero on success. Otherwise, a negative errno value is returned.
>>>> + */
>>>> +int stable_offset_add(struct inode *dir, struct dentry *dentry)
>>>> +{
>>>> +    struct xa_limit limit = XA_LIMIT(2, U32_MAX);
>>>> +    u32 offset = 0;
>>>> +    int ret;
>>>> +
>>>> +    if (dentry->d_offset)
>>>> +        return -EBUSY;
>>>> +
>>>> +    ret = xa_alloc_cyclic(&dir->i_doff_map, &offset, dentry, limit,
>>>> +                  &dir->i_next_offset, GFP_KERNEL);
>>> Please see below at struct inode my question about i_next_offset.
>>>> +    if (ret < 0)
>>>> +        return ret;
>>>> +
>>>> +    dentry->d_offset = offset;
>>>> +    return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(stable_offset_add);
>>>> +
>>>> +/**
>>>> + * stable_offset_remove - Remove an entry to a directory's stable offset map
>>>> + * @dir: parent directory being modified
>>>> + * @dentry: dentry being removed
>>>> + *
>>>> + */
>>>> +void stable_offset_remove(struct inode *dir, struct dentry *dentry)
>>>> +{
>>>> +    if (!dentry->d_offset)
>>>> +        return;
>>>> +
>>>> +    xa_erase(&dir->i_doff_map, dentry->d_offset);
>>>> +    dentry->d_offset = 0;
>>>> +}
>>>> +EXPORT_SYMBOL(stable_offset_remove);
>>>> +
>>>> +/**
>>>> + * stable_offset_destroy - Release offset map
>>>> + * @dir: parent directory that is about to be destroyed
>>>> + *
>>>> + * During fs teardown (eg. umount), a directory's offset map might still
>>>> + * contain entries. xa_destroy() cleans out anything that remains.
>>>> + */
>>>> +void stable_offset_destroy(struct inode *dir)
>>>> +{
>>>> +    xa_destroy(&dir->i_doff_map);
>>>> +}
>>>> +EXPORT_SYMBOL(stable_offset_destroy);
>>>> +
>>>> +/**
>>>> + * stable_dir_llseek - Advance the read position of a directory descriptor
>>>> + * @file: an open directory whose position is to be updated
>>>> + * @offset: a byte offset
>>>> + * @whence: enumerator describing the starting position for this update
>>>> + *
>>>> + * SEEK_END, SEEK_DATA, and SEEK_HOLE are not supported for directories.
>>>> + *
>>>> + * Returns the updated read position if successful; otherwise a
>>>> + * negative errno is returned and the read position remains unchanged.
>>>> + */
>>>> +static loff_t stable_dir_llseek(struct file *file, loff_t offset, int whence)
>>>> +{
>>>> +    switch (whence) {
>>>> +    case SEEK_CUR:
>>>> +        offset += file->f_pos;
>>>> +        fallthrough;
>>>> +    case SEEK_SET:
>>>> +        if (offset >= 0)
>>>> +            break;
>>>> +        fallthrough;
>>>> +    default:
>>>> +        return -EINVAL;
>>>> +    }
>>>> +
>>>> +    return vfs_setpos(file, offset, U32_MAX);
>>>> +}
>>>> +
>>>> +static struct dentry *stable_find_next(struct xa_state *xas)
>>>> +{
>>>> +    struct dentry *child, *found = NULL;
>>>> +
>>>> +    rcu_read_lock();
>>>> +    child = xas_next_entry(xas, U32_MAX);
>>>> +    if (!child)
>>>> +        goto out;
>>>> +    spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
>>>> +    if (simple_positive(child))
>>>> +        found = dget_dlock(child);
>>>> +    spin_unlock(&child->d_lock);
>>>> +out:
>>>> +    rcu_read_unlock();
>>>> +    return found;
>>>> +}
>>>> +
>>>> +static bool stable_dir_emit(struct dir_context *ctx, struct dentry *dentry)
>>>> +{
>>>> +    struct inode *inode = d_inode(dentry);
>>>> +
>>>> +    return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len,
>>>> +              dentry->d_offset, inode->i_ino,
>>>> +              fs_umode_to_dtype(inode->i_mode));
>>>> +}
>>>> +
>>>> +static void stable_iterate_dir(struct dentry *dir, struct dir_context *ctx)
>>>> +{
>>>> +    XA_STATE(xas, &((d_inode(dir))->i_doff_map), ctx->pos);
>>>> +    struct dentry *dentry;
>>>> +
>>>> +    while (true) {
>>>> +        spin_lock(&dir->d_lock);
>>>> +        dentry = stable_find_next(&xas);
>>>> +        spin_unlock(&dir->d_lock);
>>>> +        if (!dentry)
>>>> +            break;
>>>> +
>>>> +        if (!stable_dir_emit(ctx, dentry)) {
>>>> +            dput(dentry);
>>>> +            break;
>>>> +        }
>>>> +
>>>> +        dput(dentry);
>>>> +        ctx->pos = xas.xa_index + 1;
>>>> +    }
>>>> +}
>>>> +
>>>> +/**
>>>> + * stable_readdir - Emit entries starting at offset @ctx->pos
>>>> + * @file: an open directory to iterate over
>>>> + * @ctx: directory iteration context
>>>> + *
>>>> + * Caller must hold @file's i_rwsem to prevent insertion or removal of
>>>> + * entries during this call.
>>>> + *
>>>> + * On entry, @ctx->pos contains an offset that represents the first entry
>>>> + * to be read from the directory.
>>>> + *
>>>> + * The operation continues until there are no more entries to read, or
>>>> + * until the ctx->actor indicates there is no more space in the caller's
>>>> + * output buffer.
>>>> + *
>>>> + * On return, @ctx->pos contains an offset that will read the next entry
>>>> + * in this directory when shmem_readdir() is called again with @ctx.
>>>> + *
>>>> + * Return values:
>>>> + *   %0 - Complete
>>>> + */
>>>> +static int stable_readdir(struct file *file, struct dir_context *ctx)
>>>> +{
>>>> +    struct dentry *dir = file->f_path.dentry;
>>>> +
>>>> +    lockdep_assert_held(&d_inode(dir)->i_rwsem);
>>>> +
>>>> +    if (!dir_emit_dots(file, ctx))
>>>> +        return 0;
>>>> +
>>>> +    stable_iterate_dir(dir, ctx);
>>>> +    return 0;
>>>> +}
>>>> +
>>>> +const struct file_operations stable_dir_operations = {
>>>> +    .llseek        = stable_dir_llseek,
>>>> +    .iterate_shared    = stable_readdir,
>>>> +    .read        = generic_read_dir,
>>>> +    .fsync        = noop_fsync,
>>>> +};
>>>> +EXPORT_SYMBOL(stable_dir_operations);
>>>> +
>>>>    static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev)
>>>>    {
>>>>        struct dentry *child = NULL;
>>>> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
>>>> index 6b351e009f59..579ce1800efe 100644
>>>> --- a/include/linux/dcache.h
>>>> +++ b/include/linux/dcache.h
>>>> @@ -96,6 +96,7 @@ struct dentry {
>>>>        struct super_block *d_sb;    /* The root of the dentry tree */
>>>>        unsigned long d_time;        /* used by d_revalidate */
>>>>        void *d_fsdata;            /* fs-specific data */
>>>> +    u32 d_offset;            /* directory offset in parent */
>>>>        union {
>>>>            struct list_head d_lru;        /* LRU list */
>>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>>> index 133f0640fb24..3fc2c04ed8ff 100644
>>>> --- a/include/linux/fs.h
>>>> +++ b/include/linux/fs.h
>>>> @@ -719,6 +719,10 @@ struct inode {
>>>>    #endif
>>>>        void            *i_private; /* fs or device private pointer */
>>>> +
>>>> +    /* simplefs stable directory offset tracking */
>>>> +    struct xarray        i_doff_map;
>>>> +    u32            i_next_offset;
>>> Hmm, I was grepping through the patches and only find that "i_next_offset" is initialized to 0 and then passed to xa_alloc_cyclic - does this really need to part of struct inode or could it be a local variable in stable_offset_add()?
> 
> This is a per-directory value so that each directory can use
> the full range of offsets (from zero to UINT_MAX). If there were
> only a single next_offset field, then all tmpfs directories
> would share the offset range, which is not scalable (not to
> mention, it would also be unlike the behavior of other
> filesystems).
> 
> Yes, we could move this to the shmem-private part of the inode,
> but then the API gets a little uglier.

Oh right, sorry, I was reading this wrong as per dentry value.

> 
> 
>>> I only managed to look a bit through the patches right now, personally I like v2 better as it doesn't extend struct inode with changes that can be used by in-memory file system only. What do others think? An alternative would be to have these fields in struct shmem_inode_info and pass it as extra argument to the stable_ functions?


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 3/3] shmem: stable directory offsets
  2023-06-26 13:18   ` [PATCH v3 3/3] shmem: stable directory offsets Bernd Schubert
@ 2023-06-26 15:16     ` Chuck Lever III
  0 siblings, 0 replies; 9+ messages in thread
From: Chuck Lever III @ 2023-06-26 15:16 UTC (permalink / raw)
  To: Bernd Schubert
  Cc: Chuck Lever, Al Viro, brauner, hughd, Andrew Morton, linux-mm,
	linux-fsdevel



> On Jun 26, 2023, at 9:18 AM, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
> 
> 
> 
> On 6/6/23 15:11, Chuck Lever wrote:
>> From: Chuck Lever <chuck.lever@oracle.com>
>> The current cursor-based directory offset mechanism doesn't work
>> when a tmpfs filesystem is exported via NFS. This is because NFS
>> clients do not open directories. Each server-side READDIR operation
>> has to open the directory, read it, then close it. The cursor state
>> for that directory, being associated strictly with the opened
>> struct file, is thus discarded after each NFS READDIR operation.
>> Directory offsets are cached not only by NFS clients, but also by
>> user space libraries on those clients. Essentially there is no way
>> to invalidate those caches when directory offsets have changed on
>> an NFS server after the offset-to-dentry mapping changes. Thus the
>> whole application stack depends on unchanging directory offsets.
>> The solution we've come up with is to make the directory offset for
>> each file in a tmpfs filesystem stable for the life of the directory
>> entry it represents.
>> shmem_readdir() and shmem_dir_llseek() now use an xarray to map each
>> directory offset (an loff_t integer) to the memory address of a
>> struct dentry.
>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>> ---
>>  mm/shmem.c |   39 +++++++++++++++++++++++++++++++++++----
>>  1 file changed, 35 insertions(+), 4 deletions(-)
>> diff --git a/mm/shmem.c b/mm/shmem.c
>> index 721f9fd064aa..fd9571056181 100644
>> --- a/mm/shmem.c
>> +++ b/mm/shmem.c
>> @@ -2410,7 +2410,8 @@ static struct inode *shmem_get_inode(struct mnt_idmap *idmap, struct super_block
>>   /* Some things misbehave if size == 0 on a directory */
>>   inode->i_size = 2 * BOGO_DIRENT_SIZE;
>>   inode->i_op = &shmem_dir_inode_operations;
>> - inode->i_fop = &simple_dir_operations;
>> + inode->i_fop = &stable_dir_operations;
>> + stable_offset_init(inode);
>>   break;
>>   case S_IFLNK:
>>   /*
>> @@ -2950,6 +2951,10 @@ shmem_mknod(struct mnt_idmap *idmap, struct inode *dir,
>>   if (error && error != -EOPNOTSUPP)
>>   goto out_iput;
>>  + error = stable_offset_add(dir, dentry);
>> + if (error)
>> + goto out_iput;
>> +
>>   error = 0;
> 
> This line can be removed?
> 
>>   dir->i_size += BOGO_DIRENT_SIZE;
>>   dir->i_ctime = dir->i_mtime = current_time(dir);
>> @@ -3027,6 +3032,10 @@ static int shmem_link(struct dentry *old_dentry, struct inode *dir, struct dentr
>>   goto out;
>>   }
>>  + ret = stable_offset_add(dir, dentry);
>> + if (ret)
>> + goto out;
>> +
> 
> I think this should call shmem_free_inode() before goto out - reverse what shmem_reserve_inode() has done.
> 
>>   dir->i_size += BOGO_DIRENT_SIZE;
>>   inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
>>   inode_inc_iversion(dir);
>> @@ -3045,6 +3054,8 @@ static int shmem_unlink(struct inode *dir, struct dentry *dentry)
>>   if (inode->i_nlink > 1 && !S_ISDIR(inode->i_mode))
>>   shmem_free_inode(inode->i_sb);
>>  + stable_offset_remove(dir, dentry);
>> +
>>   dir->i_size -= BOGO_DIRENT_SIZE;
>>   inode->i_ctime = dir->i_ctime = dir->i_mtime = current_time(inode);
>>   inode_inc_iversion(dir);
>> @@ -3103,24 +3114,37 @@ static int shmem_rename2(struct mnt_idmap *idmap,
>>  {
>>   struct inode *inode = d_inode(old_dentry);
>>   int they_are_dirs = S_ISDIR(inode->i_mode);
>> + int error;
>>     if (flags & ~(RENAME_NOREPLACE | RENAME_EXCHANGE | RENAME_WHITEOUT))
>>   return -EINVAL;
>>  - if (flags & RENAME_EXCHANGE)
>> + if (flags & RENAME_EXCHANGE) {
>> + stable_offset_remove(old_dir, old_dentry);
>> + stable_offset_remove(new_dir, new_dentry);
>> + error = stable_offset_add(new_dir, old_dentry);
>> + if (error)
>> + return error;
>> + error = stable_offset_add(old_dir, new_dentry);
>> + if (error)
>> + return error;
>>   return simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry);
>> + }
> 
> Hmm, error handling issues? Everything needs to be reversed when any of the operations fails?
> 
>>     if (!simple_empty(new_dentry))
>>   return -ENOTEMPTY;
>>     if (flags & RENAME_WHITEOUT) {
>> - int error;
>> -
>>   error = shmem_whiteout(idmap, old_dir, old_dentry);
>>   if (error)
>>   return error;
>>   }
>>  + stable_offset_remove(old_dir, old_dentry);
>> + error = stable_offset_add(new_dir, old_dentry);
>> + if (error)
>> + return error;
>> +
>>   if (d_really_is_positive(new_dentry)) {
>>   (void) shmem_unlink(new_dir, new_dentry);
>>   if (they_are_dirs) {
>> @@ -3185,6 +3209,11 @@ static int shmem_symlink(struct mnt_idmap *idmap, struct inode *dir,
>>   folio_unlock(folio);
>>   folio_put(folio);
>>   }
>> +
>> + error = stable_offset_add(dir, dentry);
>> + if (error)
>> + goto out_iput;
>> +
> 
> Error handling, there is a kmemdup() above which needs to be freed? I'm not sure about folio, automatically released with the inode?
> 
>>   dir->i_size += BOGO_DIRENT_SIZE;
>>   dir->i_ctime = dir->i_mtime = current_time(dir);
>>   inode_inc_iversion(dir);
>> @@ -3920,6 +3949,8 @@ static void shmem_destroy_inode(struct inode *inode)
>>  {
>>   if (S_ISREG(inode->i_mode))
>>   mpol_free_shared_policy(&SHMEM_I(inode)->policy);
>> + if (S_ISDIR(inode->i_mode))
>> + stable_offset_destroy(inode);
>>  }
>>    static void shmem_init_inode(void *foo)
> 
> Thanks,
> Bernd

Thanks for the review. I think I've addressed the issues you've pointed out.
Watch for v4 of this series.


--
Chuck Lever




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v3 1/3] libfs: Add directory operations for stable offsets
  2023-06-26 13:36     ` Bernd Schubert
  2023-06-26 14:50       ` Chuck Lever III
@ 2023-06-26 15:22       ` Bernd Schubert
  1 sibling, 0 replies; 9+ messages in thread
From: Bernd Schubert @ 2023-06-26 15:22 UTC (permalink / raw)
  To: Chuck Lever, viro, brauner, hughd, akpm
  Cc: Chuck Lever, linux-mm, linux-fsdevel



On 6/26/23 15:36, Bernd Schubert wrote:
> 
> 
> On 6/24/23 00:21, Bernd Schubert wrote:
>>
>>
>> On 6/6/23 15:10, Chuck Lever wrote:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>
>>> Create a vector of directory operations in fs/libfs.c that handles
>>> directory seeks and readdir via stable offsets instead of the
>>> current cursor-based mechanism.
>>>
>>> For the moment these are unused.
>>>
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>   fs/dcache.c            |    1
>>>   fs/libfs.c             |  185 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++
>>>   include/linux/dcache.h |    1
>>>   include/linux/fs.h     |    9 ++
>>>   4 files changed, 196 insertions(+)
>>>
>>> diff --git a/fs/dcache.c b/fs/dcache.c
>>> index 52e6d5fdab6b..9c9a801f3b33 100644
>>> --- a/fs/dcache.c
>>> +++ b/fs/dcache.c
>>> @@ -1813,6 +1813,7 @@ static struct dentry *__d_alloc(struct 
>>> super_block *sb, const struct qstr *name)
>>>       dentry->d_sb = sb;
>>>       dentry->d_op = NULL;
>>>       dentry->d_fsdata = NULL;
>>> +    dentry->d_offset = 0;
>>>       INIT_HLIST_BL_NODE(&dentry->d_hash);
>>>       INIT_LIST_HEAD(&dentry->d_lru);
>>>       INIT_LIST_HEAD(&dentry->d_subdirs);
>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>> index 89cf614a3271..07317bbe1668 100644
>>> --- a/fs/libfs.c
>>> +++ b/fs/libfs.c
>>> @@ -239,6 +239,191 @@ const struct inode_operations 
>>> simple_dir_inode_operations = {
>>>   };
>>>   EXPORT_SYMBOL(simple_dir_inode_operations);
>>> +/**
>>> + * stable_offset_init - initialize a parent directory
>>> + * @dir: parent directory to be initialized
>>> + *
>>> + */
>>> +void stable_offset_init(struct inode *dir)
>>> +{
>>> +    xa_init_flags(&dir->i_doff_map, XA_FLAGS_ALLOC1);
>>> +    dir->i_next_offset = 0;
>>> +}
>>> +EXPORT_SYMBOL(stable_offset_init);
>>> +
>>> +/**
>>> + * stable_offset_add - Add an entry to a directory's stable offset map
>>> + * @dir: parent directory being modified
>>> + * @dentry: new dentry being added
>>> + *
>>> + * Returns zero on success. Otherwise, a negative errno value is 
>>> returned.
>>> + */
>>> +int stable_offset_add(struct inode *dir, struct dentry *dentry)
>>> +{
>>> +    struct xa_limit limit = XA_LIMIT(2, U32_MAX);
>>> +    u32 offset = 0;
>>> +    int ret;
>>> +
>>> +    if (dentry->d_offset)
>>> +        return -EBUSY;
>>> +
>>> +    ret = xa_alloc_cyclic(&dir->i_doff_map, &offset, dentry, limit,
>>> +                  &dir->i_next_offset, GFP_KERNEL);
>>
>> Please see below at struct inode my question about i_next_offset.
>>
>>> +    if (ret < 0)
>>> +        return ret;
>>> +
>>> +    dentry->d_offset = offset;
>>> +    return 0;
>>> +}
>>> +EXPORT_SYMBOL(stable_offset_add);
>>> +
>>> +/**
>>> + * stable_offset_remove - Remove an entry to a directory's stable 
>>> offset map
>>> + * @dir: parent directory being modified
>>> + * @dentry: dentry being removed
>>> + *
>>> + */
>>> +void stable_offset_remove(struct inode *dir, struct dentry *dentry)
>>> +{
>>> +    if (!dentry->d_offset)
>>> +        return;
>>> +
>>> +    xa_erase(&dir->i_doff_map, dentry->d_offset);
>>> +    dentry->d_offset = 0;
>>> +}
>>> +EXPORT_SYMBOL(stable_offset_remove);
>>> +
>>> +/**
>>> + * stable_offset_destroy - Release offset map
>>> + * @dir: parent directory that is about to be destroyed
>>> + *
>>> + * During fs teardown (eg. umount), a directory's offset map might 
>>> still
>>> + * contain entries. xa_destroy() cleans out anything that remains.
>>> + */
>>> +void stable_offset_destroy(struct inode *dir)
>>> +{
>>> +    xa_destroy(&dir->i_doff_map);
>>> +}
>>> +EXPORT_SYMBOL(stable_offset_destroy);
>>> +
>>> +/**
>>> + * stable_dir_llseek - Advance the read position of a directory 
>>> descriptor
>>> + * @file: an open directory whose position is to be updated
>>> + * @offset: a byte offset
>>> + * @whence: enumerator describing the starting position for this update
>>> + *
>>> + * SEEK_END, SEEK_DATA, and SEEK_HOLE are not supported for 
>>> directories.
>>> + *
>>> + * Returns the updated read position if successful; otherwise a
>>> + * negative errno is returned and the read position remains unchanged.
>>> + */
>>> +static loff_t stable_dir_llseek(struct file *file, loff_t offset, 
>>> int whence)
>>> +{
>>> +    switch (whence) {
>>> +    case SEEK_CUR:
>>> +        offset += file->f_pos;
>>> +        fallthrough;
>>> +    case SEEK_SET:
>>> +        if (offset >= 0)
>>> +            break;
>>> +        fallthrough;
>>> +    default:
>>> +        return -EINVAL;
>>> +    }
>>> +
>>> +    return vfs_setpos(file, offset, U32_MAX);
>>> +}
>>> +
>>> +static struct dentry *stable_find_next(struct xa_state *xas)
>>> +{
>>> +    struct dentry *child, *found = NULL;
>>> +
>>> +    rcu_read_lock();
>>> +    child = xas_next_entry(xas, U32_MAX);
>>> +    if (!child)
>>> +        goto out;
>>> +    spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED);
>>> +    if (simple_positive(child))
>>> +        found = dget_dlock(child);
>>> +    spin_unlock(&child->d_lock);
>>> +out:
>>> +    rcu_read_unlock();
>>> +    return found;
>>> +}
> 
> I wonder if this should try the next dentry when simple_positive() 
> returns false, what is if there is a readdir/unlink race? readdir now 
> abort in the middle instead of continuing with the next dentry?
> 
>>> +
>>> +static bool stable_dir_emit(struct dir_context *ctx, struct dentry 
>>> *dentry)
>>> +{
>>> +    struct inode *inode = d_inode(dentry);
>>> +
>>> +    return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len,
>>> +              dentry->d_offset, inode->i_ino,
>>> +              fs_umode_to_dtype(inode->i_mode));
>>> +}
>>> +
>>> +static void stable_iterate_dir(struct dentry *dir, struct 
>>> dir_context *ctx)
>>> +{
>>> +    XA_STATE(xas, &((d_inode(dir))->i_doff_map), ctx->pos);
>>> +    struct dentry *dentry;
>>> +
>>> +    while (true) {
>>> +        spin_lock(&dir->d_lock);
>>> +        dentry = stable_find_next(&xas);
>>> +        spin_unlock(&dir->d_lock);
>>> +        if (!dentry)
>>> +            break;
>>> +
>>> +        if (!stable_dir_emit(ctx, dentry)) {
>>> +            dput(dentry);
>>> +            break;
>>> +        }
>>> +
>>> +        dput(dentry);
>>> +        ctx->pos = xas.xa_index + 1;
>>> +    }
>>> +}
>>> +
>>> +/**
>>> + * stable_readdir - Emit entries starting at offset @ctx->pos
>>> + * @file: an open directory to iterate over
>>> + * @ctx: directory iteration context
>>> + *
>>> + * Caller must hold @file's i_rwsem to prevent insertion or removal of
>>> + * entries during this call.
>>> + *
>>> + * On entry, @ctx->pos contains an offset that represents the first 
>>> entry
>>> + * to be read from the directory.
>>> + *
>>> + * The operation continues until there are no more entries to read, or
>>> + * until the ctx->actor indicates there is no more space in the 
>>> caller's
>>> + * output buffer.
>>> + *
>>> + * On return, @ctx->pos contains an offset that will read the next 
>>> entry
>>> + * in this directory when shmem_readdir() is called again with @ctx.
>>> + *
>>> + * Return values:
>>> + *   %0 - Complete
>>> + */
>>> +static int stable_readdir(struct file *file, struct dir_context *ctx)
>>> +{
>>> +    struct dentry *dir = file->f_path.dentry;
>>> +
>>> +    lockdep_assert_held(&d_inode(dir)->i_rwsem);
>>> +
>>> +    if (!dir_emit_dots(file, ctx))
>>> +        return 0;
>>> +
>>> +    stable_iterate_dir(dir, ctx);
>>> +    return 0;
>>> +}
>>> +
>>> +const struct file_operations stable_dir_operations = {
>>> +    .llseek        = stable_dir_llseek,
>>> +    .iterate_shared    = stable_readdir,
>>> +    .read        = generic_read_dir,
>>> +    .fsync        = noop_fsync,
>>> +};
>>> +EXPORT_SYMBOL(stable_dir_operations);
>>> +
>>>   static struct dentry *find_next_child(struct dentry *parent, struct 
>>> dentry *prev)
>>>   {
>>>       struct dentry *child = NULL;
>>> diff --git a/include/linux/dcache.h b/include/linux/dcache.h
>>> index 6b351e009f59..579ce1800efe 100644
>>> --- a/include/linux/dcache.h
>>> +++ b/include/linux/dcache.h
>>> @@ -96,6 +96,7 @@ struct dentry {
>>>       struct super_block *d_sb;    /* The root of the dentry tree */
>>>       unsigned long d_time;        /* used by d_revalidate */
>>>       void *d_fsdata;            /* fs-specific data */
>>> +    u32 d_offset;            /* directory offset in parent */
>>>       union {
>>>           struct list_head d_lru;        /* LRU list */
>>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>>> index 133f0640fb24..3fc2c04ed8ff 100644
>>> --- a/include/linux/fs.h
>>> +++ b/include/linux/fs.h
>>> @@ -719,6 +719,10 @@ struct inode {
>>>   #endif
>>>       void            *i_private; /* fs or device private pointer */
>>> +
>>> +    /* simplefs stable directory offset tracking */
>>> +    struct xarray        i_doff_map;
>>> +    u32            i_next_offset;
>>
>> Hmm, I was grepping through the patches and only find that 
>> "i_next_offset" is initialized to 0 and then passed to xa_alloc_cyclic 
>> - does this really need to part of struct inode or could it be a local 
>> variable in stable_offset_add()?
> 
> Hmm, at best that is used for rename, but then is this offset right when 
> old_dir and new_dir differ?

Sorry, ignore that please, as I wrote before, I had read it a per dentry 
value.

> 
>>
>> I only managed to look a bit through the patches right now, personally 
>> I like v2 better as it doesn't extend struct inode with changes that 
>> can be used by in-memory file system only. What do others think? An 
>> alternative would be to have these fields in struct shmem_inode_info 
>> and pass it as extra argument to the stable_ functions?
>>
>>
>> Thanks,
>> Bernd


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-06-26 15:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <168605676256.32244.6158641147817585524.stgit@manet.1015granger.net>
2023-06-21 13:12 ` [PATCH v3 0/3] shmemfs stable directory offsets Chuck Lever III
2023-06-21 14:25   ` Christian Brauner
     [not found] ` <168605705924.32244.13384849924097654559.stgit@manet.1015granger.net>
2023-06-23 22:21   ` [PATCH v3 1/3] libfs: Add directory operations for stable offsets Bernd Schubert
2023-06-26 13:36     ` Bernd Schubert
2023-06-26 14:50       ` Chuck Lever III
2023-06-26 15:01         ` Bernd Schubert
2023-06-26 15:22       ` Bernd Schubert
     [not found] ` <168605707262.32244.4794425063054676856.stgit@manet.1015granger.net>
2023-06-26 13:18   ` [PATCH v3 3/3] shmem: stable directory offsets Bernd Schubert
2023-06-26 15:16     ` Chuck Lever III

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox