From: tavianator@tavianator.com
To: cel@kernel.org
Cc: akpm@linux-foundation.org, brauner@kernel.org,
chuck.lever@oracle.com, hughd@google.com, jlayton@redhat.com,
linux-fsdevel@vger.kernel.org, linux-mm@kvack.org,
viro@zeniv.linux.org.uk
Subject: Re: [PATCH v7 3/3] shmem: stable directory offsets
Date: Mon, 13 Nov 2023 13:06:16 -0500 [thread overview]
Message-ID: <20231113180616.2831430-1-tavianator@tavianator.com> (raw)
In-Reply-To: <168814734331.530310.3911190551060453102.stgit@manet.1015granger.net>
On Fri, 30 Jun 2023 at 13:49:03 -0400, Chuck Lever wrote:
> From: Chuck Lever <chuck.lever@oracle.com>
>
> The current cursor-based directory offset mechanism doesn't work
> when a tmpfs filesystem is exported via NFS. This is because NFS
> clients do not open directories. Each server-side READDIR operation
> has to open the directory, read it, then close it. The cursor state
> for that directory, being associated strictly with the opened
> struct file, is thus discarded after each NFS READDIR operation.
>
> Directory offsets are cached not only by NFS clients, but also by
> user space libraries on those clients. Essentially there is no way
> to invalidate those caches when directory offsets have changed on
> an NFS server after the offset-to-dentry mapping changes. Thus the
> whole application stack depends on unchanging directory offsets.
>
> The solution we've come up with is to make the directory offset for
> each file in a tmpfs filesystem stable for the life of the directory
> entry it represents.
>
> shmem_readdir() and shmem_dir_llseek() now use an xarray to map each
> directory offset (an loff_t integer) to the memory address of a
> struct dentry.
I believe this patch is responsible for a tmpfs behaviour change when
a directory is modified while being read. The following test program
#include <dirent.h>
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <sys/stat.h>
#include <unistd.h>
int main(int argc, char *argv[]) {
const char *tmp = "/tmp";
if (argc >= 2)
tmp = argv[1];
char *dir_path;
if (asprintf(&dir_path, "%s/foo.XXXXXX", tmp) < 0)
err(EXIT_FAILURE, "asprintf()");
if (!mkdtemp(dir_path))
err(EXIT_FAILURE, "mkdtemp(%s)", dir_path);
char *file_path;
if (asprintf(&file_path, "%s/bar", dir_path) < 0)
err(EXIT_FAILURE, "asprintf()");
if (creat(file_path, 0644) < 0)
err(EXIT_FAILURE, "creat(%s)", file_path);
DIR *dir = opendir(dir_path);
if (!dir)
err(EXIT_FAILURE, "opendir(%s)", dir_path);
struct dirent *de;
while ((de = readdir(dir))) {
printf("readdir(): %s/%s\n", dir_path, de->d_name);
if (de->d_name[0] == '.')
continue;
if (unlink(file_path) != 0)
err(EXIT_FAILURE, "unlink(%s)", file_path);
if (creat(file_path, 0644) < 0)
err(EXIT_FAILURE, "creat(%s)", file_path);
}
return EXIT_SUCCESS;
}
when run on Linux 6.5, doesn't print the new directory entry:
tavianator@graphene $ uname -a
Linux graphene 6.5.9-arch2-1 #1 SMP PREEMPT_DYNAMIC Thu, 26 Oct 2023 00:52:20 +0000 x86_64 GNU/Linux
tavianator@graphene $ gcc -Wall foo.c -o foo
tavianator@graphene $ ./foo
readdir(): /tmp/foo.wgmdmm/.
readdir(): /tmp/foo.wgmdmm/..
readdir(): /tmp/foo.wgmdmm/bar
But on Linux 6.6, readdir() never stops:
tavianator@tachyon $ uname -a
Linux tachyon 6.6.1-arch1-1 #1 SMP PREEMPT_DYNAMIC Wed, 08 Nov 2023 16:05:38 +0000 x86_64 GNU/Linux
tavianator@tachyon $ gcc foo.c -o foo
tavianator@tachyon $ ./foo
readdir(): /tmp/foo.XnIRqj/.
readdir(): /tmp/foo.XnIRqj/..
readdir(): /tmp/foo.XnIRqj/bar
readdir(): /tmp/foo.XnIRqj/bar
readdir(): /tmp/foo.XnIRqj/bar
readdir(): /tmp/foo.XnIRqj/bar
readdir(): /tmp/foo.XnIRqj/bar
readdir(): /tmp/foo.XnIRqj/bar
readdir(): /tmp/foo.XnIRqj/bar
readdir(): /tmp/foo.XnIRqj/bar
...
foo: creat(/tmp/foo.TTL6Fg/bar): Too many open files
POSIX says[1]
> If a file is removed from or added to the directory after the most recent
> call to opendir() or rewinddir(), whether a subsequent call to readdir()
> returns an entry for that file is unspecified.
so this isn't necessarily a *bug*, but I just wanted to point out the
behaviour change. I only noticed it because it broke one of my tests in
bfs[2] (in a non-default build configuration).
[1]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/readdir.html
[2]: https://github.com/tavianator/bfs/blob/main/tests/gnu/ignore_readdir_race_notdir.sh
next prev parent reply other threads:[~2023-11-13 18:06 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-30 17:48 [PATCH v7 0/3] shmemfs " Chuck Lever
2023-06-30 17:48 ` [PATCH v7 1/3] libfs: Add directory operations for stable offsets Chuck Lever
2023-07-03 10:56 ` Christian Brauner
2023-07-03 13:26 ` Chuck Lever III
2023-06-30 17:48 ` [PATCH v7 2/3] shmem: Refactor shmem_symlink() Chuck Lever
2023-06-30 17:49 ` [PATCH v7 3/3] shmem: stable directory offsets Chuck Lever
2023-07-13 13:30 ` kernel test robot
2023-07-17 6:46 ` kernel test robot
2023-07-22 20:33 ` Chuck Lever III
2023-07-25 15:12 ` Chuck Lever III
2023-07-25 15:54 ` Philip Li
2023-07-25 15:59 ` Christian Brauner
2023-07-26 2:55 ` Philip Li
2023-11-13 18:06 ` tavianator [this message]
2023-11-13 18:58 ` Chuck Lever
2023-07-03 15:23 ` [PATCH v7 0/3] shmemfs " Christian Brauner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20231113180616.2831430-1-tavianator@tavianator.com \
--to=tavianator@tavianator.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=cel@kernel.org \
--cc=chuck.lever@oracle.com \
--cc=hughd@google.com \
--cc=jlayton@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=viro@zeniv.linux.org.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox