linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yu Kuai <yukuai1@huaweicloud.com>
To: Chuck Lever <chuck.lever@oracle.com>,
	yangerkun <yangerkun@huaweicloud.com>
Cc: Yu Kuai <yukuai1@huaweicloud.com>, Chuck Lever <cel@kernel.org>,
	linux-stable <stable@vger.kernel.org>,
	"harry.wentland@amd.com" <harry.wentland@amd.com>,
	"sunpeng.li@amd.com" <sunpeng.li@amd.com>,
	"Rodrigo.Siqueira@amd.com" <Rodrigo.Siqueira@amd.com>,
	"alexander.deucher@amd.com" <alexander.deucher@amd.com>,
	"christian.koenig@amd.com" <christian.koenig@amd.com>,
	"Xinhui.Pan@amd.com" <Xinhui.Pan@amd.com>,
	"airlied@gmail.com" <airlied@gmail.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	Liam Howlett <liam.howlett@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Sasha Levin <sashal@kernel.org>,
	"srinivasan.shanmugam@amd.com" <srinivasan.shanmugam@amd.com>,
	"chiahsuan.chung@amd.com" <chiahsuan.chung@amd.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
	"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
	"zhangpeng.00@bytedance.com" <zhangpeng.00@bytedance.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"maple-tree@lists.infradead.org" <maple-tree@lists.infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	"yi.zhang@huawei.com" <yi.zhang@huawei.com>,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir
Date: Sat, 16 Nov 2024 15:22:09 +0800	[thread overview]
Message-ID: <5d0b46ee-ff09-f0ac-1920-6736394245ca@huaweicloud.com> (raw)
In-Reply-To: <ZzTDE+RN5d/mwUXl@tissot.1015granger.net>

Hi,

在 2024/11/13 23:17, Chuck Lever 写道:
> On Mon, Nov 11, 2024 at 11:20:17PM +0800, yangerkun wrote:
>>
>>
>> 在 2024/11/11 22:39, Chuck Lever III 写道:
>>>
>>>
>>>> On Nov 10, 2024, at 9:36 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>> I'm in the cc list ,so I assume you saw my set, then I don't know why
>>>> you're ignoring my concerns.
>>>> 1) next_offset is 32-bit and can overflow in a long-time running
>>>> machine.
>>>> 2) Once next_offset overflows, readdir will skip the files that offset
>>>> is bigger.
>>
>> I'm sorry, I'm a little busy these days, so I haven't responded to this
>> series of emails.
>>
>>> In that case, that entry won't be visible via getdents(3)
>>> until the directory is re-opened or the process does an
>>> lseek(fd, 0, SEEK_SET).
>>
>> Yes.
>>
>>>
>>> That is the proper and expected behavior. I suspect you
>>> will see exactly that behavior with ext4 and 32-bit
>>> directory offsets, for example.
>>
>> Emm...
>>
>> For this case like this:
>>
>> 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
>> 2. open /tmp/dir with fd1
>> 3. readdir and get /tmp/dir/file1
>> 4. rm /tmp/dir/file2
>> 5. touch /tmp/dir/file2
>> 4. loop 4~5 for 2^32 times
>> 5. readdir /tmp/dir with fd1
>>
>> For tmpfs now, we may see no /tmp/dir/file2, since the offset has been
>> overflow, for ext4 it is ok... So we think this will be a problem.
> 
> I constructed a simple test program using the above steps:
> 
> /*
>   * 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
>   * 2. open /tmp/dir with fd1
>   * 3. readdir and get /tmp/dir/file1
>   * 4. rm /tmp/dir/file2
>   * 5. touch /tmp/dir/file2
>   * 6. loop 4~5 for 2^32 times
>   * 7. readdir /tmp/dir with fd1
>   */
> 
> #include <sys/types.h>
> #include <sys/stat.h>
> 
> #include <dirent.h>
> #include <errno.h>
> #include <fcntl.h>
> #include <unistd.h>
> #include <stdbool.h>
> #include <stdio.h>
> #include <string.h>
> 
> static void list_directory(DIR *dirp)
> {
> 	struct dirent *de;
> 
> 	errno = 0;
> 	do {
> 		de = readdir(dirp);
> 		if (!de)
> 			break;
> 
> 		printf("d_off:  %lld\n", de->d_off);
> 		printf("d_name: %s\n", de->d_name);
> 	} while (true);
> 
> 	if (errno)
> 		perror("readdir");
> 	else
> 		printf("EOD\n");
> }
> 
> int main(int argc, char **argv)
> {
> 	unsigned long i;
> 	DIR *dirp;
> 	int ret;
> 
> 	/* 1. */
> 	ret = mkdir("/tmp/dir", 0755);
> 	if (ret < 0) {
> 		perror("mkdir");
> 		return 1;
> 	}
> 
> 	ret = creat("/tmp/dir/file1", 0644);
> 	if (ret < 0) {
> 		perror("creat");
> 		return 1;
> 	}
> 	close(ret);
> 
> 	ret = creat("/tmp/dir/file2", 0644);
> 	if (ret < 0) {
> 		perror("creat");
> 		return 1;
> 	}
> 	close(ret);
> 
> 	/* 2. */
> 	errno = 0;
> 	dirp = opendir("/tmp/dir");
> 	if (!dirp) {
> 		if (errno)
> 			perror("opendir");
> 		else
> 			fprintf(stderr, "EOD\n");
> 		closedir(dirp);
> 		return 1;
> 	}
> 
> 	/* 3. */
> 	errno = 0;
> 	do {
> 		struct dirent *de;
> 
> 		de = readdir(dirp);
> 		if (!de) {
> 			if (errno) {
> 				perror("readdir");
> 				closedir(dirp);
> 				return 1;
> 			}
> 			break;
> 		}
> 		if (strcmp(de->d_name, "file1") == 0) {
> 			printf("Found 'file1'\n");
> 			break;
> 		}
> 	} while (true);
> 
> 	/* run the test. */
> 	for (i = 0; i < 10000; i++) {
> 		/* 4. */
> 		ret = unlink("/tmp/dir/file2");
> 		if (ret < 0) {
> 			perror("unlink");
> 			closedir(dirp);
> 			return 1;
> 		}
> 
> 		/* 5. */
> 		ret = creat("/tmp/dir/file2", 0644);
> 		if (ret < 0) {
> 			perror("creat");
> 			fprintf(stderr, "i = %lu\n", i);
> 			closedir(dirp);
> 			return 1;
> 		}
> 		close(ret);
> 	}
> 
> 	/* 7. */
> 	printf("\ndirectory after test:\n");
> 	list_directory(dirp);
> 
> 	/* cel. */
> 	rewinddir(dirp);
> 	printf("\ndirectory after rewind:\n");
> 	list_directory(dirp);
> 
> 	closedir(dirp);
> 	return 0;
> }
> 
> 
>>> Does that not directly address your concern? Or do you
>>> mean that Erkun's patch introduces a new issue?
>>
>> Yes, to be honest, my personal feeling is a problem. But for 64bit, it may
>> never been trigger.
> 
> I ran the test program above on this kernel:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/cel/linux.git/log/?h=nfsd-testing
> 
> Note that it has a patch to restrict the range of directory offset
> values for tmpfs to 2..4096.
> 
> I did not observe any unexpected behavior after the offset values
> wrapped. At step 7, I can always see file2, and its offset is always
> 4. At step "cel" I can see all expected directory entries.

Then, do you investigate more or not?
> 
> I tested on v6.12-rc7 with the same range restriction but using
> Maple tree and 64-bit offsets. No unexpected behavior there either.
> 
> So either we're still missing something, or there is no problem. My
> only theory is maybe it's an issue with an implicit integer sign
> conversion, and we should restrict the offset range to 2..S32_MAX.
> 
> I can try testing with a range of (U32_MAX - 4096)..(U32_MAX).

You can try the following reproducer, it's much simpler. First, apply
following patch(on latest kernel):

diff --git a/fs/libfs.c b/fs/libfs.c
index a168ece5cc61..7c1a5982a0c8 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -291,7 +291,7 @@ int simple_offset_add(struct offset_ctx *octx, 
struct dentry *dentry)
                 return -EBUSY;

         ret = mtree_alloc_cyclic(&octx->mt, &offset, dentry, 
DIR_OFFSET_MIN,
-                                LONG_MAX, &octx->next_offset, GFP_KERNEL);
+                                256, &octx->next_offset, GFP_KERNEL);
         if (ret < 0)
                 return ret;


Then, create a new tmpfs dir, inside the dir:

[root@fedora test-libfs]# for ((i=0; i<256; ++i)); do touch $i; done
touch: cannot touch '255': Device or resource busy
[root@fedora test-libfs]# ls
0    103  109  114  12   125  130  136  141  147  152  158  163  169 
174  18   185  190  196  200  206  211  217  222  228  233  239  244  25 
   26  31  37  42  48  53  59  64  7   75  80  86  91  97
1    104  11   115  120  126  131  137  142  148  153  159  164  17 
175  180  186  191  197  201  207  212  218  223  229  234  24   245 
250  27  32  38  43  49  54  6   65  70  76  81  87  92  98
10   105  110  116  121  127  132  138  143  149  154  16   165  170 
176  181  187  192  198  202  208  213  219  224  23   235  240  246 
251  28  33  39  44  5   55  60  66  71  77  82  88  93  99
100  106  111  117  122  128  133  139  144  15   155  160  166  171 
177  182  188  193  199  203  209  214  22   225  230  236  241  247 
252  29  34  4   45  50  56  61  67  72  78  83  89  94
101  107  112  118  123  129  134  14   145  150  156  161  167  172 
178  183  189  194  2    204  21   215  220  226  231  237  242  248 
253  3   35  40  46  51  57  62  68  73  79  84  9   95
102  108  113  119  124  13   135  140  146  151  157  162  168  173 
179  184  19   195  20   205  210  216  221  227  232  238  243  249 
254  30  36  41  47  52  58  63  69  74  8   85  90  96
[root@fedora test-libfs]# rm -f 0
[root@fedora test-libfs]# touch 255
[root@fedora test-libfs]# ls
255
[root@fedora test-libfs]#

I don't think I have to explain why the second ls can only show the file
255...

Thanks,
Kuai

> 
> 
>>> If there is a problem here, please construct a reproducer
>>> against this patch set and post it.
> 
> Invitation still stands: if you have a solid reproducer, please post
> it.
> 
> 



      reply	other threads:[~2024-11-16  7:22 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-11  0:52 [RFC PATCH 0/6 6.6] Address rename/readdir bugs in fs/libfs.c cel
2024-11-11  0:52 ` [RFC PATCH 1/6 6.6] libfs: Define a minimum directory offset cel
2024-11-11  0:52 ` [RFC PATCH 2/6 6.6] libfs: Add simple_offset_empty() cel
2024-11-11  0:52 ` [RFC PATCH 3/6 6.6] libfs: Fix simple_offset_rename_exchange() cel
2024-11-11  0:52 ` [RFC PATCH 4/6 6.6] libfs: Add simple_offset_rename() API cel
2024-11-11  0:52 ` [RFC PATCH 5/6 6.6] shmem: Fix shmem_rename2() cel
2024-11-11  0:52 ` [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir cel
2024-11-11  2:36   ` Yu Kuai
2024-11-11 14:39     ` Chuck Lever III
2024-11-11 15:20       ` yangerkun
2024-11-11 15:34         ` Chuck Lever III
2024-11-12  3:43           ` yangerkun
2024-11-12 15:37             ` Chuck Lever III
2024-11-13 15:17         ` Chuck Lever
2024-11-16  7:22           ` Yu Kuai [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5d0b46ee-ff09-f0ac-1920-6736394245ca@huaweicloud.com \
    --to=yukuai1@huaweicloud.com \
    --cc=Rodrigo.Siqueira@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=brauner@kernel.org \
    --cc=cel@kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chiahsuan.chung@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=chuck.lever@oracle.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=harry.wentland@amd.com \
    --cc=hughd@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=sashal@kernel.org \
    --cc=srinivasan.shanmugam@amd.com \
    --cc=stable@vger.kernel.org \
    --cc=sunpeng.li@amd.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yangerkun@huaweicloud.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai3@huawei.com \
    --cc=zhangpeng.00@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox