Re: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Chuck Lever III <chuck.lever@oracle.com>
To: yangerkun <yangerkun@huaweicloud.com>
Cc: Yu Kuai <yukuai1@huaweicloud.com>, Chuck Lever <cel@kernel.org>,
	linux-stable <stable@vger.kernel.org>,
	"harry.wentland@amd.com" <harry.wentland@amd.com>,
	"sunpeng.li@amd.com" <sunpeng.li@amd.com>,
	"Rodrigo.Siqueira@amd.com" <Rodrigo.Siqueira@amd.com>,
	"alexander.deucher@amd.com" <alexander.deucher@amd.com>,
	"christian.koenig@amd.com" <christian.koenig@amd.com>,
	"Xinhui.Pan@amd.com" <Xinhui.Pan@amd.com>,
	"airlied@gmail.com" <airlied@gmail.com>,
	Daniel Vetter <daniel@ffwll.ch>,
	Al Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>,
	Liam Howlett <liam.howlett@oracle.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Hugh Dickins <hughd@google.com>,
	"Matthew Wilcox (Oracle)" <willy@infradead.org>,
	Greg KH <gregkh@linuxfoundation.org>,
	Sasha Levin <sashal@kernel.org>,
	"srinivasan.shanmugam@amd.com" <srinivasan.shanmugam@amd.com>,
	"chiahsuan.chung@amd.com" <chiahsuan.chung@amd.com>,
	"mingo@kernel.org" <mingo@kernel.org>,
	"mgorman@techsingularity.net" <mgorman@techsingularity.net>,
	"chengming.zhou@linux.dev" <chengming.zhou@linux.dev>,
	"zhangpeng.00@bytedance.com" <zhangpeng.00@bytedance.com>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Linux FS Devel <linux-fsdevel@vger.kernel.org>,
	"maple-tree@lists.infradead.org" <maple-tree@lists.infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	"yi.zhang@huawei.com" <yi.zhang@huawei.com>,
	"yukuai (C)" <yukuai3@huawei.com>
Subject: Re: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir
Date: Tue, 12 Nov 2024 15:37:29 +0000	[thread overview]
Message-ID: <C4E2D262-4864-45FD-A985-9C9F64EF83B5@oracle.com> (raw)
In-Reply-To: <dd6bd7f5-cf2e-3123-3017-c209d81ab290@huaweicloud.com>



> On Nov 11, 2024, at 10:43 PM, yangerkun <yangerkun@huaweicloud.com> wrote:
> 
> 
> 
> 在 2024/11/11 23:34, Chuck Lever III 写道:
>>> On Nov 11, 2024, at 10:20 AM, yangerkun <yangerkun@huaweicloud.com> wrote:
>>> 
>>> 
>>> 
>>> 在 2024/11/11 22:39, Chuck Lever III 写道:
>>>>> On Nov 10, 2024, at 9:36 PM, Yu Kuai <yukuai1@huaweicloud.com> wrote:
>>>>> 
>>>>> Hi,
>>>>> 
>>>>> 在 2024/11/11 8:52, cel@kernel.org 写道:
>>>>>> From: yangerkun <yangerkun@huawei.com>
>>>>>> [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ]
>>>>>> After we switch tmpfs dir operations from simple_dir_operations to
>>>>>> simple_offset_dir_operations, every rename happened will fill new dentry
>>>>>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free
>>>>>> key starting with octx->newx_offset, and then set newx_offset equals to
>>>>>> free key + 1. This will lead to infinite readdir combine with rename
>>>>>> happened at the same time, which fail generic/736 in xfstests(detail show
>>>>>> as below).
>>>>>> 1. create 5000 files(1 2 3...) under one dir
>>>>>> 2. call readdir(man 3 readdir) once, and get one entry
>>>>>> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry)
>>>>>> 4. loop 2~3, until readdir return nothing or we loop too many
>>>>>>    times(tmpfs break test with the second condition)
>>>>>> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite
>>>>>> directory reads") to fix it, record the last_index when we open dir, and
>>>>>> do not emit the entry which index >= last_index. The file->private_data
>>>>> 
>>>>> Please notice this requires last_index should never overflow, otherwise
>>>>> readdir will be messed up.
>>>> It would help your cause if you could be more specific
>>>> than "messed up".
>>>>>> now used in offset dir can use directly to do this, and we also update
>>>>>> the last_index when we llseek the dir file.
>>>>>> Fixes: a2e459555c5f ("shmem: stable directory offsets")
>>>>>> Signed-off-by: yangerkun <yangerkun@huawei.com>
>>>>>> Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com
>>>>>> Reviewed-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>> [brauner: only update last_index after seek when offset is zero like Jan suggested]
>>>>>> Signed-off-by: Christian Brauner <brauner@kernel.org>
>>>>>> Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701
>>>>>> [ cel: adjusted to apply to origin/linux-6.6.y ]
>>>>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>>>>> ---
>>>>>>  fs/libfs.c | 37 +++++++++++++++++++++++++------------
>>>>>>  1 file changed, 25 insertions(+), 12 deletions(-)
>>>>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>>>>> index a87005c89534..b59ff0dfea1f 100644
>>>>>> --- a/fs/libfs.c
>>>>>> +++ b/fs/libfs.c
>>>>>> @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx)
>>>>>>   xa_destroy(&octx->xa);
>>>>>>  }
>>>>>>  +static int offset_dir_open(struct inode *inode, struct file *file)
>>>>>> +{
>>>>>> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
>>>>>> +
>>>>>> + file->private_data = (void *)ctx->next_offset;
>>>>>> + return 0;
>>>>>> +}
>>>>> 
>>>>> Looks like xarray is still used.
>>>> That's not going to change, as several folks have already
>>>> explained.
>>>>> I'm in the cc list ,so I assume you saw my set, then I don't know why
>>>>> you're ignoring my concerns.
>>>>> 1) next_offset is 32-bit and can overflow in a long-time running
>>>>> machine.
>>>>> 2) Once next_offset overflows, readdir will skip the files that offset
>>>>> is bigger.
>>> 
>>> I'm sorry, I'm a little busy these days, so I haven't responded to this
>>> series of emails.
>>> 
>>>> In that case, that entry won't be visible via getdents(3)
>>>> until the directory is re-opened or the process does an
>>>> lseek(fd, 0, SEEK_SET).
>>> 
>>> Yes.
>>> 
>>>> That is the proper and expected behavior. I suspect you
>>>> will see exactly that behavior with ext4 and 32-bit
>>>> directory offsets, for example.
>>> 
>>> Emm...
>>> 
>>> For this case like this:
>>> 
>>> 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2
>>> 2. open /tmp/dir with fd1
>>> 3. readdir and get /tmp/dir/file1
>>> 4. rm /tmp/dir/file2
>>> 5. touch /tmp/dir/file2
>>> 4. loop 4~5 for 2^32 times
>>> 5. readdir /tmp/dir with fd1
>>> 
>>> For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem.
>>> 
>>>> Does that not directly address your concern? Or do you
>>>> mean that Erkun's patch introduces a new issue?
>>> 
>>> Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger.
>> Thanks for confirming.
>> In that case, the preferred way to handle it is to fix
>> the issue in upstream, and then backport that fix to LTS.
>> Dependence on 64-bit offsets to avoid a failure case
>> should be considered a workaround, not a real fix, IMHO.
> 
> Yes.
> 
>> Do you have a few moments to address it, or if not I
>> will see to it.
> 
> You can try to do this, for the reason I am quite busy now until end of this month... Sorry.

No worries!


>> I think reducing the xa_limit in simple_offset_add() to,
>> say, 2..16 would make the reproducer fire almost
>> immediately.
> 
> Yes.
> 
>>>> If there is a problem here, please construct a reproducer
>>>> against this patch set and post it.
>>>>> Thanks,
>>>>> Kuai
>>>>> 
>>>>>> +
>>>>>>  /**
>>>>>>   * offset_dir_llseek - Advance the read position of a directory descriptor
>>>>>>   * @file: an open directory whose position is to be updated
>>>>>> @@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx)
>>>>>>   */
>>>>>>  static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>>>>>>  {
>>>>>> + struct inode *inode = file->f_inode;
>>>>>> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode);
>>>>>> +
>>>>>>   switch (whence) {
>>>>>>   case SEEK_CUR:
>>>>>>   offset += file->f_pos;
>>>>>> @@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence)
>>>>>>   }
>>>>>>     /* In this case, ->private_data is protected by f_pos_lock */
>>>>>> - file->private_data = NULL;
>>>>>> - return vfs_setpos(file, offset, U32_MAX);
>>>>>> + if (!offset)
>>>>>> + file->private_data = (void *)ctx->next_offset;
>>>>>> + return vfs_setpos(file, offset, LONG_MAX);
>>>>>>  }
>>>>>>    static struct dentry *offset_find_next(struct xa_state *xas)
>>>>>> @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry)
>>>>>>     inode->i_ino, fs_umode_to_dtype(inode->i_mode));
>>>>>>  }
>>>>>>  -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>>>>> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index)
>>>>>>  {
>>>>>>   struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode);
>>>>>>   XA_STATE(xas, &so_ctx->xa, ctx->pos);
>>>>>> @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>>>>>   while (true) {
>>>>>>   dentry = offset_find_next(&xas);
>>>>>>   if (!dentry)
>>>>>> - return ERR_PTR(-ENOENT);
>>>>>> + return;
>>>>>> +
>>>>>> + if (dentry2offset(dentry) >= last_index) {
>>>>>> + dput(dentry);
>>>>>> + return;
>>>>>> + }
>>>>>>     if (!offset_dir_emit(ctx, dentry)) {
>>>>>>   dput(dentry);
>>>>>> - break;
>>>>>> + return;
>>>>>>   }
>>>>>>     dput(dentry);
>>>>>>   ctx->pos = xas.xa_index + 1;
>>>>>>   }
>>>>>> - return NULL;
>>>>>>  }
>>>>>>    /**
>>>>>> @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx)
>>>>>>  static int offset_readdir(struct file *file, struct dir_context *ctx)
>>>>>>  {
>>>>>>   struct dentry *dir = file->f_path.dentry;
>>>>>> + long last_index = (long)file->private_data;
>>>>>>     lockdep_assert_held(&d_inode(dir)->i_rwsem);
>>>>>>     if (!dir_emit_dots(file, ctx))
>>>>>>   return 0;
>>>>>>  - /* In this case, ->private_data is protected by f_pos_lock */
>>>>>> - if (ctx->pos == DIR_OFFSET_MIN)
>>>>>> - file->private_data = NULL;
>>>>>> - else if (file->private_data == ERR_PTR(-ENOENT))
>>>>>> - return 0;
>>>>>> - file->private_data = offset_iterate_dir(d_inode(dir), ctx);
>>>>>> + offset_iterate_dir(d_inode(dir), ctx, last_index);
>>>>>>   return 0;
>>>>>>  }
>>>>>>    const struct file_operations simple_offset_dir_operations = {
>>>>>> + .open = offset_dir_open,
>>>>>>   .llseek = offset_dir_llseek,
>>>>>>   .iterate_shared = offset_readdir,
>>>>>>   .read = generic_read_dir,
>>>> --
>>>> Chuck Lever
>> --
>> Chuck Lever


--
Chuck Lever

next prev parent reply	other threads:[~2024-11-12 15:43 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-11  0:52 [RFC PATCH 0/6 6.6] Address rename/readdir bugs in fs/libfs.c cel
2024-11-11  0:52 ` [RFC PATCH 1/6 6.6] libfs: Define a minimum directory offset cel
2024-11-11  0:52 ` [RFC PATCH 2/6 6.6] libfs: Add simple_offset_empty() cel
2024-11-11  0:52 ` [RFC PATCH 3/6 6.6] libfs: Fix simple_offset_rename_exchange() cel
2024-11-11  0:52 ` [RFC PATCH 4/6 6.6] libfs: Add simple_offset_rename() API cel
2024-11-11  0:52 ` [RFC PATCH 5/6 6.6] shmem: Fix shmem_rename2() cel
2024-11-11  0:52 ` [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir cel
2024-11-11  2:36   ` Yu Kuai
2024-11-11 14:39     ` Chuck Lever III
2024-11-11 15:20       ` yangerkun
2024-11-11 15:34         ` Chuck Lever III
2024-11-12  3:43           ` yangerkun
2024-11-12 15:37             ` Chuck Lever III [this message]
2024-11-13 15:17         ` Chuck Lever
2024-11-16  7:22           ` Yu Kuai

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=C4E2D262-4864-45FD-A985-9C9F64EF83B5@oracle.com \
    --to=chuck.lever@oracle.com \
    --cc=Rodrigo.Siqueira@amd.com \
    --cc=Xinhui.Pan@amd.com \
    --cc=airlied@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=brauner@kernel.org \
    --cc=cel@kernel.org \
    --cc=chengming.zhou@linux.dev \
    --cc=chiahsuan.chung@amd.com \
    --cc=christian.koenig@amd.com \
    --cc=daniel@ffwll.ch \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=gregkh@linuxfoundation.org \
    --cc=harry.wentland@amd.com \
    --cc=hughd@google.com \
    --cc=liam.howlett@oracle.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=maple-tree@lists.infradead.org \
    --cc=mgorman@techsingularity.net \
    --cc=mingo@kernel.org \
    --cc=sashal@kernel.org \
    --cc=srinivasan.shanmugam@amd.com \
    --cc=stable@vger.kernel.org \
    --cc=sunpeng.li@amd.com \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    --cc=yangerkun@huaweicloud.com \
    --cc=yi.zhang@huawei.com \
    --cc=yukuai1@huaweicloud.com \
    --cc=yukuai3@huawei.com \
    --cc=zhangpeng.00@bytedance.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox