From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 44C40D12D51 for ; Mon, 11 Nov 2024 15:20:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C912F6B0089; Mon, 11 Nov 2024 10:20:42 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C40A46B008A; Mon, 11 Nov 2024 10:20:42 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AE1706B008C; Mon, 11 Nov 2024 10:20:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9086A6B0089 for ; Mon, 11 Nov 2024 10:20:42 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 3A3861C77CA for ; Mon, 11 Nov 2024 15:20:42 +0000 (UTC) X-FDA: 82774175022.10.286F503 Received: from dggsgout11.his.huawei.com (dggsgout11.his.huawei.com [45.249.212.51]) by imf09.hostedemail.com (Postfix) with ESMTP id DB458140002 for ; Mon, 11 Nov 2024 15:20:06 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of yangerkun@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=yangerkun@huaweicloud.com; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731338386; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wJAdJ4CaIFze05K5YcNfmi4ZzLcnJSdyWtxJT/uO7nI=; b=aBnDPYjAFAcePapkw2NGpy85cX5ZZBIYVv1vHoCgUBAXRydETWYHO9H9OMyMXsh7B2NdTu fxQYXiZ5HuFO49xXeUKgdADC5QAGe1ebU/dk6mNYyqZssbPJSB6c/B2zk1ggfE36/qYSAW Ot0X7atosX/zjIPupkkhwQenDEQF2Ug= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731338386; a=rsa-sha256; cv=none; b=xQngFC6cowzLbmkLo/5Noj09PSex2j7PdqH2zA/8gspQevmYjRH+CTZMVQLdEICP0Vk/ng 0V+Ksm0bO187NhTwTs1EyRqBeVuskZJEivCCkjmEKvSjbwJMohYLI3BqvssvhIJXDQ2uDC PytudjwCaTJ07u7qgmgAep0nxyUrlvw= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=none; spf=pass (imf09.hostedemail.com: domain of yangerkun@huaweicloud.com designates 45.249.212.51 as permitted sender) smtp.mailfrom=yangerkun@huaweicloud.com; dmarc=none Received: from mail.maildlp.com (unknown [172.19.93.142]) by dggsgout11.his.huawei.com (SkyGuard) with ESMTP id 4XnCsR1DvVz4f3m8L for ; Mon, 11 Nov 2024 23:20:11 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id 3EDD41A0359 for ; Mon, 11 Nov 2024 23:20:24 +0800 (CST) Received: from [10.174.177.210] (unknown [10.174.177.210]) by APP4 (Coremail) with SMTP id gCh0CgB3U4exIDJn3+D7BQ--.18810S3; Mon, 11 Nov 2024 23:20:19 +0800 (CST) Message-ID: <73a05cb9-569c-9b3c-3359-824e76b14461@huaweicloud.com> Date: Mon, 11 Nov 2024 23:20:17 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [RFC PATCH 6/6 6.6] libfs: fix infinite directory reads for offset dir To: Chuck Lever III , Yu Kuai Cc: Chuck Lever , linux-stable , "harry.wentland@amd.com" , "sunpeng.li@amd.com" , "Rodrigo.Siqueira@amd.com" , "alexander.deucher@amd.com" , "christian.koenig@amd.com" , "Xinhui.Pan@amd.com" , "airlied@gmail.com" , Daniel Vetter , Al Viro , Christian Brauner , Liam Howlett , Andrew Morton , Hugh Dickins , "Matthew Wilcox (Oracle)" , Greg KH , Sasha Levin , "srinivasan.shanmugam@amd.com" , "chiahsuan.chung@amd.com" , "mingo@kernel.org" , "mgorman@techsingularity.net" , "chengming.zhou@linux.dev" , "zhangpeng.00@bytedance.com" , "amd-gfx@lists.freedesktop.org" , "dri-devel@lists.freedesktop.org" , Linux Kernel Mailing List , Linux FS Devel , "maple-tree@lists.infradead.org" , linux-mm , "yi.zhang@huawei.com" , "yukuai (C)" References: <20241111005242.34654-1-cel@kernel.org> <20241111005242.34654-7-cel@kernel.org> <278433c2-611c-6c8e-7964-5c11977b68b7@huaweicloud.com> <96A93064-8DCE-4B78-9F2A-CF6E7EEABEB1@oracle.com> From: yangerkun In-Reply-To: <96A93064-8DCE-4B78-9F2A-CF6E7EEABEB1@oracle.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:gCh0CgB3U4exIDJn3+D7BQ--.18810S3 X-Coremail-Antispam: 1UD129KBjvJXoWxKw48Ar4xCFW5tF45Wr48JFb_yoW3JF17pF Z8Gan8Krs7X34UGr4vv3WDZFyS93Z7Kr45XrZ5W34UJr9Fqr43KF1Iyr4Y9a4UArs3Cr12 qF45K343Zw45CrDanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUB214x267AKxVWrJVCq3wAFc2x0x2IEx4CE42xK8VAvwI8IcIk0 rVWrJVCq3wAFIxvE14AKwVWUJVWUGwA2ocxC64kIII0Yj41l84x0c7CEw4AK67xGY2AK02 1l84ACjcxK6xIIjxv20xvE14v26F1j6w1UM28EF7xvwVC0I7IYx2IY6xkF7I0E14v26r4U JVWxJr1l84ACjcxK6I8E87Iv67AKxVW0oVCq3wA2z4x0Y4vEx4A2jsIEc7CjxVAFwI0_Gc CE3s1le2I262IYc4CY6c8Ij28IcVAaY2xG8wAqx4xG64xvF2IEw4CE5I8CrVC2j2WlYx0E 2Ix0cI8IcVAFwI0_Jr0_Jr4lYx0Ex4A2jsIE14v26r1j6r4UMcvjeVCFs4IE7xkEbVWUJV W8JwACjcxG0xvEwIxGrwACjI8F5VA0II8E6IAqYI8I648v4I1lFIxGxcIEc7CjxVA2Y2ka 0xkIwI1lc7I2V7IY0VAS07AlzVAYIcxG8wCY1x0262kKe7AKxVWrXVW3AwCF04k20xvY0x 0EwIxGrwCFx2IqxVCFs4IE7xkEbVWUJVW8JwC20s026c02F40E14v26r1j6r18MI8I3I0E 7480Y4vE14v26r106r1rMI8E67AF67kF1VAFwI0_Wrv_Gr1UMIIYrxkI7VAKI48JMIIF0x vE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I0E14v26r4j6F4UMIIF0xvE 42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWUJVW8JwCI42IY6I8E87Iv6x kF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjTRJMa0UUUUU X-CM-SenderInfo: 51dqwvhunx0q5kxd4v5lfo033gof0z/ X-Stat-Signature: hz7kqxy1r3gbognewn6o39yeeu7ccp7x X-Rspam-User: X-Rspamd-Queue-Id: DB458140002 X-Rspamd-Server: rspam02 X-HE-Tag: 1731338406-202398 X-HE-Meta: U2FsdGVkX1+14Z82R1ZIDyh8FpUJyzOjYoFtGzC/bKAMxuDypu636+G08SB6Kmoo6+5HFaidMEDPTZxFb05HZ8a/VuEhsXbLXqby5TBhHUOl17/xUhGMArXojVz6In6z3bq2TZ8S+XfCrV4bN7z6rVn5F5vEXDqaxAO7YUT8vlfCHIR0OWoVlWLgBNiI085k4XKeV7idOIck7Qr4oynt+GxyKKNncQXn9EQvpA6MVF5KvU9ujp+dl7ic4mdnUlZk7M6pW1re0iZeQcMgQAXz/O2H1PIGkc5U6CJyaDCaBq63/fRft8LxDUihWrxX3eJZZyJkJYnhb3zr8GW/89UO6oCUPoAoFkbwZCHsR8Ee2LbCDg6wG1fZx7nR+YiFMtpKnO/LVxSbZJpFCcbz6DiMGAtiT+kPyAcKfWZqQJF7urz+4/JhwKXs8843f2OQ59s8CnBDf+k96na5xsa8FXDsOtJ7jsCkB0dqNxN7144VMFPeek+k1WkYtlxbw8vA/rs0mxg99pL73HiYDB5Qw6fHye8Jmb9b7VDCaJE1OGOCpXVN0weaxc86fUqA8pxXap4OiXm+VWq7okKtwRfjN2RRB21kGZUrDsz5irt54beSkWuCJqIQbeR2SZX3FdR8J6LkPRhzRXnCOkkb/ENbZsJ6Mm0qfPaAFeZ0+2hE6U/a71vrfDQ7ysw+dqpKzLcusAnLb29OVxpKvnQH0Ozuvco94MXQUYFvmyW4pqNjxYoRM7MFaZfBGLceyMEkoQTEHcFKscJRuHL7OzYAFf9qTRwuMw9O5+Z2Gta4ewNOyrVt58DQVLW2AO70uouyWoudF5OYjFq/CwjNCDPB+vFMOfkrtF8tbOLR1F1Prp9/hiy/tXkLvwUWMhoRFMenatkqkzHEKMS/haPGgBEC+XXZ+nWu4cww96zeyP885KYecTDd5gqgG6J3Bxuz+HugZrgz5YA5sngZ+G/jvLX2zi3UNMF zLpG7QdB uSVd0YNcVpxKX7wENtv1U8brveXdGncX7JsjsjqvVwmMXspDvvDyUiEFHHngc21Cjh0Y33Qb+e25mSjSSJIq6qg+leGPrKy15OWIKybHwWljqA+ZNI5wEMvuDmyJtvUs9Frii4NOg4aR+yO5p1FcyH53KmTbGSQjYpCTcYV0ESkABp9ay7ZGha1pjZUen76kSIfJfRzVtsFIDYDGMZM9HrYdGhQeNpCCTAs3RrgAl1hDeHzgIhJ65sLDSkuBVu3medO/ma+ciCiybFNbMw17sM9wztpVG9D2SGFRoTAnbzT2i2ASH3zHREY1l4w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/11/11 22:39, Chuck Lever III 写道: > > >> On Nov 10, 2024, at 9:36 PM, Yu Kuai wrote: >> >> Hi, >> >> 在 2024/11/11 8:52, cel@kernel.org 写道: >>> From: yangerkun >>> [ Upstream commit 64a7ce76fb901bf9f9c36cf5d681328fc0fd4b5a ] >>> After we switch tmpfs dir operations from simple_dir_operations to >>> simple_offset_dir_operations, every rename happened will fill new dentry >>> to dest dir's maple tree(&SHMEM_I(inode)->dir_offsets->mt) with a free >>> key starting with octx->newx_offset, and then set newx_offset equals to >>> free key + 1. This will lead to infinite readdir combine with rename >>> happened at the same time, which fail generic/736 in xfstests(detail show >>> as below). >>> 1. create 5000 files(1 2 3...) under one dir >>> 2. call readdir(man 3 readdir) once, and get one entry >>> 3. rename(entry, "TEMPFILE"), then rename("TEMPFILE", entry) >>> 4. loop 2~3, until readdir return nothing or we loop too many >>> times(tmpfs break test with the second condition) >>> We choose the same logic what commit 9b378f6ad48cf ("btrfs: fix infinite >>> directory reads") to fix it, record the last_index when we open dir, and >>> do not emit the entry which index >= last_index. The file->private_data >> >> Please notice this requires last_index should never overflow, otherwise >> readdir will be messed up. > > It would help your cause if you could be more specific > than "messed up". > > >>> now used in offset dir can use directly to do this, and we also update >>> the last_index when we llseek the dir file. >>> Fixes: a2e459555c5f ("shmem: stable directory offsets") >>> Signed-off-by: yangerkun >>> Link: https://lore.kernel.org/r/20240731043835.1828697-1-yangerkun@huawei.com >>> Reviewed-by: Chuck Lever >>> [brauner: only update last_index after seek when offset is zero like Jan suggested] >>> Signed-off-by: Christian Brauner >>> Link: https://nvd.nist.gov/vuln/detail/CVE-2024-46701 >>> [ cel: adjusted to apply to origin/linux-6.6.y ] >>> Signed-off-by: Chuck Lever >>> --- >>> fs/libfs.c | 37 +++++++++++++++++++++++++------------ >>> 1 file changed, 25 insertions(+), 12 deletions(-) >>> diff --git a/fs/libfs.c b/fs/libfs.c >>> index a87005c89534..b59ff0dfea1f 100644 >>> --- a/fs/libfs.c >>> +++ b/fs/libfs.c >>> @@ -449,6 +449,14 @@ void simple_offset_destroy(struct offset_ctx *octx) >>> xa_destroy(&octx->xa); >>> } >>> +static int offset_dir_open(struct inode *inode, struct file *file) >>> +{ >>> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); >>> + >>> + file->private_data = (void *)ctx->next_offset; >>> + return 0; >>> +} >> >> Looks like xarray is still used. > > That's not going to change, as several folks have already > explained. > > >> I'm in the cc list ,so I assume you saw my set, then I don't know why >> you're ignoring my concerns. > >> 1) next_offset is 32-bit and can overflow in a long-time running >> machine. >> 2) Once next_offset overflows, readdir will skip the files that offset >> is bigger. > I'm sorry, I'm a little busy these days, so I haven't responded to this series of emails. > In that case, that entry won't be visible via getdents(3) > until the directory is re-opened or the process does an > lseek(fd, 0, SEEK_SET). Yes. > > That is the proper and expected behavior. I suspect you > will see exactly that behavior with ext4 and 32-bit > directory offsets, for example. Emm... For this case like this: 1. mkdir /tmp/dir and touch /tmp/dir/file1 /tmp/dir/file2 2. open /tmp/dir with fd1 3. readdir and get /tmp/dir/file1 4. rm /tmp/dir/file2 5. touch /tmp/dir/file2 4. loop 4~5 for 2^32 times 5. readdir /tmp/dir with fd1 For tmpfs now, we may see no /tmp/dir/file2, since the offset has been overflow, for ext4 it is ok... So we think this will be a problem. > > Does that not directly address your concern? Or do you > mean that Erkun's patch introduces a new issue? Yes, to be honest, my personal feeling is a problem. But for 64bit, it may never been trigger. > > If there is a problem here, please construct a reproducer > against this patch set and post it. > > >> Thanks, >> Kuai >> >>> + >>> /** >>> * offset_dir_llseek - Advance the read position of a directory descriptor >>> * @file: an open directory whose position is to be updated >>> @@ -462,6 +470,9 @@ void simple_offset_destroy(struct offset_ctx *octx) >>> */ >>> static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) >>> { >>> + struct inode *inode = file->f_inode; >>> + struct offset_ctx *ctx = inode->i_op->get_offset_ctx(inode); >>> + >>> switch (whence) { >>> case SEEK_CUR: >>> offset += file->f_pos; >>> @@ -475,8 +486,9 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) >>> } >>> /* In this case, ->private_data is protected by f_pos_lock */ >>> - file->private_data = NULL; >>> - return vfs_setpos(file, offset, U32_MAX); >>> + if (!offset) >>> + file->private_data = (void *)ctx->next_offset; >>> + return vfs_setpos(file, offset, LONG_MAX); >>> } >>> static struct dentry *offset_find_next(struct xa_state *xas) >>> @@ -505,7 +517,7 @@ static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) >>> inode->i_ino, fs_umode_to_dtype(inode->i_mode)); >>> } >>> -static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>> +static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx, long last_index) >>> { >>> struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); >>> XA_STATE(xas, &so_ctx->xa, ctx->pos); >>> @@ -514,17 +526,21 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>> while (true) { >>> dentry = offset_find_next(&xas); >>> if (!dentry) >>> - return ERR_PTR(-ENOENT); >>> + return; >>> + >>> + if (dentry2offset(dentry) >= last_index) { >>> + dput(dentry); >>> + return; >>> + } >>> if (!offset_dir_emit(ctx, dentry)) { >>> dput(dentry); >>> - break; >>> + return; >>> } >>> dput(dentry); >>> ctx->pos = xas.xa_index + 1; >>> } >>> - return NULL; >>> } >>> /** >>> @@ -551,22 +567,19 @@ static void *offset_iterate_dir(struct inode *inode, struct dir_context *ctx) >>> static int offset_readdir(struct file *file, struct dir_context *ctx) >>> { >>> struct dentry *dir = file->f_path.dentry; >>> + long last_index = (long)file->private_data; >>> lockdep_assert_held(&d_inode(dir)->i_rwsem); >>> if (!dir_emit_dots(file, ctx)) >>> return 0; >>> - /* In this case, ->private_data is protected by f_pos_lock */ >>> - if (ctx->pos == DIR_OFFSET_MIN) >>> - file->private_data = NULL; >>> - else if (file->private_data == ERR_PTR(-ENOENT)) >>> - return 0; >>> - file->private_data = offset_iterate_dir(d_inode(dir), ctx); >>> + offset_iterate_dir(d_inode(dir), ctx, last_index); >>> return 0; >>> } >>> const struct file_operations simple_offset_dir_operations = { >>> + .open = offset_dir_open, >>> .llseek = offset_dir_llseek, >>> .iterate_shared = offset_readdir, >>> .read = generic_read_dir, > > > -- > Chuck Lever > >