From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0EE8E7718B for ; Mon, 23 Dec 2024 14:21:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6F0E06B0098; Mon, 23 Dec 2024 09:21:50 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6B5986B00BC; Mon, 23 Dec 2024 09:21:50 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 51AAF6B00C6; Mon, 23 Dec 2024 09:21:50 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 34E7E6B0098 for ; Mon, 23 Dec 2024 09:21:50 -0500 (EST) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E4E52120DE0 for ; Mon, 23 Dec 2024 14:21:49 +0000 (UTC) X-FDA: 82926436026.27.3426BDF Received: from dggsgout12.his.huawei.com (dggsgout12.his.huawei.com [45.249.212.56]) by imf22.hostedemail.com (Postfix) with ESMTP id D8FCCC000C for ; Mon, 23 Dec 2024 14:21:06 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of yangerkun@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=yangerkun@huaweicloud.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1734963679; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ep/ixe+M0+EJULPPSX7B+AkGZIegR6ZneJrtnmlfzro=; b=ys71bi3tcJMn4coIlteND93Re4rdKsKAzU8ZsEUhJwQjgEhE/PE11BpuGAIUxknZ/ZvWFK bE4dDxoTfLWkevssKPQdAFEEOnu+WI1sGCPMIdfOYFT7wuKYXWzO5YjQc94FdxxXtTDDyl 7ZNkRfhPeDqkl4Kmtocqfe5U18l00MQ= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf22.hostedemail.com: domain of yangerkun@huaweicloud.com designates 45.249.212.56 as permitted sender) smtp.mailfrom=yangerkun@huaweicloud.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1734963679; a=rsa-sha256; cv=none; b=H60X6ToM+Lppy14xChYWz4PscPQWd0Ch15F7f8dj3TEYvYAPyMobn2Ps99/JAz6LZ9NE/u UEJv62z8Oe8GHkD0RnClMPRjw8S67RoiDKJpgpIZPP3kVRrkjWwEKlf4K3GuBLvkHFuWYe ljlCrQZ/1fZvmBECRyOAaUk4TRjE6oE= Received: from mail.maildlp.com (unknown [172.19.163.235]) by dggsgout12.his.huawei.com (SkyGuard) with ESMTP id 4YH0ZC1TTrz4f3jsT for ; Mon, 23 Dec 2024 22:21:23 +0800 (CST) Received: from mail02.huawei.com (unknown [10.116.40.128]) by mail.maildlp.com (Postfix) with ESMTP id D67FF1A3191 for ; Mon, 23 Dec 2024 22:21:42 +0800 (CST) Received: from [10.174.177.210] (unknown [10.174.177.210]) by APP4 (Coremail) with SMTP id gCh0CgCHYob2cWlnCkKvFQ--.45637S3; Mon, 23 Dec 2024 22:21:42 +0800 (CST) Message-ID: <3ccf8255-dfbb-d019-d156-01edf5242c49@huaweicloud.com> Date: Mon, 23 Dec 2024 22:21:42 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.5.1 Subject: Re: [PATCH v6 5/5] libfs: Use d_children list to iterate simple_offset directories To: cel@kernel.org, Hugh Dickins , Christian Brauner , Al Viro Cc: linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, yukuai3@huawei.com, Chuck Lever References: <20241220153314.5237-1-cel@kernel.org> <20241220153314.5237-6-cel@kernel.org> From: yangerkun In-Reply-To: <20241220153314.5237-6-cel@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:gCh0CgCHYob2cWlnCkKvFQ--.45637S3 X-Coremail-Antispam: 1UD129KBjvJXoW3Xr4Dur48AFy5tF1xAr48tFb_yoW7Cr17pF Z8XasIkr4fXw12gF4xXF4DZryS9w10gF45Wr1fWw1rA3sFqrnrt3Za9r1av34UJr4kCr17 XF45KwnI9w4UtFJanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDU0xBIdaVrnRJUUUvjb4IE77IF4wAFF20E14v26r4j6ryUM7CY07I20VC2zVCF04k2 6cxKx2IYs7xG6rWj6s0DM7CIcVAFz4kK6r1j6r18M28lY4IEw2IIxxk0rwA2F7IY1VAKz4 vEj48ve4kI8wA2z4x0Y4vE2Ix0cI8IcVAFwI0_Ar0_tr1l84ACjcxK6xIIjxv20xvEc7Cj xVAFwI0_Gr1j6F4UJwA2z4x0Y4vEx4A2jsIE14v26rxl6s0DM28EF7xvwVC2z280aVCY1x 0267AKxVW0oVCq3wAS0I0E0xvYzxvE52x082IY62kv0487Mc02F40EFcxC0VAKzVAqx4xG 6I80ewAv7VC0I7IYx2IY67AKxVWUXVWUAwAv7VC2z280aVAFwI0_Jr0_Gr1lOx8S6xCaFV Cjc4AY6r1j6r4UM4x0Y48IcVAKI48JMxk0xIA0c2IEe2xFo4CEbIxvr21lc7CjxVAaw2AF wI0_JF0_Jw1l42xK82IYc2Ij64vIr41l4I8I3I0E4IkC6x0Yz7v_Jr0_Gr1lx2IqxVAqx4 xG67AKxVWUJVWUGwC20s026x8GjcxK67AKxVWUGVWUWwC2zVAF1VAY17CE14v26r1q6r43 MIIYrxkI7VAKI48JMIIF0xvE2Ix0cI8IcVAFwI0_Jr0_JF4lIxAIcVC0I7IYx2IY6xkF7I 0E14v26r4j6F4UMIIF0xvE42xK8VAvwI8IcIk0rVWUJVWUCwCI42IY6I8E87Iv67AKxVWU JVW8JwCI42IY6I8E87Iv6xkF7I0E14v26r4j6r4UJbIYCTnIWIevJa73UjIFyTuYvjxUFB T5DUUUU X-CM-SenderInfo: 51dqwvhunx0q5kxd4v5lfo033gof0z/ X-Rspamd-Queue-Id: D8FCCC000C X-Rspamd-Server: rspam12 X-Stat-Signature: kzc1a71zn76f9k7jwzynisb4s4awf6xs X-Rspam-User: X-HE-Tag: 1734963666-466993 X-HE-Meta: U2FsdGVkX18131R2ll/yc/WN5UzSOy8lpYrCAY5LRamezesnHOABezm3l7B5In6AX4WeqfQzdm4l+0+wjAvDBcJWP7cokiI/o7tvB3a04WjJuhSxnJglKh2dr+ZaZ+1XiQOFvJwCAwrp6H+OS6Tsx+EnGicW2wDyXbiOVKiBx7n2+wc9VV/SEoaKf8Buv42dGIuUSXdQWDAaU083xqXdf3GjV5zgpfRXI/LtMb5p9DoA9e84RpZGHfVAQGrFGs1UcvDHsBtKX48K+Frz3NQK+pnwMtXkvn57kRUVOAN/lFmF6fndqsHf6FrOE0O3UpJ7JC6637/LE6sflmL3XkjWoeTq3xd5BUqXWu8VgqLzQ//GbpXADJRLxe6XzT9fktS/R5E7q9Nf1cGuMz93mTpo5wHqi8NY/DpybkDHi2CoTean8TGjJO58XGRkadNNcmZSACK9pXISWZAzOjNDStkGRSZ1j0F7sXPeG3DoC79zIqUY4065Qj+f/jppokf7yVi9s6U3MR5JY1FdhkkrHFxRayi86MQmXgcspzpLbsN/IA0sjy6tGfdWeejnHZYUOhVdq6qKO623NqHocqACzzV+1OMxvyMIYIuOAR77mtr+8Ghr6nx3qs+U99vF40bTM7VhSB13GUQaGt4XcK4lJ3MW9CqUlaIcCB8SJCai17yANET067DI12fksABe04r2qS06JjkIdNaa5UEJPWvvSCOCf3L0XNmOCXVJLbR+b4wJchiYR1si5/c0pgFcTTHXQW6apA58PmNikVO75TCBoum7KszDHBY+eCUnCto2UCCcw9pFUEA2KZcrR3TSwywsuWIxLxmG8QzmKo4vf5GsglrCROm92EeolOqJnqfJsWJVvqkNF/3ixnl3xzjzMM8C8URQe8ijjmxCp8c0CpKLUAKLncuUpm0iBlE4OGUCD1E9ph1fOMoaHGKoiUTvpOHi8aR8ZU9aRXXMl0UVraDcuUm 9pRrWwqE eKhR5vU5NHx3gkgyYsc3N+95KFGHx3JRZafEghRtbnabdF1LCCmz1hxfWawp6hl4wFgKb8SY/aRNDpDqEqHg7geB5JOurNvnGu54TvKJmHUdEddkbu1QVbVSDwOlPThd9I8XfZecuswMHRzPIEIZDLeu/RCK5claJgyHWfPvVWUpJoKHXwyhX9BIIAvurDC1clGRFzp5Z+9T9Ll3IzbLxTgH5CITvA/iy2xTau5fjv2wH4+A= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2024/12/20 23:33, cel@kernel.org 写道: > From: Chuck Lever > > The mtree mechanism has been effective at creating directory offsets > that are stable over multiple opendir instances. However, it has not > been able to handle the subtleties of renames that are concurrent > with readdir. > > Instead of using the mtree to emit entries in the order of their > offset values, use it only to map incoming ctx->pos to a starting > entry. Then use the directory's d_children list, which is already > maintained properly by the dcache, to find the next child to emit. > > One of the sneaky things about this is that when the mtree-allocated > offset value wraps (which is very rare), looking up ctx->pos++ is > not going to find the next entry; it will return NULL. Instead, by > following the d_children list, the offset values can appear in any > order but all of the entries in the directory will be visited > eventually. > > Note also that the readdir() is guaranteed to reach the tail of this > list. Entries are added only at the head of d_children, and readdir > walks from its current position in that list towards its tail. > > Signed-off-by: Chuck Lever > --- > fs/libfs.c | 84 +++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 58 insertions(+), 26 deletions(-) > > diff --git a/fs/libfs.c b/fs/libfs.c > index 5c56783c03a5..f7ead02062ad 100644 > --- a/fs/libfs.c > +++ b/fs/libfs.c > @@ -247,12 +247,13 @@ EXPORT_SYMBOL(simple_dir_inode_operations); > > /* simple_offset_add() allocation range */ > enum { > - DIR_OFFSET_MIN = 2, > + DIR_OFFSET_MIN = 3, > DIR_OFFSET_MAX = LONG_MAX - 1, > }; > > /* simple_offset_add() never assigns these to a dentry */ > enum { > + DIR_OFFSET_FIRST = 2, /* Find first real entry */ > DIR_OFFSET_EOD = LONG_MAX, /* Marks EOD */ > > }; > @@ -458,51 +459,82 @@ static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > return vfs_setpos(file, offset, LONG_MAX); > } > > -static struct dentry *offset_find_next(struct offset_ctx *octx, loff_t offset) > +static struct dentry *find_positive_dentry(struct dentry *parent, > + struct dentry *dentry, > + bool next) > { > - MA_STATE(mas, &octx->mt, offset, offset); > + struct dentry *found = NULL; > + > + spin_lock(&parent->d_lock); > + if (next) > + dentry = d_next_sibling(dentry); > + else if (!dentry) > + dentry = d_first_child(parent); > + hlist_for_each_entry_from(dentry, d_sib) { > + if (!simple_positive(dentry)) > + continue; > + spin_lock_nested(&dentry->d_lock, DENTRY_D_LOCK_NESTED); > + if (simple_positive(dentry)) > + found = dget_dlock(dentry); > + spin_unlock(&dentry->d_lock); > + if (likely(found)) > + break; > + } > + spin_unlock(&parent->d_lock); > + return found; > +} > + > +static noinline_for_stack struct dentry * > +offset_dir_lookup(struct dentry *parent, loff_t offset) > +{ > + struct inode *inode = d_inode(parent); > + struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); > struct dentry *child, *found = NULL; > > - rcu_read_lock(); > - child = mas_find(&mas, DIR_OFFSET_MAX); > - if (!child) > - goto out; > - spin_lock(&child->d_lock); > - if (simple_positive(child)) > - found = dget_dlock(child); > - spin_unlock(&child->d_lock); > -out: > - rcu_read_unlock(); > + MA_STATE(mas, &octx->mt, offset, offset); > + > + if (offset == DIR_OFFSET_FIRST) > + found = find_positive_dentry(parent, NULL, false); > + else { > + rcu_read_lock(); > + child = mas_find(&mas, DIR_OFFSET_MAX); Can this child be NULL? Like we delete some file after first readdir, maybe we should break here, or we may rescan all dentry and return them to userspace again? > + found = find_positive_dentry(parent, child, false); > + rcu_read_unlock(); > + } > return found; > } > > static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) > { > struct inode *inode = d_inode(dentry); > - long offset = dentry2offset(dentry); > > - return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset, > - inode->i_ino, fs_umode_to_dtype(inode->i_mode)); > + return dir_emit(ctx, dentry->d_name.name, dentry->d_name.len, > + inode->i_ino, fs_umode_to_dtype(inode->i_mode)); > } > > -static void offset_iterate_dir(struct inode *inode, struct dir_context *ctx) > +static void offset_iterate_dir(struct file *file, struct dir_context *ctx) > { > - struct offset_ctx *octx = inode->i_op->get_offset_ctx(inode); > + struct dentry *dir = file->f_path.dentry; > struct dentry *dentry; > > + dentry = offset_dir_lookup(dir, ctx->pos); > + if (!dentry) > + goto out_eod; > while (true) { > - dentry = offset_find_next(octx, ctx->pos); > - if (!dentry) > - goto out_eod; > + struct dentry *next; > > - if (!offset_dir_emit(ctx, dentry)) { > - dput(dentry); > + ctx->pos = dentry2offset(dentry); > + if (!offset_dir_emit(ctx, dentry)) > break; > - } > > - ctx->pos = dentry2offset(dentry) + 1; > + next = find_positive_dentry(dir, dentry, true); > dput(dentry); > + > + if (!next) > + goto out_eod; > + dentry = next; > } > + dput(dentry); > return; > > out_eod: > @@ -541,7 +573,7 @@ static int offset_readdir(struct file *file, struct dir_context *ctx) > if (!dir_emit_dots(file, ctx)) > return 0; > if (ctx->pos != DIR_OFFSET_EOD) > - offset_iterate_dir(d_inode(dir), ctx); > + offset_iterate_dir(file, ctx); > return 0; > } >