From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D46D8EB64D7 for ; Fri, 30 Jun 2023 11:08:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 689288D0010; Fri, 30 Jun 2023 07:08:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 638C28D0001; Fri, 30 Jun 2023 07:08:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4B2F28D0010; Fri, 30 Jun 2023 07:08:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3812B8D0001 for ; Fri, 30 Jun 2023 07:08:12 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 082201A0C2F for ; Fri, 30 Jun 2023 11:08:12 +0000 (UTC) X-FDA: 80959140024.10.3F0B3B6 Received: from wout4-smtp.messagingengine.com (wout4-smtp.messagingengine.com [64.147.123.20]) by imf20.hostedemail.com (Postfix) with ESMTP id 99FA81C0016 for ; Fri, 30 Jun 2023 11:08:09 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm2 header.b=QtQTXS5R; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=P1ZLh44V; spf=pass (imf20.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 64.147.123.20 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688123289; a=rsa-sha256; cv=none; b=3qvpoF6XH2eTjaVCvekNblPtaRZa3Yu1vFKHBcxYGpkrVzwueXzUcxX3TIyRFbB3l+BOd7 BrQIbrjK282el7IDm3EWHQe4PSLhHo7PAluVRU/ssc2h2P140OGCSNx+RIKrlwcShAZhqc K5UVHS6hBCxVE6PUdwR5h0uyorG4bqU= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=fastmail.fm header.s=fm2 header.b=QtQTXS5R; dkim=pass header.d=messagingengine.com header.s=fm2 header.b=P1ZLh44V; spf=pass (imf20.hostedemail.com: domain of bernd.schubert@fastmail.fm designates 64.147.123.20 as permitted sender) smtp.mailfrom=bernd.schubert@fastmail.fm; dmarc=pass (policy=none) header.from=fastmail.fm ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688123289; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3cUovgn6RrIc/EveZ/jC/IGkZsbXaM+3xWSMszdrokE=; b=BZq30MSGhD1Q0AtzToGDP216dwXGsXYxBhWCrB6idH8uS/KdLVponk7HRDKXWOb1eReMD9 4u8/labO+YPQ04u5KV2CudUG7kSM/EkIN9pTsVk/OwssNtaPn1JVlScpWSXnmQnZjLOJFk 27Yndl6UE/F5rOD2TyPFhesL9M1pI9o= Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 49E5432009F2; Fri, 30 Jun 2023 07:08:07 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Fri, 30 Jun 2023 07:08:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fastmail.fm; h= cc:cc:content-transfer-encoding:content-type:content-type:date :date:from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to; s=fm2; t= 1688123286; x=1688209686; bh=3cUovgn6RrIc/EveZ/jC/IGkZsbXaM+3xWS MszdrokE=; b=QtQTXS5RGlKu2fiHLQQt/+PzME3YsLrxY/KVWUAw4Em5QuDSDM6 SmvdH5+O68Hs0B1Ly/0hrtCwyLzrSj3XrKUV/0j21JVJ5Dpzj/wBqBapqp68gtAE +GtI+uG1UTF5MUk/sd2AzteL+ySOHlUdX+eYQfufW58zLayEZVQirgLTVs2cRYhA 3+VXZZ9tdlNop/gcm5qaNmgi3EI+BAANHbErbdGWZRufJnJZ7tgLu6JeB1T3Cj+o wB0HMTEGkZ6BJUrJCD4qn0V645kGndo3Km6AXThNIr9RamOZj+Hj8gLTt5WMj9RF B5kK7SPYNHQBGFv69S1085Y9lw5YmH5kpxA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1688123286; x=1688209686; bh=3cUovgn6RrIc/EveZ/jC/IGkZsbXaM+3xWS MszdrokE=; b=P1ZLh44VC4+bfmNuNWkj2BVMLGr7WA5eDInFpkMCu66b30YUNJT Mf91qDcRrsoGnJ1bZQsqg753SrSWtd581SEdDTG9fzP4Z0bfLVxCfhfGlZ0EaOXM ladSkHHDJcUHmQuGLS0hFsyrdmzN1SGY65mQf/ciwgkPulZLjWoqxO0bRZGRQXh8 ezBFS/wfUYImXOLyTIBORMDJbtH7Q/aFROm98QRcYfKxlExeZ+R/+ThRU5VpD7QZ wszYW5sBkUVWdBdELHN5Pmhi1qJ1+JfAguN/3W/K8vM8vodawlxcYyhndtcsauEI hoKaAlGHXY2AW/BMyAyM9ctxh9cxuSqfV5A== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedviedrtdeigdefhecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefkffggfgfuvfevfhfhjggtgfesthejredttdefjeenucfhrhhomhepuegvrhhn ugcuufgthhhusggvrhhtuceosggvrhhnugdrshgthhhusggvrhhtsehfrghsthhmrghilh drfhhmqeenucggtffrrghtthgvrhhnpeekheevkeelkeekjefhheegfedtffduudejjeei heehudeuleelgefhueekfeevudenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmh epmhgrihhlfhhrohhmpegsvghrnhgurdhstghhuhgsvghrthesfhgrshhtmhgrihhlrdhf mh X-ME-Proxy: Feedback-ID: id8a24192:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Fri, 30 Jun 2023 07:08:04 -0400 (EDT) Message-ID: Date: Fri, 30 Jun 2023 13:08:03 +0200 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.12.0 Subject: Re: [PATCH v6 1/3] libfs: Add directory operations for stable offsets Content-Language: en-US, de-DE To: Chuck Lever , viro@zeniv.linux.org.uk, brauner@kernel.org, hughd@google.com, akpm@linux-foundation.org Cc: Chuck Lever , jlayton@redhat.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org References: <168796579723.157221.1988816921257656153.stgit@manet.1015granger.net> <168796590904.157221.11286772826871541854.stgit@manet.1015granger.net> From: Bernd Schubert In-Reply-To: <168796590904.157221.11286772826871541854.stgit@manet.1015granger.net> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 99FA81C0016 X-Stat-Signature: bzttumobxqjgbccpnpw7zgwozxtwspbk X-Rspam-User: X-HE-Tag: 1688123289-828085 X-HE-Meta: U2FsdGVkX19GU2pZ1u3TgjVuH98LLNHyzG1Wckrkh32EYzcOoGSc3cyhwRNpLwO0uvzrWAtsALksQmPFFqao4LZqb5zRgB+/I+9oY8F1dhTHmQf/CEeUKlV/LESs4EtxuWHWuiypOvlIovr8jEKh6bHehFPkGZSIR5Mo5L0FwPxE6dxChUJgJEZeZKJDJcKcy9ebw+DAVPXUQ3mZFRA+kMo9JtSS4owLt4LEJdNMXED1z+dc9tF0fmFVKawCTi/C46zEHycSM5FySgCKlnb/CWxNujQKZnet5ScvIu+8BIUMHVlUjb+nmnWKrWDDy4Ep4iJtmW+jC0qJIbQgSTz9B9wcxJ5kNi+vVwAD6+AZz6YTdh8lhTsSgHwPh4iKd7JClD4q3p3sRcPTKx1qYINz3Ro8Zf0VsRtOgC4U665fVUKhaSIPhDpihjiEJMLjVGNMYWaq6lbdBzws4ARuR+UuirIS5AiRY+4A0gS9wHmPosKrkNmizg7XkHwwP9GQF9pVwxB1yJwy2CQY7QKQD3igz9yE/LUysLddJnIMOM60fxslcP4oIe0sQqYP2B6hpEvfdxLreTjIjzToHQ4Tr3nIc367V5oaiPJLP23Yz4Xbt3Mv53+6PWuXEbZ3kjfO6HF26lsR9oSwfNnCOU1lbJ20COgUP4NJQhGFaZihQeVwXHcCQ6FNGsPfaBCBHhBI+dUWMlLeiY0B4MLKuSdXMxZtpQ6LeAzJ9kFUy1B1AgqNPeWqv4AQ2CyXOwq1IKWqz8deVu0Tn5owg0+KDm54bdj8d1KXMfae1xkJ8H/6F7FYpibt+0wDTKqplz6kdulOWlMwbq0U/1GiYY8LbVZm32YxjgXfqzxdDcF5yt/d3+uKp+odNpPDsz41C81vHQRmy+u1DMuncWtXnS15+8gwPnY8tROAfiAF145jC9rD+r17tEq4CcnuUNxTE0a9vzXyXNqLNLfg8G3Jakr4ITlzKyy J0iKdkFl paSrHXvRhw/aKwVHX8lk2Pnq1SteRrhDBE9tyUhUOQWlF/tfe0eXo8tQU3jKuPPB1lQtPx1pAwXcLmVsdy3Y498d4eVA+qs3xZAekcA0BLGVmlJIbxEjKAQ1tvO3QcM5NbahZ+lXnoP2TFZejaFaG2S4H6GCyPb1ggCquKfIHfrcO5/5BhgYQb0l6tEBqqWRu0Hv+7ocot1rONNZdTX79uFG7ACBTdE+9m5pM+hx5YLrhHHn94T9VazuRkMj42nr/M9BgSgr1M9TjjcxDBFQ8MM3/OA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/28/23 17:25, Chuck Lever wrote: > From: Chuck Lever > > Create a vector of directory operations in fs/libfs.c that handles > directory seeks and readdir via stable offsets instead of the > current cursor-based mechanism. > > For the moment these are unused. > > Signed-off-by: Chuck Lever > --- > fs/libfs.c | 247 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > include/linux/fs.h | 18 ++++ > 2 files changed, 265 insertions(+) > > diff --git a/fs/libfs.c b/fs/libfs.c > index 89cf614a3271..2b0d5ac472df 100644 > --- a/fs/libfs.c > +++ b/fs/libfs.c > @@ -239,6 +239,253 @@ const struct inode_operations simple_dir_inode_operations = { > }; > EXPORT_SYMBOL(simple_dir_inode_operations); > > +static void offset_set(struct dentry *dentry, unsigned long offset) > +{ > + dentry->d_fsdata = (void *)offset; > +} > + > +static unsigned long dentry2offset(struct dentry *dentry) > +{ > + return (unsigned long)dentry->d_fsdata; > +} > + > +/** > + * simple_offset_init - initialize an offset_ctx > + * @octx: directory offset map to be initialized > + * > + */ > +void simple_offset_init(struct offset_ctx *octx) > +{ > + xa_init_flags(&octx->xa, XA_FLAGS_ALLOC1); > + > + /* 0 is '.', 1 is '..', so always start with offset 2 */ > + octx->next_offset = 2; > +} > + > +/** > + * simple_offset_add - Add an entry to a directory's offset map > + * @octx: directory offset ctx to be updated > + * @dentry: new dentry being added > + * > + * Returns zero on success. @so_ctx and the dentry offset are updated. > + * Otherwise, a negative errno value is returned. > + */ > +int simple_offset_add(struct offset_ctx *octx, struct dentry *dentry) > +{ > + static const struct xa_limit limit = XA_LIMIT(2, U32_MAX); > + u32 offset; > + int ret; > + > + if (dentry2offset(dentry) != 0) > + return -EBUSY; > + > + ret = xa_alloc_cyclic(&octx->xa, &offset, dentry, limit, > + &octx->next_offset, GFP_KERNEL); > + if (ret < 0) > + return ret; > + > + offset_set(dentry, offset); > + return 0; > +} > + > +/** > + * simple_offset_remove - Remove an entry to a directory's offset map > + * @octx: directory offset ctx to be updated > + * @dentry: dentry being removed > + * > + */ > +void simple_offset_remove(struct offset_ctx *octx, struct dentry *dentry) > +{ > + unsigned long index = dentry2offset(dentry); > + > + if (index == 0) > + return; > + > + xa_erase(&octx->xa, index); > + offset_set(dentry, 0); > +} > + > +/** > + * simple_offset_rename_exchange - exchange rename with directory offsets > + * @old_dir: parent of dentry being moved > + * @old_dentry: dentry being moved > + * @new_dir: destination parent > + * @new_dentry: destination dentry > + * > + * Returns zero on success. Otherwise a negative errno is returned and the > + * rename is rolled back. > + */ > +int simple_offset_rename_exchange(struct inode *old_dir, > + struct dentry *old_dentry, > + struct inode *new_dir, > + struct dentry *new_dentry) > +{ > + struct offset_ctx *old_ctx = old_dir->i_op->get_offset_ctx(old_dir); > + struct offset_ctx *new_ctx = new_dir->i_op->get_offset_ctx(new_dir); > + unsigned long old_index = dentry2offset(old_dentry); > + unsigned long new_index = dentry2offset(new_dentry); > + int ret; > + > + simple_offset_remove(old_ctx, old_dentry); > + simple_offset_remove(new_ctx, new_dentry); > + > + ret = simple_offset_add(new_ctx, old_dentry); > + if (ret) > + goto out_restore; > + > + ret = simple_offset_add(old_ctx, new_dentry); > + if (ret) { > + simple_offset_remove(new_ctx, old_dentry); > + goto out_restore; > + } > + > + ret = simple_rename_exchange(old_dir, old_dentry, new_dir, new_dentry); > + if (ret) { > + simple_offset_remove(new_ctx, old_dentry); > + simple_offset_remove(old_ctx, new_dentry); > + goto out_restore; > + } > + return 0; > + > +out_restore: > + offset_set(old_dentry, old_index); > + xa_store(&old_ctx->xa, old_index, old_dentry, GFP_KERNEL); > + offset_set(new_dentry, new_index); > + xa_store(&new_ctx->xa, new_index, new_dentry, GFP_KERNEL); > + return ret; > +} Thanks for the update, looks great! > + > +/** > + * simple_offset_destroy - Release offset map > + * @octx: directory offset ctx that is about to be destroyed > + * > + * During fs teardown (eg. umount), a directory's offset map might still > + * contain entries. xa_destroy() cleans out anything that remains. > + */ > +void simple_offset_destroy(struct offset_ctx *octx) > +{ > + xa_destroy(&octx->xa); > +} > + > +/** > + * offset_dir_llseek - Advance the read position of a directory descriptor > + * @file: an open directory whose position is to be updated > + * @offset: a byte offset > + * @whence: enumerator describing the starting position for this update > + * > + * SEEK_END, SEEK_DATA, and SEEK_HOLE are not supported for directories. > + * > + * Returns the updated read position if successful; otherwise a > + * negative errno is returned and the read position remains unchanged. > + */ > +static loff_t offset_dir_llseek(struct file *file, loff_t offset, int whence) > +{ > + switch (whence) { > + case SEEK_CUR: > + offset += file->f_pos; > + fallthrough; > + case SEEK_SET: > + if (offset >= 0) > + break; > + fallthrough; > + default: > + return -EINVAL; > + } > + > + return vfs_setpos(file, offset, U32_MAX); > +} > + > +static struct dentry *offset_find_next(struct xa_state *xas) > +{ > + struct dentry *child, *found = NULL; > + > + rcu_read_lock(); > + child = xas_next_entry(xas, U32_MAX); > + if (!child) > + goto out; > + spin_lock_nested(&child->d_lock, DENTRY_D_LOCK_NESTED); > + if (simple_positive(child)) > + found = dget_dlock(child); > + spin_unlock(&child->d_lock); > +out: > + rcu_read_unlock(); > + return found; > +} > + > +static bool offset_dir_emit(struct dir_context *ctx, struct dentry *dentry) > +{ > + loff_t offset = dentry2offset(dentry); > + struct inode *inode = d_inode(dentry); > + > + return ctx->actor(ctx, dentry->d_name.name, dentry->d_name.len, offset, > + inode->i_ino, fs_umode_to_dtype(inode->i_mode)); > +} > + > +static void offset_iterate_dir(struct dentry *dir, struct dir_context *ctx) > +{ > + struct inode *inode = d_inode(dir); > + struct offset_ctx *so_ctx = inode->i_op->get_offset_ctx(inode); > + XA_STATE(xas, &so_ctx->xa, ctx->pos); > + struct dentry *dentry; > + > + while (true) { > + spin_lock(&dir->d_lock); > + dentry = offset_find_next(&xas); > + spin_unlock(&dir->d_lock); > + if (!dentry) > + break; > + > + if (!offset_dir_emit(ctx, dentry)) { > + dput(dentry); > + break; > + } > + > + dput(dentry); > + ctx->pos = xas.xa_index + 1; > + } > +} > + > +/** > + * offset_readdir - Emit entries starting at offset @ctx->pos > + * @file: an open directory to iterate over > + * @ctx: directory iteration context > + * > + * Caller must hold @file's i_rwsem to prevent insertion or removal of > + * entries during this call. > + * > + * On entry, @ctx->pos contains an offset that represents the first entry > + * to be read from the directory. > + * > + * The operation continues until there are no more entries to read, or > + * until the ctx->actor indicates there is no more space in the caller's > + * output buffer. > + * > + * On return, @ctx->pos contains an offset that will read the next entry > + * in this directory when shmem_readdir() is called again with @ctx. > + * > + * Return values: > + * %0 - Complete > + */ > +static int offset_readdir(struct file *file, struct dir_context *ctx) > +{ > + struct dentry *dir = file->f_path.dentry; > + > + lockdep_assert_held(&d_inode(dir)->i_rwsem); > + > + if (!dir_emit_dots(file, ctx)) > + return 0; > + > + offset_iterate_dir(dir, ctx); > + return 0; > +} > + > +const struct file_operations simple_offset_dir_operations = { > + .llseek = offset_dir_llseek, > + .iterate_shared = offset_readdir, > + .read = generic_read_dir, > + .fsync = noop_fsync, > +}; > + > static struct dentry *find_next_child(struct dentry *parent, struct dentry *prev) > { > struct dentry *child = NULL; > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 133f0640fb24..85de389e4eb8 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -1767,6 +1767,7 @@ struct dir_context { > > struct iov_iter; > struct io_uring_cmd; > +struct offset_ctx; > > struct file_operations { > struct module *owner; > @@ -1854,6 +1855,7 @@ struct inode_operations { > int (*fileattr_set)(struct mnt_idmap *idmap, > struct dentry *dentry, struct fileattr *fa); > int (*fileattr_get)(struct dentry *dentry, struct fileattr *fa); > + struct offset_ctx *(*get_offset_ctx)(struct inode *inode); > } ____cacheline_aligned; Should this be documented in filesystems/vfs.rst and filesystems/locking.rst? Thanks, Bernd