From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9FF76C71153 for ; Mon, 4 Sep 2023 01:02:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A208B8E001D; Sun, 3 Sep 2023 21:02:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9D02C8E001C; Sun, 3 Sep 2023 21:02:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8703C8E001D; Sun, 3 Sep 2023 21:02:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 743DE8E001C for ; Sun, 3 Sep 2023 21:02:39 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4617AB32AC for ; Mon, 4 Sep 2023 01:02:39 +0000 (UTC) X-FDA: 81197114838.09.1247FBF Received: from mail-pl1-f181.google.com (mail-pl1-f181.google.com [209.85.214.181]) by imf27.hostedemail.com (Postfix) with ESMTP id 4463F40012 for ; Mon, 4 Sep 2023 01:02:37 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="ExvJL/70"; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf27.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693789357; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=7SkjlWpTOnFF4QpgAAxUR7lqv3IHtXsX9Ov+YWVrmXw=; b=l7m2jnNPQb7Ez62xYElmD08EOKtvrSXmIw5iVPacwkzEJpiQoHoLb4dEls8k71sZ2hLBWw QsczCBZGoRh2RtqV8Zox0fFQ09GtiTM5eCHrTiPCr7qFE3Ue3+m/1Nnel46w/4UGEJV0rP Iy0k6aLMNh4DfY/7S4cIb96MXIHwlWQ= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=fromorbit-com.20230601.gappssmtp.com header.s=20230601 header.b="ExvJL/70"; dmarc=pass (policy=quarantine) header.from=fromorbit.com; spf=pass (imf27.hostedemail.com: domain of david@fromorbit.com designates 209.85.214.181 as permitted sender) smtp.mailfrom=david@fromorbit.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693789357; a=rsa-sha256; cv=none; b=bwO5EixI3LUBmhBTudLVbi3jzepFFsKrEfKXlRDFEdH60yi7LOKFVai/301bFSGdg4zO5N 7B0kJdGDN9g/5xcwG3qnZJ/oE1anzy/6ongK4JyXcw9hrmriKpu27ihxaFEJ8N3tfCuGff sakk7HCCX2mipSXyq270zmCEnf1rmMs= Received: by mail-pl1-f181.google.com with SMTP id d9443c01a7336-1c09673b006so1999055ad.1 for ; Sun, 03 Sep 2023 18:02:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fromorbit-com.20230601.gappssmtp.com; s=20230601; t=1693789356; x=1694394156; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=7SkjlWpTOnFF4QpgAAxUR7lqv3IHtXsX9Ov+YWVrmXw=; b=ExvJL/70IWPlCVmmj+nG11PjxTjbFaEIlCk4rcuxG4sLnJEsHSUj/KhpeZd1lXMmCZ tr/Hib/gvt9wLGOX6ZRdr18Zcq2JO9Rf1X6pD4edtMB+h44N/yHKCe5Mp8FL7qLTLsN3 2b3K0QeOI/pyv6M6RvpHqTXQxcseyRpxekeQAzH2/0fNI7J26TZRz2MGvyEo5ZGgCeKL CgtRLUV5gQYLVWtSp9fMovl1oC2PeCTKW1utHS7KStKVxBGXZUIFWQZaB/FQ9rpuMkXF qO0dS6yQj3rbJBtbQwckZASSVoWW5jxpnEzC2iK7LHeg1Hf0EmbqguBMlCOQfNK0aBez k+Ew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693789356; x=1694394156; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=7SkjlWpTOnFF4QpgAAxUR7lqv3IHtXsX9Ov+YWVrmXw=; b=P2+97cPUsT4/NszqdEtVdfSk/tXNxyxIjUj2nn+xF63UbG6Xadz13172KRZL5AIJdf SzJ7sdWPQWYTrNWds67+BTIB9Vy0TFHzubnFT0yGBkzeJY+V60WXNgqdW/YJg/uJR+s1 uvBZSLztKXZkIp23fFjRj3rqAC6HuV2dcwzyZpICl1QXh3KpPO6BVRKqE76MeqjHX9+R RIFpb6HN+Vd6tJsisr7JCXaODUNjczXt48ithUgKxx4XfGySZmH5m0IGTJsVoS9yMscK zMMqLJuVf4lPTsUSf3Z9S9OWNBBCPep5Q88YPAuV2ypGPKTOSLNpSiVl+QhuscWKlFN9 hVag== X-Gm-Message-State: AOJu0YwTjBJ95+GDzue3EZjjSYnXtMGZiAE75nBpt5nJkHyuBy2fm61H SjOIuT+oYJXIZt3+VJXmWJ5yIQ== X-Google-Smtp-Source: AGHT+IGGrky2P+bBuH4AKgWBeYdqXOP6u+qaOJyeKMBTbWb7BRdsdnk0jjxmGVwySX0sZe8eaRNDcw== X-Received: by 2002:a17:902:ecc8:b0:1c1:fe97:bf34 with SMTP id a8-20020a170902ecc800b001c1fe97bf34mr8040994plh.24.1693789355853; Sun, 03 Sep 2023 18:02:35 -0700 (PDT) Received: from dread.disaster.area (pa49-195-66-88.pa.nsw.optusnet.com.au. [49.195.66.88]) by smtp.gmail.com with ESMTPSA id d4-20020a170902c18400b001bdcafcf8d3sm6351806pld.69.2023.09.03.18.02.34 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 Sep 2023 18:02:35 -0700 (PDT) Received: from dave by dread.disaster.area with local (Exim 4.96) (envelope-from ) id 1qcxz6-00AVA9-2L; Mon, 04 Sep 2023 11:02:32 +1000 Date: Mon, 4 Sep 2023 11:02:32 +1000 From: Dave Chinner To: Hao Xu Cc: io-uring@vger.kernel.org, Jens Axboe , Dominique Martinet , Pavel Begunkov , Christian Brauner , Alexander Viro , Stefan Roesch , Clay Harris , "Darrick J . Wong" , linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-ext4@vger.kernel.org, linux-cachefs@redhat.com, ecryptfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-unionfs@vger.kernel.org, bpf@vger.kernel.org, netdev@vger.kernel.org, linux-s390@vger.kernel.org, linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, linux-btrfs@vger.kernel.org, codalist@coda.cs.cmu.edu, linux-f2fs-devel@lists.sourceforge.net, cluster-devel@redhat.com, linux-mm@kvack.org, linux-nilfs@vger.kernel.org, devel@lists.orangefs.org, linux-cifs@vger.kernel.org, samba-technical@lists.samba.org, linux-mtd@lists.infradead.org, Wanpeng Li Subject: Re: [PATCH 02/11] xfs: add NOWAIT semantics for readdir Message-ID: References: <20230827132835.1373581-1-hao.xu@linux.dev> <20230827132835.1373581-3-hao.xu@linux.dev> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20230827132835.1373581-3-hao.xu@linux.dev> X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 4463F40012 X-Stat-Signature: s3jgtaputbjqao16r91sihyyiwjxhxqx X-Rspam-User: X-HE-Tag: 1693789357-139202 X-HE-Meta: U2FsdGVkX1/6mfl3f/+UKmr1R27eDdSPgGeTv2NnLAY87+aTni/h1JPxnVIArD42XsMhVgjAEorxwGCdwEsn+BCiE0neQcDEKnBgq4VK/EviR81yFn4KjibPYaGeE7XaLnmpK/0U01NiyEM5/Pg52iqfgmcqr0N/G1qRaGrfA9VnzkNFfxNSp/cZOQRWmaH6IKAPW3A++kAwpZtOVx739ML88e+TFdj3CXq0LrJVkZVnqayI25bnMlON855QLkCAaps5RMLUm19+CW1Y24NW2kK8W0lMaEL9IgHedIc5138YA8vI/9MgdiZ2LFMGTE9frOEQXUMXHQcklSf/MUOSNesZtHLWdvAPqtr2c3OCTUzzJXwMiQhJAX8EkS0KnidFvlrCbwHDTccwzMa1CSldDJomRKoriSOIEM2igq6/7gogYP78svy4nB5zA0lUuvZdxLsBKjv3tj4evlhgQyLeQm5by1lQypv3KurD4YXcmFG3JeqLN0q7Uwwf0W6kGgtPswrkZMif9XJyt8SynL9PjMFEzZtm563JaWGO3m2zbZ6+mUD4Kyq86yOsfKf9qK/So9pEmtUVBEGWKK+bjo+lGWLRoJP/yVtQvSfnT9RCnw2BQBm4UslXDj5PMnAYNLikIplkTd0cW8YKyrS7CttVsVHtDetylfWsbCcqt6KAnYOst89VDZv2681OegKMChx79moThdLWNwyT6K0M8mQzDT6CuAbWZPbs/dMGYGbBaynCY/cCZa0oefi3lr7z/s27+J50vBZCYyo93H3VRSOvA+6lJyZwD9jMVGWm6SzvLObLsvDaLBNodgOarF2oDJKWRmpzEVHsu7AJGhicbksz2bpkAIRBx0XvyV46PqHvb6cdV+x+PMqTSXTWIVwYWRP+sN8RrTDo9q2pP8Rma2JIGgedamxFfN3TrmMPiQBvhFWTW3tHi2h34s1mQGHYSRYBGEp/GVUfy/bS4aUZve4 GrhWmKyr MXrLAGbqDr3/Ei00XUaGNM5R4xvUk7xW9myJVkks5IXyavecBiXMnTZMKFt0ehNFHSIGa6VrH1z/Z9n8AQ3vmDc3SCntA+5mtCH0hziKpXkRiwjhIAKGdPQjuDpTa2aGZMw3hubWAey3AY2Az5t6y0b53jwltrQRS5h90mB6WFMyjUF+26sQIjwcJ4qVlzWVb2YT2DfUr18MsIRuPglZzb8flMqAf0SBF5UZtaL9EE5KTuvygTxBKMDFKzh/c+3LXg/lzN3Tvx6WzEM2YAHm3RP6hnr6f+C4dreGLkuN58Jh6J8q2CXVylSkhykMRlrnsfFecSS3vdmdzvG3WkXAqeC4awAY7DG+hnQE8vV3bLGoqDdS7CeHViCGBW4Rjeadq+QcLRYid0D0+XCR1uKCh3F4APRaJjhf4u1IIkC1+BN7MjowEUP1TBLL+AT+9HOhyKhT0mnS8VMan7Mve+HcFAJkBsXT/r8iSrZCL+9WivoncSksvd6tYGvw+qhpYYLP+qHvbP3O5OHBl7dq8/L3m339/BGWI+P3/u1xbJerdXxlOovP9sp9p6+8faXBB5OIOqPYRr8pRGmaidNBWVfsWrKja+Pul03YG5E4D X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Aug 27, 2023 at 09:28:26PM +0800, Hao Xu wrote: > From: Hao Xu > > Implement NOWAIT semantics for readdir. Return EAGAIN error to the > caller if it would block, like failing to get locks, or going to > do IO. > > Co-developed-by: Dave Chinner Not really. "Co-developed" implies equal development input between all the parties, which is not the case here - this patch is based on prototype I wrote, whilst you're doing the refining, testing and correctness work. In these cases with XFS code, we add a line in the commit message to say: "This is based on a patch originally written by Dave Chinner." > Signed-off-by: Dave Chinner > Signed-off-by: Hao Xu > [fixes deadlock issue, tweak code style] With a signoff chain like you already have. In the end you'll also get a RVB from me, which seems rather wrong to me if I've apparently been "co-developing" the code.... .... > @@ -156,7 +157,9 @@ xfs_dir2_block_getdents( > if (xfs_dir2_dataptr_to_db(geo, ctx->pos) > geo->datablk) > return 0; > > - error = xfs_dir3_block_read(args->trans, dp, &bp); > + if (ctx->flags & DIR_CONTEXT_F_NOWAIT) > + flags |= XFS_DABUF_NOWAIT; > + error = xfs_dir3_block_read(args->trans, dp, flags, &bp); > if (error) > return error; > Given we do this same check in both block and leaf formats to set XFS_DABUF_NOWAIT, and we do the DIR_CONTEXT_F_NOWAIT check in xfs_readdir() as well, we should probably do this check once at the higher level and pass flags down from there with XFS_DABUF_NOWAIT already set. > @@ -240,6 +243,7 @@ xfs_dir2_block_getdents( > STATIC int > xfs_dir2_leaf_readbuf( > struct xfs_da_args *args, > + struct dir_context *ctx, > size_t bufsize, > xfs_dir2_off_t *cur_off, > xfs_dablk_t *ra_blk, > @@ -258,10 +262,15 @@ xfs_dir2_leaf_readbuf( > struct xfs_iext_cursor icur; > int ra_want; > int error = 0; > - > - error = xfs_iread_extents(args->trans, dp, XFS_DATA_FORK); > - if (error) > - goto out; > + unsigned int flags = 0; > + > + if (ctx->flags & DIR_CONTEXT_F_NOWAIT) { > + flags |= XFS_DABUF_NOWAIT; > + } else { > + error = xfs_iread_extents(args->trans, dp, XFS_DATA_FORK); > + if (error) > + goto out; > + } Especially as, in hindsight, this doesn't make a whole lot of sense. If XFS_DABUF_NOWAIT is set, we keep going until xfs_ilock_data_map_shared_nowait() where we call xfs_need_iread_extents() to see if we need to read the extents in and abort at that point. So, really, we shouldn't get this far with nowait semantics if we haven't read the extents in yet - we're supposed to already have the inode locked here and so we should have already checked this condition before we bother locking the inode... i.e. all we should be doing here is this: if (!(flags & XFS_DABUF_NOWAIT)) { error = xfs_iread_extents(args->trans, dp, XFS_DATA_FORK); if (error) goto out; } And then we don't need to pass the VFS dir_context down into low level XFS functions unnecessarily. > > /* > * Look for mapped directory blocks at or above the current offset. > @@ -280,7 +289,7 @@ xfs_dir2_leaf_readbuf( > new_off = xfs_dir2_da_to_byte(geo, map.br_startoff); > if (new_off > *cur_off) > *cur_off = new_off; > - error = xfs_dir3_data_read(args->trans, dp, map.br_startoff, 0, &bp); > + error = xfs_dir3_data_read(args->trans, dp, map.br_startoff, flags, &bp); > if (error) > goto out; > > @@ -360,6 +369,7 @@ xfs_dir2_leaf_getdents( > int byteoff; /* offset in current block */ > unsigned int offset = 0; > int error = 0; /* error return value */ > + int written = 0; > > /* > * If the offset is at or past the largest allowed value, > @@ -391,10 +401,17 @@ xfs_dir2_leaf_getdents( > bp = NULL; > } > > - if (*lock_mode == 0) > - *lock_mode = xfs_ilock_data_map_shared(dp); > - error = xfs_dir2_leaf_readbuf(args, bufsize, &curoff, > - &rablk, &bp); > + if (*lock_mode == 0) { > + *lock_mode = > + xfs_ilock_data_map_shared_generic(dp, > + ctx->flags & DIR_CONTEXT_F_NOWAIT); > + if (!*lock_mode) { > + error = -EAGAIN; > + break; > + } > + } > + error = xfs_dir2_leaf_readbuf(args, ctx, bufsize, > + &curoff, &rablk, &bp); int xfs_ilock_readdir( struct xfs_inode *ip, int flags) { if (flags & XFS_DABUF_NOWAIT) { if (!xfs_ilock_nowait(dp, XFS_ILOCK_SHARED)) return -EAGAIN; return XFS_ILOCK_SHARED; } return xfs_ilock_data_map_shared(dp); } And then this code simply becomes: if (*lock_mode == 0) *lock_mode = xfs_ilock_readdir(ip, flags); > if (error || !bp) > break; > > @@ -479,6 +496,7 @@ xfs_dir2_leaf_getdents( > */ > offset += length; > curoff += length; > + written += length; > /* bufsize may have just been a guess; don't go negative */ > bufsize = bufsize > length ? bufsize - length : 0; > } > @@ -492,6 +510,8 @@ xfs_dir2_leaf_getdents( > ctx->pos = xfs_dir2_byte_to_dataptr(curoff) & 0x7fffffff; > if (bp) > xfs_trans_brelse(args->trans, bp); > + if (error == -EAGAIN && written > 0) > + error = 0; > return error; > } > > @@ -514,6 +534,7 @@ xfs_readdir( > unsigned int lock_mode; > bool isblock; > int error; > + bool nowait; > > trace_xfs_readdir(dp); > > @@ -531,7 +552,11 @@ xfs_readdir( > if (dp->i_df.if_format == XFS_DINODE_FMT_LOCAL) > return xfs_dir2_sf_getdents(&args, ctx); > > - lock_mode = xfs_ilock_data_map_shared(dp); > + nowait = ctx->flags & DIR_CONTEXT_F_NOWAIT; > + lock_mode = xfs_ilock_data_map_shared_generic(dp, nowait); > + if (!lock_mode) > + return -EAGAIN; > + Given what I said above: if (ctx->flags & DIR_CONTEXT_F_NOWAIT) { /* * If we need to read extents, then we must do IO * and we must use exclusive locking. We don't want * to do either of those things, so just bail if we * have to read extents. Doing this check explicitly * here means we don't have to do it anywhere else * in the XFS_DABUF_NOWAIT path. */ if (xfs_need_iread_extents(&ip->i_df)) return -EAGAIN; flags |= XFS_DABUF_NOWAIT; } lock_mode = xfs_ilock_readdir(dp, flags); And with this change, we probably should be marking the entire operation as having nowait semantics. i.e. using args->op_flags here and only use XFS_DABUF_NOWAIT for the actual IO. ie. args->op_flags |= XFS_DA_OP_NOWAIT; This makes it clear that the entire directory op should run under NOWAIT constraints, and it avoids needing to pass an extra flag through the stack. That then makes the readdir locking function look like this: /* * When we are locking an inode for readdir, we need to ensure that * the extents have been read in first. This requires the inode to * be locked exclusively across the extent read, but otherwise we * want to use shared locking. * * For XFS_DA_OP_NOWAIT operations, we do an up-front check to see * if the extents have been read in, so all we need to do in this * case is a shared try-lock as we never need exclusive locking in * this path. */ static int xfs_ilock_readdir( struct xfs_da_args *args) { if (args->op_flags & XFS_DA_OP_NOWAIT) { if (!xfs_ilock_nowait(args->dp, XFS_ILOCK_SHARED)) return -EAGAIN; return XFS_ILOCK_SHARED; } return xfs_ilock_data_map_shared(args->dp); } > diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c > index 9e62cc500140..d088f7d0c23a 100644 > --- a/fs/xfs/xfs_inode.c > +++ b/fs/xfs/xfs_inode.c > @@ -120,6 +120,33 @@ xfs_ilock_data_map_shared( > return lock_mode; > } > > +/* > + * Similar to xfs_ilock_data_map_shared(), except that it will only try to lock > + * the inode in shared mode if the extents are already in memory. If it fails to > + * get the lock or has to do IO to read the extent list, fail the operation by > + * returning 0 as the lock mode. > + */ > +uint > +xfs_ilock_data_map_shared_nowait( > + struct xfs_inode *ip) > +{ > + if (xfs_need_iread_extents(&ip->i_df)) > + return 0; > + if (!xfs_ilock_nowait(ip, XFS_ILOCK_SHARED)) > + return 0; > + return XFS_ILOCK_SHARED; > +} > + > +int > +xfs_ilock_data_map_shared_generic( > + struct xfs_inode *dp, > + bool nowait) > +{ > + if (nowait) > + return xfs_ilock_data_map_shared_nowait(dp); > + return xfs_ilock_data_map_shared(dp); > +} And all this "generic" locking stuff goes away. FWIW, IMO, "generic" is a poor name for an XFS function as there's nothing "generic" in XFS. We tend name the functions after what they do, not some abstract concept. Leave "generic" as a keyword for widely used core infrastructure functions, not niche, one-off use cases like this. Cheers, Dave. -- Dave Chinner david@fromorbit.com