From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1CCAAECAAD4 for ; Wed, 31 Aug 2022 09:43:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B3B18D0001; Wed, 31 Aug 2022 05:43:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4626E6B0072; Wed, 31 Aug 2022 05:43:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32AA78D0001; Wed, 31 Aug 2022 05:43:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 23A3F6B0071 for ; Wed, 31 Aug 2022 05:43:52 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id F0A791C65D8 for ; Wed, 31 Aug 2022 09:43:51 +0000 (UTC) X-FDA: 79859401062.22.0BD7688 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf26.hostedemail.com (Postfix) with ESMTP id 763B9140058 for ; Wed, 31 Aug 2022 09:43:51 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id CD2992226F; Wed, 31 Aug 2022 09:43:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1661939029; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Wf21pEEc2qn4pGtb/SmWthvBwjNCurQ6Mq3fuLKCgXM=; b=ZCv8hkQMv8VLUCjw+uWsF2LGUYsHrCxsVqq3OQGJcExeJM0/HRGBDw8+/sxrpshYy2Oa6v 2swiTLpvpZjw3UzuYHRoCD3p8cxabr9zKD9K0aRKKqOX8D3uLDV5eAFnU9JbpejhZ7Ws3a iDt5IueX4Sm74kxBss2tlSq3x1auDNg= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1661939029; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Wf21pEEc2qn4pGtb/SmWthvBwjNCurQ6Mq3fuLKCgXM=; b=dbelnQ/VdnnD4iu+J0ymYt3LS1MSca1P78Kt0qZxPB51wzUaPAjopEjwqz/m9HSF/ZLIPg JDWp+J0k//jhQACg== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B6DDE1332D; Wed, 31 Aug 2022 09:43:49 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id FXOYLFUtD2PMTAAAMHmgww (envelope-from ); Wed, 31 Aug 2022 09:43:49 +0000 Received: by quack3.suse.cz (Postfix, from userid 1000) id 2F5D0A067B; Wed, 31 Aug 2022 11:43:49 +0200 (CEST) Date: Wed, 31 Aug 2022 11:43:49 +0200 From: Jan Kara To: John Hubbard Cc: Jan Kara , Al Viro , Andrew Morton , Jens Axboe , Miklos Szeredi , Christoph Hellwig , "Darrick J . Wong" , Trond Myklebust , Anna Schumaker , Logan Gunthorpe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, LKML Subject: Re: [PATCH 5/6] NFS: direct-io: convert to FOLL_PIN pages Message-ID: <20220831094349.boln4jjajkdtykx3@quack3> References: <20220827083607.2345453-1-jhubbard@nvidia.com> <20220827083607.2345453-6-jhubbard@nvidia.com> <353f18ac-0792-2cb7-6675-868d0bd41d3d@nvidia.com> <217b4a17-1355-06c5-291e-7980c0d3cea6@nvidia.com> <20220829160808.rwkkiuelipr3huxk@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1661939031; a=rsa-sha256; cv=none; b=GvvIJk+Pu9tzq2w9mPktklOmntpV8bvLCjpKcnm2WZg6otqdBun8IESO7IkMkEFAeMc3vX /DAWp4Ftym/uj5yg4XoHty9t5wbViNHvpsh+rxbZ/AX5rnCI3i4eGORsf6o3i5EN74Bul0 KuVwnhtGk9rOmP8aks5rQ7tJSXDK0Dg= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ZCv8hkQM; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="dbelnQ/V"; spf=pass (imf26.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1661939031; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Wf21pEEc2qn4pGtb/SmWthvBwjNCurQ6Mq3fuLKCgXM=; b=gbsks5hybj6Vyq01Gc8P6/ZM1kbKqai9W2Trmkc8sYPTq28guWZwZ1TLvyHAHfGGg6Pqqt qf1GgvS9HY4SeXkFNWRI3UYHX4qm9G4t/yfx9XTj8mL27biCWBWr7jV2lhzDZpGSjoYw8g dY+i6RA+qZzmI0L+6KpAS7uUTpQKM+8= X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=ZCv8hkQM; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b="dbelnQ/V"; spf=pass (imf26.hostedemail.com: domain of jack@suse.cz designates 195.135.220.28 as permitted sender) smtp.mailfrom=jack@suse.cz; dmarc=none X-Stat-Signature: 993xhc1516iyrmdhtzhxx7q6wdjza951 X-Rspamd-Queue-Id: 763B9140058 X-Rspamd-Server: rspam10 X-HE-Tag: 1661939031-848767 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 29-08-22 12:59:26, John Hubbard wrote: > On 8/29/22 09:08, Jan Kara wrote: > >> However, the core block/bio conversion in patch 4 still does depend upon > >> a key assumption, which I got from a 2019 email discussion with > >> Christoph Hellwig and others here [1], which says: > >> > >> "All pages released by bio_release_pages should come from > >> get_get_user_pages...". > >> > >> I really hope that still holds true. Otherwise this whole thing is in > >> trouble. > >> > >> [1] https://lore.kernel.org/kvm/20190724053053.GA18330@infradead.org/ > > > > Well as far as I've checked that discussion, Christoph was aware of pipe > > pages etc. (i.e., bvecs) entering direct IO code. But he had some patches > > [2] which enabled GUP to work for bvecs as well (using the kernel mapping > > under the hood AFAICT from a quick glance at the series). I suppose we > > could also handle this in __iov_iter_get_pages_alloc() by grabbing pin > > reference instead of plain get_page() for the case of bvec iter. That way > > we should have only pinned pages in bio_release_pages() even for the bvec > > case. > > OK, thanks, that looks viable. So, that approach assumes that the > remaining two cases in __iov_iter_get_pages_alloc() will never end up > being released via bio_release_pages(): > > iov_iter_is_pipe(i) > iov_iter_is_xarray(i) > > I'm actually a little worried about ITER_XARRAY, which is a recent addition. > It seems to be used in ways that are similar to ITER_BVEC, and cephfs is > using it. It's probably OK for now, for this series, which doesn't yet > convert cephfs. So after looking into that a bit more, I think a clean approach would be to provide iov_iter_pin_pages2() and iov_iter_pages_alloc2(), under the hood in __iov_iter_get_pages_alloc() make sure we use pin_user_page() instead of get_page() in all the cases (using this in pipe_get_pages() and iter_xarray_get_pages() is easy) and then make all bio handling use the pinning variants for iters. I think at least iov_iter_is_pipe() case needs to be handled as well because as I wrote above, pipe pages can enter direct IO code e.g. for splice(2). Also I think that all iov_iter_get_pages2() (or the _alloc2 variant) users actually do want the "pin page" semantics in the end (they are accessing page contents) so eventually we should convert them all to iov_iter_pin_pages2() and remove iov_iter_get_pages2() altogether. But this will take some more conversion work with networking etc. so I'd start with converting bios only. Honza -- Jan Kara SUSE Labs, CR