From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E428C6FA82 for ; Fri, 23 Sep 2022 16:14:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 126AE80008; Fri, 23 Sep 2022 12:14:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0B0EC80007; Fri, 23 Sep 2022 12:14:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E6AE880008; Fri, 23 Sep 2022 12:14:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D0AD380007 for ; Fri, 23 Sep 2022 12:14:04 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id AD1961C66FE for ; Fri, 23 Sep 2022 16:14:04 +0000 (UTC) X-FDA: 79943846808.09.11609C7 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) by imf26.hostedemail.com (Postfix) with ESMTP id 2EFF414001F for ; Fri, 23 Sep 2022 16:14:03 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=YgHH6pk3nJLXW2jbXtGoxm+4FKcaifQJYVKj+aRwkxg=; b=ELnSTSol/sfK6s31ClYuS8gszF sgXyMtpRrY266URjhhF2+1SDFZPQ78gjuIuhX2IKz28nBNhRVMvO+lvdYB0u9QfxT77iP5VaWVadQ XQ7c8C2XILfSmvEUEZNndcp74iadwyK5fGrJ2BFrCVFtzqPwlVzV8idNgG0D9vuRVL5c60jrYWDzq 7t0rwxdFdeOKfLhnBFBDnyEJQ9KRzGyGnf7Y6ovolAFvEL3wYKawkB7tGYFSqpHRkMjTbebYoWz82 P+56yyxTj49Ngg8b0QDC7L5mid0UQH+ffCszwa9c1DnG3p+HPNkrve7pWm/ViH77+sQ5wQ2HwLSDK 1uOznKEw==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.96 #2 (Red Hat Linux)) id 1oblJ8-002uO3-0s; Fri, 23 Sep 2022 16:13:42 +0000 Date: Fri, 23 Sep 2022 17:13:42 +0100 From: Al Viro To: Christoph Hellwig Cc: Jan Kara , John Hubbard , Andrew Morton , Jens Axboe , Miklos Szeredi , "Darrick J . Wong" , Trond Myklebust , Anna Schumaker , David Hildenbrand , Logan Gunthorpe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, LKML Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines Message-ID: References: <20220906102106.q23ovgyjyrsnbhkp@quack3> <20220914145233.cyeljaku4egeu4x2@quack3> <20220915081625.6a72nza6yq4l5etp@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663949644; a=rsa-sha256; cv=none; b=yOiJ+0SrjgeZUqleTjNvOi/g5uOpYejddtn8g2czXjIFPqaXdukywIHrdpgqbiMQ2nfHVt xnhYfU0NBMZDjhRAbtjX9uytpRGim/kfmQxpKA6HWkZ/CE/7tt4gegyZYJgUiFAvY8B3Y+ 85ZXx60qbXAb81QOd4KC8TpFlNbOdl8= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=linux.org.uk header.s=zeniv-20220401 header.b=ELnSTSol; spf=none (imf26.hostedemail.com: domain of viro@ftp.linux.org.uk has no SPF policy when checking 62.89.141.173) smtp.mailfrom=viro@ftp.linux.org.uk; dmarc=pass (policy=none) header.from=zeniv.linux.org.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663949644; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YgHH6pk3nJLXW2jbXtGoxm+4FKcaifQJYVKj+aRwkxg=; b=sQS32YHCMHD6HjvL53/krtkW6ResBalfHvda6DFEJxVUTJ7OFzuAYNqd1ancrGZSahOe5v i2Mfr0J+vCm4M0zkn05S7BNzL+lXoQh+XpzHrKQzyGa+nUzHo4+TEbbMnwC5bfyC8D8fsB QxGrcYY5/GH6mtxt3qvZvhfQjYHxOVo= X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 2EFF414001F Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=linux.org.uk header.s=zeniv-20220401 header.b=ELnSTSol; spf=none (imf26.hostedemail.com: domain of viro@ftp.linux.org.uk has no SPF policy when checking 62.89.141.173) smtp.mailfrom=viro@ftp.linux.org.uk; dmarc=pass (policy=none) header.from=zeniv.linux.org.uk X-Rspam-User: X-Stat-Signature: pd6n4kdxaixa9ogh7z57oyzhsey9fct9 X-HE-Tag: 1663949643-415253 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri, Sep 23, 2022 at 01:44:33AM -0700, Christoph Hellwig wrote: > Why would I? We generall do have or should have the iov_iter around. Not for async IO. > And for the common case where we don't (bios) we can carry that > information in the bio as it needs a special unmap helper anyway. *Any* async read_iter is like that. > > Where are they getting > > dropped and what guarantees that IO is complete by that point? > > The exact place depens on the exact taaraget frontend of which we have > a few. But it happens from the end_io callback that is triggered > through a call to target_complete_cmd. OK... > > The reason I'm asking is that here you have an ITER_BVEC possibly fed to > > __blkdev_direct_IO_async(), with its > > if (iov_iter_is_bvec(iter)) { > > /* > > * Users don't rely on the iterator being in any particular > > * state for async I/O returning -EIOCBQUEUED, hence we can > > * avoid expensive iov_iter_advance(). Bypass > > * bio_iov_iter_get_pages() and set the bvec directly. > > */ > > bio_iov_bvec_set(bio, iter); > > which does *not* grab the page referneces. Sure, bio_release_pages() knows > > to leave those alone and doesn't drop anything. However, what is the > > mechanism preventing the pages getting freed before the IO completion > > in this case? > > The contract that callers of bvec iters need to hold their own > references as without that doing I/O do them would be unsafe. It they > did not hold references the pages could go away before even calling > bio_iov_iter_get_pages (or this open coded bio_iov_bvec_set). You are mixing two issues here - holding references to pages while using iov_iter instance is obvious; holding them until async IO is complete, even though struct iov_iter might be long gone by that point is a different story. And originating iov_iter instance really can be long-gone by the time of IO completion - requirement to keep it around would be very hard to satisfy. I've no objections to requiring the pages in ITER_BVEC to be preserved at least until the IO completion by means independent of whatever ->read_iter/->write_iter does to them, but * that needs to be spelled out very clearly and * we need to verify that it is, indeed, the case for all existing iov_iter_bvec callers, preferably with comments next to non-obvious ones (something that is followed only by the sync IO is obvious) That goes not just for bio - if we make get_pages *NOT* grab references on ITER_BVEC (and I'm all for it), we need to make sure that those pages won't be retained after the original protection runs out. Which includes the reference put into struct nfs_request, for example, as well as whatever ceph transport is doing, etc. Another thing that needs to be sorted out is __zerocopy_sg_from_iter() and its callers - AFAICS, all of those are in ->sendmsg() with MSG_ZEROCOPY in flags. It's a non-trivial amount of code audit - we have about 40 iov_iter_bvec() callers in the tree, and while many are clearly sync-only... the ones that are not tend to balloon into interesting analysis of call chains, etc. Don't get me wrong - that analysis needs to be done, just don't expect it to be trivial. And it will require quite a bit of cooperation from the folks familiar with e.g. drivers/target, or with ceph transport layer, etc. FWIW, my preference would be along the lines of Backing memory for any non user-backed iov_iter should be protected from reuse by creator of iov_iter and that protection should continue through not just all synchronous operations with iov_iter in question - it should last until all async operations involving that memory are finished. That continued protection must not depend upon taking extra page references, etc. while we are working with iov_iter. But getting there will take quite a bit of code audit and probably some massage as well.