From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6964C32771 for ; Thu, 22 Sep 2022 02:23:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A26E940007; Wed, 21 Sep 2022 22:23:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 12B376B0072; Wed, 21 Sep 2022 22:23:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F34B6940007; Wed, 21 Sep 2022 22:23:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id DE62F6B0071 for ; Wed, 21 Sep 2022 22:23:11 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id A9ABB12058C for ; Thu, 22 Sep 2022 02:23:11 +0000 (UTC) X-FDA: 79938124182.09.4FB31A0 Received: from zeniv.linux.org.uk (zeniv.linux.org.uk [62.89.141.173]) by imf01.hostedemail.com (Postfix) with ESMTP id 24F6840012 for ; Thu, 22 Sep 2022 02:23:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=linux.org.uk; s=zeniv-20220401; h=Sender:In-Reply-To:Content-Type: MIME-Version:References:Message-ID:Subject:Cc:To:From:Date:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=ZiPBxKPeBhyg1ydjp9NE+T7YJ8ZxnXylq6Qe68BttcM=; b=X5nKHSb5zMgYZl+FhXsP6DCn72 bEglB3uwj1u2ckoCMYriAyG5kPjUhImLESuKJd2vtNNiLCcrYBHeBdt3F6cQh6Wm3StOrkOcjQeJu 4qrCsHH5WIU26q176+DcvQKvcrbj0BY9yaqWeThAwanr0qzriUoC+LqfYejfH/Fnsff8pbPPZ71Qb NjRsD6GZMQI/wiIBI5colWhci+DcZ9INZTqBN7Niz8E0f9Fh+HHg3qeHhqTiWGfHStzVjtBSLeI0K W8V4q7B54U3lalSJFlSdAHcj1YZ49zmVV2ju998nI+9rCL2JEeqKV0rTw/xbrhrKpjGLwJQMEI0p/ suiYbjiA==; Received: from viro by zeniv.linux.org.uk with local (Exim 4.96 #2 (Red Hat Linux)) id 1obBrU-002FEi-1w; Thu, 22 Sep 2022 02:22:48 +0000 Date: Thu, 22 Sep 2022 03:22:48 +0100 From: Al Viro To: Jan Kara Cc: Christoph Hellwig , John Hubbard , Andrew Morton , Jens Axboe , Miklos Szeredi , "Darrick J . Wong" , Trond Myklebust , Anna Schumaker , David Hildenbrand , Logan Gunthorpe , linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-xfs@vger.kernel.org, linux-nfs@vger.kernel.org, linux-mm@kvack.org, LKML Subject: Re: [PATCH v2 4/7] iov_iter: new iov_iter_pin_pages*() routines Message-ID: References: <20220831041843.973026-5-jhubbard@nvidia.com> <103fe662-3dc8-35cb-1a68-dda8af95c518@nvidia.com> <20220906102106.q23ovgyjyrsnbhkp@quack3> <20220914145233.cyeljaku4egeu4x2@quack3> <20220915081625.6a72nza6yq4l5etp@quack3> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20220915081625.6a72nza6yq4l5etp@quack3> ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1663813391; a=rsa-sha256; cv=none; b=AdZVy44MQ0/TcaujE5awnuNYMpf9itx6ourhymHi/EURhutUDKMUFpyWBjJm4gPX9ff2zj hqwdHUys43q8+NNw7bX434OowfKsKZgx2cBeMlhLEQ9ua6yL4eNsWok5KMMdoEW1SM21VQ Lc/KOt1Txjhykk9z18rDUVcaFTdr7ks= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=linux.org.uk header.s=zeniv-20220401 header.b=X5nKHSb5; spf=none (imf01.hostedemail.com: domain of viro@ftp.linux.org.uk has no SPF policy when checking 62.89.141.173) smtp.mailfrom=viro@ftp.linux.org.uk; dmarc=pass (policy=none) header.from=zeniv.linux.org.uk ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1663813391; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZiPBxKPeBhyg1ydjp9NE+T7YJ8ZxnXylq6Qe68BttcM=; b=JQvph5cwHZvksaJWVhURNOZ+5YgOCe4NYeE7SJ4ZKq+lP19GaYePdzESA6qaTvw2BPUCSD jnvJlc1XKuXSzeNmyjRF1PrKuH0TzaZc+ssvtt5Dj57ECqQjjC/WKtdF+kqLfoj1gxKwjg weFkeAR2jYmdoL4VbEshPLDGP8L5hkQ= X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 24F6840012 X-Rspam-User: Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=linux.org.uk header.s=zeniv-20220401 header.b=X5nKHSb5; spf=none (imf01.hostedemail.com: domain of viro@ftp.linux.org.uk has no SPF policy when checking 62.89.141.173) smtp.mailfrom=viro@ftp.linux.org.uk; dmarc=pass (policy=none) header.from=zeniv.linux.org.uk X-Stat-Signature: s4s65h464h1dtcw67jrx3u4enk6pn6sa X-HE-Tag: 1663813390-727059 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Sep 15, 2022 at 10:16:25AM +0200, Jan Kara wrote: > > How would that work? What protects the area where you want to avoid running > > into pinned pages from previously acceptable page getting pinned? If "they > > must have been successfully unmapped" is a part of what you are planning, we > > really do have a problem... > > But this is a very good question. So far the idea was that we lock the > page, unmap (or writeprotect) the page, and then check pincount == 0 and > that is a reliable method for making sure page data is stable (until we > unlock the page & release other locks blocking page faults and writes). But > once suddently ordinary page references can be used to create pins this > does not work anymore. Hrm. > > Just brainstorming ideas now: So we'd either need to obtain the pins early > when we still have the virtual address (but I guess that is often not > practical but should work e.g. for normal direct IO path) or we need some > way to "simulate" the page fault when pinning the page, just don't map it > into page tables in the end. This simulated page fault could be perhaps > avoided if rmap walk shows that the page is already mapped somewhere with > suitable permissions. OK. As far as I can see, the rules are along the lines of * creator of ITER_BVEC/ITER_XARRAY is responsible for pages being safe. That includes * page known to be locked by caller * page being privately allocated and not visible to anyone else * iterator being data source * page coming from pin_user_pages(), possibly as the result of iov_iter_pin_pages() on ITER_IOVEC/ITER_UBUF. * ITER_PIPE pages are always safe * pages found in ITER_BVEC/ITER_XARRAY are safe, since the iterator had been created with such. My preference would be to have iov_iter_get_pages() and friends pin if and only if we have data-destination iov_iter that is user-backed. For data-source user-backed we only need FOLL_GET, and for all other flavours (ITER_BVEC, etc.) we only do get_page(), if we need to grab any references at all. What I'd like to have is the understanding of the places where we drop the references acquired by iov_iter_get_pages(). How do we decide whether to unpin? E.g. pipe_buffer carries a reference to page and no way to tell whether it's a pinned one; results of iov_iter_get_pages() on ITER_IOVEC *can* end up there, but thankfully only from data-source (== WRITE, aka. ITER_SOURCE) iov_iter. So for those we don't care. Then there's nfs_request; AFAICS, we do need to pin the references in those if they are coming from nfs_direct_read_schedule_iovec(), but not if they come from readpage_async_filler(). How do we deal with coalescence, etc.? It's been a long time since I really looked at that code... Christoph, could you give any comments on that one? Note, BTW, that nfs_request coming from readpage_async_filler() have pages locked by caller; the ones from nfs_direct_read_schedule_iovec() do not, and that's where we want them pinned. Resulting page references end up (after quite a trip through data structures) stuffed into struct rpc_rqst ->rc_recv_buf.pages[] and when a response arrives from server, they get picked by xs_read_bvec() and fed to iov_iter_bvec(). In one case it's safe since the pages are locked; in another - since they would come from pin_user_pages(). The call chain at the time they are used has nothing to do with the originator - sunrpc is looking at the arrived response to READ that matches an rpc_rqst that had been created by sender of that request and safety is the sender's responsibility.