From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FBD1C2D0A3 for ; Mon, 26 Oct 2020 04:31:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F26B8207C4 for ; Mon, 26 Oct 2020 04:31:30 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=infradead.org header.i=@infradead.org header.b="UjxXxgVy" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F26B8207C4 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=infradead.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4CD7E6B005D; Mon, 26 Oct 2020 00:31:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 458156B0062; Mon, 26 Oct 2020 00:31:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 31F906B0068; Mon, 26 Oct 2020 00:31:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0112.hostedemail.com [216.40.44.112]) by kanga.kvack.org (Postfix) with ESMTP id 036B46B005D for ; Mon, 26 Oct 2020 00:31:29 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 84E13180AD817 for ; Mon, 26 Oct 2020 04:31:29 +0000 (UTC) X-FDA: 77412802698.23.kite98_0b08aad27270 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 6448B37604 for ; Mon, 26 Oct 2020 04:31:29 +0000 (UTC) X-HE-Tag: kite98_0b08aad27270 X-Filterd-Recvd-Size: 5220 Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf28.hostedemail.com (Postfix) with ESMTP for ; Mon, 26 Oct 2020 04:31:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=qSdqOixYcTUUWwPEY+kW4kk0m43pmokWNe8HgQoh+Ek=; b=UjxXxgVyzDFJIFVUSt9bdjzayX iefTgd/ZqSsTt0Pm3HkL3ybwrwP1hcOChcgDH/U+DMrXwKwhQAGqs0Bi1w08k2Sp8nxYtL4hNr2J+ LgFvKiFSWF6V6uaEaoSJshFx1xYOth5YCvDywB1SiT/jlkjTasm+tnGVT7G895IApb2RbxKXtxRXP Iqz4miNMTI0QfosAQ4ChTCvOxFuMMLl5MVf+QT+bPQHtZCAgdZZOR85cr9aT3e3UnzU5cZNff87K3 pmeEb1GxbLjRDv+7sGfK/h7uff2Z/JmyX5GZjGCdHn5iklW1/GSzwqsp4hAiERiEEOHTD+HtYHzoN R8/as70A==; Received: from willy by casper.infradead.org with local (Exim 4.92.3 #3 (Red Hat Linux)) id 1kWu19-00078M-0i; Mon, 26 Oct 2020 04:21:59 +0000 Date: Mon, 26 Oct 2020 04:21:58 +0000 From: Matthew Wilcox To: John Hubbard Cc: "Kirill A. Shutemov" , Dave Hansen , Andy Lutomirski , Peter Zijlstra , Paolo Bonzini , Sean Christopherson , Vitaly Kuznetsov , Wanpeng Li , Jim Mattson , Joerg Roedel , David Rientjes , Andrea Arcangeli , Kees Cook , Will Drewry , "Edgecombe, Rick P" , "Kleen, Andi" , Liran Alon , Mike Rapoport , x86@kernel.org, kvm@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, "Kirill A. Shutemov" Subject: Re: [RFCv2 08/16] KVM: Use GUP instead of copy_from/to_user() to access guest memory Message-ID: <20201026042158.GN20115@casper.infradead.org> References: <20201020061859.18385-1-kirill.shutemov@linux.intel.com> <20201020061859.18385-9-kirill.shutemov@linux.intel.com> <20201022114946.GR20115@casper.infradead.org> <30ce6691-fd70-76a2-8b61-86d207c88713@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <30ce6691-fd70-76a2-8b61-86d207c88713@nvidia.com> X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Oct 22, 2020 at 12:58:14PM -0700, John Hubbard wrote: > On 10/22/20 4:49 AM, Matthew Wilcox wrote: > > On Tue, Oct 20, 2020 at 01:25:59AM -0700, John Hubbard wrote: > > > Should copy_to_guest() use pin_user_pages_unlocked() instead of gup_unlocked? > > > We wrote a "Case 5" in Documentation/core-api/pin_user_pages.rst, just for this > > > situation, I think: > > > > > > > > > CASE 5: Pinning in order to write to the data within the page > > > ------------------------------------------------------------- > > > Even though neither DMA nor Direct IO is involved, just a simple case of "pin, > > > write to a page's data, unpin" can cause a problem. Case 5 may be considered a > > > superset of Case 1, plus Case 2, plus anything that invokes that pattern. In > > > other words, if the code is neither Case 1 nor Case 2, it may still require > > > FOLL_PIN, for patterns like this: > > > > > > Correct (uses FOLL_PIN calls): > > > pin_user_pages() > > > write to the data within the pages > > > unpin_user_pages() > > > > Case 5 is crap though. That bug should have been fixed by getting > > the locking right. ie: > > > > get_user_pages_fast(); > > lock_page(); > > kmap(); > > set_bit(); > > kunmap(); > > set_page_dirty() > > unlock_page(); > > > > I should have vetoed that patch at the time, but I was busy with other things. > > It does seem like lock_page() is better, for now at least, because it > forces the kind of synchronization with file system writeback that is > still yet to be implemented for pin_user_pages(). > > Long term though, Case 5 provides an alternative way to do this > pattern--without using lock_page(). Also, note that Case 5, *in > general*, need not be done page-at-a-time, unlike the lock_page() > approach. Therefore, Case 5 might potentially help at some call sites, > either for deadlock avoidance or performance improvements. > > In other words, once the other half of the pin_user_pages() plan is > implemented, either of these approaches should work. > > Or, are you thinking that there is never a situation in which Case 5 is > valid? I don't think the page pinning approach is ever valid. For file mappings, the infiniband registration should take a lease on the inode, blocking truncation. DAX shouldn't be using struct pages at all, so there shouldn't be anything there to pin. It's been five years since DAX was merged, and page pinning still doesn't work. How much longer before the people who are pushing it realise that it's fundamentally flawed?