From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 327D0C433F5 for ; Thu, 2 Dec 2021 04:11:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8EA7D6B0074; Wed, 1 Dec 2021 23:11:39 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8999A6B0075; Wed, 1 Dec 2021 23:11:39 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 760996B007B; Wed, 1 Dec 2021 23:11:39 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0238.hostedemail.com [216.40.44.238]) by kanga.kvack.org (Postfix) with ESMTP id 671B86B0074 for ; Wed, 1 Dec 2021 23:11:39 -0500 (EST) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 2115086322 for ; Thu, 2 Dec 2021 04:11:29 +0000 (UTC) X-FDA: 78871529898.07.8F498CD Received: from casper.infradead.org (casper.infradead.org [90.155.50.34]) by imf15.hostedemail.com (Postfix) with ESMTP id 3CF7DD000081 for ; Thu, 2 Dec 2021 04:11:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=In-Reply-To:Content-Type:MIME-Version: References:Message-ID:Subject:Cc:To:From:Date:Sender:Reply-To: Content-Transfer-Encoding:Content-ID:Content-Description; bh=OaBMbitmgEZNmXAV8f1LwzxghmeIu1lnXqfudDNWxHQ=; b=dgh3VQawoWcliXFV6oysLyGov1 1KRPdaINHDE/kTfyc7+HOgBroutE4t8I8JLCyjqEKCZB46dGhXbfj2G0mH5B6i/2LbslkCItf3MeB zmFMwoponOBvaMaOrmlY02XkID88tpLgCkP6Ni/R7LFhS6wkhqCplDeLefJA1bJz3SvIyjl4pVQnm XOVS0qv/kuD1zvT+uKsWOvh0uFjdL4xsYBgDjgTY6zr8cBOk0vIUa1JTN8LXTlm8yOnq579vf6a7C BofUjuC1ZOmdVXEpOjvBF7zNiHhxtQd6Lnv2F+dHFA5V4a1g3prlf1une5UzgW1AL8dxn3ZnomOHu Wr5DpEEg==; Received: from willy by casper.infradead.org with local (Exim 4.94.2 #2 (Red Hat Linux)) id 1msdRE-002KU4-Pj; Thu, 02 Dec 2021 04:11:16 +0000 Date: Thu, 2 Dec 2021 04:11:16 +0000 From: Matthew Wilcox To: Jann Horn Cc: Linus Torvalds , Jan Kara , Kirill Shutemov , Oleg Nesterov , Christoph Hellwig , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Mike Kravetz Subject: Re: [5.4 PATCH] mm/gup: Do not force a COW break on file-backed memory Message-ID: References: <20211201231757.332199-1-willy@infradead.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 3CF7DD000081 X-Stat-Signature: me9jwtp7ueg7g76frnraub4435cwf6sd Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=infradead.org header.s=casper.20170209 header.b=dgh3VQaw; spf=none (imf15.hostedemail.com: domain of willy@infradead.org has no SPF policy when checking 90.155.50.34) smtp.mailfrom=willy@infradead.org; dmarc=none X-HE-Tag: 1638418288-400066 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Dec 02, 2021 at 04:51:47AM +0100, Jann Horn wrote: > On Thu, Dec 2, 2021 at 12:18 AM Matthew Wilcox (Oracle) > wrote: > > Commit 17839856fd58 ("gup: document and work around "COW can break either > > way" issue") forces a COW break, even for read-only GUP. This interacts > > badly with CONFIG_READ_ONLY_THP_FOR_FS as it tries to write to a read-only > > PMD and follow_trans_huge_pmd() returns NULL which induces an endless > > loop as __get_user_pages() interprets that as page-not-present, tries > > to fault it in and retries the follow_page_mask(). > > > > The issues fixed by 17839856fd58 don't apply to files. We know which way > > the COW breaks; the page cache keeps the original and any modifications > > are private to that process. There's no optimisation that allows a > > process to reuse a file-backed MAP_PRIVATE page. So we can skip the > > breaking of the COW for file-backed mappings. > > > > This problem only exists in v5.4.y; other stable kernels either predate > > CONFIG_READ_ONLY_THP_FOR_FS or they include commit a308c71bf1e6 ("mm/gup: > > Remove enfornced COW mechanism"). > > > > Signed-off-by: Matthew Wilcox (Oracle) > > --- > > mm/gup.c | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/mm/gup.c b/mm/gup.c > > index 3ef769529548..d55e02411010 100644 > > --- a/mm/gup.c > > +++ b/mm/gup.c > > @@ -176,7 +176,8 @@ static inline bool can_follow_write_pte(pte_t pte, unsigned int flags) > > */ > > static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags) > > { > > - return is_cow_mapping(vma->vm_flags) && (flags & FOLL_GET); > > + return is_cow_mapping(vma->vm_flags) && vma_is_anonymous(vma) && > > + (flags & FOLL_GET); > > } > > To be fully correct, the check would have to check for PageAnon(), not > whether the mapping is anonymous, right? Since a private file mapping > can still contain anonymous pages from a prior CoW? Oh, right. So parent process maps a file with MAP_PRIVATE, writes to it, gets an anon page, forks. Child stuffs the page into a pipe, unmaps page. Parent writes to page again, now child can read() the modification? The problem is that we don't even get to seeing the struct page with the current code paths. And we're looking for a fix for RO THP that's less intrusive for v5.4 than backporting 09854ba94c6a ("mm: do_wp_page() simplification") 1a0cf26323c8 ("mm/ksm: Remove reuse_ksm_page()") a308c71bf1e6 ("mm/gup: Remove enfornced COW mechanism") The other patch we've been kicking around (and works) is: static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags) { - return is_cow_mapping(vma->vm_flags) && (flags & FOLL_GET); + return is_cow_mapping(vma->vm_flags) && + (!(vma->vm_flags & VM_DENYWRITE)) && (flags & FOLL_GET); } That limits the change to be only text pages. Generally programs do not write to their text pages, and they certainly don't write *secrets* to their text pages; if somebody else can read it, that's probably not a problem in the same way as writing to a page of heap.