From: Matthew Wilcox <willy@infradead.org>
To: Jann Horn <jannh@google.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
Jan Kara <jack@suse.cz>, Kirill Shutemov <kirill@shutemov.name>,
Oleg Nesterov <oleg@redhat.com>, Christoph Hellwig <hch@lst.de>,
linux-mm@kvack.org, linux-kernel@vger.kernel.org,
Mike Kravetz <mike.kravetz@oracle.com>
Subject: Re: [5.4 PATCH] mm/gup: Do not force a COW break on file-backed memory
Date: Thu, 2 Dec 2021 04:11:16 +0000 [thread overview]
Message-ID: <YahHZOnT1Uh41XnP@casper.infradead.org> (raw)
In-Reply-To: <CAG48ez3YNCNZB7AktmRoYLsBQjwBdwueRUXbkFgNVMsgjmCTGA@mail.gmail.com>
On Thu, Dec 02, 2021 at 04:51:47AM +0100, Jann Horn wrote:
> On Thu, Dec 2, 2021 at 12:18 AM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> > Commit 17839856fd58 ("gup: document and work around "COW can break either
> > way" issue") forces a COW break, even for read-only GUP. This interacts
> > badly with CONFIG_READ_ONLY_THP_FOR_FS as it tries to write to a read-only
> > PMD and follow_trans_huge_pmd() returns NULL which induces an endless
> > loop as __get_user_pages() interprets that as page-not-present, tries
> > to fault it in and retries the follow_page_mask().
> >
> > The issues fixed by 17839856fd58 don't apply to files. We know which way
> > the COW breaks; the page cache keeps the original and any modifications
> > are private to that process. There's no optimisation that allows a
> > process to reuse a file-backed MAP_PRIVATE page. So we can skip the
> > breaking of the COW for file-backed mappings.
> >
> > This problem only exists in v5.4.y; other stable kernels either predate
> > CONFIG_READ_ONLY_THP_FOR_FS or they include commit a308c71bf1e6 ("mm/gup:
> > Remove enfornced COW mechanism").
> >
> > Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
> > ---
> > mm/gup.c | 3 ++-
> > 1 file changed, 2 insertions(+), 1 deletion(-)
> >
> > diff --git a/mm/gup.c b/mm/gup.c
> > index 3ef769529548..d55e02411010 100644
> > --- a/mm/gup.c
> > +++ b/mm/gup.c
> > @@ -176,7 +176,8 @@ static inline bool can_follow_write_pte(pte_t pte, unsigned int flags)
> > */
> > static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned int flags)
> > {
> > - return is_cow_mapping(vma->vm_flags) && (flags & FOLL_GET);
> > + return is_cow_mapping(vma->vm_flags) && vma_is_anonymous(vma) &&
> > + (flags & FOLL_GET);
> > }
>
> To be fully correct, the check would have to check for PageAnon(), not
> whether the mapping is anonymous, right? Since a private file mapping
> can still contain anonymous pages from a prior CoW?
Oh, right. So parent process maps a file with MAP_PRIVATE, writes to
it, gets an anon page, forks. Child stuffs the page into a pipe,
unmaps page. Parent writes to page again, now child can read() the
modification?
The problem is that we don't even get to seeing the struct page with
the current code paths. And we're looking for a fix for RO THP that's
less intrusive for v5.4 than backporting
09854ba94c6a ("mm: do_wp_page() simplification")
1a0cf26323c8 ("mm/ksm: Remove reuse_ksm_page()")
a308c71bf1e6 ("mm/gup: Remove enfornced COW mechanism")
The other patch we've been kicking around (and works) is:
static inline bool should_force_cow_break(struct vm_area_struct *vma, unsigned
int flags)
{
- return is_cow_mapping(vma->vm_flags) && (flags & FOLL_GET);
+ return is_cow_mapping(vma->vm_flags) &&
+ (!(vma->vm_flags & VM_DENYWRITE)) && (flags & FOLL_GET);
}
That limits the change to be only text pages. Generally programs do
not write to their text pages, and they certainly don't write *secrets*
to their text pages; if somebody else can read it, that's probably not
a problem in the same way as writing to a page of heap.
next prev parent reply other threads:[~2021-12-02 4:11 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-12-01 23:17 Matthew Wilcox (Oracle)
2021-12-02 3:51 ` Jann Horn
2021-12-02 4:11 ` Matthew Wilcox [this message]
2021-12-02 4:33 ` Jann Horn
2021-12-02 18:54 ` Linus Torvalds
2021-12-02 19:59 ` Matthew Wilcox
2021-12-02 22:33 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YahHZOnT1Uh41XnP@casper.infradead.org \
--to=willy@infradead.org \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=kirill@shutemov.name \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=oleg@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox