From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD9B9C433E0 for ; Sun, 31 Jan 2021 01:07:22 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7C60E64DFA for ; Sun, 31 Jan 2021 01:07:22 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7C60E64DFA Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B94F46B0005; Sat, 30 Jan 2021 20:07:21 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B44DE6B0006; Sat, 30 Jan 2021 20:07:21 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0CF16B006C; Sat, 30 Jan 2021 20:07:21 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0078.hostedemail.com [216.40.44.78]) by kanga.kvack.org (Postfix) with ESMTP id 88BC36B0005 for ; Sat, 30 Jan 2021 20:07:21 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 43256180AD806 for ; Sun, 31 Jan 2021 01:07:21 +0000 (UTC) X-FDA: 77764281882.18.suit18_0300598275b5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin18.hostedemail.com (Postfix) with ESMTP id 22D3A100EC664 for ; Sun, 31 Jan 2021 01:07:21 +0000 (UTC) X-HE-Tag: suit18_0300598275b5 X-Filterd-Recvd-Size: 8495 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf43.hostedemail.com (Postfix) with ESMTP for ; Sun, 31 Jan 2021 01:07:20 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 72AB164E1C for ; Sun, 31 Jan 2021 01:07:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1612055239; bh=CxtVBtXG1RK6bEiRQ4utBsgHs1JtSV1USk9q5Y8AlQ8=; h=References:In-Reply-To:From:Date:Subject:To:Cc:From; b=A5VJ/Cgy9E1tZpJUp/PPuUn1AiZxlm0kbfriMgkHIKggrl7xbpKeTJ7H/33/lIvbA raY2ISip/RmwNcgNpAbGPuENDDS2zMJRAk0JXYrroagXtWC776ePHUZQjRlUHIzxqt nNWoZiXaoLx4kBIl1TUDzfVSxe7STomiArm/KEghd9dlvU0Z93JWv08gOHiLX1rGx2 wZTMyRzRtGSB+GhZ35wic8/YgxI/O6PW1azYYWbPqQ81pjkg+ewd6soxxgiFeoo6/J ocMKE414eWfHsIqKezJ61lxFFEroS+GoycXYrWEt2NGzbfplYuNhJPJwbeAsYJZ49t BYoIId1nJPDKQ== Received: by mail-ej1-f53.google.com with SMTP id a9so329258ejr.2 for ; Sat, 30 Jan 2021 17:07:19 -0800 (PST) X-Gm-Message-State: AOAM5326fy8wu8w67KlWQmUq+vf4ajJ6MkrFgJJOsG5WAAAl85/CUnl7 BedyNyAI27vpW1Hewk3CAzc2OoC0kQb1nTvLrK4Wkw== X-Google-Smtp-Source: ABdhPJwTJwsIAohpbUU9Mc/jhFIjadLasppT9SxsBt9kySMviKzZXjPr6SoBBE1Vo8/IoT9PoBWalpKBVKrYsiQ9PC0= X-Received: by 2002:a17:906:c824:: with SMTP id dd4mr9914091ejb.253.1612055237934; Sat, 30 Jan 2021 17:07:17 -0800 (PST) MIME-Version: 1.0 References: <20210131001132.3368247-1-namit@vmware.com> <20210131001132.3368247-4-namit@vmware.com> In-Reply-To: <20210131001132.3368247-4-namit@vmware.com> From: Andy Lutomirski Date: Sat, 30 Jan 2021 17:07:06 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [RFC 03/20] mm/mprotect: do not flush on permission promotion To: Nadav Amit , Andrew Cooper Cc: Linux-MM , LKML , Nadav Amit , Andrea Arcangeli , Andrew Morton , Andy Lutomirski , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , X86 ML Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Adding Andrew Cooper, who has a distressingly extensive understanding of the x86 PTE magic. On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit wrote: > > From: Nadav Amit > > Currently, using mprotect() to unprotect a memory region or uffd to > unprotect a memory region causes a TLB flush. At least on x86, as > protection is promoted, no TLB flush is needed. > > Add an arch-specific pte_may_need_flush() which tells whether a TLB > flush is needed based on the old PTE and the new one. Implement an x86 > pte_may_need_flush(). > > For x86, besides the simple logic that PTE protection promotion or > changes of software bits does require a flush, also add logic that > considers the dirty-bit. If the dirty-bit is clear and write-protect is > set, no TLB flush is needed, as x86 updates the dirty-bit atomically > on write, and if the bit is clear, the PTE is reread. > > Signed-off-by: Nadav Amit > Cc: Andrea Arcangeli > Cc: Andrew Morton > Cc: Andy Lutomirski > Cc: Dave Hansen > Cc: Peter Zijlstra > Cc: Thomas Gleixner > Cc: Will Deacon > Cc: Yu Zhao > Cc: Nick Piggin > Cc: x86@kernel.org > --- > arch/x86/include/asm/tlbflush.h | 44 +++++++++++++++++++++++++++++++++ > include/asm-generic/tlb.h | 4 +++ > mm/mprotect.c | 3 ++- > 3 files changed, 50 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/include/asm/tlbflush.h b/arch/x86/include/asm/tlbflush.h > index 8c87a2e0b660..a617dc0a9b06 100644 > --- a/arch/x86/include/asm/tlbflush.h > +++ b/arch/x86/include/asm/tlbflush.h > @@ -255,6 +255,50 @@ static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, > > extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch); > > +static inline bool pte_may_need_flush(pte_t oldpte, pte_t newpte) > +{ > + const pteval_t ignore_mask = _PAGE_SOFTW1 | _PAGE_SOFTW2 | > + _PAGE_SOFTW3 | _PAGE_ACCESSED; Why is accessed ignored? Surely clearing the accessed bit needs a flush if the old PTE is present. > + const pteval_t enable_mask = _PAGE_RW | _PAGE_DIRTY | _PAGE_GLOBAL; > + pteval_t oldval = pte_val(oldpte); > + pteval_t newval = pte_val(newpte); > + pteval_t diff = oldval ^ newval; > + pteval_t disable_mask = 0; > + > + if (IS_ENABLED(CONFIG_X86_64) || IS_ENABLED(CONFIG_X86_PAE)) > + disable_mask = _PAGE_NX; > + > + /* new is non-present: need only if old is present */ > + if (pte_none(newpte)) > + return !pte_none(oldpte); > + > + /* > + * If, excluding the ignored bits, only RW and dirty are cleared and the > + * old PTE does not have the dirty-bit set, we can avoid a flush. This > + * is possible since x86 architecture set the dirty bit atomically while s/set/sets/ > + * it caches the PTE in the TLB. > + * > + * The condition considers any change to RW and dirty as not requiring > + * flush if the old PTE is not dirty or not writable for simplification > + * of the code and to consider (unlikely) cases of changing dirty-bit of > + * write-protected PTE. > + */ > + if (!(diff & ~(_PAGE_RW | _PAGE_DIRTY | ignore_mask)) && > + (!(pte_dirty(oldpte) || !pte_write(oldpte)))) > + return false; This logic seems confusing to me. Is your goal to say that, if the old PTE was clean and writable and the new PTE is write-protected, then no flush is needed? If so, I would believe you're right, but I'm not convinced you've actually implemented this. Also, there may be other things going on that need flushing, e.g. a change of the address or an accessed bit or NX change. Also, CET makes this extra bizarre. > + > + /* > + * Any change of PFN and any flag other than those that we consider > + * requires a flush (e.g., PAT, protection keys). To save flushes we do > + * not consider the access bit as it is considered by the kernel as > + * best-effort. > + */ > + return diff & ((oldval & enable_mask) | > + (newval & disable_mask) | > + ~(enable_mask | disable_mask | ignore_mask)); > +} > +#define pte_may_need_flush pte_may_need_flush > + > #endif /* !MODULE */ > > #endif /* _ASM_X86_TLBFLUSH_H */ > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index eea113323468..c2deec0b6919 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -654,6 +654,10 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, > } while (0) > #endif > > +#ifndef pte_may_need_flush > +static inline bool pte_may_need_flush(pte_t oldpte, pte_t newpte) { return true; } > +#endif > + > #endif /* CONFIG_MMU */ > > #endif /* _ASM_GENERIC__TLB_H */ > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 632d5a677d3f..b7473d2c9a1f 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -139,7 +139,8 @@ static unsigned long change_pte_range(struct mmu_gather *tlb, > ptent = pte_mkwrite(ptent); > } > ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); > - tlb_flush_pte_range(tlb, addr, PAGE_SIZE); > + if (pte_may_need_flush(oldpte, ptent)) > + tlb_flush_pte_range(tlb, addr, PAGE_SIZE); > pages++; > } else if (is_swap_pte(oldpte)) { > swp_entry_t entry = pte_to_swp_entry(oldpte); > -- > 2.25.1 >