From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A061BC433E0 for ; Sun, 31 Jan 2021 01:17:37 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2627264E18 for ; Sun, 31 Jan 2021 01:17:37 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2627264E18 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A40906B0005; Sat, 30 Jan 2021 20:17:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9C8AF6B0006; Sat, 30 Jan 2021 20:17:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 86A1D6B006C; Sat, 30 Jan 2021 20:17:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0113.hostedemail.com [216.40.44.113]) by kanga.kvack.org (Postfix) with ESMTP id 6A8856B0005 for ; Sat, 30 Jan 2021 20:17:36 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 309451EE6 for ; Sun, 31 Jan 2021 01:17:36 +0000 (UTC) X-FDA: 77764307712.06.cow81_4218578275b5 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id 128591003D990 for ; Sun, 31 Jan 2021 01:17:36 +0000 (UTC) X-HE-Tag: cow81_4218578275b5 X-Filterd-Recvd-Size: 9306 Received: from mail-pj1-f53.google.com (mail-pj1-f53.google.com [209.85.216.53]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Sun, 31 Jan 2021 01:17:35 +0000 (UTC) Received: by mail-pj1-f53.google.com with SMTP id kx7so8243208pjb.2 for ; Sat, 30 Jan 2021 17:17:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=TVAc/O4BQg40Zvd5wwprmJLnqrBGfA6RYSuDOYbL/4Y=; b=KoluyyXdtETTHNX66nK0bhp5C7D+4iJnvV+Sh2JJQ9IntaD25EraRDo0uus6cYrHHk Z7j8AgMsbs/iYF0ytpMGeXibMrjk+LGg3zi4zCoKX3tQy5qfBhJDikCkflQ1LxBwNq8x 3YbPfnlVxYPJ+yDzqeGPAeTO7+xKnWweYAP9g5IPB+Y/6fFe99CVQvUUM1Og5Tz4bMpx Ho3WhoIOBmkEDkpt3re1bco5nr6d0QTQodXGpBRRJbRYYd1Ctm18hEDjOh8t/D93aoAt 0nmLx/IcPkv4QAGIZYyp3d/vfQJTMYvni5wjS3g3C1a+jctBBh9k0zWd/pT1TQVXu6qH 8jDA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=TVAc/O4BQg40Zvd5wwprmJLnqrBGfA6RYSuDOYbL/4Y=; b=JRMdRFIb2c5Ljcls6DfFpYCoYmMc+LZqTBScjdYFmrDMsftaa7xE/g3DJ7wB2KYU8u NL04JZdjP+lcaij5X3y1M1f1C4fNJOYYYbfH8u9mZ9VtHwpOZWnK/R8/oJz3URUvjYR9 A9FGlHnQVo4Zp1eRQPxVzm3fxtMhvWx49NHK3ZJbr5gKTqQno4hCioKWUQbJTXcL1jIs sYtrwtOmh2FByHRW/MFvWB/IbRAAQC4qxC1kZbhsQFqEHCGtJ/UR9uglJwEimDsBgsg1 CRfV9nUmNgZJfaieg0ASXlnpa7pnm76d9bElhgmS3G78QxIdTucpT1i/SD3cZiZFJlHE H72Q== X-Gm-Message-State: AOAM533ow/JZaIj9YNwWwozZjnHVKbspnJs1aZaHt0zjhcD4DSXsaFHM hrtcAx8kqQ9lLPM2TlL9dkA= X-Google-Smtp-Source: ABdhPJw6Cn2YL+xxNnKPOYM1u/zRC4Z1Ao8ivjdXT4i17YveNo2HWBMIGvoPgrk6ZiirJ8qfUcwIzQ== X-Received: by 2002:a17:902:c381:b029:e1:54b8:921 with SMTP id g1-20020a170902c381b02900e154b80921mr63551plg.40.1612055854346; Sat, 30 Jan 2021 17:17:34 -0800 (PST) Received: from [192.168.88.245] (c-24-6-216-183.hsd1.ca.comcast.net. [24.6.216.183]) by smtp.gmail.com with ESMTPSA id j6sm12781913pfg.159.2021.01.30.17.17.32 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 30 Jan 2021 17:17:33 -0800 (PST) Content-Type: text/plain; charset=us-ascii Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.4\)) Subject: Re: [RFC 03/20] mm/mprotect: do not flush on permission promotion From: Nadav Amit In-Reply-To: Date: Sat, 30 Jan 2021 17:17:31 -0800 Cc: Andrew Cooper , Linux-MM , LKML , Andrea Arcangeli , Andrew Morton , Dave Hansen , Peter Zijlstra , Thomas Gleixner , Will Deacon , Yu Zhao , Nick Piggin , X86 ML Content-Transfer-Encoding: quoted-printable Message-Id: <68D3C593-A88C-4100-90E9-E90F7733344F@gmail.com> References: <20210131001132.3368247-1-namit@vmware.com> <20210131001132.3368247-4-namit@vmware.com> To: Andy Lutomirski X-Mailer: Apple Mail (2.3608.120.23.2.4) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > On Jan 30, 2021, at 5:07 PM, Andy Lutomirski wrote: >=20 > Adding Andrew Cooper, who has a distressingly extensive understanding > of the x86 PTE magic. >=20 > On Sat, Jan 30, 2021 at 4:16 PM Nadav Amit = wrote: >> From: Nadav Amit >>=20 >> Currently, using mprotect() to unprotect a memory region or uffd to >> unprotect a memory region causes a TLB flush. At least on x86, as >> protection is promoted, no TLB flush is needed. >>=20 >> Add an arch-specific pte_may_need_flush() which tells whether a TLB >> flush is needed based on the old PTE and the new one. Implement an = x86 >> pte_may_need_flush(). >>=20 >> For x86, besides the simple logic that PTE protection promotion or >> changes of software bits does require a flush, also add logic that >> considers the dirty-bit. If the dirty-bit is clear and write-protect = is >> set, no TLB flush is needed, as x86 updates the dirty-bit atomically >> on write, and if the bit is clear, the PTE is reread. >>=20 >> Signed-off-by: Nadav Amit >> Cc: Andrea Arcangeli >> Cc: Andrew Morton >> Cc: Andy Lutomirski >> Cc: Dave Hansen >> Cc: Peter Zijlstra >> Cc: Thomas Gleixner >> Cc: Will Deacon >> Cc: Yu Zhao >> Cc: Nick Piggin >> Cc: x86@kernel.org >> --- >> arch/x86/include/asm/tlbflush.h | 44 = +++++++++++++++++++++++++++++++++ >> include/asm-generic/tlb.h | 4 +++ >> mm/mprotect.c | 3 ++- >> 3 files changed, 50 insertions(+), 1 deletion(-) >>=20 >> diff --git a/arch/x86/include/asm/tlbflush.h = b/arch/x86/include/asm/tlbflush.h >> index 8c87a2e0b660..a617dc0a9b06 100644 >> --- a/arch/x86/include/asm/tlbflush.h >> +++ b/arch/x86/include/asm/tlbflush.h >> @@ -255,6 +255,50 @@ static inline void arch_tlbbatch_add_mm(struct = arch_tlbflush_unmap_batch *batch, >>=20 >> extern void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch = *batch); >>=20 >> +static inline bool pte_may_need_flush(pte_t oldpte, pte_t newpte) >> +{ >> + const pteval_t ignore_mask =3D _PAGE_SOFTW1 | _PAGE_SOFTW2 | >> + _PAGE_SOFTW3 | _PAGE_ACCESSED; >=20 > Why is accessed ignored? Surely clearing the accessed bit needs a > flush if the old PTE is present. I am just following the current scheme in the kernel (x86): int ptep_clear_flush_young(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { /* * On x86 CPUs, clearing the accessed bit without a TLB flush * doesn't cause data corruption. [ It could cause incorrect * page aging and the (mistaken) reclaim of hot pages, but the * chance of that should be relatively low. ] * * So as a performance optimization don't flush the TLB when * clearing the accessed bit, it will eventually be flushed by * a context switch or a VM operation anyway. [ In the rare * event of it not getting flushed for a long time the delay * shouldn't really matter because there's no real memory * pressure for swapout to react to. ] */ return ptep_test_and_clear_young(vma, address, ptep); } >=20 >> + const pteval_t enable_mask =3D _PAGE_RW | _PAGE_DIRTY | = _PAGE_GLOBAL; >> + pteval_t oldval =3D pte_val(oldpte); >> + pteval_t newval =3D pte_val(newpte); >> + pteval_t diff =3D oldval ^ newval; >> + pteval_t disable_mask =3D 0; >> + >> + if (IS_ENABLED(CONFIG_X86_64) || IS_ENABLED(CONFIG_X86_PAE)) >> + disable_mask =3D _PAGE_NX; >> + >> + /* new is non-present: need only if old is present */ >> + if (pte_none(newpte)) >> + return !pte_none(oldpte); >> + >> + /* >> + * If, excluding the ignored bits, only RW and dirty are = cleared and the >> + * old PTE does not have the dirty-bit set, we can avoid a = flush. This >> + * is possible since x86 architecture set the dirty bit = atomically while >=20 > s/set/sets/ >=20 >> + * it caches the PTE in the TLB. >> + * >> + * The condition considers any change to RW and dirty as not = requiring >> + * flush if the old PTE is not dirty or not writable for = simplification >> + * of the code and to consider (unlikely) cases of changing = dirty-bit of >> + * write-protected PTE. >> + */ >> + if (!(diff & ~(_PAGE_RW | _PAGE_DIRTY | ignore_mask)) && >> + (!(pte_dirty(oldpte) || !pte_write(oldpte)))) >> + return false; >=20 > This logic seems confusing to me. Is your goal to say that, if the > old PTE was clean and writable and the new PTE is write-protected, > then no flush is needed? Yes. > If so, I would believe you're right, but I'm > not convinced you've actually implemented this. Also, there may be > other things going on that need flushing, e.g. a change of the address > or an accessed bit or NX change. The first part (diff & ~(_PAGE_RW | _PAGE_DIRTY | ignore_mask) is = supposed to capture changes of address, NX-bit, etc. The second part is indeed wrong. It should have been: (!pte_dirty(oldpte) || !pte_write(oldpte)) >=20 > Also, CET makes this extra bizarre. I saw something about the not-writeable-and-dirty considered = differently. I need to have a look, but I am not sure it affects anything.