From: Nadav Amit <nadav.amit@gmail.com>
To: Nadav Amit <namit@vmware.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Peter Xu <peterx@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
Nick Piggin <npiggin@gmail.com>,
"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()
Date: Tue, 26 Oct 2021 10:44:15 -0700 [thread overview]
Message-ID: <29E7E8A4-C400-40A5-ACEC-F15C976DDEE0@gmail.com> (raw)
In-Reply-To: <E38AEB97-DE1B-4C91-A959-132EC24812AE@vmware.com>
> On Oct 26, 2021, at 9:47 AM, Nadav Amit <namit@vmware.com> wrote:
>
>
>
>> On Oct 26, 2021, at 9:06 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>>
>> On 10/21/21 5:21 AM, Nadav Amit wrote:
>>> The first TLB flush is only necessary to prevent the dirty bit (and with
>>> a lesser importance the access bit) from changing while the PTE is
>>> modified. However, this is not necessary as the x86 CPUs set the
>>> dirty-bit atomically with an additional check that the PTE is (still)
>>> present. One caveat is Intel's Knights Landing that has a bug and does
>>> not do so.
>>
>> First, did I miss the check in this patch for X86_BUG_PTE_LEAK? I don't
>> see it anywhere.
>
> No, it is me who missed it. It should have been in pmdp_invalidate_ad():
>
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 3481b35cb4ec..f14f64cc17b5 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -780,6 +780,30 @@ int pmd_clear_huge(pmd_t *pmd)
> return 0;
> }
>
> +/*
> + * pmdp_invalidate_ad() - prevents the access and dirty bits from being further
> + * updated by the CPU.
> + *
> + * Returns the original PTE.
> + *
> + * During an access to a page, x86 CPUs set the dirty and access bit atomically
> + * with an additional check of the present-bit. Therefore, it is possible to
> + * avoid the TLB flush if we change the PTE atomically, as pmdp_establish does.
> + *
> + * We do not make this optimization on certain CPUs that has a bug that violates
> + * this behavior (specifically Knights Landing).
> + */
> +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
> + pmd_t *pmdp)
> +{
> + pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
> +
> + if (cpu_feature_enabled(X86_BUG_PTE_LEAK))
> + flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
> + return old;
> +}
>
>>
>>> - * pmdp_invalidate() is required to make sure we don't miss
>>> - * dirty/young flags set by hardware.
>>
>> This got me thinking... In here:
>>
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20160708001909.FB2443E2%40viggo.jf.intel.com%2F&data=04%7C01%7Cnamit%40vmware.com%7Cf6a2a69eec094b12638108d9989afb60%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637708613735772213%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=o8fYbm8BKHvWxYC9aO5e3MFLkaOnQxvDMy%2BEnYxz56I%3D&reserved=0
>>
>> I wrote:
>>
>>> These bits are truly "stray". In the case of the Dirty bit, the
>>> thread associated with the stray set was *not* allowed to write to
>>> the page. This means that we do not have to launder the bit(s); we
>>> can simply ignore them.
>>
>> Is the goal of your proposed patch here to ensure that the dirty bit is
>> not set at *all*? Or, is it to ensure that a dirty bit which we need to
>> *launder* is never set?
>
> At *all*.
>
> Err… I remembered from our previous discussions that the dirty bit cannot
> be set once the R/W bit is cleared atomically. But going back to the SDM,
> I see the (relatively new?) note:
>
> "If software on one logical processor writes to a page while software on
> another logical processor concurrently clears the R/W flag in the
> paging-structure entry that maps the page, execution on some processors may
> result in the entry’s dirty flag being set (due to the write on the first
> logical processor) and the entry’s R/W flag being clear (due to the update
> to the entry on the second logical processor). This will never occur on a
> processor that supports control-flow enforcement technology (CET)”
>
> So I guess that this optimization can only be enabled when CET is enabled.
>
> :(
I still wonder whether the SDM comment applies to present bit vs dirty
bit atomicity as well.
On AMD’s APM I find:
"The processor never sets the Accessed bit or the Dirty bit for a not
present page (P = 0). The ordering of Accessed and Dirty bit updates
with respect to surrounding loads and stores is discussed below.”
( The later comment regards ordering to WC memory ).
I don’t know if I read it too creatively...
next prev parent reply other threads:[~2021-10-26 17:44 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-21 12:21 [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit
2021-10-21 12:21 ` [PATCH v2 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2021-10-26 15:54 ` Dave Hansen
2021-10-26 15:57 ` Nadav Amit
2021-10-21 12:21 ` [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2021-10-25 10:52 ` Peter Zijlstra
2021-10-25 16:29 ` Nadav Amit
2021-10-26 16:06 ` Dave Hansen
2021-10-26 16:47 ` Nadav Amit
2021-10-26 16:53 ` Nadav Amit
2021-10-26 17:44 ` Nadav Amit [this message]
2021-10-26 18:44 ` Dave Hansen
2021-10-26 19:06 ` Nadav Amit
2021-10-26 19:40 ` Dave Hansen
2021-10-26 20:07 ` Nadav Amit
2021-10-26 20:47 ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 3/5] x86/mm: check exec permissions on fault Nadav Amit
2021-10-25 10:59 ` Peter Zijlstra
2021-10-25 11:13 ` Andrew Cooper
2021-10-25 14:23 ` Dave Hansen
2021-10-25 14:20 ` Dave Hansen
2021-10-25 16:19 ` Nadav Amit
2021-10-25 17:45 ` Dave Hansen
2021-10-25 17:51 ` Nadav Amit
2021-10-25 18:00 ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 4/5] mm/mprotect: use mmu_gather Nadav Amit
2021-10-21 12:21 ` [PATCH v2 5/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2021-10-25 11:12 ` Peter Zijlstra
2021-10-25 16:27 ` Nadav Amit
2021-10-22 3:04 ` [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Andrew Morton
2021-10-22 21:58 ` Nadav Amit
2021-10-26 16:09 ` Dave Hansen
2021-10-25 10:50 ` Peter Zijlstra
2021-10-25 16:42 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=29E7E8A4-C400-40A5-ACEC-F15C976DDEE0@gmail.com \
--to=nadav.amit@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=namit@vmware.com \
--cc=npiggin@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox