Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nadav Amit <nadav.amit@gmail.com>
To: Nadav Amit <namit@vmware.com>
Cc: Dave Hansen <dave.hansen@intel.com>,
	Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	"x86@kernel.org" <x86@kernel.org>
Subject: Re: [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd()
Date: Tue, 26 Oct 2021 10:44:15 -0700	[thread overview]
Message-ID: <29E7E8A4-C400-40A5-ACEC-F15C976DDEE0@gmail.com> (raw)
In-Reply-To: <E38AEB97-DE1B-4C91-A959-132EC24812AE@vmware.com>



> On Oct 26, 2021, at 9:47 AM, Nadav Amit <namit@vmware.com> wrote:
> 
> 
> 
>> On Oct 26, 2021, at 9:06 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>> 
>> On 10/21/21 5:21 AM, Nadav Amit wrote:
>>> The first TLB flush is only necessary to prevent the dirty bit (and with
>>> a lesser importance the access bit) from changing while the PTE is
>>> modified. However, this is not necessary as the x86 CPUs set the
>>> dirty-bit atomically with an additional check that the PTE is (still)
>>> present. One caveat is Intel's Knights Landing that has a bug and does
>>> not do so.
>> 
>> First, did I miss the check in this patch for X86_BUG_PTE_LEAK?  I don't
>> see it anywhere.
> 
> No, it is me who missed it. It should have been in pmdp_invalidate_ad():
> 
> diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
> index 3481b35cb4ec..f14f64cc17b5 100644
> --- a/arch/x86/mm/pgtable.c
> +++ b/arch/x86/mm/pgtable.c
> @@ -780,6 +780,30 @@ int pmd_clear_huge(pmd_t *pmd)
> 	return 0;
> }
> 
> +/*
> + * pmdp_invalidate_ad() - prevents the access and dirty bits from being further
> + * updated by the CPU.
> + *
> + * Returns the original PTE.
> + *
> + * During an access to a page, x86 CPUs set the dirty and access bit atomically
> + * with an additional check of the present-bit. Therefore, it is possible to
> + * avoid the TLB flush if we change the PTE atomically, as pmdp_establish does.
> + *
> + * We do not make this optimization on certain CPUs that has a bug that violates
> + * this behavior (specifically Knights Landing).
> + */
> +pmd_t pmdp_invalidate_ad(struct vm_area_struct *vma, unsigned long address,
> +			 pmd_t *pmdp)
> +{
> +	pmd_t old = pmdp_establish(vma, address, pmdp, pmd_mkinvalid(*pmdp));
> +
> +	if (cpu_feature_enabled(X86_BUG_PTE_LEAK))
> +		flush_pmd_tlb_range(vma, address, address + HPAGE_PMD_SIZE);
> +	return old;
> +}
> 
>> 
>>> -	 * pmdp_invalidate() is required to make sure we don't miss
>>> -	 * dirty/young flags set by hardware.
>> 
>> This got me thinking...  In here:
>> 
>>> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flkml%2F20160708001909.FB2443E2%40viggo.jf.intel.com%2F&amp;data=04%7C01%7Cnamit%40vmware.com%7Cf6a2a69eec094b12638108d9989afb60%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C637708613735772213%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=o8fYbm8BKHvWxYC9aO5e3MFLkaOnQxvDMy%2BEnYxz56I%3D&amp;reserved=0
>> 
>> I wrote:
>> 
>>> These bits are truly "stray".  In the case of the Dirty bit, the
>>> thread associated with the stray set was *not* allowed to write to
>>> the page.  This means that we do not have to launder the bit(s); we
>>> can simply ignore them.
>> 
>> Is the goal of your proposed patch here to ensure that the dirty bit is
>> not set at *all*?  Or, is it to ensure that a dirty bit which we need to
>> *launder* is never set?
> 
> At *all*.
> 
> Err… I remembered from our previous discussions that the dirty bit cannot
> be set once the R/W bit is cleared atomically. But going back to the SDM,
> I see the (relatively new?) note:
> 
> "If software on one logical processor writes to a page while software on
> another logical processor concurrently clears the R/W flag in the
> paging-structure entry that maps the page, execution on some processors may
> result in the entry’s dirty flag being set (due to the write on the first
> logical processor) and the entry’s R/W flag being clear (due to the update
> to the entry on the second logical processor). This will never occur on a
> processor that supports control-flow enforcement technology (CET)”
> 
> So I guess that this optimization can only be enabled when CET is enabled.
> 
> :(

I still wonder whether the SDM comment applies to present bit vs dirty
bit atomicity as well.

On AMD’s APM I find:

"The processor never sets the Accessed bit or the Dirty bit for a not
present page (P = 0). The ordering of Accessed and Dirty bit updates
with respect to surrounding loads and stores is discussed below.”

( The later comment regards ordering to WC memory ).

I don’t know if I read it too creatively...

next prev parent reply	other threads:[~2021-10-26 17:44 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 12:21 [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit
2021-10-21 12:21 ` [PATCH v2 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2021-10-26 15:54   ` Dave Hansen
2021-10-26 15:57     ` Nadav Amit
2021-10-21 12:21 ` [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2021-10-25 10:52   ` Peter Zijlstra
2021-10-25 16:29     ` Nadav Amit
2021-10-26 16:06   ` Dave Hansen
2021-10-26 16:47     ` Nadav Amit
2021-10-26 16:53       ` Nadav Amit
2021-10-26 17:44       ` Nadav Amit [this message]
2021-10-26 18:44         ` Dave Hansen
2021-10-26 19:06           ` Nadav Amit
2021-10-26 19:40             ` Dave Hansen
2021-10-26 20:07               ` Nadav Amit
2021-10-26 20:47                 ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 3/5] x86/mm: check exec permissions on fault Nadav Amit
2021-10-25 10:59   ` Peter Zijlstra
2021-10-25 11:13     ` Andrew Cooper
2021-10-25 14:23     ` Dave Hansen
2021-10-25 14:20   ` Dave Hansen
2021-10-25 16:19     ` Nadav Amit
2021-10-25 17:45       ` Dave Hansen
2021-10-25 17:51         ` Nadav Amit
2021-10-25 18:00           ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 4/5] mm/mprotect: use mmu_gather Nadav Amit
2021-10-21 12:21 ` [PATCH v2 5/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2021-10-25 11:12   ` Peter Zijlstra
2021-10-25 16:27     ` Nadav Amit
2021-10-22  3:04 ` [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Andrew Morton
2021-10-22 21:58   ` Nadav Amit
2021-10-26 16:09     ` Dave Hansen
2021-10-25 10:50   ` Peter Zijlstra
2021-10-25 16:42     ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=29E7E8A4-C400-40A5-ACEC-F15C976DDEE0@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=namit@vmware.com \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox