linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <nadav.amit@gmail.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	x86@kernel.org
Subject: Re: [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd()
Date: Fri, 11 Mar 2022 12:53:30 -0800	[thread overview]
Message-ID: <A2252098-2EE1-4F8D-A1A2-A49EE8FD18A7@gmail.com> (raw)
In-Reply-To: <e5f84691-3475-1cbd-e46c-163bf594a4bc@intel.com>



> On Mar 11, 2022, at 12:41 PM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 3/11/22 11:07, Nadav Amit wrote:
>> From: Nadav Amit <namit@vmware.com>
>> 
>> Calls to change_protection_range() on THP can trigger, at least on x86,
>> two TLB flushes for one page: one immediately, when pmdp_invalidate() is
>> called by change_huge_pmd(), and then another one later (that can be
>> batched) when change_protection_range() finishes.
>> 
>> The first TLB flush is only necessary to prevent the dirty bit (and with
>> a lesser importance the access bit) from changing while the PTE is
>> modified. However, this is not necessary as the x86 CPUs set the
>> dirty-bit atomically with an additional check that the PTE is (still)
>> present. One caveat is Intel's Knights Landing that has a bug and does
>> not do so.
> 
> First of all, thank you for your diligence here.  This is a super
> obscure issue.  I think I put handling for it in the kernel and I'm not
> sure I would have even thought about this angle.
> 
> That said, I'm not sure this is all necessary.
> 
> Yes, the Dirty bit can get set unexpectedly in some PTEs.  But, the
> question is whether it is *VALUABLE* and needs to be preserved.  The
> current kernel code pretty much just lets the hardware set the Dirty bit
> and then ignores it.  If it were valuable, ignoring it would have been a
> bad thing.  We'd be losing data on today's kernels because the hardware
> told us about a write that happened but that the kernel ignored.
> 
> My mental model of what the microcode responsible for the erratum does
> is something along these lines:
> 
> 	if (write)
> 		pte |= _PAGE_DIRTY;
> 	if (!pte_present(pte))
> 		#PF
> 
> The PTE is marked dirty, but the write never actually executes.  The
> thread that triggered the A/D setting *also* gets a fault.
> 

This makes perfect sense. I guess I misunderstood or forgot the erratum.
But feel free to recheck. It would allow to remove the KNL check, and
probably the first patch in this series. But I don’t think it would
allow to get rid of pmdp_invalidate_ad() since I do not fell comfortable
just to use pmdp_establish() directly: I do not know about other
architectures well enough to say that they have the same atomicity
guarantees when it comes to A/D bits.

> I'll double-check with some Intel folks to make sure I'm not missing
> something.  But, either way, I don't think we should be going to this
> much trouble for the good ol' Xeon Phi.  I doubt there are many still
> around and I *REALLY* doubt they're running new kernels.
> 
> *If* we need this (and I'm not convinced we do), my first instinct would
> be to just do this instead:
> 
> 	clear_cpu_cap(c, X86_FEATURE_PSE);
> 
> on KNL systems.  If anyone cares, they know where to find us.

I think that it is not necessary and your understanding of the erratum
is the right one. Let me know if you find it is not the case.

BTW: Thanks for the quick response, and sorry for the time it took me
to send v3.



  reply	other threads:[~2022-03-11 20:53 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-11 19:07 [RESEND PATCH v3 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 4/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2022-03-11 22:45   ` Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2022-03-11 20:41   ` Dave Hansen
2022-03-11 20:53     ` Nadav Amit [this message]
     [not found] ` <20220311190749.338281-3-namit@vmware.com>
2022-03-11 19:41   ` [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault Dave Hansen
2022-03-11 20:38     ` Nadav Amit
2022-03-11 20:59       ` Dave Hansen
2022-03-11 21:16         ` Nadav Amit
2022-03-11 21:23           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=A2252098-2EE1-4F8D-A1A2-A49EE8FD18A7@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox