From: Dave Hansen <dave.hansen@intel.com>
To: Nadav Amit <nadav.amit@gmail.com>, linux-mm@kvack.org
Cc: linux-kernel@vger.kernel.org,
Andrew Morton <akpm@linux-foundation.org>,
Nadav Amit <namit@vmware.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Peter Xu <peterx@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
Nick Piggin <npiggin@gmail.com>,
x86@kernel.org
Subject: Re: [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd()
Date: Fri, 11 Mar 2022 12:41:41 -0800 [thread overview]
Message-ID: <e5f84691-3475-1cbd-e46c-163bf594a4bc@intel.com> (raw)
In-Reply-To: <20220311190749.338281-6-namit@vmware.com>
On 3/11/22 11:07, Nadav Amit wrote:
> From: Nadav Amit <namit@vmware.com>
>
> Calls to change_protection_range() on THP can trigger, at least on x86,
> two TLB flushes for one page: one immediately, when pmdp_invalidate() is
> called by change_huge_pmd(), and then another one later (that can be
> batched) when change_protection_range() finishes.
>
> The first TLB flush is only necessary to prevent the dirty bit (and with
> a lesser importance the access bit) from changing while the PTE is
> modified. However, this is not necessary as the x86 CPUs set the
> dirty-bit atomically with an additional check that the PTE is (still)
> present. One caveat is Intel's Knights Landing that has a bug and does
> not do so.
First of all, thank you for your diligence here. This is a super
obscure issue. I think I put handling for it in the kernel and I'm not
sure I would have even thought about this angle.
That said, I'm not sure this is all necessary.
Yes, the Dirty bit can get set unexpectedly in some PTEs. But, the
question is whether it is *VALUABLE* and needs to be preserved. The
current kernel code pretty much just lets the hardware set the Dirty bit
and then ignores it. If it were valuable, ignoring it would have been a
bad thing. We'd be losing data on today's kernels because the hardware
told us about a write that happened but that the kernel ignored.
My mental model of what the microcode responsible for the erratum does
is something along these lines:
if (write)
pte |= _PAGE_DIRTY;
if (!pte_present(pte))
#PF
The PTE is marked dirty, but the write never actually executes. The
thread that triggered the A/D setting *also* gets a fault.
I'll double-check with some Intel folks to make sure I'm not missing
something. But, either way, I don't think we should be going to this
much trouble for the good ol' Xeon Phi. I doubt there are many still
around and I *REALLY* doubt they're running new kernels.
*If* we need this (and I'm not convinced we do), my first instinct would
be to just do this instead:
clear_cpu_cap(c, X86_FEATURE_PSE);
on KNL systems. If anyone cares, they know where to find us.
next prev parent reply other threads:[~2022-03-11 20:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-03-11 19:07 [RESEND PATCH v3 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 4/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2022-03-11 22:45 ` Nadav Amit
2022-03-11 19:07 ` [RESEND PATCH v3 5/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2022-03-11 20:41 ` Dave Hansen [this message]
2022-03-11 20:53 ` Nadav Amit
[not found] ` <20220311190749.338281-3-namit@vmware.com>
2022-03-11 19:41 ` [RESEND PATCH v3 2/5] x86/mm: check exec permissions on fault Dave Hansen
2022-03-11 20:38 ` Nadav Amit
2022-03-11 20:59 ` Dave Hansen
2022-03-11 21:16 ` Nadav Amit
2022-03-11 21:23 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e5f84691-3475-1cbd-e46c-163bf594a4bc@intel.com \
--to=dave.hansen@intel.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=nadav.amit@gmail.com \
--cc=namit@vmware.com \
--cc=npiggin@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox