From: Nadav Amit <nadav.amit@gmail.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linux-MM <linux-mm@kvack.org>,
LKML <linux-kernel@vger.kernel.org>,
Andrea Arcangeli <aarcange@redhat.com>,
Andrew Cooper <andrew.cooper3@citrix.com>,
Andrew Morton <akpm@linux-foundation.org>,
Andy Lutomirski <luto@kernel.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Peter Xu <peterx@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Thomas Gleixner <tglx@linutronix.de>,
Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
Nick Piggin <npiggin@gmail.com>,
x86@kernel.org
Subject: Re: [PATCH v2 3/5] x86/mm: check exec permissions on fault
Date: Mon, 25 Oct 2021 09:19:35 -0700 [thread overview]
Message-ID: <00C2DC4B-A77D-4B32-B7F7-2291830BC2D2@gmail.com> (raw)
In-Reply-To: <e55875fa-1264-7e08-3bb8-ed984f6ea5b3@intel.com>
> On Oct 25, 2021, at 7:20 AM, Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 10/21/21 5:21 AM, Nadav Amit wrote:
>> access_error() currently does not check for execution permission
>> violation.
> Ye
>
>> As a result, spurious page-faults due to execution permission
>> violation cause SIGSEGV.
>
> While I could totally believe that something is goofy when VMAs are
> being changed underneath a page fault, I'm having trouble figuring out
> why the "if (error_code & X86_PF_WRITE)" code is being modified.
In the scenario I mentioned the VMAs are not changed underneath the
page-fault. They change *before* the page-fault, but there are
residues of the old PTE in the TLB.
>
>> It appears not to be an issue so far, but the next patches avoid TLB
>> flushes on permission promotion, which can lead to this scenario. nodejs
>> for instance crashes when TLB flush is avoided on permission promotion.
>
> Just to be clear, "promotion" is going from something like:
>
> W=0->W=1
> or
> NX=1->NX=0
>
> right? I tend to call that "relaxing" permissions.
I specifically talk about NX=1>NX=0.
I can change the language to “relaxing”.
>
> Currently, X86_PF_WRITE faults are considered an access error unless the
> VMA to which the write occurred allows writes. Returning "no access
> error" permits continuing and handling the copy-on-write.
>
> It sounds like you want to expand that. You want to add a whole class
> of new faults that can be ignored: not just that some COW handling might
> be necessary, but that the PTE itself might be out of date. Just like
> a "COW fault" may just result in setting the PTE.W=1 and moving on with
> our day, an instruction fault might now just end up with setting
> PTE.NX=0 and also moving on with our day.
You raise an interesting idea (which can easily be implemented with uffd),
but no - I had none of that in my mind.
My only purpose is to deal with actual spurious page-faults that I
encountered when I removed the TLB flush the happens after NX=1->NX=0.
I am actually surprised that the kernel makes such a strong assumption
that every change of NX=1->NX=0 would be followed by a TLB flush, and
that during these changes the mm is locked for write. But that is the
case. If you do not have this change and a PTE is changed from
NX=1->NX=0 and *later* you access the page, you can have a page-fault
due to stale PTE, and get a SIGSEGV since access_error() is wrong to
assume that this is an invalid access.
I did not change and there are no changes to the VMA during the
page-fault. The page-fault handler would do pretty much nothing and
return to user-space which would retry the instruction. [ page-fault
triggers an implicit TLB flush of the offending PTE ]
>
> I'm really confused why the "error_code & X86_PF_WRITE" case is getting
> modified. I would have expected it to be something like just adding:
>
> /* read, instruction fetch */
> if (error_code & X86_PF_INSN) {
> /* Avoid enforcing access error if spurious: */
> if (unlikely(!(vma->vm_flags & VM_EXEC)))
> return 1;
> return 0;
> }
>
> I'm really confused what X86_PF_WRITE and X86_PF_INSN have in common
> other than both being able to (now) be generated spuriously.
That was my first version, but I was concerned that perhaps there is
some strange scenario in which both X86_PF_WRITE and X86_PF_INSN can
be set. That is the reason that Peter asked you whether this is
something that might happen.
If you confirm they cannot be both set, I would the version you just
mentioned.
next prev parent reply other threads:[~2021-10-25 16:19 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-21 12:21 [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit
2021-10-21 12:21 ` [PATCH v2 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2021-10-26 15:54 ` Dave Hansen
2021-10-26 15:57 ` Nadav Amit
2021-10-21 12:21 ` [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2021-10-25 10:52 ` Peter Zijlstra
2021-10-25 16:29 ` Nadav Amit
2021-10-26 16:06 ` Dave Hansen
2021-10-26 16:47 ` Nadav Amit
2021-10-26 16:53 ` Nadav Amit
2021-10-26 17:44 ` Nadav Amit
2021-10-26 18:44 ` Dave Hansen
2021-10-26 19:06 ` Nadav Amit
2021-10-26 19:40 ` Dave Hansen
2021-10-26 20:07 ` Nadav Amit
2021-10-26 20:47 ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 3/5] x86/mm: check exec permissions on fault Nadav Amit
2021-10-25 10:59 ` Peter Zijlstra
2021-10-25 11:13 ` Andrew Cooper
2021-10-25 14:23 ` Dave Hansen
2021-10-25 14:20 ` Dave Hansen
2021-10-25 16:19 ` Nadav Amit [this message]
2021-10-25 17:45 ` Dave Hansen
2021-10-25 17:51 ` Nadav Amit
2021-10-25 18:00 ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 4/5] mm/mprotect: use mmu_gather Nadav Amit
2021-10-21 12:21 ` [PATCH v2 5/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2021-10-25 11:12 ` Peter Zijlstra
2021-10-25 16:27 ` Nadav Amit
2021-10-22 3:04 ` [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Andrew Morton
2021-10-22 21:58 ` Nadav Amit
2021-10-26 16:09 ` Dave Hansen
2021-10-25 10:50 ` Peter Zijlstra
2021-10-25 16:42 ` Nadav Amit
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=00C2DC4B-A77D-4B32-B7F7-2291830BC2D2@gmail.com \
--to=nadav.amit@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=andrew.cooper3@citrix.com \
--cc=dave.hansen@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=npiggin@gmail.com \
--cc=peterx@redhat.com \
--cc=peterz@infradead.org \
--cc=tglx@linutronix.de \
--cc=will@kernel.org \
--cc=x86@kernel.org \
--cc=yuzhao@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox