Re: [PATCH v2 3/5] x86/mm: check exec permissions on fault

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Nadav Amit <nadav.amit@gmail.com>
To: Dave Hansen <dave.hansen@intel.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andy Lutomirski <luto@kernel.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Peter Xu <peterx@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Will Deacon <will@kernel.org>, Yu Zhao <yuzhao@google.com>,
	Nick Piggin <npiggin@gmail.com>,
	x86@kernel.org
Subject: Re: [PATCH v2 3/5] x86/mm: check exec permissions on fault
Date: Mon, 25 Oct 2021 09:19:35 -0700	[thread overview]
Message-ID: <00C2DC4B-A77D-4B32-B7F7-2291830BC2D2@gmail.com> (raw)
In-Reply-To: <e55875fa-1264-7e08-3bb8-ed984f6ea5b3@intel.com>

> On Oct 25, 2021, at 7:20 AM, Dave Hansen <dave.hansen@intel.com> wrote:
> 
> On 10/21/21 5:21 AM, Nadav Amit wrote:
>> access_error() currently does not check for execution permission
>> violation. 
> Ye
> 
>> As a result, spurious page-faults due to execution permission
>> violation cause SIGSEGV.
> 
> While I could totally believe that something is goofy when VMAs are
> being changed underneath a page fault, I'm having trouble figuring out
> why the "if (error_code & X86_PF_WRITE)" code is being modified.

In the scenario I mentioned the VMAs are not changed underneath the
page-fault. They change *before* the page-fault, but there are
residues of the old PTE in the TLB. 

> 
>> It appears not to be an issue so far, but the next patches avoid TLB
>> flushes on permission promotion, which can lead to this scenario. nodejs
>> for instance crashes when TLB flush is avoided on permission promotion.
> 
> Just to be clear, "promotion" is going from something like:
> 
> 	W=0->W=1
> or
> 	NX=1->NX=0
> 
> right?  I tend to call that "relaxing" permissions.

I specifically talk about NX=1>NX=0.

I can change the language to “relaxing”.

> 
> Currently, X86_PF_WRITE faults are considered an access error unless the
> VMA to which the write occurred allows writes.  Returning "no access
> error" permits continuing and handling the copy-on-write.
> 
> It sounds like you want to expand that.  You want to add a whole class
> of new faults that can be ignored: not just that some COW handling might
> be necessary, but that the PTE itself might be out of date.    Just like
> a "COW fault" may just result in setting the PTE.W=1 and moving on with
> our day, an instruction fault might now just end up with setting
> PTE.NX=0 and also moving on with our day.

You raise an interesting idea (which can easily be implemented with uffd),
but no - I had none of that in my mind.

My only purpose is to deal with actual spurious page-faults that I
encountered when I removed the TLB flush the happens after NX=1->NX=0.

I am actually surprised that the kernel makes such a strong assumption
that every change of NX=1->NX=0 would be followed by a TLB flush, and
that during these changes the mm is locked for write. But that is the
case. If you do not have this change and a PTE is changed from
NX=1->NX=0 and *later* you access the page, you can have a page-fault
due to stale PTE, and get a SIGSEGV since access_error() is wrong to
assume that this is an invalid access.

I did not change and there are no changes to the VMA during the
page-fault. The page-fault handler would do pretty much nothing and
return to user-space which would retry the instruction. [ page-fault
triggers an implicit TLB flush of the offending PTE ]

> 
> I'm really confused why the "error_code & X86_PF_WRITE" case is getting
> modified.  I would have expected it to be something like just adding:
> 
> 	/* read, instruction fetch */
> 	if (error_code & X86_PF_INSN) {
>                /* Avoid enforcing access error if spurious: */
>                if (unlikely(!(vma->vm_flags & VM_EXEC)))
>                        return 1;
>                return 0;
>        }
> 
> I'm really confused what X86_PF_WRITE and X86_PF_INSN have in common
> other than both being able to (now) be generated spuriously.

That was my first version, but I was concerned that perhaps there is
some strange scenario in which both X86_PF_WRITE and X86_PF_INSN can
be set. That is the reason that Peter asked you whether this is
something that might happen.

If you confirm they cannot be both set, I would the version you just
mentioned.

next prev parent reply	other threads:[~2021-10-25 16:19 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-21 12:21 [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Nadav Amit
2021-10-21 12:21 ` [PATCH v2 1/5] x86: Detection of Knights Landing A/D leak Nadav Amit
2021-10-26 15:54   ` Dave Hansen
2021-10-26 15:57     ` Nadav Amit
2021-10-21 12:21 ` [PATCH v2 2/5] mm: avoid unnecessary flush on change_huge_pmd() Nadav Amit
2021-10-25 10:52   ` Peter Zijlstra
2021-10-25 16:29     ` Nadav Amit
2021-10-26 16:06   ` Dave Hansen
2021-10-26 16:47     ` Nadav Amit
2021-10-26 16:53       ` Nadav Amit
2021-10-26 17:44       ` Nadav Amit
2021-10-26 18:44         ` Dave Hansen
2021-10-26 19:06           ` Nadav Amit
2021-10-26 19:40             ` Dave Hansen
2021-10-26 20:07               ` Nadav Amit
2021-10-26 20:47                 ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 3/5] x86/mm: check exec permissions on fault Nadav Amit
2021-10-25 10:59   ` Peter Zijlstra
2021-10-25 11:13     ` Andrew Cooper
2021-10-25 14:23     ` Dave Hansen
2021-10-25 14:20   ` Dave Hansen
2021-10-25 16:19     ` Nadav Amit [this message]
2021-10-25 17:45       ` Dave Hansen
2021-10-25 17:51         ` Nadav Amit
2021-10-25 18:00           ` Dave Hansen
2021-10-21 12:21 ` [PATCH v2 4/5] mm/mprotect: use mmu_gather Nadav Amit
2021-10-21 12:21 ` [PATCH v2 5/5] mm/mprotect: do not flush on permission promotion Nadav Amit
2021-10-25 11:12   ` Peter Zijlstra
2021-10-25 16:27     ` Nadav Amit
2021-10-22  3:04 ` [PATCH v2 0/5] mm/mprotect: avoid unnecessary TLB flushes Andrew Morton
2021-10-22 21:58   ` Nadav Amit
2021-10-26 16:09     ` Dave Hansen
2021-10-25 10:50   ` Peter Zijlstra
2021-10-25 16:42     ` Nadav Amit

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=00C2DC4B-A77D-4B32-B7F7-2291830BC2D2@gmail.com \
    --to=nadav.amit@gmail.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=andrew.cooper3@citrix.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@kernel.org \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox