Re: [PATCH v1 mmotm] mm/mprotect: try avoiding write faults for exclusive anonynmous pages when changing protection

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Hildenbrand <david@redhat.com>
To: Nadav Amit <nadav.amit@gmail.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Dave Hansen <dave.hansen@intel.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Peter Xu <peterx@redhat.com>, Yang Shi <shy828301@gmail.com>,
	Hugh Dickins <hughd@google.com>,
	Mel Gorman <mgorman@techsingularity.net>,
	"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v1 mmotm] mm/mprotect: try avoiding write faults for exclusive anonynmous pages when changing protection
Date: Mon, 4 Apr 2022 15:04:28 +0200	[thread overview]
Message-ID: <99cf9e14-7608-8e72-0c8e-0dd9b0047319@redhat.com> (raw)
In-Reply-To: <8A6AF878-D5D7-4D88-A736-0FEF71439D44@gmail.com>

On 01.04.22 21:15, Nadav Amit wrote:
> [ +Rick ]
> 
>> On Apr 1, 2022, at 3:13 AM, David Hildenbrand <david@redhat.com> wrote:
>>
>> Similar to our MM_CP_DIRTY_ACCT handling for shared, writable mappings, we
>> can try mapping anonymous pages writable if they are exclusive,
>> the PTE is already dirty, and no special handling applies. Mapping the
>> PTE writable is essentially the same thing the write fault handler would do
>> in this case.
> 
> In general I am all supportive for such a change.
> 
> I do have some mostly-minor concerns.

Hi Nadav,

thanks a lot for your review!

> 
>>
>> +static inline bool can_change_pte_writable(struct vm_area_struct *vma,
>> +					   unsigned long addr, pte_t pte,
>> +					   unsigned long cp_flags)
>> +{
>> +	struct page *page;
>> +
>> +	if ((vma->vm_flags & VM_SHARED) && !(cp_flags & MM_CP_DIRTY_ACCT))
>> +		/*
>> +		 * MM_CP_DIRTY_ACCT is only expressive for shared mappings;
>> +		 * without MM_CP_DIRTY_ACCT, there is nothing to do.
>> +		 */
>> +		return false;
>> +
>> +	if (!(vma->vm_flags & VM_WRITE))
>> +		return false;
>> +
>> +	if (pte_write(pte) || pte_protnone(pte) || !pte_dirty(pte))
>> +		return false;
> 
> If pte_write() is already try then return false? I understand you want
> to do so because the page is already writable, but it is confusing.


I thought about just doing outside of the function

if ((vma->vm_flags & VM_WRITE) && !pte_write(pte) &&
    can_change_pte_writable()...

	
I refrained from doing so because the sequence of checks might be
sub-optimal. But most probably we don't really care about that and it
might make the code easier to grasp.

Would that make it clearer?

> 
> In addition, I am not sure about the pte_dirty() check is really robust.
> I mean I think it is ok, but is there any issue with shadow-stack? 

Judging that it's already used that way for VMAs with dirty tracking, I
assume it's ok. Without checking that the PTE is dirty, we'd have to do a:

pte_mkwrite(pte_mkwrite(ptent));

Which would set the pte and consequently the page dirty, although there
might not even be a write access. That's what we want to avoid here.

> 
> And this also assumes the kernel does not clear the dirty bit without
> clearing the present, as otherwise the note in Intel SDM section 4.8
> ("Accessed and Dirty Flags”) will be relevant and dirty bit might be
> set unnecessarily. I think it is ok.

Yeah, I think so as well.

> 
>> +
>> +	/* Do we need write faults for softdirty tracking? */
>> +	if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !pte_soft_dirty(pte) &&
>> +	    (vma->vm_flags & VM_SOFTDIRTY))
> 
> If !IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) then VM_SOFTDIRTY == 0. So I do not
> think the IS_ENABLED() is necessary (unless you think it is clearer this
> way).

Right, we can just do

if ((vma->vm_flags & VM_SOFTDIRTY) && !pte_soft_dirty(pte))

and it should get fully optimized out. Thanks!

> 
>> +		return false;
>> +
>> +	/* Do we need write faults for uffd-wp tracking? */
>> +	if (userfaultfd_pte_wp(vma, pte))
>> +		return false;
>> +
>> +	if (!(vma->vm_flags & VM_SHARED)) {
>> +		/*
>> +		 * We can only special-case on exclusive anonymous pages,
>> +		 * because we know that our write-fault handler similarly would
>> +		 * map them writable without any additional checks while holding
>> +		 * the PT lock.
>> +		 */
>> +		page = vm_normal_page(vma, addr, pte);
> 
> I guess we cannot call vm_normal_page() twice, once for prot_numa and once
> here, in practice...

I guess we could, but it doesn't necessarily make the code easier to
read :) And we want to skip protnone either way.

> 
>> +		if (!page || !PageAnon(page) || !PageAnonExclusive(page))
>> +			return false;
>> +	}
>> +
>> +	return true;
>> +}
> 
> Note that there is a small downside to all of that. Assume you mprotect()
> a single page from RO to RW and you have many threads.
> 
> With my pending patch you would avoid the TLB shootdown (and get a PF).
> With this patch you would get a TLB shootdown and save the PF. IOW, I
> think it is worthy to skip the shootdown as well in such a case and
> instead flush the TLB on spurious page-faults. But I guess that’s for
> another patch.

Just so I understand correctly: your optimization avoids the flush when
effectively, nothing changed (R/O -> R/O).

And the optimization for this case here would be, to avoid the TLB flush
when similarly not required (R/O -> R/W).

Correct?

-- 
Thanks,

David / dhildenb

     prev parent reply	other threads:[~2022-04-04 13:05 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-01 10:13 David Hildenbrand
2022-04-01 19:15 ` Nadav Amit
2022-04-04 13:04   ` David Hildenbrand [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=99cf9e14-7608-8e72-0c8e-0dd9b0047319@redhat.com \
    --to=david@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=nadav.amit@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rick.p.edgecombe@intel.com \
    --cc=shy828301@gmail.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox