From: Nadav Amit <nadav.amit@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: LKML <linux-kernel@vger.kernel.org>,
linux-mm <linux-mm@kvack.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
Andrew Morton <akpm@linux-foundation.org>,
Dave Hansen <dave.hansen@intel.com>,
Andrea Arcangeli <aarcange@redhat.com>,
Peter Xu <peterx@redhat.com>, Yang Shi <shy828301@gmail.com>,
Hugh Dickins <hughd@google.com>,
Mel Gorman <mgorman@techsingularity.net>,
"Edgecombe, Rick P" <rick.p.edgecombe@intel.com>
Subject: Re: [PATCH v1 mmotm] mm/mprotect: try avoiding write faults for exclusive anonynmous pages when changing protection
Date: Fri, 1 Apr 2022 12:15:13 -0700 [thread overview]
Message-ID: <8A6AF878-D5D7-4D88-A736-0FEF71439D44@gmail.com> (raw)
In-Reply-To: <20220401101334.68859-1-david@redhat.com>
[ +Rick ]
> On Apr 1, 2022, at 3:13 AM, David Hildenbrand <david@redhat.com> wrote:
>
> Similar to our MM_CP_DIRTY_ACCT handling for shared, writable mappings, we
> can try mapping anonymous pages writable if they are exclusive,
> the PTE is already dirty, and no special handling applies. Mapping the
> PTE writable is essentially the same thing the write fault handler would do
> in this case.
In general I am all supportive for such a change.
I do have some mostly-minor concerns.
>
> +static inline bool can_change_pte_writable(struct vm_area_struct *vma,
> + unsigned long addr, pte_t pte,
> + unsigned long cp_flags)
> +{
> + struct page *page;
> +
> + if ((vma->vm_flags & VM_SHARED) && !(cp_flags & MM_CP_DIRTY_ACCT))
> + /*
> + * MM_CP_DIRTY_ACCT is only expressive for shared mappings;
> + * without MM_CP_DIRTY_ACCT, there is nothing to do.
> + */
> + return false;
> +
> + if (!(vma->vm_flags & VM_WRITE))
> + return false;
> +
> + if (pte_write(pte) || pte_protnone(pte) || !pte_dirty(pte))
> + return false;
If pte_write() is already try then return false? I understand you want
to do so because the page is already writable, but it is confusing.
In addition, I am not sure about the pte_dirty() check is really robust.
I mean I think it is ok, but is there any issue with shadow-stack?
And this also assumes the kernel does not clear the dirty bit without
clearing the present, as otherwise the note in Intel SDM section 4.8
("Accessed and Dirty Flags”) will be relevant and dirty bit might be
set unnecessarily. I think it is ok.
> +
> + /* Do we need write faults for softdirty tracking? */
> + if (IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) && !pte_soft_dirty(pte) &&
> + (vma->vm_flags & VM_SOFTDIRTY))
If !IS_ENABLED(CONFIG_MEM_SOFT_DIRTY) then VM_SOFTDIRTY == 0. So I do not
think the IS_ENABLED() is necessary (unless you think it is clearer this
way).
> + return false;
> +
> + /* Do we need write faults for uffd-wp tracking? */
> + if (userfaultfd_pte_wp(vma, pte))
> + return false;
> +
> + if (!(vma->vm_flags & VM_SHARED)) {
> + /*
> + * We can only special-case on exclusive anonymous pages,
> + * because we know that our write-fault handler similarly would
> + * map them writable without any additional checks while holding
> + * the PT lock.
> + */
> + page = vm_normal_page(vma, addr, pte);
I guess we cannot call vm_normal_page() twice, once for prot_numa and once
here, in practice...
> + if (!page || !PageAnon(page) || !PageAnonExclusive(page))
> + return false;
> + }
> +
> + return true;
> +}
Note that there is a small downside to all of that. Assume you mprotect()
a single page from RO to RW and you have many threads.
With my pending patch you would avoid the TLB shootdown (and get a PF).
With this patch you would get a TLB shootdown and save the PF. IOW, I
think it is worthy to skip the shootdown as well in such a case and
instead flush the TLB on spurious page-faults. But I guess that’s for
another patch.
next prev parent reply other threads:[~2022-04-01 19:15 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-01 10:13 David Hildenbrand
2022-04-01 19:15 ` Nadav Amit [this message]
2022-04-04 13:04 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8A6AF878-D5D7-4D88-A736-0FEF71439D44@gmail.com \
--to=nadav.amit@gmail.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=dave.hansen@intel.com \
--cc=david@redhat.com \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mgorman@techsingularity.net \
--cc=peterx@redhat.com \
--cc=rick.p.edgecombe@intel.com \
--cc=shy828301@gmail.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox