linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nadav Amit <namit@vmware.com>
To: David Hildenbrand <david@redhat.com>
Cc: kernel list <linux-kernel@vger.kernel.org>,
	Linux-MM <linux-mm@kvack.org>,
	linuxppc-dev <linuxppc-dev@lists.ozlabs.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Dave Chinner <david@fromorbit.com>, Peter Xu <peterx@redhat.com>,
	Andrea Arcangeli <aarcange@redhat.com>,
	Hugh Dickins <hughd@google.com>, Vlastimil Babka <vbabka@suse.cz>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Nicholas Piggin <npiggin@gmail.com>,
	Mike Rapoport <rppt@kernel.org>,
	Anshuman Khandual <anshuman.khandual@arm.com>
Subject: Re: [PATCH v1 4/6] mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite
Date: Wed, 2 Nov 2022 21:22:39 +0000	[thread overview]
Message-ID: <F3022E28-3BB5-40F5-B33A-9BDBD69ACC78@vmware.com> (raw)
In-Reply-To: <20221102191209.289237-5-david@redhat.com>

On Nov 2, 2022, at 12:12 PM, David Hildenbrand <david@redhat.com> wrote:

> !! External Email
> 
> commit b191f9b106ea ("mm: numa: preserve PTE write permissions across a
> NUMA hinting fault") added remembering write permissions using ordinary
> pte_write() for PROT_NONE mapped pages to avoid write faults when
> remapping the page !PROT_NONE on NUMA hinting faults.
> 

[ snip ]

Here’s a very shallow reviewed with some minor points...

> ---
> include/linux/mm.h |  2 ++
> mm/huge_memory.c   | 28 +++++++++++++++++-----------
> mm/ksm.c           |  9 ++++-----
> mm/memory.c        | 19 ++++++++++++++++---
> mm/mprotect.c      |  7 ++-----
> 5 files changed, 41 insertions(+), 24 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 25ff9a14a777..a0deeece5e87 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1975,6 +1975,8 @@ extern unsigned long move_page_tables(struct vm_area_struct *vma,
> #define  MM_CP_UFFD_WP_ALL                 (MM_CP_UFFD_WP | \
>                                            MM_CP_UFFD_WP_RESOLVE)
> 
> +bool can_change_pte_writable(struct vm_area_struct *vma, unsigned long addr,
> +                            pte_t pte);

It might not be customary, but how about marking it as __pure?

> extern unsigned long change_protection(struct mmu_gather *tlb,
>                              struct vm_area_struct *vma, unsigned long start,
>                              unsigned long end, pgprot_t newprot,
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 2ad68e91896a..45abd27d75a0 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -1462,8 +1462,7 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
>        unsigned long haddr = vmf->address & HPAGE_PMD_MASK;
>        int page_nid = NUMA_NO_NODE;
>        int target_nid, last_cpupid = (-1 & LAST_CPUPID_MASK);
> -       bool migrated = false;
> -       bool was_writable = pmd_savedwrite(oldpmd);
> +       bool try_change_writable, migrated = false;
>        int flags = 0;
> 
>        vmf->ptl = pmd_lock(vma->vm_mm, vmf->pmd);
> @@ -1472,13 +1471,22 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
>                goto out;
>        }
> 
> +       /* See mprotect_fixup(). */
> +       if (vma->vm_flags & VM_SHARED)
> +               try_change_writable = vma_wants_writenotify(vma, vma->vm_page_prot);
> +       else
> +               try_change_writable = !!(vma->vm_flags & VM_WRITE);

Do you find it better to copy the code instead of extracting it to a
separate function?

> +
>        pmd = pmd_modify(oldpmd, vma->vm_page_prot);
>        page = vm_normal_page_pmd(vma, haddr, pmd);
>        if (!page)
>                goto out_map;
> 
>        /* See similar comment in do_numa_page for explanation */
> -       if (!was_writable)
> +       if (try_change_writable && !pmd_write(pmd) &&
> +            can_change_pmd_writable(vma, vmf->address, pmd))
> +               pmd = pmd_mkwrite(pmd);
> +       if (!pmd_write(pmd))
>                flags |= TNF_NO_GROUP;
> 
>        page_nid = page_to_nid(page);
> @@ -1523,8 +1531,12 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf)
>        /* Restore the PMD */
>        pmd = pmd_modify(oldpmd, vma->vm_page_prot);
>        pmd = pmd_mkyoung(pmd);
> -       if (was_writable)
> +
> +       /* Similar to mprotect() protection updates, avoid write faults. */
> +       if (try_change_writable && !pmd_write(pmd) &&
> +            can_change_pmd_writable(vma, vmf->address, pmd))

Why do I have a deja-vu? :)

There must be a way to avoid the redundant code and specifically the call to
can_change_pmd_writable(), no?

>                pmd = pmd_mkwrite(pmd);
> +
>        set_pmd_at(vma->vm_mm, haddr, vmf->pmd, pmd);
>        update_mmu_cache_pmd(vma, vmf->address, vmf->pmd);
>        spin_unlock(vmf->ptl);
> @@ -1764,11 +1776,10 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>        struct mm_struct *mm = vma->vm_mm;
>        spinlock_t *ptl;
>        pmd_t oldpmd, entry;
> -       bool preserve_write;
> -       int ret;
>        bool prot_numa = cp_flags & MM_CP_PROT_NUMA;
>        bool uffd_wp = cp_flags & MM_CP_UFFD_WP;
>        bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE;
> +       int ret = 1;
> 
>        tlb_change_page_size(tlb, HPAGE_PMD_SIZE);
> 
> @@ -1779,9 +1790,6 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>        if (!ptl)
>                return 0;
> 
> -       preserve_write = prot_numa && pmd_write(*pmd);
> -       ret = 1;
> -
> #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION
>        if (is_swap_pmd(*pmd)) {
>                swp_entry_t entry = pmd_to_swp_entry(*pmd);
> @@ -1861,8 +1869,6 @@ int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
>        oldpmd = pmdp_invalidate_ad(vma, addr, pmd);
> 
>        entry = pmd_modify(oldpmd, newprot);
> -       if (preserve_write)
> -               entry = pmd_mk_savedwrite(entry);
>        if (uffd_wp) {
>                entry = pmd_wrprotect(entry);
>                entry = pmd_mkuffd_wp(entry);
> diff --git a/mm/ksm.c b/mm/ksm.c
> index dc15c4a2a6ff..dd02780c387f 100644
> --- a/mm/ksm.c
> +++ b/mm/ksm.c
> @@ -1069,7 +1069,6 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
> 
>        anon_exclusive = PageAnonExclusive(page);
>        if (pte_write(*pvmw.pte) || pte_dirty(*pvmw.pte) ||
> -           (pte_protnone(*pvmw.pte) && pte_savedwrite(*pvmw.pte)) ||

Not related to your code, but it does not make me comfortable that PTE’s
status bits (which are volatile) are not accessed in this manner.

Especially since the PTE is later saved into orig_pte. It would feel safer
to do READ_ONCE(*pvmw.pte) and work on it (probably in a separate patch).

>            anon_exclusive || mm_tlb_flush_pending(mm)) {
>                pte_t entry;
> 
> @@ -1107,11 +1106,11 @@ static int write_protect_page(struct vm_area_struct *vma, struct page *page,
> 
>                if (pte_dirty(entry))
>                        set_page_dirty(page);
> +               entry = pte_mkclean(entry);
> +
> +               if (pte_write(entry))
> +                       entry = pte_wrprotect(entry);
> 
> -               if (pte_protnone(entry))
> -                       entry = pte_mkclean(pte_clear_savedwrite(entry));
> -               else
> -                       entry = pte_mkclean(pte_wrprotect(entry));
>                set_pte_at_notify(mm, pvmw.address, pvmw.pte, entry);
>        }
>        *orig_pte = *pvmw.pte;
> diff --git a/mm/memory.c b/mm/memory.c
> index c5599a9279b1..286c29ee3aba 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4672,12 +4672,12 @@ int numa_migrate_prep(struct page *page, struct vm_area_struct *vma,
> static vm_fault_t do_numa_page(struct vm_fault *vmf)
> {
>        struct vm_area_struct *vma = vmf->vma;
> +       bool try_change_writable;
>        struct page *page = NULL;
>        int page_nid = NUMA_NO_NODE;
>        int last_cpupid;
>        int target_nid;
>        pte_t pte, old_pte;
> -       bool was_writable = pte_savedwrite(vmf->orig_pte);
>        int flags = 0;
> 
>        /*
> @@ -4692,6 +4692,12 @@ static vm_fault_t do_numa_page(struct vm_fault *vmf)
>                goto out;
>        }
> 
> +       /* See mprotect_fixup(). */
> +       if (vma->vm_flags & VM_SHARED)
> +               try_change_writable = vma_wants_writenotify(vma, vma->vm_page_prot);
> +       else
> +               try_change_writable = !!(vma->vm_flags & VM_WRITE);

It really cannot be extracted into a separate function?



  reply	other threads:[~2022-11-02 21:22 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-02 19:12 [PATCH v1 0/6] mm/autonuma: replace savedwrite infrastructure David Hildenbrand
2022-11-02 19:12 ` [PATCH v1 1/6] mm/mprotect: allow clean exclusive anon pages to be writable David Hildenbrand
2022-11-02 19:12 ` [PATCH v1 2/6] mm/mprotect: minor can_change_pte_writable() cleanups David Hildenbrand
2022-11-02 19:12 ` [PATCH v1 3/6] mm/huge_memory: try avoiding write faults when changing PMD protection David Hildenbrand
2022-11-02 19:12 ` [PATCH v1 4/6] mm/autonuma: use can_change_(pte|pmd)_writable() to replace savedwrite David Hildenbrand
2022-11-02 21:22   ` Nadav Amit [this message]
2022-11-03 10:45     ` David Hildenbrand
2022-11-03 10:51       ` David Hildenbrand
2022-11-02 19:12 ` [PATCH v1 5/6] mm: remove unused savedwrite infrastructure David Hildenbrand
2022-11-02 19:12 ` [PATCH v1 6/6] selftests/vm: anon_cow: add mprotect() optimization tests David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=F3022E28-3BB5-40F5-B33A-9BDBD69ACC78@vmware.com \
    --to=namit@vmware.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=david@fromorbit.com \
    --cc=david@redhat.com \
    --cc=hughd@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mgorman@techsingularity.net \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=peterx@redhat.com \
    --cc=rppt@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox