linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Yin Fengwei <fengwei.yin@intel.com>,
	willy@infradead.org, linux-mm@kvack.org
Cc: dave.hansen@intel.com, tim.c.chen@intel.com, ying.huang@intel.com
Subject: Re: [RFC PATCH v3 3/4] mm: add do_set_pte_range()
Date: Fri, 3 Feb 2023 14:25:04 +0100	[thread overview]
Message-ID: <5c85d6bb-f4bf-1969-8ec3-c16399e5d6f2@redhat.com> (raw)
In-Reply-To: <20230203131636.1648662-4-fengwei.yin@intel.com>

On 03.02.23 14:16, Yin Fengwei wrote:
> do_set_pte_range() allows to setup page table entries for a
> specific range. It calls folio_add_file_rmap_range() to take
> advantage of batched rmap update for large folio.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> ---
>   include/linux/mm.h |  3 +++
>   mm/filemap.c       |  1 -
>   mm/memory.c        | 59 ++++++++++++++++++++++++++++++----------------
>   3 files changed, 42 insertions(+), 21 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index d6f8f41514cc..93192f04b276 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -1162,6 +1162,9 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
>   
>   vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page);
>   void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr);
> +void do_set_pte_range(struct vm_fault *vmf, struct folio *folio,
> +		unsigned long addr, pte_t *pte,
> +		unsigned long start, unsigned int nr);
>   
>   vm_fault_t finish_fault(struct vm_fault *vmf);
>   vm_fault_t finish_mkwrite_fault(struct vm_fault *vmf);
> diff --git a/mm/filemap.c b/mm/filemap.c
> index f444684db9f2..74046a3a0ff5 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3386,7 +3386,6 @@ static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>   
>   		ref_count++;
>   		do_set_pte(vmf, page, addr);
> -		update_mmu_cache(vma, addr, vmf->pte);
>   	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < nr_pages);
>   
>   	/* Restore the vmf->pte */
> diff --git a/mm/memory.c b/mm/memory.c
> index 7a04a1130ec1..3754b2ef166a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -4257,36 +4257,58 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page)
>   }
>   #endif
>   
> -void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
> +void do_set_pte_range(struct vm_fault *vmf, struct folio *folio,
> +		unsigned long addr, pte_t *pte,
> +		unsigned long start, unsigned int nr)
>   {
>   	struct vm_area_struct *vma = vmf->vma;
>   	bool uffd_wp = pte_marker_uffd_wp(vmf->orig_pte);
>   	bool write = vmf->flags & FAULT_FLAG_WRITE;
> +	bool cow = write && !(vma->vm_flags & VM_SHARED);
>   	bool prefault = vmf->address != addr;
> +	struct page *page = folio_page(folio, start);
>   	pte_t entry;
>   
> -	flush_icache_page(vma, page);
> -	entry = mk_pte(page, vma->vm_page_prot);
> +	if (!cow) {
> +		folio_add_file_rmap_range(folio, start, nr, vma, false);
> +		add_mm_counter(vma->vm_mm, mm_counter_file(page), nr);
> +	}
>   
> -	if (prefault && arch_wants_old_prefaulted_pte())
> -		entry = pte_mkold(entry);
> -	else
> -		entry = pte_sw_mkyoung(entry);
> +	do {
> +		flush_icache_page(vma, page);
> +		entry = mk_pte(page, vma->vm_page_prot);
>   
> -	if (write)
> -		entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> -	if (unlikely(uffd_wp))
> -		entry = pte_mkuffd_wp(entry);
> -	/* copy-on-write page */
> -	if (write && !(vma->vm_flags & VM_SHARED)) {
> +		if (prefault && arch_wants_old_prefaulted_pte())
> +			entry = pte_mkold(entry);
> +		else
> +			entry = pte_sw_mkyoung(entry);
> +
> +		if (write)
> +			entry = maybe_mkwrite(pte_mkdirty(entry), vma);
> +		if (unlikely(uffd_wp))
> +			entry = pte_mkuffd_wp(entry);
> +		set_pte_at(vma->vm_mm, addr, pte, entry);
> +
> +		/* no need to invalidate: a not-present page won't be cached */
> +		update_mmu_cache(vma, addr, pte);
> +	} while (pte++, page++, addr += PAGE_SIZE, --nr > 0);
> +}
> +
> +void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr)
> +{
> +	struct folio *folio = page_folio(page);
> +	struct vm_area_struct *vma = vmf->vma;
> +	bool cow = (vmf->flags & FAULT_FLAG_WRITE) &&
> +			!(vma->vm_flags & VM_SHARED);
> +
> +	if (cow) {
>   		inc_mm_counter(vma->vm_mm, MM_ANONPAGES);
>   		page_add_new_anon_rmap(page, vma, addr);

As raised, we cannot PTE-map a multi-page folio that way.

This function only supports single-page anon folios.

page_add_new_anon_rmap() -> folio_add_new_anon_rmap(). As that documents:

"If the folio is large, it is accounted as a THP" -- for example, we 
would only increment the "entire mapcount" and set the PageAnonExclusive 
bit only on the head page.

So this really doesn't work for multi-page folios and if the function 
would be used for that, we'd be in trouble.

We'd want some fence here to detect that and bail out if we'd be 
instructed to do that. At least a WARN_ON_ONCE() I guess.
update_mmu_tlb(vma, vmf->address, vmf->pte);

Right now the function looks like it might just handle that.

-- 
Thanks,

David / dhildenb



  reply	other threads:[~2023-02-03 13:25 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-03 13:16 [RFC PATCH v3 0/4] folio based filemap_map_pages() Yin Fengwei
2023-02-03 13:16 ` [RFC PATCH v3 1/4] filemap: add function filemap_map_folio_range() Yin Fengwei
2023-02-03 13:53   ` Matthew Wilcox
2023-02-04  3:25     ` Yin, Fengwei
2023-02-03 14:17   ` Kirill A. Shutemov
2023-02-04  3:31     ` Yin, Fengwei
2023-02-03 13:16 ` [RFC PATCH v3 2/4] rmap: add folio_add_file_rmap_range() Yin Fengwei
2023-02-03 14:02   ` Matthew Wilcox
2023-02-04  3:34     ` Yin, Fengwei
2023-02-03 14:19   ` Kirill A. Shutemov
2023-02-04  3:35     ` Yin, Fengwei
2023-02-03 13:16 ` [RFC PATCH v3 3/4] mm: add do_set_pte_range() Yin Fengwei
2023-02-03 13:25   ` David Hildenbrand [this message]
2023-02-03 13:30     ` Yin, Fengwei
2023-02-03 13:34       ` David Hildenbrand
2023-02-03 13:39         ` Yin, Fengwei
2023-02-03 13:32   ` Chih-En Lin
2023-02-03 13:38     ` Yin, Fengwei
2023-02-03 14:30       ` Chih-En Lin
2023-02-04  5:47         ` Yin, Fengwei
2023-02-03 13:16 ` [RFC PATCH v3 4/4] filemap: batched update mm counter,rmap when map file folio Yin Fengwei

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5c85d6bb-f4bf-1969-8ec3-c16399e5d6f2@redhat.com \
    --to=david@redhat.com \
    --cc=dave.hansen@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=tim.c.chen@intel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox