linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: Lance Yang <ioworker0@gmail.com>, Barry Song <21cnbao@gmail.com>
Cc: akpm@linux-foundation.org, ryan.roberts@arm.com,
	dev.jain@arm.com, shy828301@gmail.com, ziy@nvidia.com,
	libang.li@antgroup.com, baolin.wang@linux.alibaba.com,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	Mingzhe Yang <mingzhe.yang@ly.com>
Subject: Re: [RFC 2/2] mm/mthp: relax anon mTHP PTE Mapping restrictions
Date: Mon, 20 Jan 2025 14:38:35 +0100	[thread overview]
Message-ID: <a23f82c8-a190-40f0-80d0-e476287f610b@redhat.com> (raw)
In-Reply-To: <CAK1f24kOVqCGp9i=QLtTd_VBVaBc1pJvLxKsW_XAyFw-C+t05A@mail.gmail.com>

On 20.01.25 14:20, Lance Yang wrote:
> On Mon, Jan 20, 2025 at 10:15 AM Barry Song <21cnbao@gmail.com> wrote:
>>
>> On Mon, Jan 20, 2025 at 2:23 PM Lance Yang <ioworker0@gmail.com> wrote:
>>>
>>> Previously, mTHP could only be mapped to PTEs where all entries were none.
>>> With this change, PTEs within the range mapping the demand-zero page can
>>> now be treated as `pte_none` and remapped to a new mTHP, providing more
>>> opportunities to take advantage of mTHP.
>>>
>>> Signed-off-by: Mingzhe Yang <mingzhe.yang@ly.com>
>>> Signed-off-by: Lance Yang <ioworker0@gmail.com>
>>> ---
>>>   mm/memory.c | 11 +++++++++--
>>>   1 file changed, 9 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/mm/memory.c b/mm/memory.c
>>> index 4e148309b3e0..99ec75c6f0fe 100644
>>> --- a/mm/memory.c
>>> +++ b/mm/memory.c
>>> @@ -4815,7 +4815,8 @@ static struct folio *alloc_anon_folio(struct vm_fault *vmf)
>>>          order = highest_order(orders);
>>>          while (orders) {
>>>                  addr = ALIGN_DOWN(vmf->address, PAGE_SIZE << order);
>>> -               if (pte_range_none(pte + pte_index(addr), 1 << order))
>>> +               if (pte_range_none_or_zeropfn(pte + pte_index(addr), 1 << order,
>>> +                                             NULL))
>>>                          break;
>>>                  order = next_order(&orders, order);
>>>          }
>>> @@ -4867,6 +4868,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>>>   {
>>>          struct vm_area_struct *vma = vmf->vma;
>>>          unsigned long addr = vmf->address;
>>> +       bool any_zeropfn = false;
>>>          struct folio *folio;
>>>          vm_fault_t ret = 0;
>>>          int nr_pages = 1;
>>> @@ -4939,7 +4941,8 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>>>          if (nr_pages == 1 && vmf_pte_changed(vmf)) {
>>>                  update_mmu_tlb(vma, addr, vmf->pte);
>>>                  goto release;
>>> -       } else if (nr_pages > 1 && !pte_range_none(vmf->pte, nr_pages)) {
>>> +       } else if (nr_pages > 1 && !pte_range_none_or_zeropfn(
>>> +                                          vmf->pte, nr_pages, &any_zeropfn)) {
>>>                  update_mmu_tlb_range(vma, addr, vmf->pte, nr_pages);
>>>                  goto release;
>>>          }
>>> @@ -4965,6 +4968,10 @@ static vm_fault_t do_anonymous_page(struct vm_fault *vmf)
>>>                  entry = pte_mkuffd_wp(entry);
>>>          set_ptes(vma->vm_mm, addr, vmf->pte, entry, nr_pages);
>>>
>>> +       /* At least one PTE was mapped to the zero page */
>>> +       if (nr_pages > 1 && any_zeropfn)
>>> +               flush_tlb_range(vma, addr, addr + (nr_pages * PAGE_SIZE));
>>
> 
> Thanks for taking time to review!
> 
>> Do we also need mmu_notifier?
>>
>>          mmu_notifier_range_init(...)
>>          mmu_notifier_invalidate_range_start(&range);
>>
>> By the way, this is getting much more complex, but are we seeing any real
>> benefits? I’ve tested this before, and it seems that zeropfn-mapped
>> anonymous folios are quite rare.
> 
> Hmm... Agreed that it's getting more complex. I don't have any data
> showing real benefits yet, so let's put this patch aside for now until I do.

The motivation for the huge zeropage case was that we would install the 
huge zeropage on read fault, to then PTE-map it on write fault and only 
COW a single PTE. It would require khugepaged to collapse in order to 
get a THP again.

So read-then-immediately-write would not give us a PMD-mapped THP with 
the huge shared zeropage enabled, but would give us a PMD-mapped THP 
with the huge shared zeropage disabled, which is rather inconsistent.

In case of mTHP, the behavior is at least always the same, and the 
performance gain is likely not as huge as in the PMD case. Maybe we can 
just wait until khugepaged will fix it up, where a lot of that 
complexity will reside either way.

So I agree that it might be useful, but also that there must be a good 
reason to have that complexity here.


It might also be a valid question if we should even always use the 
shared zeropage on read faults with mTHP enabled, because maybe it could 
be more efficient to avoid the TLB flush when ripping out the shared 
zeropage etc and just give it an mTHP, if our shrinker works as expected

Using the huge zeropage is rather "obviously good" because it also 
reduces TLB pressure + page table walk and can be zapped fairly easily 
and efficiently to install a fresh PMD-mapped THP.

-- 
Cheers,

David / dhildenb



  reply	other threads:[~2025-01-20 13:38 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-01-20  1:22 [RFC 0/2] " Lance Yang
2025-01-20  1:22 ` [RFC 1/2] mm/mthp: add pte_range_none_or_zeropfn() helper Lance Yang
2025-01-20  1:22 ` [RFC 2/2] mm/mthp: relax anon mTHP PTE Mapping restrictions Lance Yang
2025-01-20  2:15   ` Barry Song
2025-01-20 13:20     ` Lance Yang
2025-01-20 13:38       ` David Hildenbrand [this message]
2025-01-21 12:57         ` Lance Yang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=a23f82c8-a190-40f0-80d0-e476287f610b@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=dev.jain@arm.com \
    --cc=ioworker0@gmail.com \
    --cc=libang.li@antgroup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mingzhe.yang@ly.com \
    --cc=ryan.roberts@arm.com \
    --cc=shy828301@gmail.com \
    --cc=ziy@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox