linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Nico Pache <npache@redhat.com>
To: Dev Jain <dev.jain@arm.com>
Cc: linux-mm@kvack.org, linux-doc@vger.kernel.org,
	 linux-kernel@vger.kernel.org,
	linux-trace-kernel@vger.kernel.org,  david@redhat.com,
	ziy@nvidia.com, baolin.wang@linux.alibaba.com,
	 lorenzo.stoakes@oracle.com, Liam.Howlett@oracle.com,
	ryan.roberts@arm.com,  corbet@lwn.net, rostedt@goodmis.org,
	mhiramat@kernel.org,  mathieu.desnoyers@efficios.com,
	akpm@linux-foundation.org, baohua@kernel.org,
	 willy@infradead.org, peterx@redhat.com,
	wangkefeng.wang@huawei.com,  usamaarif642@gmail.com,
	sunnanyong@huawei.com, vishal.moola@gmail.com,
	 thomas.hellstrom@linux.intel.com, yang@os.amperecomputing.com,
	 kirill.shutemov@linux.intel.com, aarcange@redhat.com,
	raquini@redhat.com,  anshuman.khandual@arm.com,
	catalin.marinas@arm.com, tiwai@suse.de,  will@kernel.org,
	dave.hansen@linux.intel.com, jack@suse.cz, cl@gentwo.org,
	 jglisse@google.com, surenb@google.com, zokeefe@google.com,
	hannes@cmpxchg.org,  rientjes@google.com, mhocko@suse.com,
	rdunlap@infradead.org
Subject: Re: [PATCH v7 07/12] khugepaged: add mTHP support
Date: Sat, 7 Jun 2025 06:55:57 -0600	[thread overview]
Message-ID: <CAA1CXcBUCZ+UGsE-9xHzgi0nmzcbzt_oKQWxP8=PJyp0W+iD1A@mail.gmail.com> (raw)
In-Reply-To: <6f061c65-f3aa-42bb-ab70-b45afdcf2baf@arm.com>

On Sat, Jun 7, 2025 at 12:24 AM Dev Jain <dev.jain@arm.com> wrote:
>
>
> On 15/05/25 8:52 am, Nico Pache wrote:
> > Introduce the ability for khugepaged to collapse to different mTHP sizes.
> > While scanning PMD ranges for potential collapse candidates, keep track
> > of pages in KHUGEPAGED_MIN_MTHP_ORDER chunks via a bitmap. Each bit
> > represents a utilized region of order KHUGEPAGED_MIN_MTHP_ORDER ptes. If
> > mTHPs are enabled we remove the restriction of max_ptes_none during the
> > scan phase so we dont bailout early and miss potential mTHP candidates.
> >
> > After the scan is complete we will perform binary recursion on the
> > bitmap to determine which mTHP size would be most efficient to collapse
> > to. max_ptes_none will be scaled by the attempted collapse order to
> > determine how full a THP must be to be eligible.
> >
> > If a mTHP collapse is attempted, but contains swapped out, or shared
> > pages, we dont perform the collapse.
> >
> > For non PMD collapse we much leave the anon VMA write locked until after
> > we collapse the mTHP
>
> Why? I know that Hugh pointed out locking errors; I am yet to catch up
> on that thread, but you need to explain in the description why you do
> what you do.
I will add a better description in the next version. The reasoning is
that in the PMD case all the pages are isolated, but in the non-PMD
case this is not true, and we must keep the lock to prevent changes
from occurring after we unlock it.

Another potential solution is to isolate all the pages in the PMD,
then undo it after we collapse the mTHP.

-- Nico
>
> [--snip---]
>
> >
> > -
> > -     spin_lock(pmd_ptl);
> > -     BUG_ON(!pmd_none(*pmd));
> > -     folio_add_new_anon_rmap(folio, vma, address, RMAP_EXCLUSIVE);
> > -     folio_add_lru_vma(folio, vma);
> > -     pgtable_trans_huge_deposit(mm, pmd, pgtable);
> > -     set_pmd_at(mm, address, pmd, _pmd);
> > -     update_mmu_cache_pmd(vma, address, pmd);
> > -     deferred_split_folio(folio, false);
> > -     spin_unlock(pmd_ptl);
> > +     if (order == HPAGE_PMD_ORDER) {
> > +             pgtable = pmd_pgtable(_pmd);
> > +             _pmd = folio_mk_pmd(folio, vma->vm_page_prot);
> > +             _pmd = maybe_pmd_mkwrite(pmd_mkdirty(_pmd), vma);
> > +
> > +             spin_lock(pmd_ptl);
> > +             BUG_ON(!pmd_none(*pmd));
> > +             folio_add_new_anon_rmap(folio, vma, _address, RMAP_EXCLUSIVE);
> > +             folio_add_lru_vma(folio, vma);
> > +             pgtable_trans_huge_deposit(mm, pmd, pgtable);
> > +             set_pmd_at(mm, address, pmd, _pmd);
> > +             update_mmu_cache_pmd(vma, address, pmd);
> > +             deferred_split_folio(folio, false);
> > +             spin_unlock(pmd_ptl);
> > +     } else { /* mTHP collapse */
> > +             mthp_pte = mk_pte(&folio->page, vma->vm_page_prot);
> > +             mthp_pte = maybe_mkwrite(pte_mkdirty(mthp_pte), vma);
> > +
> > +             spin_lock(pmd_ptl);
>
> Nico,
>
> I've noticed a few occasions where my review comments have not been acknowledged -
> for example, [1]. It makes it difficult to follow up and contributes to some
> frustration on my end. I'd appreciate if you could make sure to respond to
> feedback, even if you are disagreeing with my comments. Thanks!
>
>
> [1] https://lore.kernel.org/all/08d13445-5ed1-42ea-8aee-c1dbde24407e@arm.com/
>
>
> [---snip---]
>



  reply	other threads:[~2025-06-07 12:56 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-05-15  3:22 [PATCH v7 00/12] khugepaged: " Nico Pache
2025-05-15  3:22 ` [PATCH v7 01/12] khugepaged: rename hpage_collapse_* to khugepaged_* Nico Pache
2025-05-16 17:30   ` Liam R. Howlett
2025-06-29  6:48     ` Nico Pache
2025-05-15  3:22 ` [PATCH v7 02/12] introduce khugepaged_collapse_single_pmd to unify khugepaged and madvise_collapse Nico Pache
2025-05-15  5:50   ` Baolin Wang
2025-05-16 11:59     ` Nico Pache
2025-05-16 17:12   ` Liam R. Howlett
2025-07-02  0:00     ` Nico Pache
2025-05-15  3:22 ` [PATCH v7 03/12] khugepaged: generalize hugepage_vma_revalidate for mTHP support Nico Pache
2025-05-16 17:14   ` Liam R. Howlett
2025-06-29  6:52     ` Nico Pache
2025-05-23  6:55   ` Baolin Wang
2025-05-28  6:57     ` Dev Jain
2025-05-29  4:00     ` Nico Pache
2025-05-30  3:02       ` Baolin Wang
2025-05-15  3:22 ` [PATCH v7 04/12] khugepaged: generalize alloc_charge_folio() Nico Pache
2025-05-15  3:22 ` [PATCH v7 05/12] khugepaged: generalize __collapse_huge_page_* for mTHP support Nico Pache
2025-05-15  3:22 ` [PATCH v7 06/12] khugepaged: introduce khugepaged_scan_bitmap " Nico Pache
2025-05-16  3:20   ` Baolin Wang
2025-05-17  6:47     ` Nico Pache
2025-05-18  3:04       ` Liam R. Howlett
2025-05-20 10:09       ` Baolin Wang
2025-05-20 10:26         ` David Hildenbrand
2025-05-21  1:03           ` Baolin Wang
2025-05-21 10:23         ` Nico Pache
2025-05-22  9:39           ` Baolin Wang
2025-05-28  9:26             ` David Hildenbrand
2025-05-28 14:04               ` Baolin Wang
2025-05-29  4:02                 ` Nico Pache
2025-05-29  8:27                   ` Baolin Wang
2025-05-15  3:22 ` [PATCH v7 07/12] khugepaged: add " Nico Pache
2025-06-07  6:23   ` Dev Jain
2025-06-07 12:55     ` Nico Pache [this message]
2025-06-07 13:03     ` Nico Pache
2025-06-07 14:31       ` Dev Jain
2025-06-07 14:42         ` Dev Jain
2025-05-15  3:22 ` [PATCH v7 08/12] khugepaged: skip collapsing mTHP to smaller orders Nico Pache
2025-05-15  3:22 ` [PATCH v7 09/12] khugepaged: avoid unnecessary mTHP collapse attempts Nico Pache
2025-05-15  3:22 ` [PATCH v7 10/12] khugepaged: improve tracepoints for mTHP orders Nico Pache
2025-05-15  3:22 ` [PATCH v7 11/12] khugepaged: add per-order mTHP khugepaged stats Nico Pache
2025-05-15  3:22 ` [PATCH v7 12/12] Documentation: mm: update the admin guide for mTHP collapse Nico Pache
2025-05-15  4:40   ` Randy Dunlap
2025-06-07  6:44   ` Dev Jain
2025-06-07 12:57     ` Nico Pache
2025-06-07 14:34       ` Dev Jain
2025-06-08 19:50         ` Nico Pache
2025-06-09  3:06           ` Baolin Wang
2025-06-09  5:26             ` Dev Jain
2025-06-09  6:39               ` Baolin Wang
2025-06-09  5:56             ` Nico Pache
2025-05-28 12:31 ` [PATCH 1/2] mm: khugepaged: allow khugepaged to check all anonymous mTHP orders Baolin Wang
2025-05-28 12:31   ` [PATCH 2/2] mm: khugepaged: kick khugepaged for enabling none-PMD-sized mTHPs Baolin Wang
2025-05-28 12:39 ` [PATCH v7 00/12] khugepaged: mTHP support Baolin Wang
2025-05-29  3:52   ` Nico Pache
2025-06-16  3:51 ` Dev Jain
2025-06-16 15:51   ` Nico Pache
2025-06-16 16:35     ` Dev Jain

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA1CXcBUCZ+UGsE-9xHzgi0nmzcbzt_oKQWxP8=PJyp0W+iD1A@mail.gmail.com' \
    --to=npache@redhat.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=anshuman.khandual@arm.com \
    --cc=baohua@kernel.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=catalin.marinas@arm.com \
    --cc=cl@gentwo.org \
    --cc=corbet@lwn.net \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=dev.jain@arm.com \
    --cc=hannes@cmpxchg.org \
    --cc=jack@suse.cz \
    --cc=jglisse@google.com \
    --cc=kirill.shutemov@linux.intel.com \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-trace-kernel@vger.kernel.org \
    --cc=lorenzo.stoakes@oracle.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mhiramat@kernel.org \
    --cc=mhocko@suse.com \
    --cc=peterx@redhat.com \
    --cc=raquini@redhat.com \
    --cc=rdunlap@infradead.org \
    --cc=rientjes@google.com \
    --cc=rostedt@goodmis.org \
    --cc=ryan.roberts@arm.com \
    --cc=sunnanyong@huawei.com \
    --cc=surenb@google.com \
    --cc=thomas.hellstrom@linux.intel.com \
    --cc=tiwai@suse.de \
    --cc=usamaarif642@gmail.com \
    --cc=vishal.moola@gmail.com \
    --cc=wangkefeng.wang@huawei.com \
    --cc=will@kernel.org \
    --cc=willy@infradead.org \
    --cc=yang@os.amperecomputing.com \
    --cc=ziy@nvidia.com \
    --cc=zokeefe@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox