From: Lance Yang <ioworker0@gmail.com>
To: David Hildenbrand <david@redhat.com>
Cc: akpm@linux-foundation.org, willy@infradead.org,
maskray@google.com, ziy@nvidia.com, ryan.roberts@arm.com,
21cnbao@gmail.com, mhocko@suse.com, fengwei.yin@intel.com,
zokeefe@google.com, shy828301@gmail.com, xiehuan09@gmail.com,
wangkefeng.wang@huawei.com, songmuchun@bytedance.com,
peterx@redhat.com, minchan@kernel.org, linux-mm@kvack.org,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH 1/1] mm/vmscan: avoid split PMD-mapped THP during shrink_folio_list()
Date: Thu, 18 Apr 2024 14:40:51 +0800 [thread overview]
Message-ID: <CAK1f24kDtOVRC67khxazQw1fS9LUyRrTzzf_ewRqYHQQu_r6AQ@mail.gmail.com> (raw)
In-Reply-To: <2062c2d1-4ebb-4a40-89f9-3083e6912301@redhat.com>
Hey David,
Thanks for taking time to review.
On Wed, Apr 17, 2024 at 11:02 PM David Hildenbrand <david@redhat.com> wrote:
>
> On 17.04.24 16:11, Lance Yang wrote:
> > When the user no longer requires the pages, they would use madvise(madv_free)
> > to mark the pages as lazy free. IMO, they would not typically rewrite to the
> > given range.
> >
> > At present, a PMD-mapped THP marked as lazyfree during shrink_folio_list()
> > is unconditionally split, which may be unnecessary. If the THP is exclusively
> > mapped and clean, and the PMD associated with it is also clean, then we can
> > attempt to remove the PMD mapping from it. This change will improve the
> > efficiency of memory reclamation in this case.
> >
> > On an Intel i5 CPU, reclaiming 1GiB of PMD-mapped THPs using
> > mem_cgroup_force_empty() results in the following runtimes in seconds
> > (shorter is better):
> >
> > --------------------------------------------
> > | Old | New | Change |
> > --------------------------------------------
> > | 0.683426 | 0.049197 | -92.80% |
> > --------------------------------------------
> >
> > Signed-off-by: Lance Yang <ioworker0@gmail.com>
> > ---
> > include/linux/huge_mm.h | 1 +
> > include/linux/rmap.h | 1 +
> > mm/huge_memory.c | 2 +-
> > mm/rmap.c | 81 +++++++++++++++++++++++++++++++++++++++++
> > mm/vmscan.c | 7 ++++
> > 5 files changed, 91 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index 7cd07b83a3d0..02a71c05f68a 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -36,6 +36,7 @@ bool move_huge_pmd(struct vm_area_struct *vma, unsigned long old_addr,
> > int change_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > pmd_t *pmd, unsigned long addr, pgprot_t newprot,
> > unsigned long cp_flags);
> > +inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd);
> >
> > vm_fault_t vmf_insert_pfn_pmd(struct vm_fault *vmf, pfn_t pfn, bool write);
> > vm_fault_t vmf_insert_pfn_pud(struct vm_fault *vmf, pfn_t pfn, bool write);
> > diff --git a/include/linux/rmap.h b/include/linux/rmap.h
> > index 0f906dc6d280..8c2f45713351 100644
> > --- a/include/linux/rmap.h
> > +++ b/include/linux/rmap.h
> > @@ -100,6 +100,7 @@ enum ttu_flags {
> > * do a final flush if necessary */
> > TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock:
> > * caller holds it */
> > + TTU_LAZYFREE_THP = 0x100, /* avoid split PMD-mapped THP */
> > };
> >
> > #ifdef CONFIG_MMU
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index 58f2c4745d80..309fba9624c2 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -1801,7 +1801,7 @@ bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma,
> > return ret;
> > }
> >
> > -static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
> > +inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd)
> > {
> > pgtable_t pgtable;
> >
> > diff --git a/mm/rmap.c b/mm/rmap.c
> > index 2608c40dffad..4994f9e402d4 100644
> > --- a/mm/rmap.c
> > +++ b/mm/rmap.c
> > @@ -77,6 +77,7 @@
> > #include <linux/mm_inline.h>
> >
> > #include <asm/tlbflush.h>
> > +#include <asm/tlb.h>
> >
> > #define CREATE_TRACE_POINTS
> > #include <trace/events/tlb.h>
> > @@ -1606,6 +1607,80 @@ void folio_remove_rmap_pmd(struct folio *folio, struct page *page,
> > #endif
> > }
> >
> > +static bool __try_to_unmap_lazyfree_thp(struct vm_area_struct *vma,
> > + unsigned long address,
> > + struct folio *folio)
> > +{
> > + spinlock_t *ptl;
> > + pmd_t *pmdp, orig_pmd;
> > + struct mmu_notifier_range range;
> > + struct mmu_gather tlb;
> > + struct mm_struct *mm = vma->vm_mm;
> > + struct page *page;
> > + bool ret = false;
> > +
> > + VM_WARN_ON_FOLIO(!folio_test_anon(folio), folio);
> > + VM_WARN_ON_FOLIO(folio_test_swapbacked(folio), folio);
> > + VM_WARN_ON_FOLIO(!folio_test_pmd_mappable(folio), folio);
> > + VM_WARN_ON_FOLIO(!folio_test_locked(folio), folio);
> > +
> > + /*
> > + * If we encounter a PMD-mapped THP that marked as lazyfree, we
> > + * will try to unmap it without splitting.
> > + *
> > + * The folio exclusively mapped should only have two refs:
> > + * one from the isolation and one from the rmap.
> > + */
> > + if (folio_entire_mapcount(folio) != 1 || folio_test_dirty(folio) ||
> > + folio_ref_count(folio) != 2)
>
> folio_mapcount() == 1 is a bit nicer. Bit I assume you can drop that
> completely and only check the refcount?
Thanks for your suggestion!
+ if (folio_test_dirty(folio) || folio_ref_count(folio) != 2)
I'm not sure if it's safe without checking the folio_mapcount.
>
> > + return false;
> > +
> > + pmdp = mm_find_pmd(mm, address);
> > + if (unlikely(!pmdp))
> > + return false;
> > + if (pmd_dirty(*pmdp))
> > + return false;
> > +
> > + tlb_gather_mmu(&tlb, mm);
> > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, mm,
> > + address & HPAGE_PMD_MASK,
> > + (address & HPAGE_PMD_MASK) + HPAGE_PMD_SIZE);
> > + mmu_notifier_invalidate_range_start(&range);
> > +
> > + ptl = pmd_lock(mm, pmdp);
> > + orig_pmd = *pmdp;
> > + if (unlikely(!pmd_present(orig_pmd) || !pmd_trans_huge(orig_pmd)))
> > + goto out;
> > +
> > + page = pmd_page(orig_pmd);
> > + if (unlikely(page_folio(page) != folio))
> > + goto out;
> > +
> > + orig_pmd = pmdp_huge_get_and_clear(mm, address, pmdp);
> > + tlb_remove_pmd_tlb_entry(&tlb, pmdp, address);
>
> Until this point, the page could have been pinned (including GUP-fast)
> and we might be in trouble if we drop it.
Thanks for pointing that out!
+ if (pmd_dirty(orig_pmd) || folio_maybe_dma_pinned(folio) ||
folio_ref_count(folio) != 2) {
+ set_pmd_at(mm, address, pmdp, orig_pmd);
+ } else {
Could I check the folio->_pincount using folio_maybe_dma_pinned() and
then re-check the refcount here? Or should I just re-check the refcount?
IIUC, this folio has been already unlinked from the PMD and the process
cannot get an additional pin on this folio.
Thanks again for the review!
Lance
>
> --
> Cheers,
>
> David / dhildenb
>
next prev parent reply other threads:[~2024-04-18 6:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-17 14:11 Lance Yang
2024-04-17 15:02 ` David Hildenbrand
2024-04-18 6:40 ` Lance Yang [this message]
2024-04-17 15:08 ` Matthew Wilcox
2024-04-20 4:59 ` Lance Yang
2024-04-20 15:04 ` Lance Yang
2024-04-20 16:31 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAK1f24kDtOVRC67khxazQw1fS9LUyRrTzzf_ewRqYHQQu_r6AQ@mail.gmail.com \
--to=ioworker0@gmail.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=david@redhat.com \
--cc=fengwei.yin@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=maskray@google.com \
--cc=mhocko@suse.com \
--cc=minchan@kernel.org \
--cc=peterx@redhat.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=songmuchun@bytedance.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=xiehuan09@gmail.com \
--cc=ziy@nvidia.com \
--cc=zokeefe@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox