linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lance Yang <ioworker0@gmail.com>
To: Barry Song <21cnbao@gmail.com>
Cc: Linux-MM <linux-mm@kvack.org>,
	Ryan Roberts <ryan.roberts@arm.com>,
	 David Hildenbrand <david@redhat.com>,
	Baolin Wang <baolin.wang@linux.alibaba.com>,
	 Andrew Morton <akpm@linux-foundation.org>
Subject: Re: All MADV_FREE mTHPs are fully subjected to deferred_split_folio()
Date: Mon, 30 Dec 2024 10:14:04 +0800	[thread overview]
Message-ID: <CAK1f24=aY3n72EgQR6MAXPVwabNMhbKzT=b2zVEG68MwHy1BCw@mail.gmail.com> (raw)
In-Reply-To: <CAGsJ_4wOL6TLa3FKQASdrGfuqqu=14EuxAtpKmnebiGLm0dnfA@mail.gmail.com>

Hi Barry,

On Mon, Dec 30, 2024 at 5:13 AM Barry Song <21cnbao@gmail.com> wrote:
>
> Hi Lance,
>
> Along with Ryan, David, Baolin, and anyone else who might be interested,
>
> We’ve noticed an unexpectedly high number of deferred splits. The root
> cause appears to be the changes introduced in commit dce7d10be4bbd3
> ("mm/madvise: optimize lazyfreeing with mTHP in madvise_free"). Since
> that commit, split_folio is no longer called in mm/madvise.c.
>
> However, we are still performing deferred_split_folio for all
> MADV_FREE mTHPs, even for those that are fully aligned with mTHP.
> This happens because we execute a goto discard in
> try_to_unmap_one(), which eventually leads to
> folio_remove_rmap_pte() adding all folios to deferred_split when we
> scan the 1st pte in try_to_unmap_one().
>
> discard:
>                 if (unlikely(folio_test_hugetlb(folio)))
>                         hugetlb_remove_rmap(folio);
>                 else
>                         folio_remove_rmap_pte(folio, subpage, vma);
>
> This could lead to a race condition with shrinker - deferred_split_scan().
> The shrinker might call folio_try_get(folio), and while we are scanning
> the second PTE of this folio in try_to_unmap_one(), the entire mTHP
> could be transitioned back to swap-backed because the reference count
> is incremented.
>
>                                 /*
>                                  * The only page refs must be one from isolation
>                                  * plus the rmap(s) (dropped by discard:).
>                                  */
>                                 if (ref_count == 1 + map_count &&
>                                     (!folio_test_dirty(folio) ||
>                                      ...
>                                      (vma->vm_flags & VM_DROPPABLE))) {
>                                         dec_mm_counter(mm, MM_ANONPAGES);
>                                         goto discard;
>                                 }
>
> It also significantly increases contention on ds_queue->split_queue_lock during
> memory reclamation and could potentially introduce other race conditions with
> shrinker as well.

Good catch!

>
> I’m curious if anyone has suggestions for resolving this issue. My
> idea is to use
> folio_remove_rmap_ptes to drop all PTEs at once, rather than
> folio_remove_rmap_pte,
> which processes PTEs one by one for an mTHP. This approach would require some
> changes, such as checking the dirty state of PTEs and performing a TLB
> flush for the
> entire mTHP as a whole in try_to_unmap_one().

Yeah, IHMO, it would also be beneficial to reclaim entire mTHPs as a whole
in real-world scenarios where MADV_FREE mTHPs are typically no longer
written ;)

>
> Please let me know if you have any objections or alternative suggestions.

Let's hear suggestions from other folks as well ~

Thanks,
Lance

>
> Thanks
> Barry


  reply	other threads:[~2024-12-30  2:14 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-29 21:12 Barry Song
2024-12-30  2:14 ` Lance Yang [this message]
2024-12-30  9:48   ` David Hildenbrand
2024-12-30 11:54     ` Barry Song
2024-12-30 12:52       ` David Hildenbrand
2024-12-30 16:02         ` Lance Yang
2024-12-30 19:19         ` Barry Song
2024-12-30 19:32           ` David Hildenbrand
2024-12-30 20:22             ` Barry Song
2024-12-30 20:31               ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK1f24=aY3n72EgQR6MAXPVwabNMhbKzT=b2zVEG68MwHy1BCw@mail.gmail.com' \
    --to=ioworker0@gmail.com \
    --cc=21cnbao@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=david@redhat.com \
    --cc=linux-mm@kvack.org \
    --cc=ryan.roberts@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox