From: David Hildenbrand <david@redhat.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Zi Yan <ziy@nvidia.com>,
linux-mm@kvack.org, Andrew Morton <akpm@linux-foundation.org>,
Matthew Wilcox <willy@infradead.org>,
Ryan Roberts <ryan.roberts@arm.com>,
Barry Song <21cnbao@gmail.com>,
linux-kernel@vger.kernel.org
Subject: Re: [PATCH] mm/rmap: do not add fully unmapped large folio to deferred split list
Date: Mon, 15 Apr 2024 21:19:25 +0200 [thread overview]
Message-ID: <c76c78ab-7146-4963-97aa-980b767e84a5@redhat.com> (raw)
In-Reply-To: <CAHbLzkpMjSUpB2gsYH+4kkEtPuyS4bP7ord+nSgR9xcp3fyVFQ@mail.gmail.com>
>>
>> We could have
>> * THP_DEFERRED_SPLIT_PAGE
>> * THP_UNDO_DEFERRED_SPLIT_PAGE
>> * THP_PERFORM_DEFERRED_SPLIT_PAGE
>>
>> Maybe that would catch more cases (not sure if all, though). Then, you
>> could tell how many are still on that list. THP_DEFERRED_SPLIT_PAGE -
>> THP_UNDO_DEFERRED_SPLIT_PAGE - THP_PERFORM_DEFERRED_SPLIT_PAGE.
>>
>> That could give one a clearer picture how deferred split interacts with
>> actual splitting (possibly under memory pressure), the whole reason why
>> deferred splitting was added after all.
>
> I'm not quite sure whether there is a solid usecase or not. If we
> have, we could consider this. But a simpler counter may be more
> preferred.
Yes.
>
>>
>>> It may be useful. However the counter is typically used to estimate
>>> how many THP are partially unmapped during a period of time.
>>
>> I'd say it's a bit of an abuse of that counter; well, or interpreting
>> something into the counter that that counter never reliably represented.
>
> It was way more reliable than now.
Correct me if I am wrong: now that we only adjust the counter for
PMD-sized THP, it is as (un)reliable as it always was.
Or was there another unintended change by some of my cleanups or
previous patches?
>
>>
>> I can easily write a program that keeps sending your counter to infinity
>> simply by triggering that behavior in a loop, so it's all a bit shaky.
>
> I don't doubt that. But let's get back to reality. The counter used to
> stay reasonable and reliable with most real life workloads before
> mTHP. There may be over-counting, for example, when unmapping a
> PTE-mapped THP which was not on a deferred split queue before. But
> such a case is not common for real life workloads because the huge PMD
> has to be split by partial unmap for most cases. And the partial unmap
> will add the THP to deferred split queue.
>
> But now a common workload, for example, just process exit, may
> probably send the counter to infinity.
Agreed, that's stupid.
>
>>
>> Something like Ryans script makes more sense, where you get a clearer
>> picture of what's mapped where and how. Because that information can be
>> much more valuable than just knowing if it's mapped fully or partially
>> (again, relevant for handling with memory waste).
>
> Ryan's script is very helpful. But the counter has been existing and
> used for years, and it is a quick indicator and much easier to monitor
> in a large-scale fleet.
>
> If we think the reliability of the counter is not worth fixing, why
> don't we just remove it. No counter is better than a broken counter.
Again, is only counting the PMD-sized THPs "fixing" the old use cases?
Then it should just stick around. And we can even optimize it for some
more cases as proposed in this patch. But there is no easy way to "get
it completely right" I'm afraid.
--
Cheers,
David / dhildenb
next prev parent reply other threads:[~2024-04-15 19:19 UTC|newest]
Thread overview: 24+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-04-11 15:32 Zi Yan
2024-04-11 15:46 ` David Hildenbrand
2024-04-11 19:01 ` Yang Shi
2024-04-11 21:15 ` David Hildenbrand
2024-04-11 21:59 ` Yang Shi
2024-04-12 14:21 ` Zi Yan
2024-04-12 14:31 ` Zi Yan
2024-04-12 18:29 ` Yang Shi
2024-04-12 19:36 ` David Hildenbrand
2024-04-12 20:21 ` Yang Shi
2024-04-12 19:06 ` David Hildenbrand
2024-04-12 14:35 ` Zi Yan
2024-04-12 19:32 ` David Hildenbrand
2024-04-12 20:35 ` Yang Shi
2024-04-15 15:43 ` David Hildenbrand
2024-04-12 21:06 ` Zi Yan
2024-04-12 22:29 ` Yang Shi
2024-04-12 22:59 ` Zi Yan
2024-04-13 0:50 ` Yang Shi
2024-04-15 15:40 ` David Hildenbrand
2024-04-15 17:54 ` Yang Shi
2024-04-15 19:19 ` David Hildenbrand [this message]
2024-04-15 21:16 ` Yang Shi
2024-04-15 15:13 ` David Hildenbrand
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c76c78ab-7146-4963-97aa-980b767e84a5@redhat.com \
--to=david@redhat.com \
--cc=21cnbao@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox