From: Zi Yan <ziy@nvidia.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Usama Arif <usamaarif642@gmail.com>,
Yang Shi <shy828301@gmail.com>,
Wei Yang <richard.weiyang@gmail.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@infradead.org>,
David Hildenbrand <david@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Barry Song <baohua@kernel.org>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Nhat Pham <nphamcs@gmail.com>, Chris Li <chrisl@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH hotfix 1/2] mm/thp: fix deferred split queue not partially_mapped
Date: Thu, 24 Oct 2024 18:37:08 -0400 [thread overview]
Message-ID: <3A1E5353-D8C5-4D38-A3FF-BFC671FC25CE@nvidia.com> (raw)
In-Reply-To: <760237a3-69d6-9197-432d-0306d52c048a@google.com>
On 24 Oct 2024, at 0:10, Hugh Dickins wrote:
> Recent changes are putting more pressure on THP deferred split queues:
> under load revealing long-standing races, causing list_del corruptions,
> "Bad page state"s and worse (I keep BUGs in both of those, so usually
> don't get to see how badly they end up without). The relevant recent
> changes being 6.8's mTHP, 6.10's mTHP swapout, and 6.12's mTHP swapin,
> improved swap allocation, and underused THP splitting.
>
> The new unlocked list_del_init() in deferred_split_scan() is buggy.
> I gave bad advice, it looks plausible since that's a local on-stack
> list, but the fact is that it can race with a third party freeing or
> migrating the preceding folio (properly unqueueing it with refcount 0
> while holding split_queue_lock), thereby corrupting the list linkage.
>
> The obvious answer would be to take split_queue_lock there: but it has
> a long history of contention, so I'm reluctant to add to that. Instead,
> make sure that there is always one safe (raised refcount) folio before,
> by delaying its folio_put(). (And of course I was wrong to suggest
> updating split_queue_len without the lock: leave that until the splice.)
I feel like this is not the right approach, since it breaks the existing
condition of changing folio->_deferred_list, namely taking
ds_queue->split_queue_lock for serialization. The contention might not be
as high as you think, since if a folio were split, the split_queue_lock
needed to be taken during split anyway. So the worse case is the same
as all folios are split. Do you see significant perf degradation due to
taking the lock when doing list_del_init()?
I am afraid if we take this route, we might hit hard-to-debug bugs
in the future when someone touches the code.
Thanks.
>
> And remove two over-eager partially_mapped checks, restoring those tests
> to how they were before: if uncharge_folio() or free_tail_page_prepare()
> finds _deferred_list non-empty, it's in trouble whether or not that folio
> is partially_mapped (and the flag was already cleared in the latter case).
>
> Fixes: dafff3f4c850 ("mm: split underused THPs")
> Signed-off-by: Hugh Dickins <hughd@google.com>
> ---
> mm/huge_memory.c | 21 +++++++++++++++++----
> mm/memcontrol.c | 3 +--
> mm/page_alloc.c | 5 ++---
> 3 files changed, 20 insertions(+), 9 deletions(-)
Best Regards,
Yan, Zi
next prev parent reply other threads:[~2024-10-24 22:37 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-24 4:10 Hugh Dickins
2024-10-24 4:13 ` [PATCH hotfix 2/2] mm/thp: fix deferred split unqueue naming and locking Hugh Dickins
2024-10-24 20:00 ` Yang Shi
2024-10-25 1:21 ` Yang Shi
2024-10-25 6:57 ` Hugh Dickins
2024-10-25 16:34 ` Yang Shi
2024-10-27 5:35 ` Hugh Dickins
2024-10-24 20:52 ` David Hildenbrand
2024-10-25 1:25 ` Yang Shi
2024-10-27 7:07 ` Hugh Dickins
2024-10-24 10:20 ` [PATCH hotfix 1/2] mm/thp: fix deferred split queue not partially_mapped Usama Arif
2024-10-24 20:39 ` David Hildenbrand
2024-10-24 22:37 ` Zi Yan [this message]
2024-10-25 5:41 ` Hugh Dickins
2024-10-25 15:32 ` Zi Yan
2024-10-25 18:36 ` Yang Shi
2024-10-27 5:08 ` Hugh Dickins
2024-10-28 18:36 ` Yang Shi
2024-10-27 4:43 ` Hugh Dickins
2024-10-25 1:56 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3A1E5353-D8C5-4D38-A3FF-BFC671FC25CE@nvidia.com \
--to=ziy@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=richard.weiyang@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=shy828301@gmail.com \
--cc=usamaarif642@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox