From: Yang Shi <shy828301@gmail.com>
To: Hugh Dickins <hughd@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Usama Arif <usamaarif642@gmail.com>,
Wei Yang <richard.weiyang@gmail.com>,
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>,
Matthew Wilcox <willy@infradead.org>,
David Hildenbrand <david@redhat.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Barry Song <baohua@kernel.org>,
Kefeng Wang <wangkefeng.wang@huawei.com>,
Ryan Roberts <ryan.roberts@arm.com>,
Nhat Pham <nphamcs@gmail.com>, Zi Yan <ziy@nvidia.com>,
Chris Li <chrisl@kernel.org>,
linux-kernel@vger.kernel.org, linux-mm@kvack.org
Subject: Re: [PATCH hotfix 2/2] mm/thp: fix deferred split unqueue naming and locking
Date: Fri, 25 Oct 2024 09:34:29 -0700 [thread overview]
Message-ID: <CAHbLzkqjO6X_k91xFFRG+5FLkzxvc0UKsUomW0_oYMv68TCHQg@mail.gmail.com> (raw)
In-Reply-To: <b3e88e26-5bda-50fc-cc55-a62b4b2a4e24@google.com>
On Thu, Oct 24, 2024 at 11:57 PM Hugh Dickins <hughd@google.com> wrote:
>
> On Thu, 24 Oct 2024, Yang Shi wrote:
> > On Wed, Oct 23, 2024 at 9:13 PM Hugh Dickins <hughd@google.com> wrote:
> > >
> > > That goes back to 5.4 commit 87eaceb3faa5 ("mm: thp: make deferred split
> > > shrinker memcg aware"): which included a check on swapcache before adding
> > > to deferred queue (which can now be removed), but no check on deferred
> > > queue before adding THP to swapcache (maybe other circumstances prevented
> > > it at that time, but not now).
> >
> > If I remember correctly, THP just can be added to deferred list when
> > there is no PMD map before mTHP swapout, so shrink_page_list() did
> > check THP's compound_mapcount (called _entire_mapcount now) before
> > adding it to swap cache.
> >
> > Now the code just checked whether the large folio is on deferred list or not.
>
> I've continued to find it hard to think about, hard to be convinced by
> that sequence of checks, without an actual explicit _deferred_list check.
You meant the swap cache check? I was trying to recall the reason. If
I remember correctly (sorry, memory is still vague), if the THP was
PMD-mapped and PTE-mapped by two processes, the THP may be added to
swap cache since just compound_mapcount was checked. Then
try_to_unmap() in shrink_page_list() may add it to deferred list if
PMD mapping was unmapped first. The potential list corruption fixed by
you now may be triggered.
But this was based on the assumption that there can't be PMD-mapped
THP on deferred list. If this happens (as David's migration fail
example), the swap cache check should be not enough. This case was
overlooked.
>
> David has brilliantly come up with the failed THP migration example;
> and I think now perhaps 5.8's 5503fbf2b0b8 ("khugepaged: allow to
> collapse PTE-mapped compound pages") introduced another way?
>
> But I certainly need to reword that wagging finger pointing to your
> commit: these are much more exceptional cases than I was thinking there.
>
> I have this evening tried running swapping load on 5.10 and 6.6 and 6.11,
> each with just a BUG_ON(!list_empty(the deferred list)) before resetting
> memcg in mem_cgroup_swapout() - it would of course be much easier to hit
> such a BUG_ON() than for the consequent wrong locking to be so unlucky
> as to actually result in list corruption.
>
> None of those BUG_ONs hit; though I was only running each for 1.5 hour,
> and looking at vmstats at the end, see the were really not exercising
> deferred split very much at all. I'd been hoping for an immediate hit
> (as on 6.12-rc) to confirm my doubt, but no. That doesn't *prove* you're
> right, but (excepting David's and my weird cases) I bet you are right.
Maybe it just happened rarely and was hard to hit. But still
theoretically possible. Your fix is more reliable.
>
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 4b21a368b4e2..57f64b5d0004 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -2681,7 +2681,9 @@ void free_unref_folios(struct folio_batch *folios)
> > > unsigned long pfn = folio_pfn(folio);
> > > unsigned int order = folio_order(folio);
> > >
> > > - folio_undo_large_rmappable(folio);
> > > + if (mem_cgroup_disabled())
> > > + folio_unqueue_deferred_split(folio);
> >
> > This looks confusing. It looks all callsites of free_unref_folios()
> > have folio_unqueue_deferred_split() and memcg uncharge called before
> > it. If there is any problem, memcg uncharge should catch it. Did I
> > miss something?
>
> I don't understand what you're suggesting there. But David remarked
> on it too, so it seems that I do need at least to add some comment.
>
> I'd better re-examine the memcg/non-memcg forking paths again: but by
> strange coincidence (or suggestion?), I'm suddenly now too tired here,
> precisely where David stopped too. I'll have to come back to this
> tomorrow, sorry.
I perhaps misunderstood this code. Just feel free to correct me if it
doesn't make sense to you. But, yes, some comments are definitely
welcome and helpful for understanding the code and review.
>
> Hugh
next prev parent reply other threads:[~2024-10-25 16:34 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-10-24 4:10 [PATCH hotfix 1/2] mm/thp: fix deferred split queue not partially_mapped Hugh Dickins
2024-10-24 4:13 ` [PATCH hotfix 2/2] mm/thp: fix deferred split unqueue naming and locking Hugh Dickins
2024-10-24 20:00 ` Yang Shi
2024-10-25 1:21 ` Yang Shi
2024-10-25 6:57 ` Hugh Dickins
2024-10-25 16:34 ` Yang Shi [this message]
2024-10-27 5:35 ` Hugh Dickins
2024-10-24 20:52 ` David Hildenbrand
2024-10-25 1:25 ` Yang Shi
2024-10-27 7:07 ` Hugh Dickins
2024-10-24 10:20 ` [PATCH hotfix 1/2] mm/thp: fix deferred split queue not partially_mapped Usama Arif
2024-10-24 20:39 ` David Hildenbrand
2024-10-24 22:37 ` Zi Yan
2024-10-25 5:41 ` Hugh Dickins
2024-10-25 15:32 ` Zi Yan
2024-10-25 18:36 ` Yang Shi
2024-10-27 5:08 ` Hugh Dickins
2024-10-28 18:36 ` Yang Shi
2024-10-27 4:43 ` Hugh Dickins
2024-10-25 1:56 ` Baolin Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAHbLzkqjO6X_k91xFFRG+5FLkzxvc0UKsUomW0_oYMv68TCHQg@mail.gmail.com \
--to=shy828301@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=chrisl@kernel.org \
--cc=david@redhat.com \
--cc=hannes@cmpxchg.org \
--cc=hughd@google.com \
--cc=kirill.shutemov@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=richard.weiyang@gmail.com \
--cc=ryan.roberts@arm.com \
--cc=usamaarif642@gmail.com \
--cc=wangkefeng.wang@huawei.com \
--cc=willy@infradead.org \
--cc=ziy@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox