From: Kairui Song <ryncsn@gmail.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: linux-mm@kvack.org, Hugh Dickins <hughd@google.com>,
Baolin Wang <baolin.wang@linux.alibaba.com>,
Matthew Wilcox <willy@infradead.org>,
Kemeng Shi <shikemeng@huaweicloud.com>,
Chris Li <chrisl@kernel.org>, Nhat Pham <nphamcs@gmail.com>,
Baoquan He <bhe@redhat.com>, Barry Song <baohua@kernel.org>,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH 1/4] mm/shmem, swap: improve cached mTHP handling and fix potential hung
Date: Wed, 18 Jun 2025 10:11:21 +0800 [thread overview]
Message-ID: <CAMgjq7BLKv8d5+TNbEqSiPSteJvjTBsbphwDsxdR4Mk0gj7C7g@mail.gmail.com> (raw)
In-Reply-To: <20250617155857.589c3e700b06af7dff085166@linux-foundation.org>
On Wed, Jun 18, 2025 at 6:58 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Wed, 18 Jun 2025 02:35:00 +0800 Kairui Song <ryncsn@gmail.com> wrote:
>
> > From: Kairui Song <kasong@tencent.com>
> >
> > The current swap-in code assumes that, when a swap entry in shmem
> > mapping is order 0, its cached folios (if present) must be order 0
> > too, which turns out not always correct.
> >
> > The problem is shmem_split_large_entry is called before verifying the
> > folio will eventually be swapped in, one possible race is:
> >
> > CPU1 CPU2
> > shmem_swapin_folio
> > /* swap in of order > 0 swap entry S1 */
> > folio = swap_cache_get_folio
> > /* folio = NULL */
> > order = xa_get_order
> > /* order > 0 */
> > folio = shmem_swap_alloc_folio
> > /* mTHP alloc failure, folio = NULL */
> > <... Interrupted ...>
> > shmem_swapin_folio
> > /* S1 is swapped in */
> > shmem_writeout
> > /* S1 is swapped out, folio cached */
> > shmem_split_large_entry(..., S1)
> > /* S1 is split, but the folio covering it has order > 0 now */
> >
> > Now any following swapin of S1 will hang: `xa_get_order` returns 0,
> > and folio lookup will return a folio with order > 0. The
> > `xa_get_order(&mapping->i_pages, index) != folio_order(folio)` will
> > always return false causing swap-in to return -EEXIST.
> >
> > And this looks fragile. So fix this up by allowing seeing a larger folio
> > in swap cache, and check the whole shmem mapping range covered by the
> > swapin have the right swap value upon inserting the folio. And drop
> > the redundant tree walks before the insertion.
> >
> > This will actually improve the performance, as it avoided two redundant
> > Xarray tree walks in the hot path, and the only side effect is that in
> > the failure path, shmem may redundantly reallocate a few folios
> > causing temporary slight memory pressure.
> >
> > And worth noting, it may seems the order and value check before
> > inserting might help reducing the lock contention, which is not true.
> > The swap cache layer ensures raced swapin will either see a swap cache
> > folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if
> > swap cache is bypassed), so holding the folio lock and checking the
> > folio flag is already good enough for avoiding the lock contention.
> > The chance that a folio passes the swap entry value check but the
> > shmem mapping slot has changed should be very low.
> >
> > Cc: stable@vger.kernel.org
> > Fixes: 058313515d5a ("mm: shmem: fix potential data corruption during shmem swapin")
> > Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
>
> The Fixes: tells -stable maintainers (and others) which kernel versions
> need the fix. So having two Fixes: against different kernel versions is
> very confusing! Are we recommending that kernels which contain
> 809bc86517cc but not 058313515d5a be patched?
809bc86517cc introduced mTHP support for shmem but it's buggy, and
058313515d5a tried to fix that, which is also buggy, I thought this
could help people to backport this.
I think keeping either is OK, I'll keep 809bc86517cc then, any branch
having 809bc86517cc should already have 058313515d5a backported.
next prev parent reply other threads:[~2025-06-18 2:11 UTC|newest]
Thread overview: 22+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-06-17 18:34 [PATCH 0/4] mm/shmem, swap: bugfix and improvement of mTHP swap in Kairui Song
2025-06-17 18:35 ` [PATCH 1/4] mm/shmem, swap: improve cached mTHP handling and fix potential hung Kairui Song
2025-06-17 22:58 ` Andrew Morton
2025-06-18 2:11 ` Kairui Song [this message]
2025-06-18 2:08 ` Kemeng Shi
2025-06-17 18:35 ` [PATCH 2/4] mm/shmem, swap: avoid redundant Xarray lookup during swapin Kairui Song
2025-06-18 2:48 ` Kemeng Shi
2025-06-18 3:07 ` Kairui Song
2025-06-19 1:30 ` Kemeng Shi
2025-06-18 7:16 ` Dev Jain
2025-06-18 7:22 ` Kairui Song
2025-06-18 7:29 ` Dev Jain
2025-06-17 18:35 ` [PATCH 3/4] mm/shmem, swap: improve mthp swapin process Kairui Song
2025-06-18 6:27 ` Kemeng Shi
2025-06-18 6:50 ` Kairui Song
2025-06-18 8:08 ` Kemeng Shi
2025-06-18 8:26 ` Kemeng Shi
2025-06-18 8:46 ` Kairui Song
2025-06-19 1:32 ` Kemeng Shi
2025-06-17 18:35 ` [PATCH 4/4] mm/shmem, swap: avoid false positive swap cache lookup Kairui Song
2025-06-19 1:28 ` Kemeng Shi
2025-06-19 17:37 ` Kairui Song
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAMgjq7BLKv8d5+TNbEqSiPSteJvjTBsbphwDsxdR4Mk0gj7C7g@mail.gmail.com \
--to=ryncsn@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=baohua@kernel.org \
--cc=baolin.wang@linux.alibaba.com \
--cc=bhe@redhat.com \
--cc=chrisl@kernel.org \
--cc=hughd@google.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=nphamcs@gmail.com \
--cc=shikemeng@huaweicloud.com \
--cc=stable@vger.kernel.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox