From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 357E5C7115B for ; Mon, 23 Jun 2025 03:35:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C27126B00B0; Sun, 22 Jun 2025 23:35:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BD7746B00B1; Sun, 22 Jun 2025 23:35:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC67B6B00B2; Sun, 22 Jun 2025 23:35:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 933F86B00B0 for ; Sun, 22 Jun 2025 23:35:49 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4135CC193D for ; Mon, 23 Jun 2025 03:35:49 +0000 (UTC) X-FDA: 83585251218.22.A6C81C7 Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf29.hostedemail.com (Postfix) with ESMTP id 46999120008 for ; Mon, 23 Jun 2025 03:35:47 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=B3M6Ouf9; spf=pass (imf29.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750649747; a=rsa-sha256; cv=none; b=hlGlQLxEtgq25E8xTSlndOXq+gRwMpyyPiAk6qjC3viwzxiz0Scdkz88ytZd2wIq57EM3b CbbbMbU+ITnXUeH0SqU0bKC0/ltL4ZhtEZds9BJIRBtZHTDkHNwG9dDsfRbAkDZd4iNAeu 0Wo7DvnobBbf3Q3MPvWjuQI7SSOiQAg= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=B3M6Ouf9; spf=pass (imf29.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750649747; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2KkbxdAFOFuv0wtY/Wddj5+p/K6uagaBbTkE8j++1e4=; b=wutzRitj+X/6i0gBQsnzDhNGZ8RfUxhTl1yWgYEUOfKNA5wHL+d41zX/ZFmwcMLDjmnen+ lI29HGo90YVE5m6koFuuAkF3EmrgcSi72TmQ8kEvxRfcoFU7S0fOhlRrnKXB3ezV4dlcG8 8UEFW9XmULaDaUON37m7OyAWF42sVM4= Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-32b50f357ecso30510261fa.2 for ; Sun, 22 Jun 2025 20:35:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750649745; x=1751254545; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2KkbxdAFOFuv0wtY/Wddj5+p/K6uagaBbTkE8j++1e4=; b=B3M6Ouf9inWuj3dNFv5uD0Smtj4c3+Wr0MrJzcmw5dICe8sqZBN1VED3vQW1XP1j5e Yih6TzZiotBc8h2inYTOVEjxPjWXBu+LR2hSGsK5Iwayliz5Edp8IpqGHpHT9MUsaZ2E PaIDD2t1X9854dR2kxYMHxyg1PCVlt9pQwzdnTUrBGENO6TLoVzWbmzXCtEzE2EZT509 KmIGbEy635uWU6Trsmnkt4Wmoxcj93lMPILbH5IrIdi7D6r5sW4PopF0AHLekjMHRFo8 vevXHirMDE222/nGTaLE+tm9b3sdh9yQSpD9UqIqwLMCrmJZGXqrk394Xv37tLF+fcld 6ykQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750649745; x=1751254545; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2KkbxdAFOFuv0wtY/Wddj5+p/K6uagaBbTkE8j++1e4=; b=qGQ8olZZLzHpqQT0jZIdiPKlgRzJlOP57iZvoQf257ITegpsouc32TivQZYaTpP4ln 1E/VlhVYpQynXrhCVDr9jtTGBVHvaFinrscUwZlRdcO7kMKdjJ+K/wAI0BBBflcn5+HN sjh7SN7jyAETDfI6ODXgUnvZ1/OAgsZDvy5wBlV+6KLnVKxzfhjlTgsGTwgdWXBNQvsO s2gGDXppNPMjCx2QyB6qF1ZS7/fjKFf+NBkU6hwwyLJ47Gf1RL+/MFBUrCuqUqen77Ga mHu9bTHWZFhbBDdX7NHIXyJ4A4bCCavPM6UisUIU3M+DgcL7KfjRgx+WB1ms72I8LXQK 1HbQ== X-Gm-Message-State: AOJu0Yx2XLfG/wU9F9qEpFKiLZfwrlbXkSb8ZqTY/AM4YiRjbdtLkTBg gZC2k/cppfSqGeH10Ib7iqQs4cxZi/IIXc8J6ytzoQ+i+OGRKrLESk7ICx80bhwtt1RrFqHiF8Q Sm//QXytHcm8v8Cy7tQC0k8moImLN4Vg= X-Gm-Gg: ASbGnctr0IU2P0CZPDtA7VSGB3BDbwXU0YOhFYzlYFQpc2MFKf0VCL7V/Ag39hg0ZbK YOJtwfNvTTjAfHfyokE7w3nVgvABi/dIl90IcclqqNi+CNYBRyRNutUUnLuIYcnjWQCGBl85LLY IuNnm445ZJpVJ5k/DAIe+wBCVzkaRAJY7BTegGSgbmlu0= X-Google-Smtp-Source: AGHT+IEJAOvAOWS6dCDGKZ6ptZAXHZMrZ7qNx8rjaBjlnfLRaz07f4dXTs4XM3J1EB5DX887tBnUXwpyIjca2YxtKUw= X-Received: by 2002:a05:651c:19a9:b0:32c:a097:4199 with SMTP id 38308e7fff4ca-32ca0974d60mr11089061fa.14.1750649745012; Sun, 22 Jun 2025 20:35:45 -0700 (PDT) MIME-Version: 1.0 References: <20250619175538.15799-1-ryncsn@gmail.com> <20250619175538.15799-2-ryncsn@gmail.com> In-Reply-To: From: Kairui Song Date: Mon, 23 Jun 2025 11:35:06 +0800 X-Gm-Features: AX0GCFtzItZYKRxpqdml7b_LAPN3NsUUZvRGHxiVFpFVW0Xh9YT2Ur-pVbfmadM Message-ID: Subject: Re: [PATCH v2 1/4] mm/shmem, swap: improve cached mTHP handling and fix potential hung To: Baolin Wang Cc: linux-mm@kvack.org, Andrew Morton , Hugh Dickins , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 46999120008 X-Stat-Signature: 1ohifbfby3u36ebipfm6dubuqwcoubkz X-Rspam-User: X-HE-Tag: 1750649747-702005 X-HE-Meta: U2FsdGVkX19bQFLKLjMyAae21F22DyVhvCGgoyyy8mBlZc23P80UViud3lHx713babRfe0BRhawwEcZRGb3h1yyJqaNs00QwXtzodeqvcy/b3NUu1TmaQyB9QDp1Eye5jDKs/k6sP2yphs5AzV5Jn8FfepB76n0rTCte3RS+wxHWtuLcOmfVND7kimovTJlgj055/yNsU9Ly9x3geieyrZ379X9cni1cNiOPkhZ6a4I1i8I1EyDDI4nxmEZE0zhrehq8rTI8nGeIsV6nK96+DWXLNczfZSNqUXH9DWuom8L46fh10bL7ewF4M6pfw3Q9wRz1FupooIbMPLZTyq2Lhkod1dPrbdDt2ZTgSHtOlD9TOj5M2WZwfMg4HAZTFGlwMlKL1pj0e6K4ldfpaHTACEk+wMC1gatExPcEYDOFNjJTVaQFiGvrmUC/Wz1Q7QCDUhETq7dWlrJINrFYYTQKCehNvRnGe0VXUlY72lgROvlEYUZvU7XkSwS2I4gxrL9eE+jpFa2vUYxtUAZPWKHRpIIj52RUqfc//V2wtr8hfo58flvcJDpife+cb0VBbKkMrPNX39D7RjuChRMLE0VSKStYc1pyV6KHlAHxHPd6/S1AHeeWllvfYRSOLjo/NCGbERJBM9z7hVBwafwyPdVvaECyin5XFWlJfsnCegsZU07DOux2Sd3TAH5ux2LVgCQWdh016Bt14zxx/qAUswXL5B++KHg4jkLFAqCS/5AsPG05XzjPpRw6T8FD27uBhK3tskoBYySxzVF1Sw3rAccctNb0d4GBk2kr989TYXhrOEvn+Al2Mm3udm90pmb3IVjbetQlYWc2w2AEJ9Yl/RKaG2z6cH/m2QVNmie0JFfp+c6E0NJTAMiU5Z+1cDNqwXXOydHCwYt2YHFGqGIqVPyTH+TksoxuhV0+Cp8I3D0H1QFqx+m36N7m0XUdGB0oUVDiPz6LZVmORNqGVxcZbdS n+cvJtIa NlUjvCsQqhYk+Ftt+8503dBpqeMNy/s8hjGt/qvVW9/hy5ckTLRfjtIBfevvNA9c3DnBSB8Z9W2pjgjSde2lBSlekUB7PbXVx9+iZcR3xJFN6nc1odExOccMvIzYblVCIqs/l2qSCSSIJ4NSaFf1IrdgdejSLEZ1iZkbnOPIQVWrI0ZUN54ffiK82b78hwrki5OQDwUzTAbOFATHLVZNAMzfL4dI/8BH4UKJgc1/beMUzKgsnx8QhB+WJW5p9uYBq7+LNlVzl2dxBCUaDy9S+tDVKXYjKZGKKtm2HZNcyuPXc9iWfythmzn5h94WS5Q3p5bpcIO+qezGk7M5a0P0hkQc8ciGy9kxPArK4PVPvA2dRE0Fl+upaBcO6fzKB08JuxBL/JAY83BAiUyfoaGkBQcWs5AsPtZNf3p4f X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 23, 2025 at 11:26=E2=80=AFAM Baolin Wang wrote: > > Hi Kairui, > > On 2025/6/20 01:55, Kairui Song wrote: > > From: Kairui Song > > > > The current swap-in code assumes that, when a swap entry in shmem > > mapping is order 0, its cached folios (if present) must be order 0 > > too, which turns out not always correct. > > > > The problem is shmem_split_large_entry is called before verifying the > > folio will eventually be swapped in, one possible race is: > > > > CPU1 CPU2 > > shmem_swapin_folio > > /* swap in of order > 0 swap entry S1 */ > > folio =3D swap_cache_get_folio > > /* folio =3D NULL */ > > order =3D xa_get_order > > /* order > 0 */ > > folio =3D shmem_swap_alloc_folio > > /* mTHP alloc failure, folio =3D NULL */ > > <... Interrupted ...> > > shmem_swapin_folio > > /* S1 is swapped in */ > > shmem_writeout > > /* S1 is swapped out, folio cached */ > > shmem_split_large_entry(..., S1) > > /* S1 is split, but the folio covering it has order > 0 now */ > > > > Now any following swapin of S1 will hang: `xa_get_order` returns 0, > > and folio lookup will return a folio with order > 0. The > > `xa_get_order(&mapping->i_pages, index) !=3D folio_order(folio)` will > > always return false causing swap-in to return -EEXIST. > > > > And this looks fragile. So fix this up by allowing seeing a larger foli= o > > in swap cache, and check the whole shmem mapping range covered by the > > swapin have the right swap value upon inserting the folio. And drop > > the redundant tree walks before the insertion. > > > > This will actually improve the performance, as it avoided two redundant > > Xarray tree walks in the hot path, and the only side effect is that in > > the failure path, shmem may redundantly reallocate a few folios > > causing temporary slight memory pressure. > > > > And worth noting, it may seems the order and value check before > > inserting might help reducing the lock contention, which is not true. > > The swap cache layer ensures raced swapin will either see a swap cache > > folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if > > swap cache is bypassed), so holding the folio lock and checking the > > folio flag is already good enough for avoiding the lock contention. > > The chance that a folio passes the swap entry value check but the > > shmem mapping slot has changed should be very low. > > Thanks for fixing the issue. Sadly, I haven't reproduced this issue from > my previous test cases :( > > And I have a question below. > > > Cc: stable@vger.kernel.org > > Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") > > Signed-off-by: Kairui Song > > Reviewed-by: Kemeng Shi > > --- > > mm/shmem.c | 30 +++++++++++++++++++++--------- > > 1 file changed, 21 insertions(+), 9 deletions(-) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index eda35be2a8d9..4e7ef343a29b 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -884,7 +884,9 @@ static int shmem_add_to_page_cache(struct folio *fo= lio, > > pgoff_t index, void *expected, gfp_t g= fp) > > { > > XA_STATE_ORDER(xas, &mapping->i_pages, index, folio_order(folio))= ; > > - long nr =3D folio_nr_pages(folio); > > + unsigned long nr =3D folio_nr_pages(folio); > > + swp_entry_t iter, swap; > > + void *entry; > > > > VM_BUG_ON_FOLIO(index !=3D round_down(index, nr), folio); > > VM_BUG_ON_FOLIO(!folio_test_locked(folio), folio); > > @@ -896,14 +898,24 @@ static int shmem_add_to_page_cache(struct folio *= folio, > > > > gfp &=3D GFP_RECLAIM_MASK; > > folio_throttle_swaprate(folio, gfp); > > + swap =3D iter =3D radix_to_swp_entry(expected); > > > > do { > > xas_lock_irq(&xas); > > - if (expected !=3D xas_find_conflict(&xas)) { > > - xas_set_err(&xas, -EEXIST); > > - goto unlock; > > + xas_for_each_conflict(&xas, entry) { > > + /* > > + * The range must either be empty, or filled with > > + * expected swap entries. Shmem swap entries are = never > > + * partially freed without split of both entry an= d > > + * folio, so there shouldn't be any holes. > > + */ > > + if (!expected || entry !=3D swp_to_radix_entry(it= er)) { > > + xas_set_err(&xas, -EEXIST); > > + goto unlock; > > + } > > + iter.val +=3D 1 << xas_get_order(&xas); > > } > > - if (expected && xas_find_conflict(&xas)) { > > + if (expected && iter.val - nr !=3D swap.val) { > > xas_set_err(&xas, -EEXIST); > > goto unlock; > > } > > @@ -2323,7 +2335,7 @@ static int shmem_swapin_folio(struct inode *inode= , pgoff_t index, > > error =3D -ENOMEM; > > goto failed; > > } > > - } else if (order !=3D folio_order(folio)) { > > + } else if (order > folio_order(folio)) { > > /* > > * Swap readahead may swap in order 0 folios into swapcac= he > > * asynchronously, while the shmem mapping can still stor= es > > @@ -2348,15 +2360,15 @@ static int shmem_swapin_folio(struct inode *ino= de, pgoff_t index, > > > > swap =3D swp_entry(swp_type(swap), swp_offset(swa= p) + offset); > > } > > + } else if (order < folio_order(folio)) { > > + swap.val =3D round_down(swp_type(swap), folio_order(folio= )); > > Why rounding down the swap type? do you mean rounding down the swap offse= t? Ouch, right, it should be the value: swap.val =3D round_down(swap.val, folio_order(folio)); I messed up the code here during a rebase, let me send a V3 then.