From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4B61C71157 for ; Wed, 18 Jun 2025 02:11:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2D4B06B0088; Tue, 17 Jun 2025 22:11:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2AC246B0089; Tue, 17 Jun 2025 22:11:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1E8FF6B008A; Tue, 17 Jun 2025 22:11:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 1243D6B0088 for ; Tue, 17 Jun 2025 22:11:44 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E15481008DE for ; Wed, 18 Jun 2025 02:11:42 +0000 (UTC) X-FDA: 83566895244.25.B20A07C Received: from mail-lj1-f176.google.com (mail-lj1-f176.google.com [209.85.208.176]) by imf12.hostedemail.com (Postfix) with ESMTP id F38A140014 for ; Wed, 18 Jun 2025 02:11:40 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LxoCme5z; spf=pass (imf12.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750212701; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=zHAOHa3W5FppAF2bP4Z945KCWTGy+r+Tavq5xMn89hc=; b=x8u824QDC4EdM+KhKrN6jjH+/O3TbQFBajASf787alzWR0HO5zu0O3EU/ewMdHrFzDZBF6 m6YzqScIMOlHWkn89c99FTaDX9jiHQ5x/v1dBPzgninSNFegFYcIVe9mQwzVgGpqHHxqRZ InEfQzTPfjfWVV/FrHmDHQ7YCkrgcsU= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750212701; a=rsa-sha256; cv=none; b=3TZNvhWnBA/0LtZjuE8XgF7r0SwcIjz9DqnrfoeO00b+nyd1zkomowJeB3anO094tI8kXb GXYFoFIIxAonGiHj6tN1a/y77enJS3ldcxkj9pBx2aGmkLkIXRS5E6RwannOoDEl0KpYcW 9xqM9C33hvU8CcnDVNI4kQylJeENxAY= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=LxoCme5z; spf=pass (imf12.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.176 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-lj1-f176.google.com with SMTP id 38308e7fff4ca-32addf54a01so66305241fa.3 for ; Tue, 17 Jun 2025 19:11:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750212699; x=1750817499; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=zHAOHa3W5FppAF2bP4Z945KCWTGy+r+Tavq5xMn89hc=; b=LxoCme5zI+WcvKgSsol+97mO+Zx3GG025M7xPQSV/PiZkIOigZsw6zfIdVZ7X32XLF qjIzbl3oaFhJtXcFRWYDYzuO2CLkL+PCr4BKdblNeY24+hu+hE6/VGoYOh/S2XE4OuJB +gGfthh+ofwIE10J7vKYSpSjAF6m/wBEmQt/N6G0Prm2pwvfq2YiQeQQieZzQco8l69E RcZ1CerkDS0LP7Kdp6UXCK40sLcMSgBOjkagi+EKdOdzvGtmSD4VYKEUGFlnHzmQHnnJ sSr3GRE2M7uFf/BjiaVtwk0QQmTCa8b+rI3bUitZkBoOkZhVR8Y0OkMo7BhlEOTW95Ns Ce+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750212699; x=1750817499; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=zHAOHa3W5FppAF2bP4Z945KCWTGy+r+Tavq5xMn89hc=; b=R5sKKKIAqyhGG4MwPHT05simArxe1NOsphwk+Uh3/jwEBiFa92Phe0KmxJqV2KrAO5 NEyeUWqDdp/t8ESNpq84/5xWBV4gWJV+S70Uunbi8EniXmLnLI+9MOgib00zE5g7walF BvAINRbMsvb6zcQvesQw6z0i/7/bkcQl7PZAcdjYBzy+A2sa1dI/xuXT4rKNFIGvR3QW PW2R8lMIV1wK579+G9t9G/6CAlcED7t7mQdXIdrIZxXAiGGGEOuU8fEfqJH3tEN/9QOI KwBt/YSCTJaVUEHhJbcfKVKXR++fqBxbpn1octrBPkuDKt1RUwE6h2ozi5uAEibhT8Nk qTQQ== X-Gm-Message-State: AOJu0Yy5qnmGdxeWSFOML93n7jQQxy6LtH/1zzLL9rVg7xMABOTK8vyj 2aoP3jcrjzz7HRxB/58QLe+tI/xZwuP53mUgsB/AHa6kJYQ9s3DDQp8i7xuDqAX6U+y8lHrUg7H IM4Qi9RQ8A+c5xkhmn4+raDdtz4ORwoM= X-Gm-Gg: ASbGncsaBNo53oZGnPRd1bTq8LT4oeNoIsp7YiZMzumIAwiQOQ74uMu0PQfqID2DLyC Do3o7eAQj6g4Smanf3Q3LE1OAv1SjbrJEJp1+dE1couxmvvO5hpANysNsvZsJsbS8WHr4gIKQXz 8Vcw9bBL4Qhma6czObv8Df/QZ01eNqycn8TXkkaHNMBic= X-Google-Smtp-Source: AGHT+IEWmjNCjVIII2C4kehsPINz+G10W880sadxhkKf0w+OjGqLSdLsl1+0Ja4C/dNMdsJy7vnBtzqP6Qm34MCkvdA= X-Received: by 2002:a05:651c:e03:b0:31e:9d54:62ec with SMTP id 38308e7fff4ca-32b4a610f99mr24777831fa.31.1750212698780; Tue, 17 Jun 2025 19:11:38 -0700 (PDT) MIME-Version: 1.0 References: <20250617183503.10527-1-ryncsn@gmail.com> <20250617183503.10527-2-ryncsn@gmail.com> <20250617155857.589c3e700b06af7dff085166@linux-foundation.org> In-Reply-To: <20250617155857.589c3e700b06af7dff085166@linux-foundation.org> From: Kairui Song Date: Wed, 18 Jun 2025 10:11:21 +0800 X-Gm-Features: AX0GCFv1v9kZZHvMtOV0WpsR0JWyiXoh-ZY6ZwkEGekrLicOy00i1MR-9buMnvQ Message-ID: Subject: Re: [PATCH 1/4] mm/shmem, swap: improve cached mTHP handling and fix potential hung To: Andrew Morton Cc: linux-mm@kvack.org, Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: F38A140014 X-Stat-Signature: 1n1brokib3k7wwrkqh3ds4zqxsr3dwzr X-Rspam-User: X-HE-Tag: 1750212700-413181 X-HE-Meta: U2FsdGVkX19RXs2MNnjzD4KUBLU1S20Rl7P/uXAwXPgRwh+JkpN36/gdysMdS80rsbyJrPklauWo1WX19qAIHbEu6q6uK87HTX9mqcTegFYqIe3d1kwHeEXe0K1SW6pn5RPnifI0ul9lHnrStRVXF/SX5ZkkK96LZIHxvgYv3IKgvdmNJZEWUwFdo5MPYV/gna6JYA4KOdDxonJNizjprAHH3EXs6QE6dDeruzzo5pRh5Ogw9c+QDk8HQfC7w4wdsSfqY7SDckQ5SbCDd3DfUC1CWeAE+zOlRHhENuKeBA39mHTR10UHt6GQGqJOA+4nd5W7BvUzOYG9BbHE10rE16vxDqbrvtJXprxPs81J1I7YfPkoybRObnWNxxibAFDO1kFjMo9v++9CnqkBf04Z0JpeXvMdHqQpnNDwQCDJxxc96XFO5WQ+hMhux4I6lIS9EURxkjRcLy0KgorI/7j71ZWP3ogSTnHyIXpbqC4etvBx6POXKEAVilpFjlUAkE3lltUj21AmqOWwaxUXGiB5Kl/7o4lxpXdVv3RA3eRMAAZBlghrmEuISRQME/zcPs5iW2+K7BetMqemtXYHLXqIR1ReLWSr3cAYBsEh/oIbgFqjVZPrNqjN2PAz+wrU9UNy7qQxpp1KwrJQ+SGtNwbI0f67Im5YUuRI/Sqojj37WYF1jdGvot6IizTFCdXn2qe70FGxqNU5uMOlHJbVARS07ULbfKbylTk94JEK0C9kNt4nwRoUFIYnw0K6dpr5tW7gGzQ4kOXZ+jjvx8wPF91/BEDDusAoD70vKkZY7rNlaLb5c0vcNeAVSa87lLT+KzLypV3T6SiQ7rqpM/8HC8or8c9RvKiPFXHwpEnNib0OfJqr2oc63K2/8QgWm3fhPinN8Ku4Smvz5S2UybgUHKGB4r3a0gbmzlFm2kjwFC+vwQjBj4ofqncMSKDg0zkBFGqRLdkUKMioa49+Z5/+AUP xr2UjqXP bHlzVnVDXhjpL42t2NO5tdZiRueu0dSdBN9w8TcUdDPYjjN3jihV6PPRCpud1RfcY1DKC0eInG5i43W2dMqS2sbwayD3Jzl69r863kjHygkf2u68aURFFrDwLX3BMbA05spTiLqWKOVsNhfdCVqbxpoBRaRKlnF/mQAMZ9iSHwAi3vVHpva26tr6hTD25OD7MLBZts6in2raZXxPZV7d9cuywu29D/vyUmhQq6bFuvO4Qpf3vFOV0Ue09G+mdaiFgRohhL0EqjkRxKX2V524CrlwOkjhxRak5uGVtygCd+fbMNgrN6MTKlmZjEoEvUqWn4H9FD8uMbUYPGYkKBCqR3xvXOSZkO/E7Aedgbyr3RYgC7vwzThB7YcCntrL4yOdhqRSXtTWp1xFcCB6bPNtA58KeVGcyc2PXCCtrkrMbinC7aQLvntp1/3nvW6hZ6dcvlxUn X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Jun 18, 2025 at 6:58=E2=80=AFAM Andrew Morton wrote: > > On Wed, 18 Jun 2025 02:35:00 +0800 Kairui Song wrote: > > > From: Kairui Song > > > > The current swap-in code assumes that, when a swap entry in shmem > > mapping is order 0, its cached folios (if present) must be order 0 > > too, which turns out not always correct. > > > > The problem is shmem_split_large_entry is called before verifying the > > folio will eventually be swapped in, one possible race is: > > > > CPU1 CPU2 > > shmem_swapin_folio > > /* swap in of order > 0 swap entry S1 */ > > folio =3D swap_cache_get_folio > > /* folio =3D NULL */ > > order =3D xa_get_order > > /* order > 0 */ > > folio =3D shmem_swap_alloc_folio > > /* mTHP alloc failure, folio =3D NULL */ > > <... Interrupted ...> > > shmem_swapin_folio > > /* S1 is swapped in */ > > shmem_writeout > > /* S1 is swapped out, folio cached */ > > shmem_split_large_entry(..., S1) > > /* S1 is split, but the folio covering it has order > 0 now */ > > > > Now any following swapin of S1 will hang: `xa_get_order` returns 0, > > and folio lookup will return a folio with order > 0. The > > `xa_get_order(&mapping->i_pages, index) !=3D folio_order(folio)` will > > always return false causing swap-in to return -EEXIST. > > > > And this looks fragile. So fix this up by allowing seeing a larger foli= o > > in swap cache, and check the whole shmem mapping range covered by the > > swapin have the right swap value upon inserting the folio. And drop > > the redundant tree walks before the insertion. > > > > This will actually improve the performance, as it avoided two redundant > > Xarray tree walks in the hot path, and the only side effect is that in > > the failure path, shmem may redundantly reallocate a few folios > > causing temporary slight memory pressure. > > > > And worth noting, it may seems the order and value check before > > inserting might help reducing the lock contention, which is not true. > > The swap cache layer ensures raced swapin will either see a swap cache > > folio or failed to do a swapin (we have SWAP_HAS_CACHE bit even if > > swap cache is bypassed), so holding the folio lock and checking the > > folio flag is already good enough for avoiding the lock contention. > > The chance that a folio passes the swap entry value check but the > > shmem mapping slot has changed should be very low. > > > > Cc: stable@vger.kernel.org > > Fixes: 058313515d5a ("mm: shmem: fix potential data corruption during s= hmem swapin") > > Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") > > The Fixes: tells -stable maintainers (and others) which kernel versions > need the fix. So having two Fixes: against different kernel versions is > very confusing! Are we recommending that kernels which contain > 809bc86517cc but not 058313515d5a be patched? 809bc86517cc introduced mTHP support for shmem but it's buggy, and 058313515d5a tried to fix that, which is also buggy, I thought this could help people to backport this. I think keeping either is OK, I'll keep 809bc86517cc then, any branch having 809bc86517cc should already have 058313515d5a backported.