From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6EA7C8303C for ; Mon, 7 Jul 2025 08:05:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4EB9B8D000D; Mon, 7 Jul 2025 04:05:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 49AE28D0002; Mon, 7 Jul 2025 04:05:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 362518D000D; Mon, 7 Jul 2025 04:05:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 034E28D0002 for ; Mon, 7 Jul 2025 04:05:24 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id CB8BB16040D for ; Mon, 7 Jul 2025 08:05:23 +0000 (UTC) X-FDA: 83636733726.27.9FC8909 Received: from mail-lj1-f174.google.com (mail-lj1-f174.google.com [209.85.208.174]) by imf22.hostedemail.com (Postfix) with ESMTP id E03AFC0013 for ; Mon, 7 Jul 2025 08:05:21 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZYjLom4E; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751875522; a=rsa-sha256; cv=none; b=bCiV9I3j/LchZfAtitaZP92dnxh/6HLtZdZQXeTntudI6VPbg2ku5hSzyr6lgNTlXnPHDk CD0NFEDlhaDpgZCgSx+12Xldf07i5KQpGunLLhFkLne8ZB4jU9wBMR81y/i82nlGlyB4ds Sv8/NZ073hHamVciwTIrgrAaqWZnjow= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZYjLom4E; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf22.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.208.174 as permitted sender) smtp.mailfrom=ryncsn@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751875522; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z88uHAjPEKGSJNP4tbcHU+4G3qfjX+M6dkD9zUP3er0=; b=gNP5qqpWXSLo4OUhKDue3I254JwZrPVUVkAwW09K5WLncdwr1QULwjPDYCVxJnwQrFf3zN lJFTmL/z50YLdp+8nmPl1PDfXpBV71fpagWQuUN2M5qTBQ3dWU2j8HXZ2t7H9/ngY+da9k V8O88Dw2+NiWQb/NR1sHR/gxENLeyHQ= Received: by mail-lj1-f174.google.com with SMTP id 38308e7fff4ca-32f1df5703aso10790211fa.3 for ; Mon, 07 Jul 2025 01:05:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751875520; x=1752480320; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Z88uHAjPEKGSJNP4tbcHU+4G3qfjX+M6dkD9zUP3er0=; b=ZYjLom4EH+TVtfP989OUxYtyqsARQkb/frzPox/aB04qTyHx+a+kou5Nurs5SZQ7yw 9SfBY6ItgH50Q8vwRTfXNuvjLLI85cV+V1KccYfQCW0K7kqUlXdopAW9kNzHQmQ3D90V vny7lTwrp+dFNXtxCiwv1P9FG79jypWdry1y+TQYw+wruCueqq0aOwtnqxnRkQQR9eb+ TGaiGmMyUx7OpIos+VzTpQqmJQj1ssv+IiE2jdPBKQtMLcBs7HlJaOH5xrSUGnhiNC7N 4xEac6FREyBfD5KAm2l0IPET2eqCtFORMi+rYdlyLjfWowo4OIIQOt7MQM1CZ30J5VMB Ac7Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751875520; x=1752480320; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z88uHAjPEKGSJNP4tbcHU+4G3qfjX+M6dkD9zUP3er0=; b=cifX1YRWHcLFzjAZ9AImFwWofLsg+JrSrxpVa+AqjfjSDJ0/CqPYniyj0kXLRDObtG 2fuYyaCytzESHK05HzmnhRyHj0TXyPq7f7wwdjhNgGMjquv6+h1kc2RMpNK7Z7WARnfS ONyXuOFQcwvPvXtgFgt2miX7zIl28sn1Ya8juHYw9h+MbEqHzcUN5KDR482OAWf2k7HV O3i29lOXiakJaSYEcBbUR0y9EjNCN3Eb29FOP2yqYGf89fTpVHkzjt5HEF30Ttk0IXW3 +c55+7/xKP+ziiSe+pTSXbukzq6VE511ie5haRgjXaaMHRyOWjiA4PY/NV4p+jCMQFDc Osig== X-Gm-Message-State: AOJu0YyH5jdGR3amVQBrnmMc1N/NYnpOzwaQ7XtKSZiftrqftfnU2jgx mQ7dOO93yN1xGPY3MXNHKEcWvD/Eenf8gfcFT4YlMsZF/sE0plUEp8w1bqI98RSR8RiIlBEsbAo 3XccacWXtuJTVe8cJog7gzPJED0iG5Ic= X-Gm-Gg: ASbGnctkVM6Z0HgQjVL7Jz4NOyli2tICvUkiEUE7SdyP1M8CwzR4VVgdUw9okEpeJuF EVywScnaOoiAu1wttPM16qurmT3drhDc5qIW6BM7XdKl++tNPF5gJoGBiDCgCvzKTMGHyRJTQpf zwfBqMw8MB8aFDYudhrnMc3I/U6EXe2ZbNm8n18sShqng= X-Google-Smtp-Source: AGHT+IEHo61DDf9mV8ifWr0exAFpT+KhKvtPmoUhloI2HSV/oBv6aYB5J+kMJQIoHB6jnoxDo2YBODKl+Cu5Y97uYbQ= X-Received: by 2002:a05:651c:f10:b0:32b:53b1:c8ab with SMTP id 38308e7fff4ca-32f19b574fbmr22931251fa.22.1751875519494; Mon, 07 Jul 2025 01:05:19 -0700 (PDT) MIME-Version: 1.0 References: <20250704181748.63181-1-ryncsn@gmail.com> <20250704181748.63181-6-ryncsn@gmail.com> <17d23ed0-3b12-42a5-a5de-994f570b1bca@linux.alibaba.com> In-Reply-To: <17d23ed0-3b12-42a5-a5de-994f570b1bca@linux.alibaba.com> From: Kairui Song Date: Mon, 7 Jul 2025 16:04:42 +0800 X-Gm-Features: Ac12FXx1ZC0tuvMHDAyKyf5Q-oqZ0i4OBwcmj1fyIP-rSOSoM_Lx3Q9GixcYF1E Message-ID: Subject: Re: [PATCH v4 5/9] mm/shmem, swap: avoid false positive swap cache lookup To: Baolin Wang Cc: linux-mm@kvack.org, Andrew Morton , Hugh Dickins , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: E03AFC0013 X-Stat-Signature: fmqk8t9mgg8qeatrdp1anuny9cay4jk8 X-Rspam-User: X-HE-Tag: 1751875521-48558 X-HE-Meta: U2FsdGVkX1+lh9lNh1XHdcLQaAU8ByCv2bqvxGTipH17XKZLX5QOgAxoTWw5AuuCWPuKBQ88BCdY8qWCOHVXxCZ/1a4xN8GH/8OM8204YV2RiXwZjI0/s989954Wr/nbMS+KXCbb8D796zcN5OHt7rsleQh3qvvUhxTTL0z2GXjpUHz7NHjFjOV5z//vmb7xxzRQV6lTyniVjQ2hfOwIIRLXPTmygEc61kOUyZL9HpE0if3DwRdIGgZUPwi5IhWjnEyZ2HNBmvr03StyN9pXJyv96dAQWpoklv5wVLMlNNlaWQ00+ykYF2pc7ZxFSX3+rb0u/50R8VE13zc0u7JxEpKnYpiLKqnpSDH04bAiUzNZKxiCA1ti7x50hCi9HfQViMRscmDrY53BsREeQMDipasvaFX/z/HJzw9oaextZ/tmk2HYipp+U9nahv5yMchE5Ca1gJiTRe98F1muXIX4R61NLOB1jVzSiXYEdOHXhX45MBdp8RRRIza+Gr3RrZ71HDUKK3WDb73vpOIN9WHc7Fo4A9tD2NDoLlkgyhWUHojQDWuClNb6dCysADGRdqyrrh/RgavM5LetovBWyMVWksbuzo9QOPna5x0S/Nw+0XFVMvmBXFetwJrfH/IMdsI6Olvyuxu6Oy1NKRB816iWD291M/5TYwwsrw0eR5ewtvP+F5hGgcH41flv/RzgiCMxgG1n09ED0kXbWzWqRgdboYSQJaMG7E7liEeQ6VqFWe+rJcxjpG+T/8DAmB2Z0/TJFzEJasc+aER9Ys461eqhq78hjnxiJApa4Nua+a/9qs5mT2/zi0APZEqqyaQnaRfsYJtIHSPm2pz0sMfc6fwnVaaUyY2lB357wOhCTw+Oz2aNShjOTKluKO1g4mYc+xHirP/fN0Qr98MSZ98bqHFMGTCRqKAcvDNfcQ3fm3n3sFEVY4M6tePNjFUJ+YMuns7xME+TCL4+NSNStuO+tBr NlA+bNKN dV9rdeHfGBwy7vW9fvsuUT5YS/4/xhgzFLZP0AuqRfpWzA4w6tGQ/btQNjBKV0oxF9NwnRC9vE46F+fF7HOqearG6dLpePxDHNUYy+7wALiplChWPzyfQY+0UWIP67KP6Xl9aKCqqonpMaKwsB/iaXIGgujOBmTiPqaKVcxi5/z0q13+7tLRqGYcZQaUMgqD6m/T5YY9/ouoNXeabnBGD2R1cz+4zFag5+2XGAvjhyfHHqFXgoDiCMaWxfDaLM8p9NX/uMLEX0sgnvABo2xt/nd58KmAOcXT5llHmG3gvppYIzCYtNWhauxSFuO7mQ4Q85K0Yli3PzETrTNYMTfNZdpsFDhGGoFRpmcM6o3kMb1c6sbXUqyx4/uC5gE0bBlJ7cNspc0kOtay26pULNfk3nJvinI4wKof0jpIV3BHEl24yCoTQ7TefKkJ0ZgCQekqBpStt X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 7, 2025 at 3:53=E2=80=AFPM Baolin Wang wrote: > > Hi Kairui, > > On 2025/7/5 02:17, Kairui Song wrote: > > From: Kairui Song > > > > If a shmem read request's index points to the middle of a large swap > > entry, shmem swap in will try the swap cache lookup using the large > > swap entry's starting value (which is the first sub swap entry of this > > large entry). This will lead to false positive lookup results, if only > > the first few swap entries are cached but the actual requested swap > > entry pointed by index is uncached. This is not a rare event as swap > > readahead always try to cache order 0 folios when possible. > > > > Currently, shmem will do a large entry split when it occurs, aborts > > due to a mismatching folio swap value, then retry the swapin from > > the beginning, which is a waste of CPU and adds wrong info to > > the readahead statistics. > > > > This can be optimized easily by doing the lookup using the right > > swap entry value. > > > > Signed-off-by: Kairui Song > > --- > > mm/shmem.c | 31 +++++++++++++++---------------- > > 1 file changed, 15 insertions(+), 16 deletions(-) > > > > diff --git a/mm/shmem.c b/mm/shmem.c > > index 217264315842..2ab214e2771c 100644 > > --- a/mm/shmem.c > > +++ b/mm/shmem.c > > @@ -2274,14 +2274,15 @@ static int shmem_swapin_folio(struct inode *ino= de, pgoff_t index, > > pgoff_t offset; > > > > VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); > > - swap =3D index_entry =3D radix_to_swp_entry(*foliop); > > + index_entry =3D radix_to_swp_entry(*foliop); > > + swap =3D index_entry; > > *foliop =3D NULL; > > > > - if (is_poisoned_swp_entry(swap)) > > + if (is_poisoned_swp_entry(index_entry)) > > return -EIO; > > > > - si =3D get_swap_device(swap); > > - order =3D shmem_confirm_swap(mapping, index, swap); > > + si =3D get_swap_device(index_entry); > > + order =3D shmem_confirm_swap(mapping, index, index_entry); > > if (unlikely(!si)) { > > if (order < 0) > > return -EEXIST; > > @@ -2293,6 +2294,12 @@ static int shmem_swapin_folio(struct inode *inod= e, pgoff_t index, > > return -EEXIST; > > } > > > > + /* index may point to the middle of a large entry, get the sub en= try */ > > + if (order) { > > + offset =3D index - round_down(index, 1 << order); > > + swap =3D swp_entry(swp_type(swap), swp_offset(swap) + off= set); > > + } > > + > > /* Look it up and read it in.. */ > > folio =3D swap_cache_get_folio(swap, NULL, 0); > > Please drop this patch, which will cause a swapin fault dead loop. > > Assume an order-4 shmem folio has been swapped out, and the swap cache > holds this order-4 folio (assuming index =3D=3D 0, swap.val =3D=3D 0x4000= ). > > During swapin, if the index is 1, and the recalculation of the swap > value here will result in 'swap.val =3D=3D 0x4001'. This will cause the > subsequent 'folio->swap.val !=3D swap.val' check to fail, continuously > triggering a dead-loop swapin fault, ultimately causing the CPU to hang. > Oh, thanks for catching that. Clearly I wasn't thinking carefully enough on this. The problem will be gone if we calculate the `swap.val` based on folio_order and not split_order, which is currently done in patch 8. Previously there were only 4 patches so I never expected this problem... I can try to organize the patch order again. I was hoping they could be merged as one patch, some designs are supposed to work together so splitting the patch may cause intermediate problems like this. Perhaps you can help have a look at later patches, if we can just merge them into one? eg. merge or move patch 8 into this. Or maybe I need to move this patch later. The performance / object size / stack usage improvements are shown in the commit message.