From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A87A1C8303C for ; Tue, 8 Jul 2025 06:10:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4B2A16B016E; Tue, 8 Jul 2025 02:10:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 48A886B016F; Tue, 8 Jul 2025 02:10:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C79D6B0170; Tue, 8 Jul 2025 02:10:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 2D0D66B016E for ; Tue, 8 Jul 2025 02:10:47 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B9E9E58581 for ; Tue, 8 Jul 2025 06:10:46 +0000 (UTC) X-FDA: 83640073692.20.E5067CB Received: from out199-11.us.a.mail.aliyun.com (out199-11.us.a.mail.aliyun.com [47.90.199.11]) by imf12.hostedemail.com (Postfix) with ESMTP id A766240011 for ; Tue, 8 Jul 2025 06:10:44 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=ovB3O7Rg; spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 47.90.199.11 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751955045; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=YgHE/eu3q6DUgpWG6PC5BtQe1oXPiqfjAelN5RELJb4=; b=AZF/8XrRacavpO/S7UNWDw4bav/eySZXXcY7TtzCaj38LIesVAvJZ0JWdlC4er33JXP9/V jt3/+tWnd5XIESpULJcTbq8ArpTKEAO2SRhacRFWTFBkQcI934E7rkJMZ4YOgyB0xErkwE iMWZQDSSxZ7X9aghe3DoDNw28oKcWSE= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=linux.alibaba.com header.s=default header.b=ovB3O7Rg; spf=pass (imf12.hostedemail.com: domain of baolin.wang@linux.alibaba.com designates 47.90.199.11 as permitted sender) smtp.mailfrom=baolin.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=linux.alibaba.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751955045; a=rsa-sha256; cv=none; b=ysyrqlydUPlaTRs8JI8t54xWd8bX8z0jkgo+Nk5tuGpRSxP81g9dByP3EPPEEpAGUcQOjS 1vMd350jNV0DQ9QDT/64f3Xlc/htrodQNuHJtoTO0YFHIQvr3hYJP50BBZBw6wb1LJcyeZ gU6yofC88pQNlrqu7QF2m98nrqSpIEE= DKIM-Signature:v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.alibaba.com; s=default; t=1751954440; h=Message-ID:Date:MIME-Version:Subject:To:From:Content-Type; bh=YgHE/eu3q6DUgpWG6PC5BtQe1oXPiqfjAelN5RELJb4=; b=ovB3O7Rg/+4Av+gk7jz/2lNTbfp57S5jb8OwqV80Q5g6WDMnnMFF03jgOP5OB2I99HGIUFRJaFNPfYHQugFb/g93GwS85pk4cbBRdd+JD+Cj9p+bTZGAZVS4B2iqnavAuUijOssnMr76/gBwCrJFYUa+kg8nnTU8YouEHEJMSYk= Received: from 30.74.144.119(mailfrom:baolin.wang@linux.alibaba.com fp:SMTPD_---0WiKolvu_1751954438 cluster:ay36) by smtp.aliyun-inc.com; Tue, 08 Jul 2025 14:00:39 +0800 Message-ID: <49feb9d6-36d0-414b-b56e-29dc106596d8@linux.alibaba.com> Date: Tue, 8 Jul 2025 14:00:37 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH v4 5/9] mm/shmem, swap: avoid false positive swap cache lookup To: Kairui Song Cc: linux-mm@kvack.org, Andrew Morton , Hugh Dickins , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org References: <20250704181748.63181-1-ryncsn@gmail.com> <20250704181748.63181-6-ryncsn@gmail.com> <17d23ed0-3b12-42a5-a5de-994f570b1bca@linux.alibaba.com> From: Baolin Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: A766240011 X-Stat-Signature: pwt87ho4ba15nyr6dxif4bek4x8oxtsx X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1751955044-8439 X-HE-Meta: U2FsdGVkX19LRBKsJCMPyHy25Tkk7ubFHNawzVxjPo4NmCDl9NUcBrJuny/O3ZU+04/1qSvJUw1NTsbdZiQgs1YOVDAbmuNZq/0vik6zJQuJNz7Gkk1UpeUfCV6i4fNkbjG0uSIEEyF84n3JyFJldL5IeG02piJAnCdv55KLQy1UhOtK528GaW847YXj4J6oqBAxk59skzBmI/1l8z/Lmn+rh++Y/NHc3Hs23PC3j8juApxODSKLSvHRB9gIOxlhRR5G8XeRpcDcXRDKKNIUfYWWOjRGXeXkIIDScsqxp6nrM1N7x3BLWJnKYf4oKtTk6NIvYuc2h2XBMgNTenMuMchlcbVN4kYAh8ViriwKwy95vZuPScMhsy9D0iwL6M7+IosWXzxtKZiKbD5drXlR91XXBkaG26hz61sgAmS5CUh8v1LMLqFhfXOV8G77Avy+IMli4uBi9G9NO/A3y84NRgmiEou85WjRixUk9Ra1xrWDeoVBzYPnHvevUXO+2Esb7Hn2VSOzIPJ7JOSKQdq7FBCSuZMNtwNu2po11Lk7zBnFWprXRzeQBCPkFOkJDP0CT1yWjYz70t7A8+q1oeM93YPTRuraKwyBh0WopwwJfekhVIpfb0N7lvZgVPHTuT77cyJZC4ed2pwyf56t0RQCrQY2nwXrqec1AclrHOruwV7dJ/dF7YydD81VSaoCX7ISEhvo0ksJ5fFeRUyfsy274ZIO6MT2siBoPbkI5CAyhDdPIOLUHsLdSNG4ql3OA9j7r+g271KNTVjknj5IlFVEBq1yZxfXTrwHpRuHLdv5C0dfyFMVkkMBeLDWKBiGCcg7nQRukKRQmmJ6ks7MZqmDjqyFKlqMztonMa0cuPro4SIOLIx5q95OZgLYj+k66Ol1DiF3bGxVWiqUq8CaU2G1ydUouQ+BC/e/69GfYuS8T04g4VXYmrfFqwqun5zwJSY+eT3pyuZE+G2XcIFzcwS 91kkwrQ+ h1mVShto7FHcgnRmLCVqI/FiFifof0NNQJXwXJV7KMUBIGAsIPxGggcIh4+mRWiOjgYfJXrAGQ7OXJkaM2PHF9+fQR6WGOyg5377RAhq2n5rssgsYEvJymVQbDlzAdBi1xTt1qFVb7oUSzUXvVBJc5xwZz8nyP/Sqf2joSSR6wuvJ1dfW8/sU9xeBMHCzZLtYF90P27Dl3rCMhG9vt/3qTI3VPGvwmXF7HFGgmXcUtlmv250tHH6Cr88hQEFt66WGTOI50QlUypeDDS7Z/NMz8pHN2jFH8xNGx5eij9Lgc/MxAC+miXnw+NrbNhMfsue2hb+BPxlIlYceQcg7IQKD5B9fBfo27aWqcJFBnubLc4ljLWTb5sRThsYcq8NsUsNn0sBp2LWR8f6RWucn6ZcBS9Bp4A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2025/7/7 16:04, Kairui Song wrote: > On Mon, Jul 7, 2025 at 3:53 PM Baolin Wang > wrote: >> >> Hi Kairui, >> >> On 2025/7/5 02:17, Kairui Song wrote: >>> From: Kairui Song >>> >>> If a shmem read request's index points to the middle of a large swap >>> entry, shmem swap in will try the swap cache lookup using the large >>> swap entry's starting value (which is the first sub swap entry of this >>> large entry). This will lead to false positive lookup results, if only >>> the first few swap entries are cached but the actual requested swap >>> entry pointed by index is uncached. This is not a rare event as swap >>> readahead always try to cache order 0 folios when possible. >>> >>> Currently, shmem will do a large entry split when it occurs, aborts >>> due to a mismatching folio swap value, then retry the swapin from >>> the beginning, which is a waste of CPU and adds wrong info to >>> the readahead statistics. >>> >>> This can be optimized easily by doing the lookup using the right >>> swap entry value. >>> >>> Signed-off-by: Kairui Song >>> --- >>> mm/shmem.c | 31 +++++++++++++++---------------- >>> 1 file changed, 15 insertions(+), 16 deletions(-) >>> >>> diff --git a/mm/shmem.c b/mm/shmem.c >>> index 217264315842..2ab214e2771c 100644 >>> --- a/mm/shmem.c >>> +++ b/mm/shmem.c >>> @@ -2274,14 +2274,15 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, >>> pgoff_t offset; >>> >>> VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); >>> - swap = index_entry = radix_to_swp_entry(*foliop); >>> + index_entry = radix_to_swp_entry(*foliop); >>> + swap = index_entry; >>> *foliop = NULL; >>> >>> - if (is_poisoned_swp_entry(swap)) >>> + if (is_poisoned_swp_entry(index_entry)) >>> return -EIO; >>> >>> - si = get_swap_device(swap); >>> - order = shmem_confirm_swap(mapping, index, swap); >>> + si = get_swap_device(index_entry); >>> + order = shmem_confirm_swap(mapping, index, index_entry); >>> if (unlikely(!si)) { >>> if (order < 0) >>> return -EEXIST; >>> @@ -2293,6 +2294,12 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, >>> return -EEXIST; >>> } >>> >>> + /* index may point to the middle of a large entry, get the sub entry */ >>> + if (order) { >>> + offset = index - round_down(index, 1 << order); >>> + swap = swp_entry(swp_type(swap), swp_offset(swap) + offset); >>> + } >>> + >>> /* Look it up and read it in.. */ >>> folio = swap_cache_get_folio(swap, NULL, 0); >> >> Please drop this patch, which will cause a swapin fault dead loop. >> >> Assume an order-4 shmem folio has been swapped out, and the swap cache >> holds this order-4 folio (assuming index == 0, swap.val == 0x4000). >> >> During swapin, if the index is 1, and the recalculation of the swap >> value here will result in 'swap.val == 0x4001'. This will cause the >> subsequent 'folio->swap.val != swap.val' check to fail, continuously >> triggering a dead-loop swapin fault, ultimately causing the CPU to hang. >> > > Oh, thanks for catching that. > > Clearly I wasn't thinking carefully enough on this. The problem will > be gone if we calculate the `swap.val` based on folio_order and not > split_order, which is currently done in patch 8. OK. I saw patch 8. After patch 8, the logic seems correct. > Previously there were only 4 patches so I never expected this > problem... I can try to organize the patch order again. I was hoping > they could be merged as one patch, some designs are supposed to work > together so splitting the patch may cause intermediate problems like > this. Again, please do not combine different changes into one huge patch, which is _really_ hard to review and discuss. Please split your patches properly and ensure each patch has been tested. > Perhaps you can help have a look at later patches, if we can just > merge them into one? eg. merge or move patch 8 into this. Or maybe I > need to move this patch later. It seems that patch 5 depends on the cleanup in patch 8. If there's no better way to split them, I suggest merging patch 5 into patch 8. > The performance / object size / stack usage improvements are > shown in the commit message.