From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A9BD5C71136 for ; Tue, 17 Jun 2025 18:35:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 487CC6B009C; Tue, 17 Jun 2025 14:35:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 437416B00B3; Tue, 17 Jun 2025 14:35:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 326D26B00B4; Tue, 17 Jun 2025 14:35:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 22A9A6B009C for ; Tue, 17 Jun 2025 14:35:48 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 91C19101004 for ; Tue, 17 Jun 2025 18:35:47 +0000 (UTC) X-FDA: 83565746334.18.CDFBF99 Received: from mail-pj1-f45.google.com (mail-pj1-f45.google.com [209.85.216.45]) by imf08.hostedemail.com (Postfix) with ESMTP id BA76C160009 for ; Tue, 17 Jun 2025 18:35:45 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lKYJIum3; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750185345; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LgTulNr4jx8hCL9cMcrSknGVCL7uEZpC1WW1KTxxd18=; b=siY6GbBzKL9DB/KTzpGzjiFV9c5SdEZ4AnDPH7CayEYPfqN21EafLNWIu3VNj9VdMFUVo+ Q6JQ3SIbrE3ym5IJ7qj4Fyv09fWmBqo0HzrWBCVTxDr3zsZkDmTbo4fj0fWTKnYY/R5UAU 4sF7b292Sn6V/mE13ZcPBomcq1bcEtQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750185345; a=rsa-sha256; cv=none; b=xUwN2IcPGTVvW/Ji1ix04cimcIQPJ9QDdpAQVbiwjneUUCFQ9uVW6JXv6NHSDQOQNfpizP 17UuUSncR4081Xd5rCLS2ZZtIhBpWlXc8vfmc/RYSqLzKwgtTic1DLAFx2f9bsA/+fZ0Cg EddmyeJ4DXt7FBf2p/L2NeGwdUHKH+A= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=lKYJIum3; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.45 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pj1-f45.google.com with SMTP id 98e67ed59e1d1-311d5fdf1f0so5723422a91.1 for ; Tue, 17 Jun 2025 11:35:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750185344; x=1750790144; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=LgTulNr4jx8hCL9cMcrSknGVCL7uEZpC1WW1KTxxd18=; b=lKYJIum3S6adjBkUnYSsg965vBLAl+e9FfIfeeUsQRR+0rZXOrMr+HMKR0JvIDYPHM gosa4y6OlF6Q+JWcxNNEUVjcKx88vpNoEIyYb1BtPh+mLNzmco5gqOzX7N7ceYOFerzH QWqMnQXZ3gQd2YRSBneWcjYT7gzszvoYXcq93utZQemhL3vbNkLBbvHblOYLlvwKyLJ9 BoTxRbWUtb4MGdYVIX3XzOgkhgWPEVMk99mWLg3Km3+7QP4vDIvPtvEmDXGzrm4HPbKT UzRFzkS8VvedwZ7lUGmj2VRJfVmufWLh9MoehOQNeZ4+gRX3t22dlV0kLTCQnVBDkklq gn+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750185344; x=1750790144; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=LgTulNr4jx8hCL9cMcrSknGVCL7uEZpC1WW1KTxxd18=; b=lFNeB/MtKylO8wJvY3CgcH4Skw/oe93kQjZ+fwwNlXr1S3TP17Rl6bbSXGykFNq4eC T/SMBUcIaz92pJPXEfcP9XRVD76UJ5RvWSn3KEF8rJFhaj20gxbVIGnQFeh5fncYF6f8 VrRy6Gx3C9HypZnwIs9/SiQb4NA9+e9BCNCPoWUL4L4Mxm8T97eGR4oWsFdZ87LY7nUE HR4oVTMK5BcLMsSqtXk3Qz1QG651Numc8ZRyohRmXp7CDCwOBNu0O8IKB8ZAZDYNfBKR ovWRyQv7iMmB6a2HKY7scTBOqG0QSY1a6aVIkCUk8PZ1Bd1iOt6UbSZjaNv49EODH8xO KkBg== X-Gm-Message-State: AOJu0Yy0QVHWBZc0j1GZ2iv2m1apvEhKS541LkYbRUxGDFG8Hb33yFtw QdpYL5LsGgL35W1QwbZtVRlGpd/Jx+U7QQOxGD86HCNpwblBMBhoRLH3ewwDSiS5qlM= X-Gm-Gg: ASbGnctp3tGYiHc0HcX+dBVxNQXyiE04h786L+qLrVhSSMCcNKtcggZjZFS6At1BkrR djEyVjrKTSm5Dl/GZK5K4aDhbTRmYZJrsjdLg/6pqHElviWnhT5TMxOv56gaRkQ2PfgQ0gc01+2 JObsxYVyp+jacvWbjQk2iqnJZddtr1NZPabCK7jlC5VVeaYOKufssVHh0T7PnTFfuDWce0+2afd CP7NAugwAgQ1jeQi8RuCzCFQwwlas5i7hXjsNegW3q3bzREPon9Oot31A6/VTNkQwZEmqLvcEjM axjMcuEOA4+fbZ0oh8YQx6aShPwBt4fYEnC6P3QwKAf/xarJMCWCrkqDC3Ls7T7qZX2CiEic9qO SPVc2Sgg= X-Google-Smtp-Source: AGHT+IFpvL4LkuVFY2mtE2Qq5xqwAAinBHrY520uObdv06QZLbd46hL0BA94TNP5rdY3aaT6byR1Ww== X-Received: by 2002:a17:90b:2688:b0:313:2e69:8002 with SMTP id 98e67ed59e1d1-313f1daa7a9mr22767858a91.20.1750185343766; Tue, 17 Jun 2025 11:35:43 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.123.168]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-2365de781c7sm83753715ad.128.2025.06.17.11.35.38 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 17 Jun 2025 11:35:43 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 4/4] mm/shmem, swap: avoid false positive swap cache lookup Date: Wed, 18 Jun 2025 02:35:03 +0800 Message-ID: <20250617183503.10527-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250617183503.10527-1-ryncsn@gmail.com> References: <20250617183503.10527-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam03 X-Stat-Signature: m6gjhpp3ho95nezdfdko1qybosix3tny X-Rspam-User: X-Rspamd-Queue-Id: BA76C160009 X-HE-Tag: 1750185345-211534 X-HE-Meta: U2FsdGVkX19Qfbhdg5TAEttaZoIPAiFA9kgYQLNb7BXmvo/McwbgZ8UyeWH6xG8EFac8C9bmGUr1TJo/cfEaoGRZzpQgtfKfyLbyWYoCG5NdmOslIjb4/zxo3ZE+N09KaEje9YtCx1QK/NSFVjtr4BwryvFjaVHj3NOhrwL0ptHYrQnFgkCX2TpgaqQBpBDc0myENfg+B4G3xNnBcd/qmSuQ4bl7hPLfYYSENwpeSGpGTBP7MhLSeD3LWyMoyCNl3P4KKgtut/tYbJbJjOlqYs9AJUBw4IV6hSWt980QraANcMQkDUEAeEWHLyyoxXrEN5qZ/6Jom476tOT0p1oS7FiFKHAubZXJHXR3nVJPbFDT6VonVA1pJoFdbNQ3GPPxlPKsVh50iO4kjXFQjpU+ZIqVwbx6AAR8vorXASoEesXpbT5C0ikxxbdEcI52l+JPFE2eDksJswz7AF6DSFdr3dYDolKvsDz53vskW1qJ00ILhzXIMQUkdYWImnyGeYySc4WESFGjBqp8Umgn2hDTC92rSp5+N1LtQG/qaHeTO1hGIaq3FYQowdS3xJW8v/dXFHdPH76uGzVuIfPzdpm7op+18GhXagxnQSCjWDj2ZaK3b6g5tJr839b6r6xJ0IRzzaOR2ONhHOoHbGxQXQ3ZmnlHWX1fJaEkf5JI2QWXZKhyY2hdpb6XVcROEwxhW0jnO57ByQYnNGSigiQLk5M02vWT83PJJRV1S3PFnS8ZA0LPvq0wtc30z/QJmORMzckqNO0YOHH+ItnCERPIVECYmDcJ728DOCEKXcTm8ygOxYiDezfs4O8PfZqkdxo1ig+9/NcLZEg2vA0F+7vM0gx5KiVtzzUDWABSk1q394yEogKm7i1s++7QFtKO3zCy4XnoEQGC1WKHjqK3gtWq7QB+jK8uP+smBmFz2hBBbAX9ry6EDqLdgaPl66Rev1+kM7JVaXJYAYtzus0U5kSPaev REvWPkRR nO1wwruG92GCOWejA+ZlLv/90cbkttEQt2EHLbJiXNV4mjr4Pab99neRVOFVB0DYEP8O68YoDzV/QcZx/o2xnD3Y97yLomlqmw2mAK/ZIfzzVM47vfsVY2vEIS3k5+ETQfcppXpQVNcN83aBi3Nxy+zyKTSaZWpWB1lUJWO+LtPSjpGR9cMOHF2O5xu5M+czK0KR8XsFe+kQC33GLwT5aXThsjA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song If the shmem read request's index points to the middle of a large swap entry, shmem swap in does the swap cache lookup use the large swap entry's starting value (the first sub swap entry of this large entry). This will lead to false positive lookup result if only the first few swap entries are cached, but the requested swap entry pointed by index is uncached. Currently shmem will do a large entry split then retry the swapin from beginning, which is a waste of CPU and fragile. Handle this correctly. Also add some sanity checks to help understand the code and ensure things won't go wrong. Signed-off-by: Kairui Song --- mm/shmem.c | 61 ++++++++++++++++++++++++++---------------------------- 1 file changed, 29 insertions(+), 32 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 46dea2fa1b43..0bc30dafad90 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1977,12 +1977,12 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, static struct folio *shmem_swapin_direct(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - swp_entry_t entry, int *order, gfp_t gfp) + swp_entry_t swap_entry, swp_entry_t swap, + int *order, gfp_t gfp) { struct shmem_inode_info *info = SHMEM_I(inode); int nr_pages = 1 << *order; struct folio *new; - pgoff_t offset; void *shadow; /* @@ -2003,13 +2003,11 @@ static struct folio *shmem_swapin_direct(struct inode *inode, */ if ((vma && userfaultfd_armed(vma)) || !zswap_never_enabled() || - non_swapcache_batch(entry, nr_pages) != nr_pages) { - offset = index - round_down(index, nr_pages); - entry = swp_entry(swp_type(entry), - swp_offset(entry) + offset); + non_swapcache_batch(swap_entry, nr_pages) != nr_pages) { *order = 0; nr_pages = 1; } else { + swap.val = swap_entry.val; gfp_t huge_gfp = vma_thp_gfp_mask(vma); gfp = limit_gfp_mask(huge_gfp, gfp); @@ -2021,7 +2019,7 @@ static struct folio *shmem_swapin_direct(struct inode *inode, return ERR_PTR(-ENOMEM); if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL, - gfp, entry)) { + gfp, swap)) { folio_put(new); return ERR_PTR(-ENOMEM); } @@ -2036,17 +2034,17 @@ static struct folio *shmem_swapin_direct(struct inode *inode, * In this case, shmem_add_to_page_cache() will help identify the * concurrent swapin and return -EEXIST. */ - if (swapcache_prepare(entry, nr_pages)) { + if (swapcache_prepare(swap, nr_pages)) { folio_put(new); return ERR_PTR(-EEXIST); } __folio_set_locked(new); __folio_set_swapbacked(new); - new->swap = entry; + new->swap = swap; - memcg1_swapin(entry, nr_pages); - shadow = get_shadow_from_swap_cache(entry); + memcg1_swapin(swap, nr_pages); + shadow = get_shadow_from_swap_cache(swap); if (shadow) workingset_refault(new, shadow); folio_add_lru(new); @@ -2278,20 +2276,21 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct mm_struct *fault_mm = vma ? vma->vm_mm : NULL; struct shmem_inode_info *info = SHMEM_I(inode); int error, nr_pages, order, swap_order; + swp_entry_t swap, swap_entry; struct swap_info_struct *si; struct folio *folio = NULL; bool skip_swapcache = false; - swp_entry_t swap; + pgoff_t offset; VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); - swap = radix_to_swp_entry(*foliop); + swap_entry = radix_to_swp_entry(*foliop); *foliop = NULL; - if (is_poisoned_swp_entry(swap)) + if (is_poisoned_swp_entry(swap_entry)) return -EIO; - si = get_swap_device(swap); - order = shmem_swap_check_entry(mapping, index, swap); + si = get_swap_device(swap_entry); + order = shmem_swap_check_entry(mapping, index, swap_entry); if (unlikely(!si)) { if (order < 0) return -EEXIST; @@ -2303,7 +2302,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, return -EEXIST; } - /* Look it up and read it in.. */ + /* @index may points to the middle of a large entry, get the real swap value first */ + offset = index - round_down(index, 1 << order); + swap.val = swap_entry.val + offset; folio = swap_cache_get_folio(swap, NULL, 0); if (!folio) { /* Or update major stats only when swapin succeeds?? */ @@ -2315,7 +2316,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, /* Try direct mTHP swapin bypassing swap cache and readahead */ if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { swap_order = order; - folio = shmem_swapin_direct(inode, vma, index, + folio = shmem_swapin_direct(inode, vma, index, swap_entry, swap, &swap_order, gfp); if (!IS_ERR(folio)) { skip_swapcache = true; @@ -2338,28 +2339,25 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, } } alloced: + swap_order = folio_order(folio); + nr_pages = folio_nr_pages(folio); + + /* The swap-in should cover both @swap and @index */ + swap.val = round_down(swap.val, nr_pages); + VM_WARN_ON_ONCE(swap.val > swap_entry.val + offset); + VM_WARN_ON_ONCE(swap.val + nr_pages <= swap_entry.val + offset); + /* * We need to split an existing large entry if swapin brought in a * smaller folio due to various of reasons. - * - * And worth noting there is a special case: if there is a smaller - * cached folio that covers @swap, but not @index (it only covers - * first few sub entries of the large entry, but @index points to - * later parts), the swap cache lookup will still see this folio, - * And we need to split the large entry here. Later checks will fail, - * as it can't satisfy the swap requirement, and we will retry - * the swapin from beginning. */ - swap_order = folio_order(folio); + index = round_down(index, nr_pages); if (order > swap_order) { - error = shmem_split_swap_entry(inode, index, swap, gfp); + error = shmem_split_swap_entry(inode, index, swap_entry, gfp); if (error) goto failed_nolock; } - index = round_down(index, 1 << swap_order); - swap.val = round_down(swap.val, 1 << swap_order); - /* We have to do this with folio locked to prevent races */ folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || @@ -2372,7 +2370,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, goto failed; } folio_wait_writeback(folio); - nr_pages = folio_nr_pages(folio); /* * Some architectures may have to restore extra metadata to the -- 2.50.0