From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4D361C7115D for ; Thu, 19 Jun 2025 17:56:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DC16F6B009B; Thu, 19 Jun 2025 13:56:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D4D096B009C; Thu, 19 Jun 2025 13:56:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BEC886B009D; Thu, 19 Jun 2025 13:56:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id A5ADE6B009B for ; Thu, 19 Jun 2025 13:56:05 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 524DF1D69E6 for ; Thu, 19 Jun 2025 17:56:05 +0000 (UTC) X-FDA: 83572903890.05.096566D Received: from mail-pl1-f170.google.com (mail-pl1-f170.google.com [209.85.214.170]) by imf15.hostedemail.com (Postfix) with ESMTP id 5B3EFA0005 for ; Thu, 19 Jun 2025 17:56:03 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Trzzcgdh; spf=pass (imf15.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750355763; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZEwjuK0AylTZIVquFebv6qfh92gO6eHGaxgW/dfiLAE=; b=sI5hWmPMzlp6ihmoZiyjgGRc7OooCc5bZI2Va71OXgeeN3vG/lQU9h3NyHy6xAzGCXn2DK nTWgpgTWkrjTw2WriFgWVo5rqf1hShm67+tj6fdEbb6+rNom5n7jSOiX7Tm6icTLD0z+HJ sggdvGgNNCivRULu0rBAuaYdSBKSFZ8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=Trzzcgdh; spf=pass (imf15.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.214.170 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750355763; a=rsa-sha256; cv=none; b=XirUbQ0KaBQONFb8r4OZHDS7TnsOB4f8clI6qaRsXDOhcf1wX6RU5HQSl9mChlcRHoUvK7 jSPkCw47RTLfqyDn0nwaDeh0cJ4B9a2M5sbOth/v43GxD50JjVtmel64wPC/V1Lfj7PD5T R/leI7CguD3yuiGJiz9AMp89aEcqa6c= Received: by mail-pl1-f170.google.com with SMTP id d9443c01a7336-234bfe37cccso14046975ad.0 for ; Thu, 19 Jun 2025 10:56:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1750355761; x=1750960561; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=ZEwjuK0AylTZIVquFebv6qfh92gO6eHGaxgW/dfiLAE=; b=TrzzcgdhYj2Y03dWAdZFAff9IyrnVqDArT2Yll7cZMEOORRGtUcvFlYUvgXDvNjac/ 57MhM1x7cQRNEWPs3UCAWgrOp6PwlERCeKC+a2/1+O4VCyYrboxh27MEmLt1fGqDIvMS wZmty8w6zKkqCmJNZJHj97LTKmX3QRL/sk+Xm9IXsfCQ6wUs6hi483y45QjOkv78vzt/ YxgL4DmE4JPiXY9x7AKdMfilqLhCU+knWq2XEw8vVOmlNOmMkbg4zAR6/eBGqyXGiawG 1PkSq7bwtUSIx9Z1exvhbiCHnjYsK44Tn6BGhYkisYzCAp5tT7EAKlBcw18h+oTZSFdi m5ug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750355761; x=1750960561; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ZEwjuK0AylTZIVquFebv6qfh92gO6eHGaxgW/dfiLAE=; b=MnmrpjSe5etPA4TpM5+OxOl1VeLgsmCSsZ5UQT4vVPbfJNIHS8OM4XG/AWLbV+sPVF 0Boh2YxyC9GK6Prtt3Rae7fT8ODeuEXKX4qLI8nMr7WaKZESp/iS34LjTMG3ZfKsbJVB YkLjIeodOZx4HYuk48DocrqLlZIZDIJJd+iuqZxArWO9MZPizxcBuMHqru7Ns5goq+HL M63uBxXYd3fU46MEnEJH94MWsJ7EAWJjcBkUWcKlqhpj4Mfz23psmHfmxdwG6ujJpNcM y5Ahz2BlOs7wdoxznfuroJcGfh/QLiyhA01ONZ0mSejEhXjvmEFZPoh8q64Cz18g1dOE bNyg== X-Gm-Message-State: AOJu0YzlY5Vg0jhDNkq0bXRxdfYkr9D+oqVHOGRgFSQtXZPGmDGE8MoG OLYSUDZGl7frdSNHtZQHh1DPYrOhqQilzhbkfTK5Lyob4p1NXpt/qzGeIyB+d0yu8QM= X-Gm-Gg: ASbGncs3ptbgtQT1boO39z4iQ5mF4H4cIIGFKTZh9HXTeQx1GyL1LCi7rqf7vVRi9De 28oOnhSseSLvOeQ5tujeJhVFCmRVKTXdONaK2hxuRD/I0qqHH5rDZOEYu0UZ+yrUKnYG4GmpIAY x5HnXjL0gvnwVB+bKUNHH3Kd6ikyIClSo1L5nV+VsgE3qON1Wlb7KRPbupJEdyW5Ex45bne/Gov 9TTP/E2MB0fsdhuzNUT/BxCNBPHifjVIpdYQMjbEVeA5felxVFJ3iIGjW3gsE0nH0inWzV8Ijsx 2MtBbwEuwU00br59YtXNwtNn8gndEKAFlKuynDO6YyldQwjsXQJKzauMspCZUBjCRPVueAcX18h wyorDNVmiDAD2AcLQzw== X-Google-Smtp-Source: AGHT+IE3cWG2htDcPElbGlnt86OWiMjDrxw9xS9eH9eh+XwRtY+S8BaNpOKQxNLRO9LApOYSIdscwg== X-Received: by 2002:a17:903:3d8d:b0:22f:c19c:810c with SMTP id d9443c01a7336-2366b19ec24mr235491245ad.51.1750355761011; Thu, 19 Jun 2025 10:56:01 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-237d83efa44sm255215ad.77.2025.06.19.10.55.57 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 19 Jun 2025 10:56:00 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v2 4/4] mm/shmem, swap: avoid false positive swap cache lookup Date: Fri, 20 Jun 2025 01:55:38 +0800 Message-ID: <20250619175538.15799-5-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250619175538.15799-1-ryncsn@gmail.com> References: <20250619175538.15799-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam11 X-Rspam-User: X-Rspamd-Queue-Id: 5B3EFA0005 X-Stat-Signature: nxjps88aa9ycx698qwezfec3x3uyqwd5 X-HE-Tag: 1750355763-972429 X-HE-Meta: U2FsdGVkX1+9DTArfZEhRTJap3qfSkbgyrj8gtSU569a/smIJj3fuQZkD3+kuVrDhK0zOhX3rf3N1hRYY2tVN9YUv+cmg/Jlvyv16lzFbs14IGh7NYnw9pN8STVjDjDbq9bJkZrBFdv0ZBZqZvkMOUYoXekPYiThUswcAI5VY4T2GIzQ/5xzfYiTJxCOMve4TnA0adICmKUfZ2l0ctSE3USQid7vgACt8BQ//rUwmLtRn700WCrGowZNFAEGqv17mFYkNHnzLUE8B/3RCewt3mrVN3z9BHhdNuOsEsKZ1tNMDlDwaS9crkA+jnleZ2/YrKyIFpYgeF+EEl/iT8QNd05k0G9hAy8Z6clxSISa0Tna/tY0Lk5ifmjywb1UiaOqMaozA0T1vXyEdHKT2FjF/ncjJCVBR3LjHC8WSKvwVGh76hZFbVPlelrF/rfL+6Slroa2GZ+cUL3HcwCU+RmYM06cp5Uv+O769LpXAELTsr5UCZkNYnwIijPnwhc4Gw/tIAuubf2yJuah+RwEmAA8pYsoo2yR3uhHEfl+e8U9Y3uxxjuh7IlXinhR1TssTHjz6HSCxkd70Y54bvWF7Y3YWFxG68LrF1yCdtWXPfRq1ONBH4t1XErHzeOWmeH9ev5UXWEBtQ1+WUE5QVNb7l8gr0FwC2On5YDZrWnirbt1GMrRD4EkOexUVeKfvOzD3FiDAaNunTQMfzEPDRlWDOOhhUrXDcOns6um9X9XrX6RGPHa+8lAaOTmregoE2gQPuXIYoNyIzNJYEP1YxYmMC4SxPeKuCTL7ZRbqzLZ3OqcAPC9LZHxl+Q0f0QFe+ansid4C1GaNiRG+FfMhpv5LKKK7dxSZMS9XfLn7hsroAjlIeSJa/E9wnNyPz6q9yWDAs1m2lQiACOCJlE7LfxZfvG8UdwiuILSZR3Q5Dg40gYV+fZe4Rnwt7tADnCR+QOyrNHUf1sV0TVBrhY0ziuBv1A oh9DeKig EywjzMerOR/9hWJQKO4wnlHrt2eewcPOFeLTeFV0DynBwtufrGkUyrtFA/bG2oeOXYqdDngVnS8Tchhq9U+0w2utIImw5M2iOTpcNq7r+7Re0gz2GpX0Nk9oNHNXUq756T9URLBQum9H/hJTISJlrKyToh2tyFhi/ejUNf5SJJkDyoJGJB8E1GyS326kQoiV+XC7yIPnTrpqqPRx1xDSRgYd6fvsLOBOwSNDlURho00M6w8Wc0fpvXlRmEFfKXhLS8dUad66/mukIlMYgg3mfthVSrCMGSeYiGqL+ZVFQYsPoAK0wc+FoxwvVpK2ZEJ4IVuMQXBrbuwxm+P1TPfrhK3Fav6IhJFQ37q3vDvcBFgo8i1e/1sE3Pre7HcXDK4t/ktxSrlKrP+JlRdjcaPHzfmc6QSF8OaQEx9+xhkeLQKypZpYbfdM4q8VR9a34op/m9SEc/yAF/t9E771ZOiJ7JIOUnx9hO5fg6Y3vCWi6LRqj+8ix0Zx+qjjjsYQsrIOJUVrr5hiVjit9ckR+/3mae9sjsg2CddmkoU1yYE5CBZEQNRCoYV0JgPP+pALmdiSzMWjw+DfxxtEgNDspBLZHhYkyMw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song If the shmem read request's index points to the middle of a large swap entry, shmem swap in does the swap cache lookup use the large swap entry's starting value (the first sub swap entry of this large entry). This will lead to false positive lookup result if only the first few swap entries are cached, but the requested swap entry pointed by index is uncached. Currently shmem will do a large entry split then retry the swapin from beginning, which is a waste of CPU and fragile. Handle this correctly. Also add some sanity checks to help understand the code and ensure things won't go wrong. Signed-off-by: Kairui Song Reviewed-by: Kemeng Shi --- mm/shmem.c | 61 ++++++++++++++++++++++++++---------------------------- 1 file changed, 29 insertions(+), 32 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index 721f5aa68572..128b92486f2e 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1977,12 +1977,12 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, static struct folio *shmem_swapin_direct(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - swp_entry_t entry, int *order, gfp_t gfp) + swp_entry_t index_entry, swp_entry_t swap, + int *order, gfp_t gfp) { struct shmem_inode_info *info = SHMEM_I(inode); int nr_pages = 1 << *order; struct folio *new; - pgoff_t offset; void *shadow; /* @@ -2003,13 +2003,11 @@ static struct folio *shmem_swapin_direct(struct inode *inode, */ if ((vma && userfaultfd_armed(vma)) || !zswap_never_enabled() || - non_swapcache_batch(entry, nr_pages) != nr_pages) { - offset = index - round_down(index, nr_pages); - entry = swp_entry(swp_type(entry), - swp_offset(entry) + offset); + non_swapcache_batch(index_entry, nr_pages) != nr_pages) { *order = 0; nr_pages = 1; } else { + swap.val = index_entry.val; gfp_t huge_gfp = vma_thp_gfp_mask(vma); gfp = limit_gfp_mask(huge_gfp, gfp); @@ -2021,7 +2019,7 @@ static struct folio *shmem_swapin_direct(struct inode *inode, return ERR_PTR(-ENOMEM); if (mem_cgroup_swapin_charge_folio(new, vma ? vma->vm_mm : NULL, - gfp, entry)) { + gfp, swap)) { folio_put(new); return ERR_PTR(-ENOMEM); } @@ -2036,17 +2034,17 @@ static struct folio *shmem_swapin_direct(struct inode *inode, * In this case, shmem_add_to_page_cache() will help identify the * concurrent swapin and return -EEXIST. */ - if (swapcache_prepare(entry, nr_pages)) { + if (swapcache_prepare(swap, nr_pages)) { folio_put(new); return ERR_PTR(-EEXIST); } __folio_set_locked(new); __folio_set_swapbacked(new); - new->swap = entry; + new->swap = swap; - memcg1_swapin(entry, nr_pages); - shadow = get_shadow_from_swap_cache(entry); + memcg1_swapin(swap, nr_pages); + shadow = get_shadow_from_swap_cache(swap); if (shadow) workingset_refault(new, shadow); folio_add_lru(new); @@ -2278,20 +2276,21 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct mm_struct *fault_mm = vma ? vma->vm_mm : NULL; struct shmem_inode_info *info = SHMEM_I(inode); int error, nr_pages, order, swap_order; + swp_entry_t swap, index_entry; struct swap_info_struct *si; struct folio *folio = NULL; bool skip_swapcache = false; - swp_entry_t swap; + pgoff_t offset; VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); - swap = radix_to_swp_entry(*foliop); + index_entry = radix_to_swp_entry(*foliop); *foliop = NULL; - if (is_poisoned_swp_entry(swap)) + if (is_poisoned_swp_entry(index_entry)) return -EIO; - si = get_swap_device(swap); - order = shmem_confirm_swap(mapping, index, swap); + si = get_swap_device(index_entry); + order = shmem_confirm_swap(mapping, index, index_entry); if (unlikely(!si)) { if (order < 0) return -EEXIST; @@ -2303,7 +2302,9 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, return -EEXIST; } - /* Look it up and read it in.. */ + /* @index may points to the middle of a large entry, get the real swap value first */ + offset = index - round_down(index, 1 << order); + swap.val = index_entry.val + offset; folio = swap_cache_get_folio(swap, NULL, 0); if (!folio) { /* Or update major stats only when swapin succeeds?? */ @@ -2315,7 +2316,7 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, /* Try direct mTHP swapin bypassing swap cache and readahead */ if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { swap_order = order; - folio = shmem_swapin_direct(inode, vma, index, + folio = shmem_swapin_direct(inode, vma, index, index_entry, swap, &swap_order, gfp); if (!IS_ERR(folio)) { skip_swapcache = true; @@ -2338,28 +2339,25 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, } } alloced: + swap_order = folio_order(folio); + nr_pages = folio_nr_pages(folio); + + /* The swap-in should cover both @swap and @index */ + swap.val = round_down(swap.val, nr_pages); + VM_WARN_ON_ONCE(swap.val > index_entry.val + offset); + VM_WARN_ON_ONCE(swap.val + nr_pages <= index_entry.val + offset); + /* * We need to split an existing large entry if swapin brought in a * smaller folio due to various of reasons. - * - * And worth noting there is a special case: if there is a smaller - * cached folio that covers @swap, but not @index (it only covers - * first few sub entries of the large entry, but @index points to - * later parts), the swap cache lookup will still see this folio, - * And we need to split the large entry here. Later checks will fail, - * as it can't satisfy the swap requirement, and we will retry - * the swapin from beginning. */ - swap_order = folio_order(folio); + index = round_down(index, nr_pages); if (order > swap_order) { - error = shmem_split_swap_entry(inode, index, swap, gfp); + error = shmem_split_swap_entry(inode, index, index_entry, gfp); if (error) goto failed_nolock; } - index = round_down(index, 1 << swap_order); - swap.val = round_down(swap.val, 1 << swap_order); - /* We have to do this with folio locked to prevent races */ folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || @@ -2372,7 +2370,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, goto failed; } folio_wait_writeback(folio); - nr_pages = folio_nr_pages(folio); /* * Some architectures may have to restore extra metadata to the -- 2.50.0