From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8988BC7EE31 for ; Fri, 27 Jun 2025 06:23:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2745A8D0001; Fri, 27 Jun 2025 02:23:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1FD8D6B00AD; Fri, 27 Jun 2025 02:23:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0C4098D0001; Fri, 27 Jun 2025 02:23:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id EBD0D6B00A4 for ; Fri, 27 Jun 2025 02:23:11 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id B6D541223A6 for ; Fri, 27 Jun 2025 06:23:11 +0000 (UTC) X-FDA: 83600188182.03.254DA47 Received: from mail-pf1-f180.google.com (mail-pf1-f180.google.com [209.85.210.180]) by imf28.hostedemail.com (Postfix) with ESMTP id D76EBC0002 for ; Fri, 27 Jun 2025 06:23:09 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aNWH2lAk; spf=pass (imf28.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1751005389; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nL0w5kzVkX1kWqwcuxi9i82Fs+bCrE4ZYjkCcwdAB2I=; b=aQMHnS5KP4PGrUOu+QwzuJs/tq8v4Ykl5lrPcDkaur0UC9dZOrNWaJgo7a/CKIVL0NB92r MeoXSJdPShpitMnvdCKlhilSr3Eqh90G88iUAu/CQWbJI9AG73QIy6Mqt5/soCNKqkC2uY 3potffygzVmIroV0kvWKG0JmbyQlZ10= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1751005389; a=rsa-sha256; cv=none; b=PkKTRk/sQ/wlPw7O1N8x9FckWnaYNtkERpvXH7OLQUDnmHPLNlrPRrOfH5+TC31Xbxd9Qp wYjEOuN/Adj1umeBcGFVIHmebsvmkDJrx3ev/b1M5V57ej46uGA7aML2BNdkuUH0pSjuB6 Wmjif7CcoWQ3QG6Smx1NF/yMvbkGs7k= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aNWH2lAk; spf=pass (imf28.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.180 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-pf1-f180.google.com with SMTP id d2e1a72fcca58-742c3d06de3so2156813b3a.0 for ; Thu, 26 Jun 2025 23:23:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1751005388; x=1751610188; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=nL0w5kzVkX1kWqwcuxi9i82Fs+bCrE4ZYjkCcwdAB2I=; b=aNWH2lAk+p8bWohqPQVZ/E9VPFwXi5IscYdHYPztmuBgxfZifiMw/OdiEwrEBwmfKI dPBrFkwUttCNtIfacgYDcP+sEHDWc3xDLQYjfCTTbjgfAFNVAF/eEWazZkneU+CsHI4D 3QnwPTgNN5HtP0de8nz6qHlCZFifpBLLfNTphwI/7ZMuI9bpWiSH3qN9tbrzqdnmIx+S 3ik4hXbzIGvMn361VLsksDPduE8W5QIEkIjBAMAaNySiS7XLY7d4xJ4CbKsKd0DrEmeA 6ITkjPx31p3+qNYpKZfzMhG2w8LDUuNEz+KlZw0hnjIL0bgE3BQa6sh6N2vP7gLfUdw+ 1Asw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1751005388; x=1751610188; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=nL0w5kzVkX1kWqwcuxi9i82Fs+bCrE4ZYjkCcwdAB2I=; b=e+Wb/kHeAgx67ro8Yl5vi0RWMltosnJCkN5JNNGyZ3zp1+cK7pGg9cy3EAeaxcA980 fJaJws3FGD0dTJ/IEHd1TiNkKjdRgpRurfY6GABedPb5pYaXcjmmIf8URx9O9UUTOaFw q3ITjWEtPT+E6OoDBt9qE6yNsoXVAA0MKuMbC+5lhYJsBaUbyFxsZfL6wB14kYA8kJrZ XAPBw0yCmoECc1JZTEiyP6Uv8auPj31S0iUiC9kZXSYXv1wmG6FadNsnvw9NzM7sFkcu QZV4fvxU+poiEptCJgOn+yQvh4xtY6yAP9CkLdxCFzF9zQVrIQRh/6aEzVEbGTUENP9/ tZsw== X-Gm-Message-State: AOJu0YwEfSKqF7QG6+t4poA23uUvbULRamQVdw0K2gFgdyIZy8g193Gp kgD9sg1tyW7tIHevD5s5vGA4kz5BYC2jCkPMCNQ7vc9WDNT164uRv714QDDtwL36vAo= X-Gm-Gg: ASbGncul3422lkCV1LOuoVxucSGpjPdY3urwrMX9EjqzUDthTVPtaFOHkvqnHRGaJf8 UoFcXi83Pzkbrf5RLIHo2RUXYrYR9yvhi1qvFCrKlO3/+K/v1FpU09wPHAmMlcNuUbb49G03SKT 5bXLRD7yAmnKC6yFoHeEm5zL9y1QZmsczmCc/f6HYq/idAqygIkkzElp+fqPb4AyKoASKS6JWRo ngPn5lvVesZnDLLhDgWdviKd2nxvghyUWZXVIMq6f0Q3MUZ6zDnAd7yNRas1rBDnGBNtcYz9dix oJPp2w5BtAYqukmwl4bKiC9QDWa28h2qUaSOPmC+BHxSqxXcPq/vnSpumQ5XGRA3u2B8uGR4gnC NtT5mJJqTweE= X-Google-Smtp-Source: AGHT+IHB6BJrzXE0fbYKz+3Hy49o+JIqzEHntdXhY43+JBAbIZmgdJw3vrmu91ru2+wBU4vePpDHUA== X-Received: by 2002:a05:6a00:14cd:b0:748:e38d:fed4 with SMTP id d2e1a72fcca58-74af6ef30e0mr2742498b3a.6.1751005387610; Thu, 26 Jun 2025 23:23:07 -0700 (PDT) Received: from KASONG-MC4 ([43.132.141.21]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-74af5409cb6sm1456212b3a.23.2025.06.26.23.23.04 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Thu, 26 Jun 2025 23:23:06 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Hugh Dickins , Baolin Wang , Matthew Wilcox , Kemeng Shi , Chris Li , Nhat Pham , Baoquan He , Barry Song , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH v3 7/7] mm/shmem, swap: avoid false positive swap cache lookup Date: Fri, 27 Jun 2025 14:20:20 +0800 Message-ID: <20250627062020.534-8-ryncsn@gmail.com> X-Mailer: git-send-email 2.50.0 In-Reply-To: <20250627062020.534-1-ryncsn@gmail.com> References: <20250627062020.534-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: D76EBC0002 X-Rspam-User: X-Rspamd-Server: rspam06 X-Stat-Signature: bq8331xxbz7ey4hm81jbgaqnu483kjp3 X-HE-Tag: 1751005389-210920 X-HE-Meta: U2FsdGVkX18+PpO/uZBUYba0tZLRJ40+f40m5YRnLuf0Is8sBrNHm2/2A/2J9gjEdM0AExcVkh73bNe8ERvlu7vv9YymLWplm0Ad/XLKXZk0tPv4zPDGCAj5k0ud3K/8L+rUn5KC3oDCPgcYv8wXjCGdQ8fs9Z4CPDzju0CoSPJreU5K/0rixH/aVnEayQF6sGXLYsebHeb25be2Q4bgwC2abkNAt20UBBjQsx9XNv48DZQGCR3Z8s6eRgOjnchYiYR/kl0nlcK1A44KTGpjHcur6qCro4RcPuO9GnzVsieL5HvitEXnNgxpp/fWSEnfyP3rKQk4+w2EuuaBauP5ZTDPGSyoGkWEugQLia9Kvv4Yckjnf0BvNwQSanSiGOSI7MzftBdsNy8SFqpaSfZsVUwGPqXcjE7kQKzo44Stv3pUF2QCMeYs2V5pEIT7FtA/izR7ZNBKPy+wc5FmJbluQjmqKAcuk9nnjJagjNsXFcaTEsCAYhPNpK5JKc6xWjE4wuosBPzN01mgR7hFDGe2mZ4U1UCKYsmnKT8YTGhwxkSdwDww882Py5soh5ku6PjinGOLXw4XOTLfJTl+cTNdAI7cdMwNaK5ofyhOHNsImo/4kZE6MflXsJvzNgPRxYud6CzipBU1ipwLEbT3ZX6ZvSKAbisr1ParFItCaeg9W7BI7C/8y2CRr1CIxx6KuIosXBdeh722dJiKQRB1FRG37YLAtn9M8ahsZ2VogiDOEgcM5QtqB+YDY5OwKUvNkqwU+FOHqnzG5y2vxcCD9i9OYRppt95zv8c5eFYAPVqA8XzMB0FQWLjB/xmS0MrL81Xl23P/qE5RO2KGcxVO0IYyI3V4IX5Y0M7rMvqjnDIqOFmx2vU7hc+kHxCrkJrTIDcZLwN/g21sMQEDz6pPmE9QLohNUyECyL77b0bbp6756A3AsQhTwHxi7gQMDJ1xq5BKlRqaqM+LJrX6OneDDGA j8oek8X/ UsalVqqNn4C46fLvWCm1ylPv8FKQ4JwRYvrzdczlfzaJ3qtzPakhxWHmoDkLy+jSnpwkR6K6Dcm4m1Rb9jsep7UOrBOLpnyyF/aYrMl5rNyseI7USKh+Zww7f4zql5M9UXBqB732OSj0ZfpaTI/pcws4Y114D1jNlMKJlZh3zMwctu1iZM7Skl/z7HRjRz0hQDN9yS15Hz3TtZW+JMpcf5s3hr3TiyywW6RjNQepzQJ+RG4blOTt03NQfCOM3/jkZ+Nde2+65fXcLzT6x7f1+LUQwSYqg06IQU8XsULfyKa2TWO8cfqzCoA5azffohvyKxeti/yuMly7gFSOA4CxxBaALKrD6Aqf2TJArDR+2ekjrx5FldRLchwuGdWj8JOlVuYji2IM9YbeotZDGJvc1NEN24CkuvI5Iq8rfyxhcN230PR3sIgUSRD953Hn+ELcHTgzKgMx5B1A1XCexUatWSOaFqiNRLIUwTC4JzSKYg4G+lKJE5n0cJUx4zq/WDkqluhLpN6DTuxPiiybZgM8hrb56J20VTPEXdweA/op3aD+Wmq5VbWMt7QF19yjLDqHR9OAO6H7Dv6jvbrnamKYJLkparxeW01nuLr6JtSuz0QT1yffJ+No156mHqKn2FAA+S0pueMQsGJ6Xg7w= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song If the shmem read request's index points to the middle of a large swap entry, shmem swap in does the swap cache lookup use the large swap entry's starting value (the first sub swap entry of this large entry). This will lead to false positive lookup result if only the first few swap entries are cached, but the requested swap entry pointed by index is uncached. Currently shmem will do a large entry split then retry the swapin from beginning, which is a waste of CPU and fragile. Handle this correctly. Also add some sanity checks to help understand the code and ensure things won't go wrong. Signed-off-by: Kairui Song --- mm/shmem.c | 60 +++++++++++++++++++++++++++++------------------------- 1 file changed, 32 insertions(+), 28 deletions(-) diff --git a/mm/shmem.c b/mm/shmem.c index ea9a105ded5d..9341c51c3d10 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -1977,14 +1977,19 @@ static struct folio *shmem_alloc_and_add_folio(struct vm_fault *vmf, static struct folio *shmem_swapin_direct(struct inode *inode, struct vm_area_struct *vma, pgoff_t index, - swp_entry_t entry, int order, gfp_t gfp) + swp_entry_t index_entry, swp_entry_t swap, + int order, gfp_t gfp) { struct shmem_inode_info *info = SHMEM_I(inode); - int nr_pages = 1 << order; struct folio *new; - pgoff_t offset; + swp_entry_t entry; gfp_t swap_gfp; void *shadow; + int nr_pages; + + /* Prefer aligned THP swapin */ + entry.val = index_entry.val; + nr_pages = 1 << order; /* * We have arrived here because our zones are constrained, so don't @@ -2011,6 +2016,7 @@ static struct folio *shmem_swapin_direct(struct inode *inode, swap_gfp = limit_gfp_mask(vma_thp_gfp_mask(vma), gfp); } } + retry: new = shmem_alloc_folio(swap_gfp, order, info, index); if (!new) { @@ -2056,11 +2062,10 @@ static struct folio *shmem_swapin_direct(struct inode *inode, if (!order) return new; /* High order swapin failed, fallback to order 0 and retry */ - order = 0; - nr_pages = 1; + entry.val = swap.val; swap_gfp = gfp; - offset = index - round_down(index, nr_pages); - entry = swp_entry(swp_type(entry), swp_offset(entry) + offset); + nr_pages = 1; + order = 0; goto retry; } @@ -2288,20 +2293,21 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, struct mm_struct *fault_mm = vma ? vma->vm_mm : NULL; struct shmem_inode_info *info = SHMEM_I(inode); int error, nr_pages, order, swap_order; + swp_entry_t swap, index_entry; struct swap_info_struct *si; struct folio *folio = NULL; bool skip_swapcache = false; - swp_entry_t swap; + pgoff_t offset; VM_BUG_ON(!*foliop || !xa_is_value(*foliop)); - swap = radix_to_swp_entry(*foliop); + index_entry = radix_to_swp_entry(*foliop); *foliop = NULL; - if (is_poisoned_swp_entry(swap)) + if (is_poisoned_swp_entry(index_entry)) return -EIO; - si = get_swap_device(swap); - order = shmem_confirm_swap(mapping, index, swap); + si = get_swap_device(index_entry); + order = shmem_confirm_swap(mapping, index, index_entry); if (unlikely(!si)) { if (order < 0) return -EEXIST; @@ -2313,13 +2319,15 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, return -EEXIST; } - /* Look it up and read it in.. */ + /* @index may points to the middle of a large entry, get the real swap value first */ + offset = index - round_down(index, 1 << order); + swap.val = index_entry.val + offset; folio = swap_cache_get_folio(swap, NULL, 0); if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO)) { /* Direct mTHP swapin without swap cache or readahead */ folio = shmem_swapin_direct(inode, vma, index, - swap, order, gfp); + index_entry, swap, order, gfp); if (IS_ERR(folio)) { error = PTR_ERR(folio); folio = NULL; @@ -2341,28 +2349,25 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, count_memcg_event_mm(fault_mm, PGMAJFAULT); } } + + swap_order = folio_order(folio); + nr_pages = folio_nr_pages(folio); + /* The swap-in should cover both @swap and @index */ + swap.val = round_down(swap.val, nr_pages); + VM_WARN_ON_ONCE(swap.val > index_entry.val + offset); + VM_WARN_ON_ONCE(swap.val + nr_pages <= index_entry.val + offset); + /* * We need to split an existing large entry if swapin brought in a * smaller folio due to various of reasons. - * - * And worth noting there is a special case: if there is a smaller - * cached folio that covers @swap, but not @index (it only covers - * first few sub entries of the large entry, but @index points to - * later parts), the swap cache lookup will still see this folio, - * And we need to split the large entry here. Later checks will fail, - * as it can't satisfy the swap requirement, and we will retry - * the swapin from beginning. */ - swap_order = folio_order(folio); + index = round_down(index, nr_pages); if (order > swap_order) { - error = shmem_split_swap_entry(inode, index, swap, gfp); + error = shmem_split_swap_entry(inode, index, index_entry, gfp); if (error) goto failed_nolock; } - index = round_down(index, 1 << swap_order); - swap.val = round_down(swap.val, 1 << swap_order); - /* We have to do this with folio locked to prevent races */ folio_lock(folio); if ((!skip_swapcache && !folio_test_swapcache(folio)) || @@ -2375,7 +2380,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, goto failed; } folio_wait_writeback(folio); - nr_pages = folio_nr_pages(folio); /* * Some architectures may have to restore extra metadata to the -- 2.50.0