From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9B164C3ABDA for ; Wed, 14 May 2025 20:18:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0CAA56B0099; Wed, 14 May 2025 16:18:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 076C66B009A; Wed, 14 May 2025 16:18:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E0F0D6B009B; Wed, 14 May 2025 16:18:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id BB6FD6B0099 for ; Wed, 14 May 2025 16:18:08 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 9065312111B for ; Wed, 14 May 2025 20:18:08 +0000 (UTC) X-FDA: 83442625056.26.7A9FA8E Received: from mail-pj1-f50.google.com (mail-pj1-f50.google.com [209.85.216.50]) by imf04.hostedemail.com (Postfix) with ESMTP id AF0C04000F for ; Wed, 14 May 2025 20:18:06 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Uy//zIZ1"; spf=pass (imf04.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1747253886; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ixabOMMX0yk4n2W1E0UWyzlwsR6Q+qdPVypU0hJK/w8=; b=LJEMxW6SUjARMix0P69kuFu33u4P/2YhG1GbSYB4NLjKrPblb8QMC7m2XR4tNzUOgaV26d CnCcG6tf1AMswtFlZMqgWoRm7UQSf5uNSuQytMkFfUkCk0en4/Dzo9r5DUWYoD4rdq7G8j ncuzu6Y0lDFYbBhxpxLm+27Xo3BmDkU= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="Uy//zIZ1"; spf=pass (imf04.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.216.50 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1747253886; a=rsa-sha256; cv=none; b=Nf1JrMkZ/5Tv4135J5c14aTlpcD5kY9NTZouw0pgWw5bK53AY+UALH7mD69zczQqMCbCYH 0eh7z4WJ53F8QxQu9OToce4tfevtSSjz9Br7eb0Fp7VHOP/l2RlsmBax5XvYnE2m7285T2 nFtC85nioLegxKABa5qx7G9+Otzu/Hw= Received: by mail-pj1-f50.google.com with SMTP id 98e67ed59e1d1-30a93117e1bso313843a91.1 for ; Wed, 14 May 2025 13:18:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1747253885; x=1747858685; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:from:to:cc:subject :date:message-id:reply-to; bh=ixabOMMX0yk4n2W1E0UWyzlwsR6Q+qdPVypU0hJK/w8=; b=Uy//zIZ1k894kJYo9iILFKqhHafIUkmLn/3KJWg+Nq+4IJQzgUPPZXIigMLWqJgrGC u+EQjYVyxgSQTKPZoEFPwCYNSgGQD4fwDD6cTo3uqIU6d/llbFXbwBs4NqbclL5UZBW9 6MuVBZzEB5ERM1oePv7W5DSmQuyImvwixYzLBLh+ZciXzeEXe/u/oVUZr9rUWIdgfxoe XJRGxAyi4HhApcK0x6q4MERiNADg611/GwyGcEpUEfj+vFySpgLOhqWMawO4MGrklsWB SB+oayX/RHcqjaNppiaXaiee5/i3KjlnZTxlEqrfhuyqlX+UGvBLfJa4DvjkC+PSOtqz AITg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1747253885; x=1747858685; h=content-transfer-encoding:mime-version:reply-to:references :in-reply-to:message-id:date:subject:cc:to:from:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ixabOMMX0yk4n2W1E0UWyzlwsR6Q+qdPVypU0hJK/w8=; b=o7v7oWoqXnS6CBssMjCQ4n7DU6Cjb2/88Nti2UzGbX6kp/m96GB/Q/hLSl3o2MOP5v xMPZ+yY0mWHJJDPubQWkjW4D7TsN1twBwcW+W+glN6MU6dYoCANbOhNHJS0+FjqaTOLs kz8Yu1aJ6NDcK47xTa1TwdtIQ/cV2KBN3cPrbOZ5z5iWuaEp0qQ2Gt0NmPqyk84lfkll qEKr8y29DadVJ7J0tUVqRC5ro5hMfbJp1rM7x9+bI53pQP8kAlkY/hgdX7PI7MlmKarh QdVo+fQM8bhBNmQXMm+C2dbuzriRCy8bAL06gT3NS682QJJE5kw7iZQYWyy5uPoUfXc6 GS8g== X-Gm-Message-State: AOJu0YzTGdTciU4zJ0+vM5sNLeXMfKoTeN/JzeAsgC5mJsfTVcg5JCa+ omfHeIpI0FX974gO7ZFO6kJTnOrjzoNSW/TdVSquQ5fADWURNyLh42dPWYVOgNc= X-Gm-Gg: ASbGncvU6U9y3ItzSqgnqQFQDM9FdZW9rTdBhj+SJenucR1gduWqA2rXhLmHFOjku1c f0Mqu3PfpzxrBp4kJ0NVI1c35r2AP1/wsxK2k5DFLk0kxF4+GuKqzmmZQCziYszJfEsLZXRdxKM YsZ9AQQKykCRJkz2PLrhCyUyp7bGeCpmT+0RLFaFurxPuRykD8QifilC94LRjawGzmgBVx/Qvsu UptZ9bIHB1ZjDaNia3E/dJLnByM6GoaiBfr8cRn8OIxNVccJ9dDly4kGqP7QDJA56firB45GHyv BCoyBq+mf1KazD3uBYScM4BErS+jekA/+BBbGDx7zzHk2Y7yXtfW2gTejRcIAM35s+IuFW0w X-Google-Smtp-Source: AGHT+IGr5HzTLj0l5+TDyHxQfpYlvAcUzlwgK0aNpxveYTHvqgUePc1PWoBsve7OsVOdjaUr/KReqA== X-Received: by 2002:a17:90a:ec8d:b0:2ff:53ad:a0ec with SMTP id 98e67ed59e1d1-30e2e5d1a30mr6983977a91.21.1747253884833; Wed, 14 May 2025 13:18:04 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([101.32.222.185]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-30e33401934sm2003692a91.9.2025.05.14.13.18.00 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Wed, 14 May 2025 13:18:04 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , Hugh Dickins , Chris Li , David Hildenbrand , Yosry Ahmed , "Huang, Ying" , Nhat Pham , Johannes Weiner , Baolin Wang , Baoquan He , Barry Song , Kalesh Singh , Kemeng Shi , Tim Chen , Ryan Roberts , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH 05/28] mm, swap: sanitize swap cache lookup convention Date: Thu, 15 May 2025 04:17:05 +0800 Message-ID: <20250514201729.48420-6-ryncsn@gmail.com> X-Mailer: git-send-email 2.49.0 In-Reply-To: <20250514201729.48420-1-ryncsn@gmail.com> References: <20250514201729.48420-1-ryncsn@gmail.com> Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspam-User: X-Rspamd-Queue-Id: AF0C04000F X-Rspamd-Server: rspam09 X-Stat-Signature: n8azmzoifzia97odbnsrp3raesitfubi X-HE-Tag: 1747253886-209244 X-HE-Meta: U2FsdGVkX1/Lgm9F10UE+mgnIdigZpron0XkNKuwHxqg3p8W1sk0vn0qDRcaE2Dvey+BZOincih/Wkg5unE2CDvxLaC5nQ2w7AbV4h9Ol6frCx+SLANqqQ7cSd+Hi2KOglX/DiqtTqoJCvPrpHfVzvites/MigZ3cBeht3sULJF9HkLnLZrkHYYB/IUgD73rb1NTfEMY4TjEAogImNGz6O/Wh3RwwR9YKKtMkxW1HiT+iIl1uJlen/34h7aAa3+XYyWlpvBmV4bHQmD4GQyz6U7Odso5PGMxrCHOtUBMksCHWWbafaVNwrxyqr6dxA3xAqrr2I0sypi9ZgyaQ15LVx4fo2bQGFYN4+hORXl9+g9OOPj/cIKlkoQ9nLUlmTGqVxy1zcImxI9+329aofUqnZyIRG6BL8QpH8pms4B1cS0yg7oTIQqH/xzetomVUG5hOwvdYksH/tououDUFxNY4N/JBUPWaC4/rQaQsHxZCxweDUTQa+l9C6vB6uboW0UPrh+X0sN/iHk1pjEsDCEbxMYL44H7J46NtmHyZoEHY802ND8YxcxBZKbcEBXgocYw3bX7fbB/FwtKg7xiBU/J4eG99/znT1tatCqj8utI/NoeiC1KndngcdDz9yuZ4lX4tDUDM+4JACR6CGT4a2bU6Aqv1lpmmlCBx49R4nTSU9nsyRs7hH/RC7gbGFWoxhLcUGpGZTZxzGm+z0XjfauvQnRpNvF01jQJAj4BtgOSYak3aJo4RXLgw24AO41AhuZX+0yETUBB4A1MgecagJAfK3pm6FE9ZHaZv1jt60ssuC8XAG4WEqjNtr4VVJAxgIslf7C4JXAT/kE5pPI9bvrzSHKKPmTo24N61jprtNW3M59IW9XemsQYjxge08mGUXbsX6k5bRlb2GBnj15SrOzERtzF1JM2xILWz3AwzKpNy40t0tiwQtvEaybaHjraNWoty7MhR4TYthU9mK+Fx0U xbzQNTNH TC4JxZjct8CXERfZbCpdv9R/ZwIcVWZRTig3Bs8/CvFiK3U5R0ziiQXjGtBd3ys0oOlzf+eYNhHEhiwES80aC2lJCKhClGDrGa0w1FPDN+7qoUkL2hCKQu1xt99+Bv6b186Cd2GGO9qQGTc0breB243nJmoSS9n9T2XAyBavaoSBymHUov654dZt6gZ7kjfdxDv93a2zBe57omq551bZTudxNCEpDLnr176+SyO3fzbr2JYweCvT5qXeMrwS8+LQWs4oH96oZxm9rBF04Y92xm+xgfMcPPhawf2Q80X4Caj2i5DcKwsCQqmghd1IBeOhVw9UhqZxGNqxnCyE75fGI7/YUh/ti1lVYp8HgzFyblcFEUL2X7rjcYwSyt4If8LJGK8hx9ZPfXGZkm5NG0EpXbegc7EIwjFXwop00J/QVdKyiSXWFaqUJK0SSNHwIYDL2yYBugGZw4O+BgzKFwJfriGYmWT+kYph2LHpI33cZ1tpodsxJ/SCyeNE+udqbDCPfeFqMLgfGmUdyZ6GuAvi0ymKIppOVhBhuLsFm5GQRtmJ/yIOoRrhdq/rQ7tZbSLGkVDe1 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song Swap cache lookup is lock less, the returned folio could be invalidated any time before locking it. So the caller always have to lock and check the folio before use. Introduce a helper for swap cache folio checking, document this convention, and avoid touching the folio until the folio has been verified. And update all current users using this convention. Signed-off-by: Kairui Song --- mm/memory.c | 31 ++++++++++++++----------------- mm/shmem.c | 4 ++-- mm/swap.h | 21 +++++++++++++++++++++ mm/swap_state.c | 8 ++++++-- mm/swapfile.c | 10 ++++++++-- mm/userfaultfd.c | 4 ++++ 6 files changed, 55 insertions(+), 23 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 18b5a77a0a4b..254be0e88801 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4568,12 +4568,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) goto out; folio = swap_cache_get_folio(entry); - if (folio) { - swap_update_readahead(folio, vma, vmf->address); - page = folio_file_page(folio, swp_offset(entry)); - } swapcache = folio; - if (!folio) { if (data_race(si->flags & SWP_SYNCHRONOUS_IO) && __swap_count(entry) == 1) { @@ -4642,20 +4637,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) ret = VM_FAULT_MAJOR; count_vm_event(PGMAJFAULT); count_memcg_event_mm(vma->vm_mm, PGMAJFAULT); - page = folio_file_page(folio, swp_offset(entry)); - } else if (PageHWPoison(page)) { - /* - * hwpoisoned dirty swapcache pages are kept for killing - * owner processes (which may be unknown at hwpoison time) - */ - ret = VM_FAULT_HWPOISON; - goto out_release; } ret |= folio_lock_or_retry(folio, vmf); if (ret & VM_FAULT_RETRY) goto out_release; + page = folio_file_page(folio, swp_offset(entry)); if (swapcache) { /* * Make sure folio_free_swap() or swapoff did not release the @@ -4664,10 +4652,20 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) * swapcache, we need to check that the page's swap has not * changed. */ - if (unlikely(!folio_test_swapcache(folio) || - page_swap_entry(page).val != entry.val)) + if (!folio_swap_contains(folio, entry)) goto out_page; + if (PageHWPoison(page)) { + /* + * hwpoisoned dirty swapcache pages are kept for killing + * owner processes (which may be unknown at hwpoison time) + */ + ret = VM_FAULT_HWPOISON; + goto out_page; + } + + swap_update_readahead(folio, vma, vmf->address); + /* * KSM sometimes has to copy on read faults, for example, if * page->index of !PageKSM() pages would be nonlinear inside the @@ -4682,8 +4680,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) ret = VM_FAULT_HWPOISON; folio = swapcache; goto out_page; - } - if (folio != swapcache) + } else if (folio != swapcache) page = folio_page(folio, 0); /* diff --git a/mm/shmem.c b/mm/shmem.c index 01f29cb31c7a..43d9e3bf16f4 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2260,8 +2260,6 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, /* Look it up and read it in.. */ folio = swap_cache_get_folio(swap); - if (folio) - swap_update_readahead(folio, NULL, 0); order = xa_get_order(&mapping->i_pages, index); if (!folio) { bool fallback_order0 = false; @@ -2362,6 +2360,8 @@ static int shmem_swapin_folio(struct inode *inode, pgoff_t index, error = -EEXIST; goto unlock; } + if (!skip_swapcache) + swap_update_readahead(folio, NULL, 0); if (!folio_test_uptodate(folio)) { error = -EIO; goto failed; diff --git a/mm/swap.h b/mm/swap.h index e83109ad1456..34af06bf6fa4 100644 --- a/mm/swap.h +++ b/mm/swap.h @@ -50,6 +50,22 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry) return swp_offset(entry) & SWAP_ADDRESS_SPACE_MASK; } +/* + * Check if a folio still contains a swap entry, must be called after a + * swap cache lookup as the folio might have been invalidated while + * it's unlocked. + */ +static inline bool folio_swap_contains(struct folio *folio, swp_entry_t entry) +{ + pgoff_t index = swp_offset(entry); + VM_WARN_ON_ONCE(!folio_test_locked(folio)); + if (unlikely(!folio_test_swapcache(folio))) + return false; + if (unlikely(swp_type(entry) != swp_type(folio->swap))) + return false; + return (index - swp_offset(folio->swap)) < folio_nr_pages(folio); +} + void show_swap_cache_info(void); void *get_shadow_from_swap_cache(swp_entry_t entry); int add_to_swap_cache(struct folio *folio, swp_entry_t entry, @@ -123,6 +139,11 @@ static inline pgoff_t swap_cache_index(swp_entry_t entry) return 0; } +static inline bool folio_swap_contains(struct folio *folio, swp_entry_t entry) +{ + return false; +} + static inline void show_swap_cache_info(void) { } diff --git a/mm/swap_state.c b/mm/swap_state.c index bca201100138..07c41676486a 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -170,7 +170,8 @@ void __delete_from_swap_cache(struct folio *folio, * Lookup a swap entry in the swap cache. A found folio will be returned * unlocked and with its refcount incremented. * - * Caller must hold a reference on the swap device. + * Caller must hold a reference of the swap device, and check if the + * returned folio is still valid after locking it (e.g. folio_swap_contains). */ struct folio *swap_cache_get_folio(swp_entry_t entry) { @@ -339,7 +340,10 @@ struct folio *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, for (;;) { int err; - /* Check the swap cache in case the folio is already there */ + /* + * Check the swap cache first, if a cached folio is found, + * return it unlocked. The caller will lock and check it. + */ folio = swap_cache_get_folio(entry); if (folio) goto got_folio; diff --git a/mm/swapfile.c b/mm/swapfile.c index 29e918102355..aa031fd27847 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -240,12 +240,12 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, * Offset could point to the middle of a large folio, or folio * may no longer point to the expected offset before it's locked. */ - entry = folio->swap; - if (offset < swp_offset(entry) || offset >= swp_offset(entry) + nr_pages) { + if (!folio_swap_contains(folio, entry)) { folio_unlock(folio); folio_put(folio); goto again; } + entry = folio->swap; offset = swp_offset(entry); need_reclaim = ((flags & TTRS_ANYWAY) || @@ -2117,6 +2117,12 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, } folio_lock(folio); + if (!folio_swap_contains(folio, entry)) { + folio_unlock(folio); + folio_put(folio); + continue; + } + folio_wait_writeback(folio); ret = unuse_pte(vma, pmd, addr, entry, folio); if (ret < 0) { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e5a0db7f3331..5b4f01aecf35 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -1409,6 +1409,10 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, goto retry; } } + if (!folio_swap_contains(src_folio, entry)) { + err = -EBUSY; + goto out; + } err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte, orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval, dst_ptl, src_ptl, src_folio); -- 2.49.0