From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B7813C5B543 for ; Fri, 30 May 2025 20:17:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 519A66B01F8; Fri, 30 May 2025 16:17:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4CA0C6B01F9; Fri, 30 May 2025 16:17:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3B9586B01FA; Fri, 30 May 2025 16:17:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 141986B01F8 for ; Fri, 30 May 2025 16:17:22 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AD5EC141542 for ; Fri, 30 May 2025 20:17:21 +0000 (UTC) X-FDA: 83500683882.26.3E84D92 Received: from mail-pf1-f177.google.com (mail-pf1-f177.google.com [209.85.210.177]) by imf01.hostedemail.com (Postfix) with ESMTP id D07084000B for ; Fri, 30 May 2025 20:17:19 +0000 (UTC) Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="FLDfu/Ae"; spf=pass (imf01.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748636239; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=CNz6qun3hh1euzqXte77G8oybXmyDfbl1U3WleK9TnI=; b=Xm56/kxNg5QS3HhvEyeERsc6m6eEfqm5HGFM7X01LrOA3fmNCX7XAbBvP0BSYC+N6D89mZ veexBTA25Rj2an/X4Vb1UPY+ZAgiNZM9WReXQ0bl6kYDOq2CEkFuc4wE8D0VBm63i/0Zq0 wmr71fX3UnVy3/krexSJfAWQTfKAgQU= ARC-Authentication-Results: i=1; imf01.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b="FLDfu/Ae"; spf=pass (imf01.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.210.177 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748636239; a=rsa-sha256; cv=none; b=6zw/+lZJJ2GmfV8sDOVILe6Tdu5HyH/qobdAy7Y6S0RV0chjHhfws/dkx3YQe57Nye4o1g lvE29xNyGA6labFkjrUaS5a/cVJXaN2LEiyhU0IS4jteMmyKTuEv7RlXOpFudsRYtOCD6/ 45kRvuauM0eSx1Irgw4Hmn4tdoHieF4= Received: by mail-pf1-f177.google.com with SMTP id d2e1a72fcca58-742c73f82dfso2012637b3a.2 for ; Fri, 30 May 2025 13:17:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1748636238; x=1749241038; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=CNz6qun3hh1euzqXte77G8oybXmyDfbl1U3WleK9TnI=; b=FLDfu/AeUio9VdN0iK0jtCjduXjv+ZHcs0+NDoIV6cE/Tau85wT2DVmjEjlampF3mv 5ZV4LezgWhVA5z5m8WT/KIqcN8WODXvASeVA4AvDJzwi12PYTluKGIszult9Z2yq8AJ2 2yVdSheCM1xoqNf+95Z4HtMR11D+olv8zRZaYYZ1K3B0Ryw+EbEnh2lay54KdXZR9Zmm hA6EhVdR/RiIuyatBBDYAwmG3VA/5tpD4fGh8U+Q6PkoPNeVnXlAaA9WndWNJWojORCp jt41bjJ8SJkv99pXt4cG98M/gUjB0XCM828puvP7VTx+dPatjegOv7qd42Fql8DTurHY Rnrg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748636238; x=1749241038; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=CNz6qun3hh1euzqXte77G8oybXmyDfbl1U3WleK9TnI=; b=Uj7l7uReIYL3dFk/8ehMytJXikai2YrOoK9YGs4a+Dr2KXnzfOwiA7aUdzWiptcsEw vGYVzum7t8bq9eiDjmTKRoOe3GB7x5cOHYOhQTD2CYAXCyJ+jFP7RbQqp89cT5ZjUA4H DXvPJEYsZ+aWKS+c1Y8pSbTpdEspQIdSetsXltt8EjrB8k5Hwma2uKvtkZ/XWol50eWK d7Di9xR1+6nsUCZbPKMSwAq0DEbV6xq9xGtUBzfmawAuwribtwuGEKP5YppKf2LDMklU asTwAYFefFwRLUFpWpiiUyR8fJChQ+bx2FxCeMiSkgbFzxSPB8+dxLquhZGhfl2yQCxy MlDQ== X-Gm-Message-State: AOJu0YzGCOtkVM73Vd7WRwTDIoGVADQeOlofVLz98FSkrreYVyQDOGa0 6Q1cn/bv6CdmlRxcZ2AJQUQHS4gxZ1LipCvo+COYxH4921lqOCm+FKn/9CHBhB1mNlw= X-Gm-Gg: ASbGncvXdScNSNXnaQU3Ww1uB0AFW6NQcSCIsQO3fJ58VtASi3NSDy1LPvKVkJqJZnU Ks4/Stekm85J/w8krrOJpjOQWnmghr1GpQukEPJMNKmCZ1GAZd1ugJXODzGjEJpxMzjQ+h+83/j Dq+WvuNfkdi7PHIQqMMuN8epaL0Uoji8/YfydeOaFxgIwuQoN/d+1ERUGgr7iDAE5qvnqlQwj4d 7Fl7B9EH7tNwvV/VlH9OXlzuDHLxgWbp4SEz4a87gab6UvZTDaSmnU6xGxLs2Exvj/ajB1ZEJn8 ijAZZ389L8kL1Vs9Ut0Uk+BHlq+Ger94DvbThJHQw0cbvHRQ+oUMc+Xrp2VLCe+xIcsCqWNZhQ1 kzRCF9N8D7iFjr8RwXA== X-Google-Smtp-Source: AGHT+IGiu0lFzZNa6hPiqhaDj24YWOh0gardigPDwei+AoUpJR/2KwA+p7biOIHmDCcuY50AXdtAyQ== X-Received: by 2002:a05:6a00:744d:b0:736:5f75:4a3b with SMTP id d2e1a72fcca58-747bd96da15mr5403374b3a.7.1748636237682; Fri, 30 May 2025 13:17:17 -0700 (PDT) Received: from KASONG-MC4.tencent.com ([106.37.120.101]) by smtp.gmail.com with ESMTPSA id d2e1a72fcca58-747affd4188sm3437056b3a.128.2025.05.30.13.17.14 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Fri, 30 May 2025 13:17:17 -0700 (PDT) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Barry Song <21cnbao@gmail.com>, Peter Xu , Suren Baghdasaryan , Andrea Arcangeli , David Hildenbrand , Lokesh Gidra , stable@vger.kernel.org, linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH] mm: userfaultfd: fix race of userfaultfd_move and swap cache Date: Sat, 31 May 2025 04:17:10 +0800 Message-ID: <20250530201710.81365-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.49.0 Reply-To: Kairui Song MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Rspamd-Queue-Id: D07084000B X-Stat-Signature: hb43r36rbmgfgjpegor6nwfocab1jp3m X-Rspam-User: X-Rspamd-Server: rspam04 X-HE-Tag: 1748636239-47105 X-HE-Meta: U2FsdGVkX1++K2Y6px+Nz4Qndfbb7LxULgCjhT4ee4D0VkjtkRo8JefyTuY1YqCpAXP4GERbJjYJ/rvwx5EZpgfg/Lo5A1tjEhhfxtTEfVjUHiYds7P9zePT7dh4N5O1UoaxRfIlv2C9HTkjuDuWJv5uR7yDEUNdzPQLTYFA8FvsDIaVORGXeA6S7bodf8ocoM+X8MxutQOv0k/Lg4ALFSkc33N9DoraZzn5q7fMcMrCX045pc9gbUK2gjxPx4D8NhKxQQuPPMAn0sLpyToXgbWUGHnU3JBhY1uqUN4pHw2MLrOxnRxtQULvCBDWPQN5oiyvXCMN/tJcrZZq/xvX5J8vfEdVIp5JApPDZnXPU1VSxf7lQxEsZ8rNVDqpUj92swCBopstH3RRiDQlMOhwu/rxhBpRgEQ4Ocb1ShF81l4OfYYt2/hZH9q5OVwm7OzaIcR40igRQNgnw7kxrJ0GyKaVvvBCdDP4k9sy081JGxZL3rgCqTKZS2baIoOgfxHteUFJxq+s6IAavCTKmJrTPvDhVNAuIAhA7Or7zawAYZUU8vRBf0GEReGqR4Xh543tT80oBagkHacUDuKvDOwQ9bwm4ebDwNXocMdFkVsbSTgftlr2R2+UUs6H6G+FgwVakmzYeTOey2slRwjkkdkxiqmEGch55RebpS+CXSl3VGu2xXdb3ymObiuYGgny71zZdDbIQMf+aiNB8VJAORl48VyZnzIxpJfcVVl8E1GDbd7+W/gFLx06bY47SXQbZYRRFgXXPtFISSVsXPazHh8lP31FNJS3F+n5QYrkHe8c2zhWFUG0eyDBX6KL+jm7OfLxcOCIybSB3+DJ9O7jTi5KvcjIixbi81Pym1Rim8iE/ceNq3iLcSVp0xAUO/6rQXNq212HueY18sVQBue725gq3CJxeRTxi2UN6+neXkraSxVniFFK4frbaFkPtJyQyhjFgY79T1B9AGUX3xLdpWj 54KRIQNd MBJA6DtRICqYxIu+FKQD9ZPWYUPW28lWnv6evZ1Cvql1qv6J9Za57tAJtdu2SY8xrD1gRaDqtTGcYCmYt7MJRRDyxXX5HJOIyECsdqm23hMnCEXV32vbZpTBhsZsKiAVIp6C7JbGsYiQhFK1c6qW8B6u4t60SesWwQVVcyK3zJsnPbJZYxhxwhQMCxZFp09b8q2tpZ86/ERPGnV4n7A1aGROGxhJgCcB0olrVw2uOub1GuTpZMZmRx3NeoaTdvbMHIjHX+PsfKWjHLB3qy/3y+oNmo7rNDX78naHki+hyuHIXoa0ePfqxv+EaZZu0PcGiFBGk2mx7HEhtPZGBB1658YSLv1Y3S75leTYl5dcerEH60jli5Duhu1UpqOACFCZWYUi/UaptHzpciovIICW0fhuNHR7oLtbpN4DEGKdXbvv4Tdx2pT8z81CPnY/YHrDBafo/1QN2yDatMil6lnp/m4dPnONKiGgXyj+Y+NoCU7Xapv9xMCkKyiozv1pHVwyFBbH8XzNFvNO8V+77FeW7C+zAT5rBmq/5vEZIhVS2KgUgjKANRZzvNWBVkYAqZkmB5IyJ8uEry8sd2dc+fqgvaMXdCpIQbuxtYnYebdhKQPuLVl7+LdZMCZvyztDh3vMr6ewtO0Wn5tMsxkS2ElPmTxDEqFF7BBJVA6Ta X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song On seeing a swap entry PTE, userfaultfd_move does a lockless swap cache lookup, and try to move the found folio to the faulting vma when. Currently, it relies on the PTE value check to ensure the moved folio still belongs to the src swap entry, which turns out is not reliable. While working and reviewing the swap table series with Barry, following existing race is observed and reproduced [1]: ( move_pages_pte is moving src_pte to dst_pte, where src_pte is a swap entry PTE holding swap entry S1, and S1 isn't in the swap cache.) CPU1 CPU2 userfaultfd_move move_pages_pte() entry = pte_to_swp_entry(orig_src_pte); // Here it got entry = S1 ... < Somehow interrupted> ... // folio A is just a new allocated folio // and get installed into src_pte // src_pte now points to folio A, S1 // has swap count == 0, it can be freed // by folio_swap_swap or swap // allocator's reclaim. // folio B is a folio in another VMA. // S1 is freed, folio B could use it // for swap out with no problem. ... folio = filemap_get_folio(S1) // Got folio B here !!! ... < Somehow interrupted again> ... // Now S1 is free to be used again. // Now src_pte is a swap entry pte // holding S1 again. folio_trylock(folio) move_swap_pte double_pt_lock is_pte_pages_stable // Check passed because src_pte == S1 folio_move_anon_rmap(...) // Moved invalid folio B here !!! The race window is very short and requires multiple collisions of multiple rare events, so it's very unlikely to happen, but with a deliberately constructed reproducer and increased time window, it can be reproduced [1]. It's also possible that folio (A) is swapped in, and swapped out again after the filemap_get_folio lookup, in such case folio (A) may stay in swap cache so it needs to be moved too. In this case we should also try again so kernel won't miss a folio move. Fix this by checking if the folio is the valid swap cache folio after acquiring the folio lock, and checking the swap cache again after acquiring the src_pte lock. SWP_SYNCRHONIZE_IO path does make the problem more complex, but so far we don't need to worry about that since folios only might get exposed to swap cache in the swap out path, and it's covered in this patch too by checking the swap cache again after acquiring src_pte lock. Fixes: adef440691ba ("userfaultfd: UFFDIO_MOVE uABI") Closes: https://lore.kernel.org/linux-mm/CAMgjq7B1K=6OOrK2OUZ0-tqCzi+EJt+2_K97TPGoSt=9+JwP7Q@mail.gmail.com/ [1] Signed-off-by: Kairui Song --- mm/userfaultfd.c | 26 ++++++++++++++++++++++++++ 1 file changed, 26 insertions(+) diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index bc473ad21202..a1564d205dfb 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include "internal.h" @@ -1086,6 +1087,8 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, spinlock_t *dst_ptl, spinlock_t *src_ptl, struct folio *src_folio) { + swp_entry_t entry; + double_pt_lock(dst_ptl, src_ptl); if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src_pte, @@ -1102,6 +1105,19 @@ static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *dst_vma, if (src_folio) { folio_move_anon_rmap(src_folio, dst_vma); src_folio->index = linear_page_index(dst_vma, dst_addr); + } else { + /* + * Check again after acquiring the src_pte lock. Or we might + * miss a new loaded swap cache folio. + */ + entry = pte_to_swp_entry(orig_src_pte); + src_folio = filemap_get_folio(swap_address_space(entry), + swap_cache_index(entry)); + if (!IS_ERR_OR_NULL(src_folio)) { + double_pt_unlock(dst_ptl, src_ptl); + folio_put(src_folio); + return -EAGAIN; + } } orig_src_pte = ptep_get_and_clear(mm, src_addr, src_pte); @@ -1409,6 +1425,16 @@ static int move_pages_pte(struct mm_struct *mm, pmd_t *dst_pmd, pmd_t *src_pmd, folio_lock(src_folio); goto retry; } + /* + * Check if the folio still belongs to the target swap entry after + * acquiring the lock. Folio can be freed in the swap cache while + * not locked. + */ + if (unlikely(!folio_test_swapcache(folio) || + entry.val != folio->swap.val)) { + err = -EAGAIN; + goto out; + } } err = move_swap_pte(mm, dst_vma, dst_addr, src_addr, dst_pte, src_pte, orig_dst_pte, orig_src_pte, dst_pmd, dst_pmdval, -- 2.49.0