From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F0BAC021BE for ; Wed, 26 Feb 2025 03:38:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0320B6B0099; Tue, 25 Feb 2025 22:38:31 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EFB9E6B009A; Tue, 25 Feb 2025 22:38:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D75FA280001; Tue, 25 Feb 2025 22:38:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id AF1F56B0099 for ; Tue, 25 Feb 2025 22:38:30 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 67C1AA17A0 for ; Wed, 26 Feb 2025 03:38:30 +0000 (UTC) X-FDA: 83160688380.13.DAC47F5 Received: from mail-qt1-f172.google.com (mail-qt1-f172.google.com [209.85.160.172]) by imf30.hostedemail.com (Postfix) with ESMTP id 829CE8000D for ; Wed, 26 Feb 2025 03:38:28 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NzPD9b7b; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of surenb@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740541108; a=rsa-sha256; cv=none; b=ZO/Z8q7Q7Jlkdh/j/fDt2/6OKdjSNoltlzsgZ3fqfTd15hmW5hvHLcWrmE8/F2rXkNqyV9 cuDQNwgCjfeABkxCMzelYBKAT7A72y1iGxHhAtZflgXaX/PVgJin4apAoOPbonQacFfytb B28hOJRl+cchnH+OC5Kf899++Lqfgh0= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NzPD9b7b; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of surenb@google.com designates 209.85.160.172 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740541108; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lIjFTs8XaVhd/5sHwJKhguYA1os9aCgi4nK7ADGd+ew=; b=bMr0DBdmNSDoqHmRg/oaBsujDGuD9YNTbdsQpUPgWiX7czvf7QzlNr+N5ouVcbaEdB1B1s CLjaQRAd2dwcytur4U1fmnL00iSwYeOiIu7/8KKyufb17pUUjXnSWAjAJxv08Nkr7GKMYS 7mcHW1f82A0rw5Jfe5JwZzhQtq6ni20= Received: by mail-qt1-f172.google.com with SMTP id d75a77b69052e-471fa3b19bcso135131cf.0 for ; Tue, 25 Feb 2025 19:38:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1740541107; x=1741145907; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=lIjFTs8XaVhd/5sHwJKhguYA1os9aCgi4nK7ADGd+ew=; b=NzPD9b7bY6+D91ZNNbXKJOUEDI9z6xt10W/RbFy8PdAwh9C0wL9dK08xm/1zexkrSf VfOumTUZ4tctxqw/LWVfgygNFcpDAdlHOvAoawjqZNQvIh7NRsBFkUh61l6Id4PK0m8r TH6QGWmvei/dRvpByxvdZvDo5BiuLjD0Y+VJI9UYHMl1S1c8+q5ibCua5uYruZr5hIPa RzVgbrSJIdSXXe3DcQ3q1vjHaOsbWdb5ep8aJXqQ64l4zvTM8ZD8ocmRZm+DOxe9GwD8 oHyLCDdXsRvgl6qCEtsYlr64Bvb+LDqcIiTc2HjtR/EG2DF5k/bjU6pQhfzCg48qjKhI x4hQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740541107; x=1741145907; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=lIjFTs8XaVhd/5sHwJKhguYA1os9aCgi4nK7ADGd+ew=; b=o1UiX//9uv7u+HqO+6C+bFlNridv5XRHlwQW2k4hLxmIke81Vse5im4reS7BEk40BG ccY6O39MSN2Pu0OP3rk4uRBkVSLIgIkxHmH1EEqhGGrVrX684V9DcJjAgyufymEwe9p+ MtTJ2Dfk2owrO3x2fxJzfzO2x01fLdFnP7MfLrIfnq3qgpQibl3vY8tDGkIzKblSwn5Z izy7UceEih8wqes/0HMoipjyaYvyLzBgKPnpYvMIGoawmxNMQNHZzbOG0yBSsIBFXZae SzcUjzrvFRDoOjV0rk/fsnyNaCCVac2mvYvy2OeY8xuRm0UloAFm7FK6YUVEaIbbEbjF +fJA== X-Forwarded-Encrypted: i=1; AJvYcCWjSZ6wUG8VelJ8H/Be4puKPyEmfQVzMmVcb8Aoy69HClggscLx9ikA1PAPcJH5X9BZW4Nab6Njhg==@kvack.org X-Gm-Message-State: AOJu0Yxgl84coQGragbHHTdMo7ioeEPvF00ZUytlp9iKKlJaDLb4OEBG PdK43oBXe6d6pFndQqZMzXr29d8ctVjHScgq1dyJqj3SI/PrEE51OpZ1V4wcs4iaW7oGeSJf9ru +MUUVUIlKLWFUTiBNPatcJ6FX6/fWc49Gg9Kd X-Gm-Gg: ASbGnctQCnqqiAHO0M+Wr7gPuMLZV5dUtLvE8Jwj4BFClG9VVKCaCB0nt86FqZbxXtm 5ocHDjBeMlyhqZCgnAIvNDhYNswkmQF+yfbZA4pgwK1inNoAAKruuO3tGARaJjZ7darpGajxI/n fDLhlL67k= X-Google-Smtp-Source: AGHT+IF/QvaqsRSMVSCz6PAVAXCNN4lg2WeWrhIDLcOH5mZ+gaYtyv9Dcg0xYSSEEbvk8/e3LQPjmGC4iEVC30PW2Fg= X-Received: by 2002:ac8:58c2:0:b0:471:e982:c73d with SMTP id d75a77b69052e-47376e6f26amr7297951cf.11.1740541107210; Tue, 25 Feb 2025 19:38:27 -0800 (PST) MIME-Version: 1.0 References: <20250226001400.9129-1-21cnbao@gmail.com> In-Reply-To: From: Suren Baghdasaryan Date: Tue, 25 Feb 2025 19:38:15 -0800 X-Gm-Features: AWEUYZk_cLD-qefIHA6n47oBNOgE_qb3rukF4ShR7mhbPbxyd6C35f7Uwi5jMkk Message-ID: Subject: Re: [PATCH v2] mm: Fix kernel BUG when userfaultfd_move encounters swapcache To: Peter Xu Cc: Barry Song <21cnbao@gmail.com>, linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, Barry Song , Andrea Arcangeli , Al Viro , Axel Rasmussen , Brian Geffon , Christian Brauner , David Hildenbrand , Hugh Dickins , Jann Horn , Kalesh Singh , "Liam R . Howlett" , Lokesh Gidra , Matthew Wilcox , Michal Hocko , Mike Rapoport , Nicolas Geoffray , Ryan Roberts , Shuah Khan , ZhangPeng , Tangquan Zheng , stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 829CE8000D X-Rspamd-Server: rspam12 X-Stat-Signature: thocgz6a1dyfgz5tzup8i9hafwt84mf8 X-HE-Tag: 1740541108-454817 X-HE-Meta: U2FsdGVkX192LGn9a7FsJC+GlDXh2oPBhgNWPMS1zezlwc1ws/Dl+2XCFVloAtjbBHhgWpuYxirmzTdapCQzup24Q26YqWH+EEO5rIr1N81dsZT0GerkmZf4aMasgp3u7dbUndsf0VwN/PPTUcGkugdxxTrWMsjR+VOTrEQD7MnLjonLdgYQ2HyjxGDXlPN/O3A/k73zakiihBdi6WQKcXy7LYfrXR9N19F5fJL4cw2Hzdafgn+uqFqvp86W90NB8mYICY+RP7jVCOJVHCvlHelymy9nHWdSrKiN/zGAq0aUoxuFbyUKQbAkqddR2djIGw7QSTEEYo1gEXXVPloVBTB3E8qtTDJ6Ac2nlKDds9S4l/3DOWX2WE77P1Ti7qSsMvzLvKwOOA/96HH84+71Nhs71ffLCjboQ63YMUY9fHX/byF4v+rf/1cfhRAKRMOfdJIM+MDR9iIsWuVe8megdYj3K3hvDqMl0txPOyz4FSYnDgBWHyqQY7wa1YlDkGubLTDdz4yq/7F2ALg5o+vbHd+P+Z9Rhr5YBncbG4i+UYAXc+nXwW4HYmtAQ2QOGaEtmR8w2eSjnsFf8USYGEYLvc9jkuPOEHQGFgpnNx2h+S2pWnB1Yt+W+HLOqY7hyM+qyWH7dGMEP9ycuKgWCDH9XfFw7pNOgfPvAzUWkW2otx1NARZG014hpkpiO+asuhxiZNBl3PS0XvlyfiaJ+g02AQBnSfLlApFup3qeuAXvwfbGLlS2+1XXVp4EmOouSSWA1vr05Foy6OemZLndZLg7R21ihUP5GkNNENRDPag0yPxUKe3GAgvQ2b5TebTdP4o9p0CYd9+8qe77EzYhIRiI90wTeOVtuEvcX2D6dxZjcW1ycfwjXgDuG8S+LE3teROXvB7GNdRpfNGzeq1MS5OweQuhBujiM/6GgTYVDPU56zO63Ia5WermlUgLEI2FGxWf0WwfFG7Xqsczd4j6EGd Zbg5+aAP z0JVgepwNxMB1WowvzgV25FPM41zCz33z9vxgm22JYF7OoG1vV+cwggwsyyjTQXK2ClImP15Npyx8JQYNV0qKsSIp1qOYN0uONSWJJ/NCPvl87Yi+Gi54Y+FORMtLsSpfz81IMhb6zKuJH+V0rOsL1mIbDFyeTrYN/Gl6+AuUy1LUdbEGiGvh226Z5IxiJ2NfQtZaAEh7Xoy0MwuwQVzXUlRTPlVEyb/8cjMIhhElv+auSfXuTzAwYUfUDI7U26xLcHmsmrNU605YflAq5GYxtzpH50w3YuSfodmrtdo/vpwvVFtWcWyk59dBicfXYvghAnetA6yDr++qu+MxKZ5/X+ZWGV4QQzbYTUtsXms29o8M3rNbRx25kcFMmOWuQCnRLql3ekO+o2SDzcVM40ihjaxBa+Q0pfbMG1WlCdHcjoWKaqvY/e1OyODpC3ZcIJxhbq7OBnWqEvmwnTveB186K+SGWB1VdjQMyKH5xfpWatyp1vefStKlchfsJCCTQENJv8hutARhPCrQhaqzS8F7FOR+xtzpPEeHwa8pffS6W14QEF2aMP3ohwAs0WHOgo1qgNq95fasLXRxO+c33gmLolgBvKi0iYE43NNFKrpPjwUqOCA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Feb 25, 2025 at 5:00=E2=80=AFPM Peter Xu wrote: > > On Wed, Feb 26, 2025 at 01:14:00PM +1300, Barry Song wrote: > > From: Barry Song > > > > userfaultfd_move() checks whether the PTE entry is present or a > > swap entry. > > > > - If the PTE entry is present, move_present_pte() handles folio > > migration by setting: > > > > src_folio->index =3D linear_page_index(dst_vma, dst_addr); > > > > - If the PTE entry is a swap entry, move_swap_pte() simply copies > > the PTE to the new dst_addr. > > > > This approach is incorrect because, even if the PTE is a swap entry, > > it can still reference a folio that remains in the swap cache. > > > > This creates a race window between steps 2 and 4. > > 1. add_to_swap: The folio is added to the swapcache. > > 2. try_to_unmap: PTEs are converted to swap entries. > > 3. pageout: The folio is written back. > > 4. Swapcache is cleared. > > If userfaultfd_move() occurs in the window between steps 2 and 4, > > after the swap PTE has been moved to the destination, accessing the > > destination triggers do_swap_page(), which may locate the folio in > > the swapcache. However, since the folio's index has not been updated > > to match the destination VMA, do_swap_page() will detect a mismatch. > > > > This can result in two critical issues depending on the system > > configuration. > > > > If KSM is disabled, both small and large folios can trigger a BUG > > during the add_rmap operation due to: > > > > page_pgoff(folio, page) !=3D linear_page_index(vma, address) > > > > [ 13.336953] page: refcount:6 mapcount:1 mapping:00000000f43db19c ind= ex:0xffffaf150 pfn:0x4667c > > [ 13.337520] head: order:2 mapcount:1 entire_mapcount:0 nr_pages_mapp= ed:1 pincount:0 > > [ 13.337716] memcg:ffff00000405f000 > > [ 13.337849] anon flags: 0x3fffc0000020459(locked|uptodate|dirty|owne= r_priv_1|head|swapbacked|node=3D0|zone=3D0|lastcpupid=3D0xffff) > > [ 13.338630] raw: 03fffc0000020459 ffff80008507b538 ffff80008507b538 = ffff000006260361 > > [ 13.338831] raw: 0000000ffffaf150 0000000000004000 0000000600000000 = ffff00000405f000 > > [ 13.339031] head: 03fffc0000020459 ffff80008507b538 ffff80008507b538= ffff000006260361 > > [ 13.339204] head: 0000000ffffaf150 0000000000004000 0000000600000000= ffff00000405f000 > > [ 13.339375] head: 03fffc0000000202 fffffdffc0199f01 ffffffff00000000= 0000000000000001 > > [ 13.339546] head: 0000000000000004 0000000000000000 00000000ffffffff= 0000000000000000 > > [ 13.339736] page dumped because: VM_BUG_ON_PAGE(page_pgoff(folio, pa= ge) !=3D linear_page_index(vma, address)) > > [ 13.340190] ------------[ cut here ]------------ > > [ 13.340316] kernel BUG at mm/rmap.c:1380! > > [ 13.340683] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEMP= T SMP > > [ 13.340969] Modules linked in: > > [ 13.341257] CPU: 1 UID: 0 PID: 107 Comm: a.out Not tainted 6.14.0-rc= 3-gcf42737e247a-dirty #299 > > [ 13.341470] Hardware name: linux,dummy-virt (DT) > > [ 13.341671] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BT= YPE=3D--) > > [ 13.341815] pc : __page_check_anon_rmap+0xa0/0xb0 > > [ 13.341920] lr : __page_check_anon_rmap+0xa0/0xb0 > > [ 13.342018] sp : ffff80008752bb20 > > [ 13.342093] x29: ffff80008752bb20 x28: fffffdffc0199f00 x27: 0000000= 000000001 > > [ 13.342404] x26: 0000000000000000 x25: 0000000000000001 x24: 0000000= 000000001 > > [ 13.342575] x23: 0000ffffaf0d0000 x22: 0000ffffaf0d0000 x21: fffffdf= fc0199f00 > > [ 13.342731] x20: fffffdffc0199f00 x19: ffff000006210700 x18: 0000000= 0ffffffff > > [ 13.342881] x17: 6c203d2120296567 x16: 6170202c6f696c6f x15: 6628666= 66f67705f > > [ 13.343033] x14: 6567617028454741 x13: 2929737365726464 x12: ffff800= 083728ab0 > > [ 13.343183] x11: ffff800082996bf8 x10: 0000000000000fd7 x9 : ffff800= 08011bc40 > > [ 13.343351] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff800= 0829eebf8 > > [ 13.343498] x5 : c0000000fffff000 x4 : 0000000000000000 x3 : 0000000= 000000000 > > [ 13.343645] x2 : 0000000000000000 x1 : ffff0000062db980 x0 : 0000000= 00000005f > > [ 13.343876] Call trace: > > [ 13.344045] __page_check_anon_rmap+0xa0/0xb0 (P) > > [ 13.344234] folio_add_anon_rmap_ptes+0x22c/0x320 > > [ 13.344333] do_swap_page+0x1060/0x1400 > > [ 13.344417] __handle_mm_fault+0x61c/0xbc8 > > [ 13.344504] handle_mm_fault+0xd8/0x2e8 > > [ 13.344586] do_page_fault+0x20c/0x770 > > [ 13.344673] do_translation_fault+0xb4/0xf0 > > [ 13.344759] do_mem_abort+0x48/0xa0 > > [ 13.344842] el0_da+0x58/0x130 > > [ 13.344914] el0t_64_sync_handler+0xc4/0x138 > > [ 13.345002] el0t_64_sync+0x1ac/0x1b0 > > [ 13.345208] Code: aa1503e0 f000f801 910f6021 97ff5779 (d4210000) > > [ 13.345504] ---[ end trace 0000000000000000 ]--- > > [ 13.345715] note: a.out[107] exited with irqs disabled > > [ 13.345954] note: a.out[107] exited with preempt_count 2 > > > > If KSM is enabled, Peter Xu also discovered that do_swap_page() may > > trigger an unexpected CoW operation for small folios because > > ksm_might_need_to_copy() allocates a new folio when the folio index > > does not match linear_page_index(vma, addr). > > > > This patch also checks the swapcache when handling swap entries. If a > > match is found in the swapcache, it processes it similarly to a present > > PTE. > > However, there are some differences. For example, the folio is no longe= r > > exclusive because folio_try_share_anon_rmap_pte() is performed during > > unmapping. > > Furthermore, in the case of swapcache, the folio has already been > > unmapped, eliminating the risk of concurrent rmap walks and removing th= e > > need to acquire src_folio's anon_vma or lock. > > > > Note that for large folios, in the swapcache handling path, we directly > > return -EBUSY since split_folio() will return -EBUSY regardless if > > the folio is under writeback or unmapped. This is not an urgent issue, > > so a follow-up patch may address it separately. > > > > Fixes: adef440691bab ("userfaultfd: UFFDIO_MOVE uABI") > > Cc: Andrea Arcangeli > > Cc: Suren Baghdasaryan > > Cc: Al Viro > > Cc: Axel Rasmussen > > Cc: Brian Geffon > > Cc: Christian Brauner > > Cc: David Hildenbrand > > Cc: Hugh Dickins > > Cc: Jann Horn > > Cc: Kalesh Singh > > Cc: Liam R. Howlett > > Cc: Lokesh Gidra > > Cc: Matthew Wilcox (Oracle) > > Cc: Michal Hocko > > Cc: Mike Rapoport (IBM) > > Cc: Nicolas Geoffray > > Cc: Peter Xu > > Cc: Ryan Roberts > > Cc: Shuah Khan > > Cc: ZhangPeng > > Cc: Tangquan Zheng > > Cc: > > Signed-off-by: Barry Song > > Acked-by: Peter Xu > > Some nitpicks below, maybe no worth for a repost.. With Peter's nits addressed, Reviewed-by: Suren Baghdasaryan Thanks! > > > --- > > mm/userfaultfd.c | 76 ++++++++++++++++++++++++++++++++++++++++++------ > > 1 file changed, 67 insertions(+), 9 deletions(-) > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index 867898c4e30b..2df5d100e76d 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -18,6 +18,7 @@ > > #include > > #include > > #include "internal.h" > > +#include "swap.h" > > > > static __always_inline > > bool validate_dst_vma(struct vm_area_struct *dst_vma, unsigned long ds= t_end) > > @@ -1072,16 +1073,14 @@ static int move_present_pte(struct mm_struct *m= m, > > return err; > > } > > > > -static int move_swap_pte(struct mm_struct *mm, > > +static int move_swap_pte(struct mm_struct *mm, struct vm_area_struct *= dst_vma, > > unsigned long dst_addr, unsigned long src_addr, > > pte_t *dst_pte, pte_t *src_pte, > > pte_t orig_dst_pte, pte_t orig_src_pte, > > pmd_t *dst_pmd, pmd_t dst_pmdval, > > - spinlock_t *dst_ptl, spinlock_t *src_ptl) > > + spinlock_t *dst_ptl, spinlock_t *src_ptl, > > + struct folio *src_folio) > > { > > - if (!pte_swp_exclusive(orig_src_pte)) > > - return -EBUSY; > > - > > double_pt_lock(dst_ptl, src_ptl); > > > > if (!is_pte_pages_stable(dst_pte, src_pte, orig_dst_pte, orig_src= _pte, > > @@ -1090,10 +1089,20 @@ static int move_swap_pte(struct mm_struct *mm, > > return -EAGAIN; > > } > > > > + /* > > + * The src_folio resides in the swapcache, requiring an update to= its > > + * index and mapping to align with the dst_vma, where a swap-in m= ay > > + * occur and hit the swapcache after moving the PTE. > > + */ > > + if (src_folio) { > > + folio_move_anon_rmap(src_folio, dst_vma); > > + src_folio->index =3D linear_page_index(dst_vma, dst_addr)= ; > > + } > > + > > orig_src_pte =3D ptep_get_and_clear(mm, src_addr, src_pte); > > set_pte_at(mm, dst_addr, dst_pte, orig_src_pte); > > - double_pt_unlock(dst_ptl, src_ptl); > > > > + double_pt_unlock(dst_ptl, src_ptl); > > Unnecessary line move. > > > return 0; > > } > > > > @@ -1137,6 +1146,7 @@ static int move_pages_pte(struct mm_struct *mm, p= md_t *dst_pmd, pmd_t *src_pmd, > > __u64 mode) > > { > > swp_entry_t entry; > > + struct swap_info_struct *si =3D NULL; > > pte_t orig_src_pte, orig_dst_pte; > > pte_t src_folio_pte; > > spinlock_t *src_ptl, *dst_ptl; > > @@ -1318,6 +1328,8 @@ static int move_pages_pte(struct mm_struct *mm, p= md_t *dst_pmd, pmd_t *src_pmd, > > orig_dst_pte, orig_src_pte, dst_pm= d, > > dst_pmdval, dst_ptl, src_ptl, src_= folio); > > } else { > > + struct folio *folio =3D NULL; > > + > > entry =3D pte_to_swp_entry(orig_src_pte); > > if (non_swap_entry(entry)) { > > if (is_migration_entry(entry)) { > > @@ -1331,9 +1343,53 @@ static int move_pages_pte(struct mm_struct *mm, = pmd_t *dst_pmd, pmd_t *src_pmd, > > goto out; > > } > > > > - err =3D move_swap_pte(mm, dst_addr, src_addr, dst_pte, sr= c_pte, > > - orig_dst_pte, orig_src_pte, dst_pmd, > > - dst_pmdval, dst_ptl, src_ptl); > > + if (!pte_swp_exclusive(orig_src_pte)) { > > + err =3D -EBUSY; > > + goto out; > > + } > > + > > + si =3D get_swap_device(entry); > > + if (unlikely(!si)) { > > + err =3D -EAGAIN; > > + goto out; > > + } > > + /* > > + * Verify the existence of the swapcache. If present, the= folio's > > + * index and mapping must be updated even when the PTE is= a swap > > + * entry. The anon_vma lock is not taken during this proc= ess since > > + * the folio has already been unmapped, and the swap entr= y is > > + * exclusive, preventing rmap walks. > > + * > > + * For large folios, return -EBUSY immediately, as split_= folio() > > + * also returns -EBUSY when attempting to split unmapped = large > > + * folios in the swapcache. This issue needs to be resolv= ed > > + * separately to allow proper handling. > > + */ > > + if (!src_folio) > > + folio =3D filemap_get_folio(swap_address_space(en= try), > > + swap_cache_index(entry)); > > + if (!IS_ERR_OR_NULL(folio)) { > > + if (folio && folio_test_large(folio)) { > > Can drop this folio check as it just did check "!IS_ERR_OR_NULL(folio)".. > > > + err =3D -EBUSY; > > + folio_put(folio); > > + goto out; > > + } > > + src_folio =3D folio; > > + src_folio_pte =3D orig_src_pte; > > + if (!folio_trylock(src_folio)) { > > + pte_unmap(&orig_src_pte); > > + pte_unmap(&orig_dst_pte); > > + src_pte =3D dst_pte =3D NULL; > > + /* now we can block and wait */ > > + folio_lock(src_folio); > > + put_swap_device(si); > > + si =3D NULL; > > Not sure if it can do any harm, but maybe still nicer to put swap before > locking folio. > > Thanks, > > > + goto retry; > > + } > > + } > > + err =3D move_swap_pte(mm, dst_vma, dst_addr, src_addr, ds= t_pte, src_pte, > > + orig_dst_pte, orig_src_pte, dst_pmd, dst_= pmdval, > > + dst_ptl, src_ptl, src_folio); > > } > > > > out: > > @@ -1350,6 +1406,8 @@ static int move_pages_pte(struct mm_struct *mm, p= md_t *dst_pmd, pmd_t *src_pmd, > > if (src_pte) > > pte_unmap(src_pte); > > mmu_notifier_invalidate_range_end(&range); > > + if (si) > > + put_swap_device(si); > > > > return err; > > } > > -- > > 2.39.3 (Apple Git-146) > > > > -- > Peter Xu >