From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5AECDC021B2 for ; Thu, 20 Feb 2025 21:45:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 82C6C2802BD; Thu, 20 Feb 2025 16:45:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7DC5A2802B8; Thu, 20 Feb 2025 16:45:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67C6B2802BD; Thu, 20 Feb 2025 16:45:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 441CE2802B8 for ; Thu, 20 Feb 2025 16:45:38 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 6A2071A2E11 for ; Thu, 20 Feb 2025 21:45:37 +0000 (UTC) X-FDA: 83141655114.20.CA99B36 Received: from mail-ua1-f44.google.com (mail-ua1-f44.google.com [209.85.222.44]) by imf27.hostedemail.com (Postfix) with ESMTP id 2C6B84000A for ; Thu, 20 Feb 2025 21:45:34 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RiNN0fBN; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1740087935; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jXIb836gWC4wBjuTjc53sh1pM42/PKXwk1AqfwdsvLg=; b=hginxy4vKbCOW20C22dZA3zQyi89E9brhQaDgBS6JY0iPNY5V0JZPJwm0octFwxHp37HsI MWl8ycQdRQICgi3M+d0f6BP50Ypx1E5eCP7w5kIXLnTPFgK8C5dtF4fs6WM33zTu1vWJA4 /GhHs9D3M8f60n6SU5qrwifBvrA+R3w= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=RiNN0fBN; spf=pass (imf27.hostedemail.com: domain of 21cnbao@gmail.com designates 209.85.222.44 as permitted sender) smtp.mailfrom=21cnbao@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1740087935; a=rsa-sha256; cv=none; b=Ch1pI5o7iSO3F6GPCWnsmyDgucltMkaqS2RllaL/yW6GFgeu2rhMF7HKkqNrFZbgo9vzXa Kp3W79oWCaR730vXGjGcx/Qc468JDvArbxoJ4RvXA4207GO5YY1gKJiQEAhKyriFO2/J+P mD2TbVMxMzAcrlIlXuT0K5nc3IDXmTo= Received: by mail-ua1-f44.google.com with SMTP id a1e0cc1a2514c-8641bc78952so357901241.0 for ; Thu, 20 Feb 2025 13:45:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1740087934; x=1740692734; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=jXIb836gWC4wBjuTjc53sh1pM42/PKXwk1AqfwdsvLg=; b=RiNN0fBNkOpd8Fs/Iwzs5pyyZSXiW5IqPeYlRYvQFgeXdKX54p5xV+WuaVm4wlVW2z h+dI9glRBo2AWE6pGkzXA0ic7Fc5aplURTk8tvO7FULZFMECWkHUA6ZNWYpO6V7HUD6o cAyf3Y+fJX0eQ+pCL3IcY8a4/+mUsZbrfcMFz/cmOJqRRV+QHs6UvcDMs4SpXO8E2T40 8KspiSLmPjkJ8cCZnS2Hvr8eFa/Ik2ytb2wCd6ocSFEbJZzcJIH2Lxie0IClA7MSYxda Go0ZWLc/a3Jvg8pSxWV/mw71OS6BE0rgqkzf+yG1PYSu1pK+NNPSYpNUvIAHOLpV+H46 ON8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1740087934; x=1740692734; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=jXIb836gWC4wBjuTjc53sh1pM42/PKXwk1AqfwdsvLg=; b=pChi/ObwBzg8qt0Fu2lilc4JVogp+Qw7Ie37sMNTwZTYl9XeopVdYWpw/72c7jLAlc AZq2YfJODxxtt1ShZ7A2lnLAUImnt4q2c9TiY04Q5LjxDwnnDIKOdtz9bRXcLUiQjg+d ogElWqdTV6Ms0H7WEzrSPOnmCsPw7bYmMvLJIW6xBGwg+11FRXVcRw1o6EjSfHdWil1f LQqYc5oPtOwUx474nKSAsBu9h2S2Mlo6xhoIaw6kygjr7709NNFLXwT8nh38VglFT/AP FhcXXaC9BHgI6Grl8niZn3Li1KmkasDRBj7ONjjxNt0GXoiaVXRhMRHweMEn4U8g6Gha sRHg== X-Forwarded-Encrypted: i=1; AJvYcCX3t5v4+X8XlU9izMpbbh+zCgnj5XQWbUkwISyfRBye08kWroml91MUVYqV1NqOFRN0tiIyCDkUFA==@kvack.org X-Gm-Message-State: AOJu0Yz3TAB0vXe+ar0WH/CSfvEoMrA7Eoc5C8auTViZTBews9bM0ZHG POKUDbwjEhYn0mPYf/6v0C1y3eCo44kdx9CZ8OjQYVVjsbtAOLxtj7dXW5Lpb/VLDxyqdURvxxD frvefDPss83dZTKL4JiJcNB+IGnc= X-Gm-Gg: ASbGnctp/AjqML0hTirOSAsTgNeLIISZLPNmAOdFGn1HPd3Ghw1IvDLIHFcLvhzXFBl jJ8Gp+eBUBcV9Hlf1iMUvoIUvecCquzjy0ty8Qqe5JPO+S6pGcj74IDuLX6tl4qPHNBrLqC21 X-Google-Smtp-Source: AGHT+IGcOnNa+DhVZ9fnuKYxVe9VnnOAeXEBeS7020TvFrUeDtR9jy3zY8/SZoSI1tPQoL4f6Ad0iVxoyvUEXzmC1WQ= X-Received: by 2002:a05:6102:442c:b0:4b2:9eeb:518f with SMTP id ada2fe7eead31-4bfc00c0415mr839015137.10.1740087934022; Thu, 20 Feb 2025 13:45:34 -0800 (PST) MIME-Version: 1.0 References: <20250219112519.92853-1-21cnbao@gmail.com> <50566d42-7754-4017-b290-f29d92e69231@redhat.com> In-Reply-To: From: Barry Song <21cnbao@gmail.com> Date: Fri, 21 Feb 2025 10:45:23 +1300 X-Gm-Features: AWEUYZl4xe7HNDvo6UXTBDHLB73U9sPNHtCPSO3wp119NQEl1496IzjgrPxSD-A Message-ID: Subject: Re: [PATCH RFC] mm: Fix kernel BUG when userfaultfd_move encounters swapcache To: David Hildenbrand Cc: Suren Baghdasaryan , Lokesh Gidra , linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, zhengtangquan@oppo.com, Barry Song , Andrea Arcangeli , Al Viro , Axel Rasmussen , Brian Geffon , Christian Brauner , Hugh Dickins , Jann Horn , Kalesh Singh , "Liam R . Howlett" , Matthew Wilcox , Michal Hocko , Mike Rapoport , Nicolas Geoffray , Peter Xu , Ryan Roberts , Shuah Khan , ZhangPeng , Yu Zhao Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 2C6B84000A X-Stat-Signature: smmerkibmfbmo7nt74ryayfow46tr8c7 X-HE-Tag: 1740087934-625594 X-HE-Meta: U2FsdGVkX1/5DznJGBrUO3UW0OBE8b34K8HDE/r98nT/jfkhU9qsHYTaNFBZC8Z7GO2wlFJW0+kyNa8XD7Pnaa55cyNlnDz8fdkrTxRMmNyxDmYtv5ywofkuI6XqOHqVlGJj0u0IzGj3CYPBA8/zY13HFHpleYCiftRzw7LY8LxpWv7xKWX1I+97MNN6bgeNTVISTjp5bPXkv2P45uwLrN+AsoXoeLhGEC60v3iM8vsIYydp7+yT/6mCTfoNmopaD0i2aUhFL7jnnRYrns/6VyKsBOlXqKh/a5aE/O1uBKyKbN9CplHjfGDt7yd8RA907zmM5RToosazEuZQHyJijpYy1VxCeSaONcbb2aTbXzyzWur838jW6gVtcD6Z3yDW73U6Mt68lTNR0py9OFyebkGga5Z3niJlB2/GGiobCOZhQa7+1+cPIcy5nUSGVPaA/RpeUF8cgxeRFOJ9Ht68G+RD4u2o7c8KRlPLtsa2yxsFz4Y1n7b8fTtoLHs8V9AmpdpY1mW1qnuOql6NgsiTInnuSgNLdfGCWn6muSt1704RiE/TTCpkpE1ecBFSdC4SIMBTSxLIn9aLUXXhQcak8AMKpDluzHR+jV2aZJwrB3I2H+58yEF4hPu85xfWCMpFIalBcCMH84UTwemGexAKmnTqGAUGWxNofMXX7sbedaN3aaNlPVkKnOCJRXQE6E9286S/20LycVLsRd7+L1ht+t5NpQxZ0fShH+klc4fPlOfK7bg15KOY9CCNCX2B3r0yg+3u+flOBGV26diUu3U/rW4B+P6dm34nIwDBx6qoH2T/fqSsIRCccxM3j0wbFGgR/jaW1PiR/znUU/am71f65AP1ZldZYdfLrs/nfNesKp1JAQwwuKTQ0M75Sibwl/7NEsMkH/rc8PFh5lebX7ksQ/PI9Wzox37we1A5Tw2qvDFYZNL4CTg33IjmBdmBrAJHRcbyLQ2F1jUHvH7QHB8 whQk5l3V qFsLr6xNDgYx3FoM3aHjQw2PoElS3kLDUORZ7N2ApdfRPNRGrAC/c2Es1sDZXfhl20hYzyYYa8DeEfTz7jfV6KA72pedT3EZP6LLfABESLC/NzoW3Yu5wtXlVhInoASgvdTRi8R+ZJiwYy8l98EKpRYW+xZG4wuKTqr7V6DPDy7VKAdHRZy22aB7oZc9ID9vFJveUNhxgbpPvZUqn4a1tk/PfP8i2DwVgaGk0s1LbjBW3tKR8SzDXrWBiRaRGUb7fYy2DHnMBiOO2Qr9TuWsuS0T3zx6Dd9ADw5Wf21zEPHGSzErXco4JEMpicHJ7kwUfvNNHuQF0s/YG/C55YBzhszKkwuntvSJ9ZmCrXi+HcFQXrYg+BrEo9AwEiFrXiw5K4RYLKHtvtP60XNmwdntZqjYYeJcxr6j7MO09YmCEVYU85zZD1UAVEyd5CxlSDFno3ITITkt2nvh5116ZpzObOeFWkRM6IA//6R+72ZAtagMnJJ4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Feb 20, 2025 at 10:36=E2=80=AFPM David Hildenbrand wrote: > > On 20.02.25 10:31, Barry Song wrote: > > On Thu, Feb 20, 2025 at 9:51=E2=80=AFPM David Hildenbrand wrote: > >> > >> On 19.02.25 21:37, Barry Song wrote: > >>> On Thu, Feb 20, 2025 at 7:27=E2=80=AFAM Suren Baghdasaryan wrote: > >>>> > >>>> On Wed, Feb 19, 2025 at 3:25=E2=80=AFAM Barry Song <21cnbao@gmail.co= m> wrote: > >>>>> > >>>>> From: Barry Song > >>>>> > >>>>> userfaultfd_move() checks whether the PTE entry is present or a > >>>>> swap entry. > >>>>> > >>>>> - If the PTE entry is present, move_present_pte() handles folio > >>>>> migration by setting: > >>>>> > >>>>> src_folio->index =3D linear_page_index(dst_vma, dst_addr); > >>>>> > >>>>> - If the PTE entry is a swap entry, move_swap_pte() simply copies > >>>>> the PTE to the new dst_addr. > >>>>> > >>>>> This approach is incorrect because even if the PTE is a swap > >>>>> entry, it can still reference a folio that remains in the swap > >>>>> cache. > >>>>> > >>>>> If do_swap_page() is triggered, it may locate the folio in the > >>>>> swap cache. However, during add_rmap operations, a kernel panic > >>>>> can occur due to: > >>>>> page_pgoff(folio, page) !=3D linear_page_index(vma, address) > >>>> > >>>> Thanks for the report and reproducer! > >>>> > >>>>> > >>>>> $./a.out > /dev/null > >>>>> [ 13.336953] page: refcount:6 mapcount:1 mapping:00000000f43db19c= index:0xffffaf150 pfn:0x4667c > >>>>> [ 13.337520] head: order:2 mapcount:1 entire_mapcount:0 nr_pages_= mapped:1 pincount:0 > >>>>> [ 13.337716] memcg:ffff00000405f000 > >>>>> [ 13.337849] anon flags: 0x3fffc0000020459(locked|uptodate|dirty|= owner_priv_1|head|swapbacked|node=3D0|zone=3D0|lastcpupid=3D0xffff) > >>>>> [ 13.338630] raw: 03fffc0000020459 ffff80008507b538 ffff80008507b= 538 ffff000006260361 > >>>>> [ 13.338831] raw: 0000000ffffaf150 0000000000004000 0000000600000= 000 ffff00000405f000 > >>>>> [ 13.339031] head: 03fffc0000020459 ffff80008507b538 ffff80008507= b538 ffff000006260361 > >>>>> [ 13.339204] head: 0000000ffffaf150 0000000000004000 000000060000= 0000 ffff00000405f000 > >>>>> [ 13.339375] head: 03fffc0000000202 fffffdffc0199f01 ffffffff0000= 0000 0000000000000001 > >>>>> [ 13.339546] head: 0000000000000004 0000000000000000 00000000ffff= ffff 0000000000000000 > >>>>> [ 13.339736] page dumped because: VM_BUG_ON_PAGE(page_pgoff(folio= , page) !=3D linear_page_index(vma, address)) > >>>>> [ 13.340190] ------------[ cut here ]------------ > >>>>> [ 13.340316] kernel BUG at mm/rmap.c:1380! > >>>>> [ 13.340683] Internal error: Oops - BUG: 00000000f2000800 [#1] PR= EEMPT SMP > >>>>> [ 13.340969] Modules linked in: > >>>>> [ 13.341257] CPU: 1 UID: 0 PID: 107 Comm: a.out Not tainted 6.14.= 0-rc3-gcf42737e247a-dirty #299 > >>>>> [ 13.341470] Hardware name: linux,dummy-virt (DT) > >>>>> [ 13.341671] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSB= S BTYPE=3D--) > >>>>> [ 13.341815] pc : __page_check_anon_rmap+0xa0/0xb0 > >>>>> [ 13.341920] lr : __page_check_anon_rmap+0xa0/0xb0 > >>>>> [ 13.342018] sp : ffff80008752bb20 > >>>>> [ 13.342093] x29: ffff80008752bb20 x28: fffffdffc0199f00 x27: 000= 0000000000001 > >>>>> [ 13.342404] x26: 0000000000000000 x25: 0000000000000001 x24: 000= 0000000000001 > >>>>> [ 13.342575] x23: 0000ffffaf0d0000 x22: 0000ffffaf0d0000 x21: fff= ffdffc0199f00 > >>>>> [ 13.342731] x20: fffffdffc0199f00 x19: ffff000006210700 x18: 000= 00000ffffffff > >>>>> [ 13.342881] x17: 6c203d2120296567 x16: 6170202c6f696c6f x15: 662= 866666f67705f > >>>>> [ 13.343033] x14: 6567617028454741 x13: 2929737365726464 x12: fff= f800083728ab0 > >>>>> [ 13.343183] x11: ffff800082996bf8 x10: 0000000000000fd7 x9 : fff= f80008011bc40 > >>>>> [ 13.343351] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : fff= f8000829eebf8 > >>>>> [ 13.343498] x5 : c0000000fffff000 x4 : 0000000000000000 x3 : 000= 0000000000000 > >>>>> [ 13.343645] x2 : 0000000000000000 x1 : ffff0000062db980 x0 : 000= 000000000005f > >>>>> [ 13.343876] Call trace: > >>>>> [ 13.344045] __page_check_anon_rmap+0xa0/0xb0 (P) > >>>>> [ 13.344234] folio_add_anon_rmap_ptes+0x22c/0x320 > >>>>> [ 13.344333] do_swap_page+0x1060/0x1400 > >>>>> [ 13.344417] __handle_mm_fault+0x61c/0xbc8 > >>>>> [ 13.344504] handle_mm_fault+0xd8/0x2e8 > >>>>> [ 13.344586] do_page_fault+0x20c/0x770 > >>>>> [ 13.344673] do_translation_fault+0xb4/0xf0 > >>>>> [ 13.344759] do_mem_abort+0x48/0xa0 > >>>>> [ 13.344842] el0_da+0x58/0x130 > >>>>> [ 13.344914] el0t_64_sync_handler+0xc4/0x138 > >>>>> [ 13.345002] el0t_64_sync+0x1ac/0x1b0 > >>>>> [ 13.345208] Code: aa1503e0 f000f801 910f6021 97ff5779 (d4210000) > >>>>> [ 13.345504] ---[ end trace 0000000000000000 ]--- > >>>>> [ 13.345715] note: a.out[107] exited with irqs disabled > >>>>> [ 13.345954] note: a.out[107] exited with preempt_count 2 > >>>>> > >>>>> Fully fixing it would be quite complex, requiring similar handling > >>>>> of folios as done in move_present_pte. > >>>> > >>>> How complex would that be? Is it a matter of adding > >>>> folio_maybe_dma_pinned() checks, doing folio_move_anon_rmap() and > >>>> folio->index =3D linear_page_index like in move_present_pte() or > >>>> something more? > >>> > >>> My main concern is still with large folios that require a split_folio= () > >>> during move_pages(), as the entire folio shares the same index and > >>> anon_vma. However, userfaultfd_move() moves pages individually, > >>> making a split necessary. > >>> > >>> However, in split_huge_page_to_list_to_order(), there is a: > >>> > >>> if (folio_test_writeback(folio)) > >>> return -EBUSY; > >>> > >>> This is likely true for swapcache, right? However, even for move_pres= ent_pte(), > >>> it simply returns -EBUSY: > >>> > >>> move_pages_pte() > >>> { > >>> /* at this point we have src_folio locked */ > >>> if (folio_test_large(src_folio)) { > >>> /* split_folio() can block */ > >>> pte_unmap(&orig_src_pte); > >>> pte_unmap(&orig_dst_pte); > >>> src_pte =3D dst_pte =3D NULL; > >>> err =3D split_folio(src_folio); > >>> if (err) > >>> goto out; > >>> > >>> /* have to reacquire the folio after it got= split */ > >>> folio_unlock(src_folio); > >>> folio_put(src_folio); > >>> src_folio =3D NULL; > >>> goto retry; > >>> } > >>> } > >>> > >>> Do we need a folio_wait_writeback() before calling split_folio()? > >>> > >>> By the way, I have also reported that userfaultfd_move() has a fundam= ental > >>> conflict with TAO (Cc'ed Yu Zhao), which has been part of the Android= common > >>> kernel. In this scenario, folios in the virtual zone won=E2=80=99t be= split in > >>> split_folio(). Instead, the large folio migrates into nr_pages small = folios. > >> > > Thus, the best-case scenario would be: > >>> > >>> mTHP -> migrate to small folios in split_folio() -> move small folios= to > >>> dst_addr > >>> > >>> While this works, it negates the performance benefits of > >>> userfaultfd_move(), as it introduces two PTE operations (migration in > >>> split_folio() and move in userfaultfd_move() while retry), nr_pages m= emory > >>> allocations, and still requires one memcpy(). This could end up > >>> performing even worse than userfaultfd_copy(), I guess. > >> > > The worst-case scenario would be failing to allocate small folio= s in > >>> split_folio(), then userfaultfd_move() might return -ENOMEM? > >> > >> Although that's an Android problem and not an upstream problem, I'll > >> note that there are other reasons why the split / move might fail, and > >> user space either must retry or fallback to a COPY. > >> > >> Regarding mTHP, we could move the whole folio if the user space-provid= ed > >> range allows for batching over multiple PTEs (nr_ptes), they are in a > >> single VMA, and folio_mapcount() =3D=3D nr_ptes. > >> > >> There are corner cases to handle, such as moving mTHPs such that they > >> suddenly cross two page tables I assume, that are harder to handle whe= n > >> not moving individual PTEs where that cannot happen. > > > > This is a useful suggestion. I=E2=80=99ve heard that Lokesh is also int= erested in > > modifying ART to perform moves at the mTHP granularity, which would req= uire > > kernel modifications as well. It=E2=80=99s likely the direction we=E2= =80=99ll take after > > fixing the current urgent bugs. The current split_folio() really isn=E2= =80=99t ideal. > > > > The corner cases you mentioned are definitely worth considering. Howeve= r, > > once we can perform batch UFFDIO_MOVE, I believe that in most cases, > > the conflict between userfaultfd_move() and TAO will be resolved ? > > Well, as soon as you would have varying mTHP sizes, you'd still run into > the split with TAO. Maybe that doesn't apply with Android today, but I > can just guess that performing sub-mTHP moving would still be required > for GC at some point. With patch v2[1], as discussed in my previous email, I have observed that small folios consistently succeed without crashing. Similarly, mTHP no longer crashes; however, it still returns -EBUSY during the raced time window, even after adding folio_wait_writeback. While I previously mentioned that folio_writeback prevents mTHP from splitting, this is not the only factor. The split_folio() function still returns -EBUSY because folio_get_anon_vma(folio) returns NULL when the folio is not mapped. int split_huge_page_to_list_to_order(struct page *page, struct list_head *l= ist, unsigned int new_order) { anon_vma =3D folio_get_anon_vma(folio); if (!anon_vma) { ret =3D -EBUSY; goto out; } end =3D -1; mapping =3D NULL; anon_vma_lock_write(anon_vma); } Even if mTHP is not from TAO's virtual zone, userfaultfd_move() will still fail when performing sub-mTHP moving in the swap cache case due to: struct anon_vma *folio_get_anon_vma(const struct folio *folio) { ... if (!folio_mapped(folio)) goto out; ... } We likely need to modify split_folio() to support splitting unmapped anon folios within the swap cache or introduce a new function like split_unmapped_anon_folio()? Otherwise, userspace will have to fall back to UFFDIO_COPY or retry. As it stands, I see no way for sub-mTHP to survive moving with the current code and within the existing raced window. For mTHP, there is essentially no difference between returning -EBUSY immediately upon detecting that it is within the swap cache, as proposed in v1. [1] https://lore.kernel.org/linux-mm/20250220092101.71966-1-21cnbao@gmail.c= om/ > > -- > Cheers, > > David / dhildenb > Thanks Barry