From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2E291C021B1 for ; Wed, 19 Feb 2025 18:58:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9FB4A28025D; Wed, 19 Feb 2025 13:58:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 983B528025B; Wed, 19 Feb 2025 13:58:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7614228025D; Wed, 19 Feb 2025 13:58:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 55BFE28025B for ; Wed, 19 Feb 2025 13:58:52 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id C56B3140242 for ; Wed, 19 Feb 2025 18:58:51 +0000 (UTC) X-FDA: 83137606062.16.1482A4B Received: from mail-qt1-f170.google.com (mail-qt1-f170.google.com [209.85.160.170]) by imf30.hostedemail.com (Postfix) with ESMTP id A5F118000D for ; Wed, 19 Feb 2025 18:58:49 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MmEzbcOF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of surenb@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739991529; a=rsa-sha256; cv=none; b=qtSKuM2uEyEghxuG/LQYukgTSsK+vlqR1sMH5VA4lakYgTB+hqNb3/vDKy5G3vWd2TbhU0 HXfVuvHw27gEEpj1ZSTNnkSwUr57v2xmwv5o/UtaOu8m8koGjnNqtxeYhKv4d1bj2nKMcD CWLjJxg8qHNc0h6jVMJo0LKfeU+A7AY= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MmEzbcOF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of surenb@google.com designates 209.85.160.170 as permitted sender) smtp.mailfrom=surenb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739991529; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=yV6Dp9R0UqZgfUejgg99jyvJGgh21w1gRN9FOQxaLog=; b=zMwNluQmXjX4Laz0ay7jVS56j5ZRhaHmVwJKEdTsTYzuSVVz3Ihg0jAwR7VoIFa5ibWoPw iepLu09zOTrhhXy+JwtU1z2wTNgKGmHC63nttPWvWTt3EOzeZAHU3fgpNM0JA9O0k4LjaI 5PEecRjkgyJBI2Co1ftfgiSNMhY+83A= Received: by mail-qt1-f170.google.com with SMTP id d75a77b69052e-471f1dd5b80so24071cf.1 for ; Wed, 19 Feb 2025 10:58:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739991529; x=1740596329; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=yV6Dp9R0UqZgfUejgg99jyvJGgh21w1gRN9FOQxaLog=; b=MmEzbcOFGrPNXEP92nMpaiXHQyjIDtG7qJO840W/m+J+z3FmawFphy4rTmv74AykCU 3gSvmsnSwMz8HVJRsbaAq3EQWDaYCfS8aXUSarBSSsKuON8QKwBpRAy7hMB8lwccjShb nN9NnxiNUhFHjzb7nA6wRkizflMim6mG8CYiCXW3+0A2g5qZ3a71DN3KGahn37soX9+e ZUp5/M7cjJTuF2gSxRSX0fXhX0xAtY7fVnj0r7kZtdBmje9briXX6hL8AunryrO9H8g0 ImxvXmkx+MxjzjUzVIr2FasbPR6EgE9l5zNxtcGJSEFgjTJ7KB28v93r5NJiGAspwK/l pWzw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739991529; x=1740596329; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=yV6Dp9R0UqZgfUejgg99jyvJGgh21w1gRN9FOQxaLog=; b=dVSzW6KCqBv2PTxPs+AQVMOiLT3GfJXBAgvNctlbrTIiZg3d5uAMe7QwMeDDVMIPbg vdOtZnJaczWY6Y3DcQMDdad5kuGY/rMKXQWeTHVcLQNzxT/3Mt0r+jqLfKQs18Pt/V2M qqcbDVbksfLGXwi6Z2v07J+56DbJNCPAtCH20kmnW+Rl158PJai5t4y1HHwiB4j3ZHiL Dj3/8SAqZICveqlO+5LpiM13AiEmA9cbb3J+tawzeHt29ZiJqxUM+Hf6FYm91YrBT3Xk NRrjUCplKa64xIIp6JhCw+J6570GjBqC7ggfw1lqODcf7GyhYK0JA4AKWezEMA6Rr+9F KsYA== X-Forwarded-Encrypted: i=1; AJvYcCX9dz9YP1C250XODtx3bbLy1q5VZupZzOz1TPCn8Teu70UQUSulosXExd0+sUw0Fk1VGAryG4Lwhw==@kvack.org X-Gm-Message-State: AOJu0YwhXgg9LBm9K6LSmyIdtJ48IP76iPg+FYMuXwoUDQ6Vf0GISsmb Ns0Gi50eX+v0K0yRAIQocrESOTC0v+sjLcoMEOIKwHcW3m9wh7p+wSWMr1cOr19JmuWXujNmy7v +UhGGkz0K3F9ZMJoWVvJqiZff/PSul+V4zUhu X-Gm-Gg: ASbGnctdsh/fF9UDLE6uedbK5C+auMIZxtsVVdv9YKVt1ZRvsz+R9J9fQIF+FNxKWF0 ZKF0k2gWNBB/+M6jwHxS77+xjYqh6YRVOM3HunNwBFTVbidOx/RW7A+BA7tQOVZmOx4Oy157N X-Google-Smtp-Source: AGHT+IEEAB0BHyuUOz64t5FnXCl3xFBOJJY0di2gbvVDhueCJHmMq/ttstvjIkoUnNhv36OqJuezAhfxTDhMjz8xXGc= X-Received: by 2002:a05:622a:15c1:b0:472:7e8:a788 with SMTP id d75a77b69052e-47215bdb313mr68301cf.12.1739991528492; Wed, 19 Feb 2025 10:58:48 -0800 (PST) MIME-Version: 1.0 References: <20250219112519.92853-1-21cnbao@gmail.com> In-Reply-To: From: Suren Baghdasaryan Date: Wed, 19 Feb 2025 10:58:36 -0800 X-Gm-Features: AWEUYZns0E6GpI5_bmin95J6es9s90FSblqZWapaJGWbPq3gylXB90Dy1g5muLk Message-ID: Subject: Re: [PATCH RFC] mm: Fix kernel BUG when userfaultfd_move encounters swapcache To: David Hildenbrand Cc: Barry Song <21cnbao@gmail.com>, linux-mm@kvack.org, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, zhengtangquan@oppo.com, Barry Song , Andrea Arcangeli , Al Viro , Axel Rasmussen , Brian Geffon , Christian Brauner , Hugh Dickins , Jann Horn , Kalesh Singh , "Liam R . Howlett" , Lokesh Gidra , Matthew Wilcox , Michal Hocko , Mike Rapoport , Nicolas Geoffray , Peter Xu , Ryan Roberts , Shuah Khan , ZhangPeng Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: A5F118000D X-Stat-Signature: ajf6mgufhpnfxeqj359xdqbsxhp7sz6m X-Rspam-User: X-HE-Tag: 1739991529-508685 X-HE-Meta: U2FsdGVkX19amEbL3p5olt375DB1tFytmOfL6hlG/UTPlqvCYTHdzsStQXvQVRefcKzbSWEby4w2D1FxiMyV3t7Uv8Lz7z8NHNmkEPsip0KVP5+GNIjZKzaEhSmtBIHvfkFo9TXLBU/PbINITpr48G/kVYHobDebKMTraBfIYuZtmiH6yOW5TQXhAI/Q/L61YEWWAV/ol6jOUndUrsXjKuYd26rx6moqXXnfoptBQqOAQBH7B7vDBuppBfaG10N0J0IN8yQd3HE3LeXgCIB66W8yYP037zZEJ14tQFvN15pRv8xqn/qifrzgLV2Q/svgvREmAc4zXHtHXo46W/ssVkWB6PZ45n0DQ5YwvH3aRldaeqgxRM1L6cOxCgmGkEaHceD5+JOkWUpye92zEX5BmoMJyVi+mhNfgSaSBjdbuCXstKB6m/Pj69agkQ/NrN+2r7ubD58BBKLpUXS+D31hXEmSq2o0jrWArGx1bZG3kMgODebtV9MR4Ab+tn7GVbsaqWYqsrUbJO1NszoKQjJhsLCaMCv+hPBh1DQVnWCzbTvqyqCCo5j0LzgxPkdZwfrRCqO99zvskrFWLjABicXg8HWDKbW3RpEc6n4OGjX2fnkKpEmtnook+KKlTLqxGOEWw5fRzgQ97yQOu6pQP3MIQCr5Vc15QlWjBmfLNuNnnZhpAXlnFS671/0h8lN28SV4wBp+jh6eqjDe5aaJyfVW0h53NAfZMWSm4VysfP/esCwdIXEz16Jp0M4N9IBw1USBRLOMG+fV4hIM9gG5n28Xr1bBlxCZZanUYWJID4buVowOe2nsSFG+bYXjACaZJc2Y0vE/meBWIK7momHRgHJ/fQY6uBmUq00Jvb37aLCH3xyPe4I/zqRwzINW2t9utcdwi8OBeI6e4AxQCNL5R0Ackc9UzMgmAajTJ9yN1tsi+UDJDAvpUOd4RnewylCEvdyYdnI0cp/XMlW+jREA3tW T1JT1QDN IsZyaF7Z7E1aQ5yXnHDoHtFCWqP4da4CkJCjKRvsvcYwTImfNkO5c/NLwsQwzRNAe1/h5KAap8RIbaqRUiMTChyPyN447P5D9WJOxai4VzCbpqi/+WTfVYULSLx5+nKDZyfSVXCaP/T97FYutJpA/eBQO1QEKCSMWv/k+4+zlri2iG6C1sFnFxmzYetA5LT0Al1sq7Rgu3NjQ4HsoAr4B3gVT0Nad0DoSv0oh1ZAX3yTzGDFjtsd/H9WOUoY1YDCz3oweJwTRt71GoOIBnQhdVvNW/RJuBB4pyUHCsfYiWK/tvI95zPaaZ+RDVQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Feb 19, 2025 at 10:30=E2=80=AFAM David Hildenbrand wrote: > > On 19.02.25 19:26, Suren Baghdasaryan wrote: > > On Wed, Feb 19, 2025 at 3:25=E2=80=AFAM Barry Song <21cnbao@gmail.com> = wrote: > >> > >> From: Barry Song > >> > >> userfaultfd_move() checks whether the PTE entry is present or a > >> swap entry. > >> > >> - If the PTE entry is present, move_present_pte() handles folio > >> migration by setting: > >> > >> src_folio->index =3D linear_page_index(dst_vma, dst_addr); > >> > >> - If the PTE entry is a swap entry, move_swap_pte() simply copies > >> the PTE to the new dst_addr. > >> > >> This approach is incorrect because even if the PTE is a swap > >> entry, it can still reference a folio that remains in the swap > >> cache. > >> > >> If do_swap_page() is triggered, it may locate the folio in the > >> swap cache. However, during add_rmap operations, a kernel panic > >> can occur due to: > >> page_pgoff(folio, page) !=3D linear_page_index(vma, address) > > > > Thanks for the report and reproducer! > > > >> > >> $./a.out > /dev/null > >> [ 13.336953] page: refcount:6 mapcount:1 mapping:00000000f43db19c in= dex:0xffffaf150 pfn:0x4667c > >> [ 13.337520] head: order:2 mapcount:1 entire_mapcount:0 nr_pages_map= ped:1 pincount:0 > >> [ 13.337716] memcg:ffff00000405f000 > >> [ 13.337849] anon flags: 0x3fffc0000020459(locked|uptodate|dirty|own= er_priv_1|head|swapbacked|node=3D0|zone=3D0|lastcpupid=3D0xffff) > >> [ 13.338630] raw: 03fffc0000020459 ffff80008507b538 ffff80008507b538= ffff000006260361 > >> [ 13.338831] raw: 0000000ffffaf150 0000000000004000 0000000600000000= ffff00000405f000 > >> [ 13.339031] head: 03fffc0000020459 ffff80008507b538 ffff80008507b53= 8 ffff000006260361 > >> [ 13.339204] head: 0000000ffffaf150 0000000000004000 000000060000000= 0 ffff00000405f000 > >> [ 13.339375] head: 03fffc0000000202 fffffdffc0199f01 ffffffff0000000= 0 0000000000000001 > >> [ 13.339546] head: 0000000000000004 0000000000000000 00000000fffffff= f 0000000000000000 > >> [ 13.339736] page dumped because: VM_BUG_ON_PAGE(page_pgoff(folio, p= age) !=3D linear_page_index(vma, address)) > >> [ 13.340190] ------------[ cut here ]------------ > >> [ 13.340316] kernel BUG at mm/rmap.c:1380! > >> [ 13.340683] Internal error: Oops - BUG: 00000000f2000800 [#1] PREEM= PT SMP > >> [ 13.340969] Modules linked in: > >> [ 13.341257] CPU: 1 UID: 0 PID: 107 Comm: a.out Not tainted 6.14.0-r= c3-gcf42737e247a-dirty #299 > >> [ 13.341470] Hardware name: linux,dummy-virt (DT) > >> [ 13.341671] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS B= TYPE=3D--) > >> [ 13.341815] pc : __page_check_anon_rmap+0xa0/0xb0 > >> [ 13.341920] lr : __page_check_anon_rmap+0xa0/0xb0 > >> [ 13.342018] sp : ffff80008752bb20 > >> [ 13.342093] x29: ffff80008752bb20 x28: fffffdffc0199f00 x27: 000000= 0000000001 > >> [ 13.342404] x26: 0000000000000000 x25: 0000000000000001 x24: 000000= 0000000001 > >> [ 13.342575] x23: 0000ffffaf0d0000 x22: 0000ffffaf0d0000 x21: fffffd= ffc0199f00 > >> [ 13.342731] x20: fffffdffc0199f00 x19: ffff000006210700 x18: 000000= 00ffffffff > >> [ 13.342881] x17: 6c203d2120296567 x16: 6170202c6f696c6f x15: 662866= 666f67705f > >> [ 13.343033] x14: 6567617028454741 x13: 2929737365726464 x12: ffff80= 0083728ab0 > >> [ 13.343183] x11: ffff800082996bf8 x10: 0000000000000fd7 x9 : ffff80= 008011bc40 > >> [ 13.343351] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff80= 00829eebf8 > >> [ 13.343498] x5 : c0000000fffff000 x4 : 0000000000000000 x3 : 000000= 0000000000 > >> [ 13.343645] x2 : 0000000000000000 x1 : ffff0000062db980 x0 : 000000= 000000005f > >> [ 13.343876] Call trace: > >> [ 13.344045] __page_check_anon_rmap+0xa0/0xb0 (P) > >> [ 13.344234] folio_add_anon_rmap_ptes+0x22c/0x320 > >> [ 13.344333] do_swap_page+0x1060/0x1400 > >> [ 13.344417] __handle_mm_fault+0x61c/0xbc8 > >> [ 13.344504] handle_mm_fault+0xd8/0x2e8 > >> [ 13.344586] do_page_fault+0x20c/0x770 > >> [ 13.344673] do_translation_fault+0xb4/0xf0 > >> [ 13.344759] do_mem_abort+0x48/0xa0 > >> [ 13.344842] el0_da+0x58/0x130 > >> [ 13.344914] el0t_64_sync_handler+0xc4/0x138 > >> [ 13.345002] el0t_64_sync+0x1ac/0x1b0 > >> [ 13.345208] Code: aa1503e0 f000f801 910f6021 97ff5779 (d4210000) > >> [ 13.345504] ---[ end trace 0000000000000000 ]--- > >> [ 13.345715] note: a.out[107] exited with irqs disabled > >> [ 13.345954] note: a.out[107] exited with preempt_count 2 > >> > >> Fully fixing it would be quite complex, requiring similar handling > >> of folios as done in move_present_pte. > > > > How complex would that be? Is it a matter of adding > > folio_maybe_dma_pinned() checks, doing folio_move_anon_rmap() and > > folio->index =3D linear_page_index like in move_present_pte() or > > something more? > > If the entry is pte_swp_exclusive(), and the folio is order-0, it cannot > be pinned and we may be able to move it I think. > > So all that's required is to check pte_swp_exclusive() and the folio size= . > > ... in theory :) Not sure about the swap details. Looking some more into it, I think we would have to perform all the folio and anon_vma locking and pinning that we do for present pages in move_pages_pte(). If that's correct then maybe treating swapcache pages like a present page inside move_pages_pte() would be simpler? > > -- > Cheers, > > David / dhildenb >