From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08D12D43379 for ; Thu, 7 Nov 2024 16:43:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 75B996B0088; Thu, 7 Nov 2024 11:43:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 70B6E6B0089; Thu, 7 Nov 2024 11:43:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5D34B6B008A; Thu, 7 Nov 2024 11:43:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 3DCF76B0088 for ; Thu, 7 Nov 2024 11:43:55 -0500 (EST) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 6909B16022D for ; Thu, 7 Nov 2024 16:43:54 +0000 (UTC) X-FDA: 82759869948.15.289ABC5 Received: from mail.marcansoft.com (marcansoft.com [212.63.210.85]) by imf24.hostedemail.com (Postfix) with ESMTP id 1795218002B for ; Thu, 7 Nov 2024 16:43:46 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=asahilina.net header.s=default header.b=rGXufeGC; dmarc=pass (policy=quarantine) header.from=asahilina.net; spf=pass (imf24.hostedemail.com: domain of lina@asahilina.net designates 212.63.210.85 as permitted sender) smtp.mailfrom=lina@asahilina.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1730997662; a=rsa-sha256; cv=none; b=QlQ3Vdsi+BD1PjPb5ZhnWttQsqiCyggoWSELdvUFVki0dEy6l5q7wL7lDT/XXQJRtWXghJ uJpjel6X/gW2eInmFjYOc/KfyVCmWuvx9TOUhvRkqmknTg/zC4oOI9iXOT0cqKtOAQ6B8a 495QnvOmrh6OWElNmauMKML3fDdKYCs= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=asahilina.net header.s=default header.b=rGXufeGC; dmarc=pass (policy=quarantine) header.from=asahilina.net; spf=pass (imf24.hostedemail.com: domain of lina@asahilina.net designates 212.63.210.85 as permitted sender) smtp.mailfrom=lina@asahilina.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1730997662; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0wx6nMvxOMtD8zWjiCUYGMz0OIUE7ZZUYN9dv7RIrpA=; b=xKs6sXj43wXKgd0nMvoBatRMhygce4RJP724uLsQbTLvfCtxL9GzZxSHnMx3boH9NTBGHV ovtxxrh8pt9FPT4bN1I2ma3l5FD0FQJdVWaSF+50H35bsXuD5woxPSPl69L/9vsdcI1AN+ 9Pp/Wf0JAfVyM8AnyxpEJQnhlsblAE8= Received: from [127.0.0.1] (localhost [127.0.0.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (No client certificate requested) (Authenticated sender: lina@asahilina.net) by mail.marcansoft.com (Postfix) with ESMTPSA id B118241A48; Thu, 7 Nov 2024 16:43:48 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=asahilina.net; s=default; t=1730997829; bh=MZNJSWYVCpMEpguf8qtSqYK+wdgMTcBu1I1DBUNzm2g=; h=Date:Subject:To:Cc:References:From:In-Reply-To; b=rGXufeGCPMMvi1AkgS6SWE/OsSvNvAd3rkKnQOgza45SpQjKtj/3gy0G6DVt7d/Hr Y3HIUZ9VY4avkjRatiwP3Yme6Bjo+rzMevCvF8Cq1v5USjUH6DCFI+ZZHtZ15G1A9L pPKFRq4GOBwlYFcSSKmi7acSXUV/rC3Qmlu/v/Eqn5YXaHWFg7XrrLkbF3WOXw0/wq s2ocrae8zX0gpZQ3AEaVwJa6aYrK+aYfiPkLeCnfwPX4SFx+MPmRKDIfD1oEfk71kZ UpbCjQiB+QXyaoD73UUyVUgnwwpy11YcgdZxmJQa4GZOZ4Cgpp0h0MzBAOJlJhuphN XN3rjqI0XUjEw== Message-ID: <2d8380b9-3d03-4263-b5bf-7e0227c83ba9@asahilina.net> Date: Fri, 8 Nov 2024 01:43:48 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH] mm: Fix __wp_page_copy_user fallback path for remote mm To: David Hildenbrand , Andrew Morton Cc: Sergio Lopez Pascual , linux-mm@kvack.org, linux-kernel@vger.kernel.org, asahi@lists.linux.dev References: <20241101-mm-remote-pfn-v1-1-080b609270b7@asahilina.net> Content-Language: en-US From: Asahi Lina In-Reply-To: Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit X-Stat-Signature: h8gf4u9oabfmugb67at1u4m5ggpxt47g X-Rspamd-Queue-Id: 1795218002B X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1730997826-384060 X-HE-Meta: U2FsdGVkX18I6zNKpWmFx2TbZJSpX5YDNsGmFV7uczDyf18Fx/wgfIH/wSBEq263ROXT4tvZ2UcEqDiiysWRkG+XvCyxj1OsPAo7Xja5+a3/CYp+3EwH52rwqf+K1dRgP61f/1SZbIixn4dNrZhM2DReBpiCq+43CFayaZjuCM9pB0ptUJ++99AoXmhKPonHO6dajf1UkH3uLMNxU9RSzSZKlZsuJIWN1nhzaJ9fZeUDeab7n/ePn4HFXQ8wGM0IkKSgdh7lXTx+XAZwXa/Q+wlyTBu/7IsVM/be+v9D2rMgKOZYl5rBZolmQ+bg+NzznOyPlX4NNxLriDhucrukDj0qP2H3HwKu3y3vxZx7a9lh6k/AO7eoICsxfXuBxKT7twaSHBKo9lIZui8vHA1Kyb/WOvof9W99mOh9d0hdOgpvFEigP15KhtNtr+XL48fmkJT+jdGrFWQNj31sv7Bixme3CKSwLnowSKWW4vnR3VOZ3rrwNYhbtrm1f8QWOdOsIHlKXAkLXLvQgMo/V3zcaby319nh4Lj8q0tMw0U+MdG08wibFzpaI/65Ax02R2n9eD8KEMFQ9UyvK/Gggm+0YvsHPFy/K5azNbU+SjZ+Y/HOT/ZW3OtHjAar54xvfxPga8tvi9xgCTskrufdn76SAeZLIDJXcUnIT2A6bUhKYKhheY9kvVA2dUop2LOFCSUghvolm8UEB2xt9jnKRlg0WSnlI28geX+ru0DSnN3gOPyF3SDw6DNGuf9WMt54bTxSBnipfbz8Pzpf+soVX/HlMnra8b9O6sxtQYNyQ4YRAwGvxYDPkA2dfCBeiEcaC661vORW2QQT1pUkmcrMaLKeS62GG4CaNrl0/tOyLXPMfbRsgSYkVHBr+3ygZXiPeUFkyhrRxY/s8+HxC+HyeSO4yGgpiCt117djmYajfq13pCDpRsyJQ/da0paYIydODUOoTzAKpJ7zLOWrHN511km +4o7IwVr LVymTRTeKcGsQbWVREOkZFJYNLu4hYpXfXHxt584gYr55G1lJbpUP+63ble3356b+eiF7+lyT8L94fWlEt/vOylTq4Y8MneEUmI34Vof0PCqYOeG3zg6RdqICXOkBziPGM7We3FM7EAhLwBYc8DjbAFKPhB6lF4NfYiNgAYW8mGP91+JhXcbp32yBKOgnAOi8W8yy6/qFVaMGgp7DVqZUzwaOGymqL/GvCIfH/1FSHcrsAgbLKhf4fsXkI9h3UdpLWj/FjGcP+DXnfnheMFj8K8x2gc4JsCBy3I3iCfpMNJ4OxYdzrbUyEIz6kF/84OqP/1KwcFwlxq4DPGJ8OvyFrmVfi4b3BGal0WZxeiilAyEw0lyNThmko7a/75Sf8zg8J56xY/j/+q/tfn+k6M7XS38D3CuALAO1t4gN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 11/5/24 9:03 PM, David Hildenbrand wrote: > On 01.11.24 13:08, Asahi Lina wrote: >> If the source page is a PFN mapping, we copy back from userspace. >> However, if this fault is a remote access, we cannot use >> __copy_from_user_inatomic. Instead, use access_remote_vm() in this case. >> >> Fixes WARN and incorrect zero-filling when writing to CoW mappings in >> a remote process, such as when using gdb on a binary present on a DAX >> filesystem. >> >> [  143.683782] ------------[ cut here ]------------ >> [  143.683784] WARNING: CPU: 1 PID: 350 at mm/memory.c:2904 >> __wp_page_copy_user+0x120/0x2bc >> [  143.683793] CPU: 1 PID: 350 Comm: gdb Not tainted 6.6.52 #1 >> [  143.683794] Hardware name: linux,dummy-virt (DT) >> [  143.683795] pstate: 61400005 (nZCv daif +PAN -UAO -TCO +DIT -SSBS >> BTYPE=--) >> [  143.683796] pc : __wp_page_copy_user+0x120/0x2bc >> [  143.683798] lr : __wp_page_copy_user+0x254/0x2bc >> [  143.683799] sp : ffff80008272b8b0 >> [  143.683799] x29: ffff80008272b8b0 x28: 0000000000000000 x27: >> ffff000083bad580 >> [  143.683801] x26: 0000000000000000 x25: 0000fffff7fd5000 x24: >> ffff000081db04c0 >> [  143.683802] x23: ffff00014f24b000 x22: fffffc00053c92c0 x21: >> ffff000083502150 >> [  143.683803] x20: 0000fffff7fd5000 x19: ffff80008272b9d0 x18: >> 0000000000000000 >> [  143.683804] x17: ffff000081db0500 x16: ffff800080fe52a0 x15: >> 0000fffff7fd5000 >> [  143.683804] x14: 0000000000bb1845 x13: 0000000000000080 x12: >> ffff80008272b880 >> [  143.683805] x11: ffff000081d13600 x10: ffff000081d13608 x9 : >> ffff000081d1360c >> [  143.683806] x8 : ffff000083a16f00 x7 : 0000000000000010 x6 : >> ffff00014f24b000 >> [  143.683807] x5 : ffff00014f24c000 x4 : 0000000000000000 x3 : >> ffff000083582000 >> [  143.683807] x2 : 0000000000000f80 x1 : 0000fffff7fd5000 x0 : >> 0000000000001000 >> [  143.683808] Call trace: >> [  143.683809]  __wp_page_copy_user+0x120/0x2bc >> [  143.683810]  wp_page_copy+0x98/0x5c0 >> [  143.683813]  do_wp_page+0x250/0x530 >> [  143.683814]  __handle_mm_fault+0x278/0x284 >> [  143.683817]  handle_mm_fault+0x64/0x1e8 >> [  143.683819]  faultin_page+0x5c/0x110 >> [  143.683820]  __get_user_pages+0xc8/0x2f4 >> [  143.683821]  get_user_pages_remote+0xac/0x30c >> [  143.683823]  __access_remote_vm+0xb4/0x368 >> [  143.683824]  access_remote_vm+0x10/0x1c >> [  143.683826]  mem_rw.isra.0+0xc4/0x218 >> [  143.683831]  mem_write+0x18/0x24 >> [  143.683831]  vfs_write+0xa0/0x37c >> [  143.683834]  ksys_pwrite64+0x7c/0xc0 >> [  143.683834]  __arm64_sys_pwrite64+0x20/0x2c >> [  143.683835]  invoke_syscall+0x48/0x10c >> [  143.683837]  el0_svc_common.constprop.0+0x40/0xe0 >> [  143.683839]  do_el0_svc+0x1c/0x28 >> [  143.683841]  el0_svc+0x3c/0xdc >> [  143.683846]  el0t_64_sync_handler+0x120/0x12c >> [  143.683848]  el0t_64_sync+0x194/0x198 >> [  143.683849] ---[ end trace 0000000000000000 ]--- >> >> Signed-off-by: Asahi Lina >> --- >>   mm/memory.c | 7 ++++++- >>   1 file changed, 6 insertions(+), 1 deletion(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index >> 3ccee51adfbbd007b24331fe6874265f231a877b..dba25d9734063ac02cdaeb0a5cd5432473f6372e 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -3081,13 +3081,18 @@ static inline int __wp_page_copy_user(struct >> page *dst, struct page *src, >>               update_mmu_cache_range(vmf, vma, addr, vmf->pte, 1); >>       } >>   +    /* If the mm is a remote mm, copy in the page using >> access_remote_vm() */ >> +    if (current->mm != mm) { >> +        if (access_remote_vm(mm, (unsigned long)uaddr, kaddr, >> PAGE_SIZE, 0) != PAGE_SIZE) > > access_remote_vm() will do a mmap_read_lock_killable() and then call > into get_user_page_vma_remote() -- fortunately read-access, otherwise > we'd be in trouble :) . > > So we should already be holding the mmap read lock from the previous > access_remote_vm() users (who we end up here) ... doesn't this complain > with lockdep about recursive locking? > > I keep forgetting locking rules, so I might just be wrong. You're right, this complains with lockdep: [ 23.154031] [ 23.154093] ============================================ [ 23.154193] WARNING: possible recursive locking detected [ 23.154229] 6.6.52 #2 Not tainted [ 23.154270] -------------------------------------------- [ 23.154306] gdb/349 is trying to acquire lock: [ 23.154343] ffff0000862e3450 (&mm->mmap_lock){++++}-{3:3}, at: __access_remote_vm+0x3c/0x3a8 [ 23.154431] [ 23.154431] but task is already holding lock: [ 23.154474] ffff0000862e3450 (&mm->mmap_lock){++++}-{3:3}, at: __access_remote_vm+0x3c/0x3a8 [ 23.154553] [ 23.154553] other info that might help us debug this: [ 23.154598] Possible unsafe locking scenario: [ 23.154598] [ 23.154641] CPU0 [ 23.154665] ---- [ 23.154685] lock(&mm->mmap_lock); [ 23.154712] lock(&mm->mmap_lock); [ 23.154741] [ 23.154741] *** DEADLOCK *** [ 23.154741] [ 23.154790] May be due to missing lock nesting notation [ 23.154790] [ 23.154838] 2 locks held by gdb/349: [ 23.154868] #0: ffff0000835b53f8 (sb_writers#4){.+.+}-{0:0}, at: vfs_write+0x84/0x2e0 [ 23.154945] #1: ffff0000862e3450 (&mm->mmap_lock){++++}-{3:3}, at: __access_remote_vm+0x3c/0x3a8 [ 23.155023] [ 23.155023] stack backtrace: [ 23.155060] CPU: 5 PID: 349 Comm: gdb Not tainted 6.6.52 #2 [ 23.155112] Hardware name: linux,dummy-virt (DT) [ 23.155148] Call trace: [ 23.155167] dump_backtrace+0x98/0x118 [ 23.155209] show_stack+0x18/0x24 [ 23.155240] dump_stack_lvl+0x60/0xac [ 23.155292] dump_stack+0x18/0x24 [ 23.155320] print_deadlock_bug+0x260/0x34c [ 23.155364] validate_chain+0x364/0x4c0 [ 23.155393] __lock_acquire+0x564/0xb64 [ 23.155420] lock_acquire.part.0+0x9c/0x1bc [ 23.155448] lock_acquire+0x9c/0x140 [ 23.155477] down_read_killable+0x44/0x158 [ 23.155521] __access_remote_vm+0x3c/0x3a8 [ 23.155562] __wp_page_copy_user+0x13c/0x3a8 [ 23.155611] wp_page_copy+0x98/0x4d8 [ 23.155640] do_wp_page+0x290/0x594 [ 23.155671] __handle_mm_fault+0x258/0x25c [ 23.155712] handle_mm_fault+0x64/0x1f0 [ 23.155755] faultin_page+0x64/0x138 [ 23.155798] __get_user_pages+0x11c/0x340 [ 23.155843] get_user_pages_remote+0xc4/0x404 [ 23.155895] __access_remote_vm+0xf4/0x3a8 [ 23.155922] access_remote_vm+0x10/0x1c [ 23.155952] mem_rw.isra.0+0xc4/0x218 [ 23.155996] mem_write+0x18/0x24 [ 23.156023] vfs_write+0xa4/0x2e0 [ 23.156066] ksys_pwrite64+0x7c/0xc0 [ 23.156109] __arm64_sys_pwrite64+0x20/0x2c [ 23.156152] invoke_syscall+0x48/0x10c [ 23.156196] el0_svc_common.constprop.0+0x40/0xe0 [ 23.156249] do_el0_svc+0x1c/0x28 [ 23.156293] el0_svc+0x54/0x140 [ 23.156334] el0t_64_sync_handler+0x120/0x12c [ 23.156384] el0t_64_sync+0x194/0x198 I guess the locking implementation is recursive so that's why this didn't actually deadlock... I'm not sure what the right way to do this is then. The underlying reason why the fallback code is being called is that do_wp_page() calls vm_normal_page(), which returns NULL for VM_PFNMAP pages. So vmf->page is NULL and __wp_page_copy_user has to use the fallback path. However, the reason GUP works is that follow_page_pte() and friends have a specific fallback path for the pte_devmap() case that grabs a struct page anyway. Maybe similar logic should be in do_wp_page() so it can grab a struct page for PFN mappings too? Or if the problem is just the lock, would just eliding the locking work? I guess that only works if all the calls into wp_page_copy() are guaranteed to hold the mmap lock already, but I don't know if that is true... ~~ Lina