From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72B4AD2E027 for ; Wed, 23 Oct 2024 08:56:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F308C6B0083; Wed, 23 Oct 2024 04:56:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EDF946B0085; Wed, 23 Oct 2024 04:56:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DA7B36B0088; Wed, 23 Oct 2024 04:56:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id BC1C16B0083 for ; Wed, 23 Oct 2024 04:56:52 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 6D2BC807BE for ; Wed, 23 Oct 2024 08:56:37 +0000 (UTC) X-FDA: 82704261024.01.FA28597 Received: from mail-lj1-f175.google.com (mail-lj1-f175.google.com [209.85.208.175]) by imf08.hostedemail.com (Postfix) with ESMTP id 108B3160014 for ; Wed, 23 Oct 2024 08:56:38 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MWfYMttc; spf=pass (imf08.hostedemail.com: domain of dvyukov@google.com designates 209.85.208.175 as permitted sender) smtp.mailfrom=dvyukov@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729673658; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=B8CkFURLW567ianwD0kQbNa+S705JCFmVdVbQVhbttk=; b=OAj1P4GQhmAgyW50dhBTPnErL7k8LlErrNyxCMOYpHWr1sYWWTC1UqhkRBC71tGPcJtsI4 ceynF2TeYXuF1TwqrjHgA/OxhK68WNUZu2rqZTZ+iqcWOH/4lJT/HgFBHsOWlj9zmA0h+T QM0ECzJxPXn+oEOpf/HdgWJ3JaXXlbQ= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729673658; a=rsa-sha256; cv=none; b=MufppoZ6ZuF75P9QMI7rGgLSShJm2H8WAv/+Os337mWkKHtoGHcUwvQVva0hKQGO6bZIX3 roh/+QDHwdlp1wTi+v/Aa7rzT6UuJRJTSST7FB4fyFc+o/uYGFbqTS2XmzS5pVIi/uhdM/ GlGNQeHlJftpc7jp4U7gtVPiBY9lr8w= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=MWfYMttc; spf=pass (imf08.hostedemail.com: domain of dvyukov@google.com designates 209.85.208.175 as permitted sender) smtp.mailfrom=dvyukov@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-lj1-f175.google.com with SMTP id 38308e7fff4ca-2fb470a8b27so5748381fa.1 for ; Wed, 23 Oct 2024 01:56:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1729673808; x=1730278608; darn=kvack.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=B8CkFURLW567ianwD0kQbNa+S705JCFmVdVbQVhbttk=; b=MWfYMttc+niGKwccPkQqAQIs3TYvsh/bU2LTOMgCAIDa3uUvVLO5g2wbSyTEFIsADA AS9eLX+FFhT67/3hgWQar9H3hFJ1OWdUXgU5iIl3bBrCGQRO7pd44PSmwPXGERzeJOw6 Fy4/4behdURq7rEY6E2QJi1z477GcXsGF8JbBu2b1Kw9zN6jVeYv1SAdlmyXwppniWNX HeL6skVlpDET2froz91aA61/JOBick/F7GUB3GS/gh4LtDYDKnCnxdVYfQq3SfCvPa79 k56+RdJ5AXosckLzqHEfuWGk0OHRhygKYmM5zzlKPzMQTOCA3aFxSG6Q3JVa+Uhi1Jqc iEKA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729673808; x=1730278608; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=B8CkFURLW567ianwD0kQbNa+S705JCFmVdVbQVhbttk=; b=mWQyPRYJdNU8y9QZ8bYvfTRiltJocDDc1WWk8WS3HP3vE9UhA/tY2dia3C9DCXNYIc +W51iE9mysdNQoXIAZ0YK8lEM2SwbJw2aMOZ5nfQ6CGkw7a8ekyMNQmTfppr9dAbTOGF rpfOfPek3IaZpoZ6D9s6Xl9iIqBqTeJnTkhLSKZE8q8W6g0rEfQXJoND7ff7qRypSOKI AKl4pYCRwGu7pYTMP5W3L9Icyjn8HuAeUl+ABs1QSafbmC0vaBgU2BI+cbVe3Q1JcWNa iqYR656ScsjzZO/6b7ZDSaVXV1MJwqmVq6ZTayKp9AlgO+iqJF2ROPkSrLrlAMIlqHmA KQvA== X-Forwarded-Encrypted: i=1; AJvYcCUrQJCstu9LnejmWxUl9VXaGYlPKSa1FqE1KD9MZGyCBp5vxINX8rytL+P6fc34AVfd/9IRPf/Wog==@kvack.org X-Gm-Message-State: AOJu0YwLeR+qJFvz5gRMSy3ri6w/2Sil91J5PYfkQQLgWQIvf9UOEZsD 2/k3LcRcabZtDuJRi+cNsXQ5sKvXr26gtbkpMU3vYPlYjDSvKYbV3IOBqjvnVZtVSzDJCg54bj2 vIfjfYR1kCjkjDn3iUYUgKZK8Yl5bMfLP4uEI X-Google-Smtp-Source: AGHT+IFJyix7nHZqbrcmoYi68VrxcCXIDptIBfA2F8NkNPGJDlP/PrFzgRE1M5nnfjhZx6AKFu2zdsjKTjLbclUMUq8= X-Received: by 2002:a2e:b8c1:0:b0:2fb:4a15:6112 with SMTP id 38308e7fff4ca-2fc9cfe7adamr7683491fa.4.1729673808345; Wed, 23 Oct 2024 01:56:48 -0700 (PDT) MIME-Version: 1.0 References: <87a5eysmj1.fsf@mid.deneb.enyo.de> <20241023062417.3862170-1-dvyukov@google.com> <8471d7b1-576b-41a6-91fb-1c9baae8c540@redhat.com> <5a3d3bc8-60db-46d0-b689-9aeabcdb8eab@lucifer.local> In-Reply-To: <5a3d3bc8-60db-46d0-b689-9aeabcdb8eab@lucifer.local> From: Dmitry Vyukov Date: Wed, 23 Oct 2024 10:56:33 +0200 Message-ID: Subject: Re: [PATCH v2 0/5] implement lightweight guard pages To: Lorenzo Stoakes Cc: David Hildenbrand , fw@deneb.enyo.de, James.Bottomley@hansenpartnership.com, Liam.Howlett@oracle.com, akpm@linux-foundation.org, arnd@arndb.de, brauner@kernel.org, chris@zankel.net, deller@gmx.de, hch@infradead.org, ink@jurassic.park.msu.ru, jannh@google.com, jcmvbkbc@gmail.com, jeffxu@chromium.org, jhubbard@nvidia.com, linux-alpha@vger.kernel.org, linux-api@vger.kernel.org, linux-arch@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mips@vger.kernel.org, linux-mm@kvack.org, linux-parisc@vger.kernel.org, mattst88@gmail.com, muchun.song@linux.dev, paulmck@kernel.org, richard.henderson@linaro.org, shuah@kernel.org, sidhartha.kumar@oracle.com, surenb@google.com, tsbogend@alpha.franken.de, vbabka@suse.cz, willy@infradead.org, elver@google.com, Linus Torvalds Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: mym7ch4csw3dsd1icqzj8q6jde4ysdfq X-Rspamd-Queue-Id: 108B3160014 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1729673798-819926 X-HE-Meta: U2FsdGVkX1/ruDj2/vSqRhOl+OFEYqbRV5h0lFktWWKCCHaoZh5lhotl9GI0pQ9WqSDGWxTOaWtlNanpMt4jEGH16KJssytfeZ8QSUirPOb+fG4vR1QVHd1L5yRl2gQgG/Tw1tY+CozJPWOE5F/hgLwQF6H/pIYk5mjink0fux9TIzYFmfE/+t/IHJK3Km+nQfjUPbrnAcxrSo8IXaZ36hRn1Z1efSKRqhOwFNtEgzBm4Yvh5wVeVCggnhXeiWwPWwC9gZYfchY+OFiC+O0g4c40hTB0BsQyZBNUDgdErQcwXKPWvQgNzTdRgJysJyo55J6ivn/6DcTmK6WJkzT07tT2rgZ1LYS0qX0whNbg6j7DdTYfQi1pgkeny2RgCPAINxNOfcRxlhgA6nVPzA8PCtmn4SR2WFAY6h6asmAlxeCLDlxDAmN73dOLRWe04JF/VG7+bNAr5Z7QIz0gjho+wjb8Y3iBdK0H0SlKkNBFw2IJn97w8Ddys8FHC+LGmVc2zVRProJRcp8Wrd37gniHuwXnitCeX5leGXnDQSLGBNJQK0ttryzg+H6w6Rv/v07RlXBaDY7UXz7safY9WdIfvNpZE5yQMZlcnSEiD5/VcPXoQnGc+vBiZ8sUkN7VjE3pftA/zAppZpwjcOCClLq0rBEiGPCVKVoj60LVxwKuOjtPObFIC5HXEIgNHsdZA1Xz16miv9CgGbHLXKMAjT9vGnRVUrT7alms2niMo+69lNIChuCysA360zUq1Ft2ABHUP66GVXEtvVoO5An6Swmp34g7J69E8QDnVWEGNSwTI8BfmscHrOmprKqninX0yHZGjBPdoBtfxEJ9gNJHIhu7JiiknDibBhskSYqq3Mh46QquAbGIw2PHYUUKHUEnoOVhQtYWQLxjum53oVQYMpOT4KSO5fpWGF7bGfJEI2paCEulftjK4S2uwLJmoy4GVc8gov+4uHHw9eVxXSVzgE6 r0Kkt222 kSCgTBoVTgYx86mtHo2oZD0Y5MGQi4hqk93amNfrMBbr8AH7gz8b7ZplQpJq0KlcSRPnMf03WNATMcLeW4yAkLoYR7Unxkhghd5OGm04/H40pqXPVnsC6kR8Ule0BAriMZZyAKatGy83OwUuH/09lthtg5Qsl4Mdgre6cQPf6dY/rdkTQEEYk64D0IKJ61MKpHs6bjPg/AuPNlDFEAQzuXO+LAk3Q3bvccD0S X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, 23 Oct 2024 at 10:12, Lorenzo Stoakes wrote: > > +cc Linus as reference a commit of his below... > > On Wed, Oct 23, 2024 at 09:19:03AM +0200, David Hildenbrand wrote: > > On 23.10.24 08:24, Dmitry Vyukov wrote: > > > Hi Florian, Lorenzo, > > > > > > This looks great! > > Thanks! > > > > > > > What I am VERY interested in is if poisoned pages cause SIGSEGV even when > > > the access happens in the kernel. Namely, the syscall still returns EFAULT, > > > but also SIGSEGV is queued on return to user-space. > > Yeah we don't in any way. > > I think adding something like this would be a bit of its own project. I can totally understand this. > The fault andler for this is in handle_pte_marker() in mm/memory.c, where > we do the following: > > /* Hitting a guard page is always a fatal condition. */ > if (marker & PTE_MARKER_GUARD) > return VM_FAULT_SIGSEGV; > > So basically we pass this back to whoever invoked the fault. For uaccess we > end up in arch-specific code that eventually checks exception tables > etc. and for x86-64 that's kernelmode_fixup_or_oops(). > > There used to be a sig_on_uaccess_err in the x86-specific thread_struct > that let you propagate it but Linus pulled it out in commit 02b670c1f88e > ("x86/mm: Remove broken vsyscall emulation code from the page fault code") > where it was presumably used for vsyscall. > > Of course we could just get something much higher up the stack to send the > signal, but we'd need to be careful we weren't breaking anything doing > it... Can setting TIF_NOTIFY_RESUME and then doing the rest when returning to userspace help here? > I address GUP below. > > > > > > > Catching bad accesses in system calls is currently the weak spot for > > > all user-space bug detection tools (GWP-ASan, libefence, libefency, etc). > > > It's almost possible with userfaultfd, but catching faults in the kernel > > > requires admin capability, so not really an option for generic bug > > > detection tools (+inconvinience of userfaultfd setup/handler). > > > Intercepting all EFAULT from syscalls is not generally possible > > > (w/o ptrace, usually not an option as well), and EFAULT does not always > > > mean a bug. > > > > > > Triggering SIGSEGV even in syscalls would be not just a performance > > > optimization, but a new useful capability that would allow it to catch > > > more bugs. > > > > Right, we discussed that offline also as a possible extension to the > > userfaultfd SIGBUS mode. > > > > I did not look into that yet, but I was wonder if there could be cases where > > a different process could trigger that SIGSEGV, and how to (and if to) > > handle that. > > > > For example, ptrace (access_remote_vm()) -> GUP likely can trigger that. I > > think with userfaultfd() we will currently return -EFAULT, because we call > > get_user_page_vma_remote() that is not prepared for dropping the mmap lock. > > Possibly that is the right thing to do, but not sure :) That's a good corner case. I guess also process_vm_readv/writev. Not triggering the signal in these cases looks like the right thing to do. > > These "remote" faults set FOLL_REMOTE -> FAULT_FLAG_REMOTE, so we might be > > able to distinguish them and perform different handling. > > So all GUP will return -EFAULT when hitting guard pages unless we change > something. > > In GUP we handle this in faultin_page(): > > if (ret & VM_FAULT_ERROR) { > int err = vm_fault_to_errno(ret, flags); > > if (err) > return err; > BUG(); > } > > And vm_fault_to_errno() is: > > static inline int vm_fault_to_errno(vm_fault_t vm_fault, int foll_flags) > { > if (vm_fault & VM_FAULT_OOM) > return -ENOMEM; > if (vm_fault & (VM_FAULT_HWPOISON | VM_FAULT_HWPOISON_LARGE)) > return (foll_flags & FOLL_HWPOISON) ? -EHWPOISON : -EFAULT; > if (vm_fault & (VM_FAULT_SIGBUS | VM_FAULT_SIGSEGV)) > return -EFAULT; > return 0; > } > > Again, I think if we wanted special handling here we'd need to probably > propagate that fault from higher up, but yes we'd need to for one > definitely not do so if it's remote but I worry about other cases. > > > > > -- > > Cheers, > > > > David / dhildenb > > > > Overall while I sympathise with this, it feels dangerous and a pretty major > change, because there'll be something somewhere that will break because it > expects faults to be swallowed that we no longer do swallow. > > So I'd say it'd be something we should defer, but of course it's a highly > user-facing change so how easy that would be I don't know. > > But I definitely don't think a 'introduce the ability to do cheap PROT_NONE > guards' series is the place to also fundmentally change how user access > page faults are handled within the kernel :) Will delivering signals on kernel access be a backwards compatible change? Or will we need a different API? MADV_GUARD_POISON_KERNEL? It's just somewhat painful to detect/update all userspace if we add this feature in future. Can we say signal delivery on kernel accesses is unspecified?