From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 117DAC433F5 for ; Tue, 8 Feb 2022 01:31:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5288F6B0075; Mon, 7 Feb 2022 20:31:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 4B17C6B0078; Mon, 7 Feb 2022 20:31:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 32C5A6B007B; Mon, 7 Feb 2022 20:31:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0216.hostedemail.com [216.40.44.216]) by kanga.kvack.org (Postfix) with ESMTP id 20DAF6B0075 for ; Mon, 7 Feb 2022 20:31:51 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CF23F92E0F for ; Tue, 8 Feb 2022 01:31:50 +0000 (UTC) X-FDA: 79117885980.24.76D7291 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf10.hostedemail.com (Postfix) with ESMTP id 34A8FC0003 for ; Tue, 8 Feb 2022 01:31:50 +0000 (UTC) Received: from smtp.kernel.org (relay.kernel.org [52.25.139.140]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by dfw.source.kernel.org (Postfix) with ESMTPS id F26E561217; Tue, 8 Feb 2022 01:31:48 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id EBBC3C004E1; Tue, 8 Feb 2022 01:31:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1644283908; bh=FVFHKY6Lf5v9k+9f7XtvYdS9xaoSxDGnNdiLfTk98cU=; h=Date:Subject:To:Cc:References:From:In-Reply-To:From; b=ACca6sfW0QhhOG37xqghFUEPFAHu4/cLZldK6AHO0fxPfRvY3L4hh+OEOPczU4+qh tcWFYCnMamYIhnLcfqrVMmk1kNOg/FfCASuXegdI8zfobB8vFXd4z9xhtiNfVVIt0U 1PVZBcrQkhAIXkJ2bGHilOJ9ZYhfsCDevYyYwYcz8Zx/Ze497rwvb0xgAsPWok5ZQU 0FYOGpRGw0tB5CEScLBU+nN+3FcE+gQuRbB3qXgnrcDhYNi2nKQdq8h+8oH68CXwuG rezvbXBxOGpu+UU2lXOLKqYBownzcqmsEnhJObimWYidnnd5BJd2fk35E45dwoNQib mmLcy1YKo5msA== Message-ID: <6ba06196-0756-37a4-d6c4-2e47e6601dcd@kernel.org> Date: Mon, 7 Feb 2022 17:31:45 -0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.5.0 Subject: Re: [PATCH 00/35] Shadow stacks for userspace Content-Language: en-US To: "Edgecombe, Rick P" , "hjl.tools@gmail.com" , "David.Laight@aculab.com" , Adrian Reber , Cyrill Gorcunov , Eugene Syromiatnikov , Dmitry Safonov <0x7f454c46@gmail.com> Cc: "bsingharora@gmail.com" , "hpa@zytor.com" , "Syromiatnikov, Eugene" , "peterz@infradead.org" , "rdunlap@infradead.org" , "keescook@chromium.org" , "Eranian, Stephane" , "kirill.shutemov@linux.intel.com" , "dave.hansen@linux.intel.com" , "linux-mm@kvack.org" , "fweimer@redhat.com" , "nadav.amit@gmail.com" , "jannh@google.com" , "kcc@google.com" , "linux-arch@vger.kernel.org" , "pavel@ucw.cz" , "oleg@redhat.com" , "Yang, Weijiang" , "bp@alien8.de" , "arnd@arndb.de" , "Moreira, Joao" , "tglx@linutronix.de" , "mike.kravetz@oracle.com" , "x86@kernel.org" , "linux-doc@vger.kernel.org" , "Dave.Martin@arm.com" , "john.allen@amd.com" , "mingo@redhat.com" , "Shankar, Ravi V" , "corbet@lwn.net" , "linux-kernel@vger.kernel.org" , "linux-api@vger.kernel.org" , "gorcunov@gmail.com" References: <87fsozek0j.ffs@tglx> <3421da7fc8474b6db0e265b20ffd28d0@AcuMS.aculab.com> <9f948745435c4c9273131146d50fe6f328b91a78.camel@intel.com> From: Andy Lutomirski In-Reply-To: <9f948745435c4c9273131146d50fe6f328b91a78.camel@intel.com> Content-Type: text/plain; charset=UTF-8; format=flowed X-Rspamd-Queue-Id: 34A8FC0003 X-Rspam-User: Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=ACca6sfW; spf=pass (imf10.hostedemail.com: domain of luto@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=luto@kernel.org; dmarc=pass (policy=none) header.from=kernel.org X-Stat-Signature: 7a63afae61ycctb6wiqscb4rrfpwpu7t X-Rspamd-Server: rspam04 X-HE-Tag: 1644283910-771768 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/5/22 12:15, Edgecombe, Rick P wrote: > On Sat, 2022-02-05 at 05:29 -0800, H.J. Lu wrote: >> On Sat, Feb 5, 2022 at 5:27 AM David Laight >> wrote: >>> >>> From: Edgecombe, Rick P >>>> Sent: 04 February 2022 01:08 >>>> Hi Thomas, >>>> >>>> Thanks for feedback on the plan. >>>> >>>> On Thu, 2022-02-03 at 22:07 +0100, Thomas Gleixner wrote: >>>>>> Until now, the enabling effort was trying to support both >>>>>> Shadow >>>>>> Stack and IBT. >>>>>> This history will focus on a few areas of the shadow stack >>>>>> development history >>>>>> that I thought stood out. >>>>>> >>>>>> Signals >>>>>> ------- >>>>>> Originally signals placed the location of the shadow >>>>>> stack >>>>>> restore >>>>>> token inside the saved state on the stack. This was >>>>>> problematic from a >>>>>> past ABI promises perspective. So the restore location >>>>>> was >>>>>> instead just >>>>>> assumed from the shadow stack pointer. This works >>>>>> because in >>>>>> normal >>>>>> allowed cases of calling sigreturn, the shadow stack >>>>>> pointer >>>>>> should be >>>>>> right at the restore token at that time. There is no >>>>>> alternate shadow >>>>>> stack support. If an alt shadow stack is added later >>>>>> we >>>>>> would >>>>>> need to >>>>> >>>>> So how is that going to work? altstack is not an esoteric >>>>> corner >>>>> case. >>>> >>>> My understanding is that the main usages for the signal stack >>>> were >>>> handling stack overflows and corruption. Since the shadow stack >>>> only >>>> contains return addresses rather than large stack allocations, >>>> and is >>>> not generally writable or pivotable, I thought there was a good >>>> possibility an alt shadow stack would not end up being especially >>>> useful. Does it seem like reasonable guesswork? >>> >>> The other 'problem' is that it is valid to longjump out of a signal >>> handler. >>> These days you have to use siglongjmp() not longjmp() but it is >>> still used. >>> >>> It is probably also valid to use siglongjmp() to jump from a nested >>> signal handler into the outer handler. >>> Given both signal handlers can have their own stack, there can be >>> three >>> stacks involved. >=20 > So the scenario is? >=20 > 1. Handle signal 1 > 2. sigsetjmp() > 3. signalstack() > 4. Handle signal 2 on alt stack > 5. siglongjmp() >=20 > I'll check that it is covered by the tests, but I think it should work > in this series that has no alt shadow stack. I have only done a high > level overview of how the shadow stack stuff, that doesn't involve the > kernel, works in glibc. Sounds like I'll need to do a deeper dive. >=20 >>> >>> I think the shadow stack pointer has to be in ucontext - which also >>> means the application can change it before returning from a signal. >=20 > Yes we might need to change it to support alt shadow stacks. Can you > elaborate why you think it has to be in ucontext? I was thinking of > looking at three options for storing the ssp: > - Stored in the shadow stack like a token using WRUSS from the kernel= . > - Stored on the kernel side using a hashmap that maps ucontext or > sigframe userspace address to ssp (this is of course similar to > storing in ucontext, except that the user can=E2=80=99t change the = ssp). > - Stored writable in userspace in ucontext. >=20 > But in this version, without alt shadow stacks, the shadow stack > pointer is not stored in ucontext. This causes the limitation that > userspace can only call sigreturn when it has returned back to a point > where there is a restore token on the shadow stack (which was placed > there by the kernel). I'll reply here and maybe cover multiple things. User code already needs to rewind the regular stack to call sigreturn --=20 sigreturn find the signal frame based on ESP/RSP. So if you call it=20 from the wrong place, you go boom. I think that the Linux SHSTK ABI=20 should have the property that no amount of tampering with just the=20 ucontext and associated structures can cause sigreturn to redirect to=20 the wrong IP -- there should be something on the shadow stack that also=20 gets verified in sigreturn. IIRC the series does this, but it's been a=20 while. The post-sigreturn SSP should be entirely implied by=20 pre-sigreturn SSP (or perhaps something on the shadow stack), so, in the=20 absence of an altshadowstack feature, no ucontext changes should be neede= d. We can also return from a signal or from more than one signal at once,=20 as above, using siglongjmp. It seems like this should Just Work (tm),=20 at least in the absence of altshadowstack. So this leaves altshadowstack. If we want to allow userspace to handle=20 a shstk overflow, I think we need altshadowstack. And I can easily=20 imagine signal handling in a coroutine or user-threading evironment (Go?=20 UMCG or whatever it's called?) wanting this. As noted, this obnoxious=20 Andy person didn't like putting any shstk-related extensions in the FPU=20 state. For better or for worse, altshadowstack is (I think) fundamentally a new=20 API. No amount of ucontext magic is going to materialize an entire=20 shadow stack out of nowhere when someone calls sigaltstack(). So the=20 questions are: should we support altshadowstack from day one and, if so,=20 what should it look like? If we want to be clever, we could attempt to make altstadowstack=20 compatible with RSTORSSP. Signal delivery pushes a restore token to the=20 old stack (hah! what if the old stack is full?) and pushes the RSTORSSP=20 busy magic to the new stack, and sigreturn inverts it. Code that wants=20 to return without sigreturn does it manually with RSTORSSP. (Assuming=20 that I've understood the arcane RSTORSSP sequence right. Intel wins=20 major points for documentation quality here.) Or we could invent our=20 own scheme. In either case, I don't immediately see any reason that the=20 ucontext needs to contain a shadow stack pointer. There's a delightful wart to consider, though. siglongjmp, at least as=20 currently envisioned, can't return off an altshadowstack: the whole=20 point of the INCSSP distance restrictions to to avoid incrementing right=20 off the top of the current stack, but siglongjmp off an altshadowstack=20 fundamentally switches stacks. So either siglongjmp off an=20 altshadowstack needs to be illegal or it needs to work differently. (By=20 incssp-ing to the top of the altshadowstack, then switching, then=20 incssp-ing some more? How does it even find the top of the current=20 altshadowstack?) And the plot thickens if one tries to siglongjmp off=20 two nested altshadowstack-using signals in a single call. Fortunately,=20 since altshadowstack is a new API, it's not entirely crazy to have=20 different rules. So I don't have a complete or even almost complete design in mind, but I=20 think we do need to make a conscious decision either to design this=20 right or to skip it for v1. As for CRIU, I don't think anyone really expects a new kernel, running=20 new userspace that takes advantage of features in the new kernel, to=20 work with old CRIU. Upgrading to a SHSTK kernel should still allow=20 using CRIU with non-SHSTK userspace, but I don't see how it's possible=20 for CRIU to handle SHSTK without updates. We should certainly do our=20 best to make CRIU's life easy, though. This doesn=E2=80=99t mean it can=E2=80=99t switch to a different > shadow stack or handle a nested signal, but it limits the possibility > for calling sigreturn with a totally different sigframe (like CRIU and > SROP attacks do). It should hopefully be a helpful, protective > limitation for most apps and I'm hoping CRIU can be fixed without > removing it. >=20 > I am not aware of other limitations to signals (besides normal shadow > stack enforcement), but I could be missing it. And people's skepticism > is making me want to go back over it with more scrutiny. >=20 >>> In much the same way as all the segment registers can be changed >>> leading to all the nasty bugs when the final 'return to user' code >>> traps in kernel when loading invalid segment registers or executing >>> iret. >=20 > I don't think this is as difficult to avoid because userspace ssp has > its own register that should not be accessed at that point, but I have > not given this aspect enough analysis. Thanks for bringing it up. >=20 >>> >>> Hmmm... do shadow stacks mean that longjmp() has to be a system >>> call? >> >> No. setjmp/longjmp save and restore shadow stack pointer. >> >=20 > It sounds like it would help to write up in a lot more detail exactly > how all the signal and specialer stack manipulation scenarios work in > glibc. >=20