From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68B9FC0219E for ; Tue, 11 Feb 2025 06:47:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8E24228000B; Tue, 11 Feb 2025 01:47:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 891E6280005; Tue, 11 Feb 2025 01:47:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 759D628000B; Tue, 11 Feb 2025 01:47:26 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 5531F280005 for ; Tue, 11 Feb 2025 01:47:26 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D0E264AA49 for ; Tue, 11 Feb 2025 06:47:25 +0000 (UTC) X-FDA: 83106732450.20.274E6EF Received: from mail-lj1-f177.google.com (mail-lj1-f177.google.com [209.85.208.177]) by imf08.hostedemail.com (Postfix) with ESMTP id EABAF160007 for ; Tue, 11 Feb 2025 06:47:23 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iUOmidnd; spf=pass (imf08.hostedemail.com: domain of dvyukov@google.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=dvyukov@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1739256444; a=rsa-sha256; cv=none; b=SggWgmthW3uiRkUR4u73TQ1iuiXn7kPKEE1qBhN3oGhc/C7Fky0Ikig1jWS8AD0H6fmJJw jVOU4U+VL7SrjE4SBkpao99wW06eVUgF3YyYRu0sxdour9gHd2rd9NuxhIRHlIQGL4F6kv /GOUogMqfkqzIJhu+mDz+lH/iJNqXyg= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=iUOmidnd; spf=pass (imf08.hostedemail.com: domain of dvyukov@google.com designates 209.85.208.177 as permitted sender) smtp.mailfrom=dvyukov@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1739256444; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XGbBdlP90w+T3JVAYzJi7gDp1tAle4/AZ9lzBH6+O1Y=; b=J9oVFSpOM1p6RqoJl5e2OhdG5AmjK5hDGad/TbLRfg9BuJDekCROvAy+qqd4gQC+gz8e/O WAIt7BhLJbDbc4j6hUSBnS8LGC9apXnD4cIFcSy7SxazhD5zGHc3+T3QRFQu1w381735KX lp+aIegJNmuA1bHBJi4Gd1B6lFBtwkM= Received: by mail-lj1-f177.google.com with SMTP id 38308e7fff4ca-30613802a59so52669811fa.0 for ; Mon, 10 Feb 2025 22:47:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1739256442; x=1739861242; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=XGbBdlP90w+T3JVAYzJi7gDp1tAle4/AZ9lzBH6+O1Y=; b=iUOmidnd5r23nzZvGg516vdRV6nkERowRYyfrR/e65q0ojWvkcpNjyXeNCVv+9ayVe eXMaKHMCKF36j9kNCeek0zQaLSZFaNJ5lJa0UZVEPmgpt9aVYi68u/RLgMu0Qo9VmbNw c5Dxhcgtpn8Lql4hqGD0vxxK1zbvdZZ7BI4NCHbtwMCwWNrZ8vkW3zJfh9kWyB1rmO1A eisTxMAmvMaHTZES33MFY+4G+Vrpruv4+jQp33wD4VI/YI7G/W27tCGppvbXwF1gv3xa pqQSnPVXF9elW1gbOUD4W8zmy1/2AUgvy+h2JB7VJKJrp0/JqBx9WVoB2AKd1B6p0Xq8 9jcw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1739256442; x=1739861242; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=XGbBdlP90w+T3JVAYzJi7gDp1tAle4/AZ9lzBH6+O1Y=; b=cYHhGkbLSBlr8w5nYIR9XDQdkz3JUbs1C7AFgLyZxCHYi09rIDERM9DJDvYiHdDXrH 2agy5NnaIeKNoMT/aLSSYAIjJ8844Qd7EFeSPeXabiV91dqskRSkcCUTKqbsJX+4yAd+ XyQBZVBlIJy4iA4boDMUDOlBiBCPyUPYK9ShC0Rt0y4DfroRyzCJZ8TKRheGJ4SIwcmQ pqzuJkJ0mJ+ny9gcy5GfNKqr0AbkCkA+2fhNHTau6WBY5EiEnK0uZ6jGk5Yw9W6bjw/d gQu5AhdNPGUT7TOZ0+ZebO6sL4k3Fl8q3Sw/N2241EH/WiG0A54gn0BE1NJ70Gx0u9cC dxiA== X-Forwarded-Encrypted: i=1; AJvYcCV8RJ5/FCqM37sRz4OfIdPCnfVkXvsV3J12YIOGsQoKDpgJtk/EdByqdCWPDc5yUyoBlVq4R+YZKg==@kvack.org X-Gm-Message-State: AOJu0YzEYMZv0hc6inlPe/TawnVG72peUrfmfaVby4+RulQfUaEhLESU A71Mtr6d0QfhNCjSTQwIZy8PEh62yCvm3NoqmZWgHPPq4kqKRyLps3yE5Xf0+biVsJon707+kfe r5w2TYgDPsyHNrNXBvk0KXt+uhx5N9WtTefZB X-Gm-Gg: ASbGnct2TvBUFfcBuu73zIZvSzcpUCU7/q3LaTV3Ldi4QA3JvOisaX7C9EMYM9oKCCH dS4GsyjJBrZ2Ixnu+6eXQ9O4XL1/VrDS6T0L9dZggZTx6zkAr6/oqfIBKyEuwgV1HNsRG9eAd7y Mm5GdPxwNQsHMUa+LVsDqb0jJIEdvj X-Google-Smtp-Source: AGHT+IFmmtCjIuRuWVB391USVy0RjlwmtwqAQj8domrIkUvOzc5ezcrz1DL9GSbeR4Uyx7Fnkfedpb+7qX5njXzx+uw= X-Received: by 2002:a05:651c:198a:b0:308:f84b:6b39 with SMTP id 38308e7fff4ca-308f84b7925mr8863631fa.14.1739256441858; Mon, 10 Feb 2025 22:47:21 -0800 (PST) MIME-Version: 1.0 References: <20240802061318.2140081-4-aruna.ramakrishna@oracle.com> <20250204100134.1843654-1-dvyukov@google.com> In-Reply-To: From: Dmitry Vyukov Date: Tue, 11 Feb 2025 07:47:10 +0100 X-Gm-Features: AWEUYZlR2tfk8ZN2hBQsd_W9mYR4XCl6EKxisiNdOz5aqQ60oJ-OItvGRbhHEm8 Message-ID: Subject: Re: [PATCH v8 3/5] x86/pkeys: Update PKRU to enable all pkeys before XSAVE To: Jeff Xu Cc: aruna.ramakrishna@oracle.com, mathieu.desnoyers@efficios.com, peterz@infradead.org, paulmck@kernel.org, boqun.feng@gmail.com, dave.hansen@linux.intel.com, jannh@google.com, jorgelo@chromium.org, keescook@chromium.org, keith.lucas@oracle.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, mingo@kernel.org, rick.p.edgecombe@intel.com, sroettger@google.com, tglx@linutronix.de, x86@kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: EABAF160007 X-Stat-Signature: kfd94xuz7f54f17r63cc48fps84jerwn X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1739256443-166050 X-HE-Meta: U2FsdGVkX1/TRs34w8tgYFEYa2RgXZihaVach2OI2ZlbtxkoSQrNK9M5B7SNoxXI37FkTkoxWDobvpEr4ZGsRV9g+LWa+uH5J8+ChrEMCg7BIsc8VkgXcCtR5+HetbFQC5vuLPQi+RV828ZyllbQVkqJIQwGEHqxOPW3E9M0xtgr/Ntp99fItn46m2BojUfwjx68Jo6fqtCzDD5GFEcM7jbUscp9FpKXdBLKBwE+v88pfOV3UrUet+CGgfA+ON3eZB5AKnt8Y+ubn294FgEZacpuqo0AbMUUxWxzuxFOmVgdzlInLChv5hR5l1cXmuQVxO1hS3ygldXeXb/VPaW05ZHbUO1Tz+p/3Yz17Ept3ZsivfOVqnAJ+J3Ec16+68H+WtZVJ/Cq8+rcn2sGZ0JmyzhvjZGJcS+f19PjuDCtGAx4fDNWRoi05u8TfQ7ROYLcxIXHymz9JDvrcd7F3XMbzcyPbO3ujEL3O8Fkr1KFI6a//l3F7Y39EZUrYrf51ujyeTHLwF/kyxOpGpsEZNxam4jPaog07QVNKccuQM/sEVx+EXJQ9FIpLtjQLbWWwuOkrRP54tP3Wbgntu7kmZfsFziqlN/6uuRUYs15WQ4BX9D7io2iReFGatMuS2e4L0MobTEUaBVzkpR4NnZqYlTHgkmos9PMtiYDO/fWo3RLxI4BiX2RMpHZ7Qz1Q5fOIMkeV+CCDmOFkf+Tu7GS7RvEje6sLDj4uGYG+cxdkxtVsmTDlePMFyXkknrSKZLGzSwjs3q+HGI8oXhxxJH2ZDGIbdJv0tQHKfqlvDEnIOpgic1voho8MAdeVep+jerorm1QNBal+stFkg7DXSoFplxWR1EZjAiAftORzqEvsaxoo9BMnLQSX6jpjkVf7ev5KMrewaumUQWC29lOYguHoi4XZM1ypaYiMo4z4044YLrKY8dAYF0h26k2rwhUIwAbd9aq93yRaFscjaeb7XM2WVy OrITwHSA MTisMF2CJQCQ4WE+O7UQTQm9J6yOzBTB4l2MrgCGErrosyr+nlU/G1P0ms6s2iV8sghdQb06I9p+rDqsXNa1Ld5XkMRAezQBunY3qQ2HsXYaRTmZkHR4m31MlW8uSthlbJefvLuLHfzHhA5p/1/MQJXhTrtwGXIyCxEt4aTmEg3JGtx107eClEq9n7JbjzKiXqVMXokEtl6ZRQKF2cLYG92ikaPUA5luKzB/pYj1jWM7ye9cMkgLs/IebHrgUhW00rYMqITvJAsvMFoeBGnFNffsf/8nfN1q2Q4StfXpiQ20rGeQO6BqBY4fVFIHqcjYNZ8fseDf0FKTuc0/49I6yoV6uth5LUU/4KcCq+a4PReeEArxgCPJsgFG1oq1oBRiabZGyhWu25QfvY7TNLWDMteYQTnVVqphpRNEH X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, 10 Feb 2025 at 23:46, Jeff Xu wrote: > > Hi Dmitry > > On Thu, Feb 6, 2025 at 10:06=E2=80=AFAM Dmitry Vyukov wrote: > > > > On Tue, 4 Feb 2025 at 11:02, Dmitry Vyukov wrote: > > > > > > Re commit 70044df250d022572e26cd301bddf75eac1fe50e: > > > https://lore.kernel.org/all/20240802061318.2140081-4-aruna.ramakrishn= a@oracle.com/ > > > > > > > If the alternate signal stack is protected by a different pkey than= the > > > > current execution stack, copying xsave data to the sigaltstack will= fail > > > > if its pkey is not enabled in the PKRU register. > > > > > > > > We do not know which pkey was used by the application for the altst= ack, > > > > so enable all pkeys before xsave. > > > > > > > > But this updated PKRU value is also pushed onto the sigframe, which > > > > means the register value restored from sigcontext will be different= from > > > > the user-defined one, which is unexpected. Fix that by overwriting = the > > > > PKRU value on the sigframe with the original, user-defined PKRU. > > > > > > Hi, > > > > > > This unfortunatly seems to be broken for rseq user-space writes. > > > If the signal is caused by rseq struct being inaccessible due to PKEY= s, > > > we try to write to rseq again at setup_rt_frame->rseq_signal_deliver, > > > which happens _before_ sig_prepare_pkru and won't succeed > > > (PKEY is still inaccessible, hard kills the process). > > > Any PKEY sandbox would want to restict untrusted access to rseq > > > as well (otherwise allows easy sandbox escapes). > > > > > > If we do sig_prepare_pkru before rseq_signal_deliver (and generally > > > before any copy_to_userpace), then user-space handler gets SIGSEGV > > > and could unregister rseq and retry. > > > > > > However, I am not sure if it's the best solution performance- > > > and complexity-wise (for user-space). A better solution may be to > > > change __rseq_handle_notify_resume to temporary switch to default > > > PKEY if user accesses fail. > > > Rseq is similar to signals in this respect. Since rseq updates > > > happen asynchronously with respect to user-space control flow, > > > if a program uses rseq and ever makes rseq inaccessible with PKEYs, > > > it's in trouble and will be randomly killed. > > > Since rseq updates are asynchronous as signals, they shouldn't > > > assume PKEY is set to default value that allows access > > > to rseq descriptor. > > > > > > Thoughts? > > > > Another question about switching to pkey 0 and not switching back on al= l errors. > > Can it create security problems by allowing sandboxed code to escape? > > > Sandbox escape would be bad , we wouldn't want the calling thread to > get PKRU =3D 0 in any error path. > > > Namely, here: > > > > + /* Update PKRU to enable access to the alternate signal stack.= */ > > + pkru =3D sig_prepare_pkru(); > > /* save i387 and extended state */ > > - if (!copy_fpstate_to_sigframe(*fpstate, (void __user > > *)buf_fx, math_size, pkru)) > > + if (!copy_fpstate_to_sigframe(*fpstate, (void __user > > *)buf_fx, math_size, pkru)) { > > + /* > > + * Restore PKRU to the original, user-defined value; d= isable > > + * extra pkeys enabled for the alternate signal stack,= if any. > > + */ > > + write_pkru(pkru); > > return (void __user *)-1L; > > + } > > > > we restore to the original pkru on this error, but there are other > > failure paths later, e.g.: > > https://elixir.bootlin.com/linux/v6.13.1/source/arch/x86/kernel/signal_= 64.c#L199 > > > > on these errors paths we will eventually get here to force_sig(SIGSEGV)= : > > https://elixir.bootlin.com/linux/v6.13.1/source/kernel/signal.c#L1685 > > which just sends SIGSEGV and is not fatal. > > > > So hypothetically, if there is a SIGUSR1 handler without SA_ONSTACK, > > which fails, but SIGSEGV handler has SA_ONSTACK and doesn't fail, this > > will result in resetting PKRU to 0 without restoring it back. > > Or sandboxed code somehow arranges for the first signal setup for other= reasons. > > > Can you walk me through the setup and steps that led to this situation? I don't anything more concrete steps, just the observation that PKRU is restored only on 1 out of N error paths. > > This is, of course, a tricky attack vector, and the program must > > resume after SIGSEGV somehow (there are some such cases, e.g. mmaping > > something lazily and retrying), but with security you never know how > > creative an attacker can get and what you are missing that they are > > not missing. So it looks safer to restore to the original PKRU on all > > errors. > > Thanks > -Jeff