From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C3A64C77B75 for ; Wed, 17 May 2023 23:49:21 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 2DB57900004; Wed, 17 May 2023 19:49:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 28B6C900003; Wed, 17 May 2023 19:49:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 15491900004; Wed, 17 May 2023 19:49:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id F3E79900003 for ; Wed, 17 May 2023 19:49:20 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C780EADFC0 for ; Wed, 17 May 2023 23:49:20 +0000 (UTC) X-FDA: 80801390880.15.C389DC6 Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) by imf25.hostedemail.com (Postfix) with ESMTP id E4A3CA000D for ; Wed, 17 May 2023 23:49:18 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=iMl6xWrY; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of jeffxu@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jeffxu@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684367359; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hEhi/F7IF7fOHVORB9HV9ejZFUaKkU7/Sl0HWjAsuNA=; b=z0YK+rj3IvEZf8E+t7DPWrJpCrhrBDrmQAmwQhATVKn1liZwP+48m9yK7Pjvvj+GieHMHn Vmg7rBifqnXYR2hJqC1A7XUtRUbFumSfhBGe2CGMErzSrlkhRuesLBeHJwT9bHDUTCb9mI HzPM+Y6PK7pWb1u4EGnV2YL8laCkIP4= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=iMl6xWrY; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf25.hostedemail.com: domain of jeffxu@google.com designates 209.85.208.45 as permitted sender) smtp.mailfrom=jeffxu@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684367359; a=rsa-sha256; cv=none; b=y2WrAF+XqJHf1ERf3WoeExlOBxXWpcpmr+WwrYqPEJuj/nT3EcDVzQCKjG9UN1rB9yuyT3 8jfFBpBfW9X4Zk7FeAdMoWWPLsNEr1JQggJiMH1GCIMxvhq9dMZbF0OZN+AYj9W20xei8Z D3WOunNlrFMtz2QT4VX7N2R1ZbfHsP8= Received: by mail-ed1-f45.google.com with SMTP id 4fb4d7f45d1cf-50bf7bb76d0so3306a12.0 for ; Wed, 17 May 2023 16:49:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1684367357; x=1686959357; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=hEhi/F7IF7fOHVORB9HV9ejZFUaKkU7/Sl0HWjAsuNA=; b=iMl6xWrYH04mrV0CwDQLr2HB1DnGbBIP3kGTJhzYuIXJK+qigdo+veLe/w/Bx16rKM v4NN+MNO0tclMIFjUybEjY/qAJhzPVKNMzrrdhBY2dDucRH4CsibiP/fjoawx7cPMTkp pHun/ufQ5l0yKC/jZLt74ipxGm1is3NuTaNppXnUGCVxrD1uHTefMNeQ4ykJJ7x0ayD1 8oXRBOs4PzUNqmCT0VXMSFYofRPLZeBRxm8gCDFzlql2qv0TKcSefwS3JJ/qdhx/dhoA 8vlaUWAZo80i0ThsOttdquge6fWzWTtX7FBxk6SKP7mWpJF51FdkwmcsQRflvPFXYcaj imXQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684367357; x=1686959357; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=hEhi/F7IF7fOHVORB9HV9ejZFUaKkU7/Sl0HWjAsuNA=; b=ARGfWzMjCci4FZWrwFwItWV3ukuFZX8PDUTFhyzgoQFJk9l8incweo21YrqFoT7aBz ayVV8fp2GM50I+Np/eG1bBpZlqtpZCTXhgXLOQ50OmCRzT/+rj05I4KeLaL6S3VMxDwY F1dxvxXindfMah+meiPD7Ja9rr0NMZFRKY3ZJ2QKjlUUH8scvhiGQ3gHxezwPqKAafJc zH9AvdhGq+VHNpPL1Uk6hDuq4krmalnGaGT5YaFySIPUUhzmZ9Vwl18zvL28xC1nDIlV cUsahrJj6C7oi9reK0rh2cHMc09OR0Ui87hPNdz4vzsJU1qUaVdk94nIQvyi8JIUJfPJ xFIA== X-Gm-Message-State: AC+VfDykpU1a8amOUYamEIon8UhOxjZuyedWIxvC5vCV33jEFPin0K2M wU6U5mkCMH/ITzA6YMteIPn+0v2er37VgOumC/mxTg== X-Google-Smtp-Source: ACHHUZ4XbKxbQOQLQG53OqE9dfOPzVmawUyUW2Ny+fMNBdc2uIPn3QUzAb17hrfujb59vMW5hnL59Bdx1QkYCl9hX6U= X-Received: by 2002:a50:ccd6:0:b0:505:863:d85f with SMTP id b22-20020a50ccd6000000b005050863d85fmr46998edj.4.1684367357123; Wed, 17 May 2023 16:49:17 -0700 (PDT) MIME-Version: 1.0 References: <20230515130553.2311248-1-jeffxu@chromium.org> <2bcffc9f-9244-0362-2da9-ece230055320@intel.com> In-Reply-To: From: Jeff Xu Date: Wed, 17 May 2023 16:48:00 -0700 Message-ID: Subject: Re: [PATCH 0/6] Memory Mapping (VMA) protection using PKU - set 1 To: Dave Hansen Cc: =?UTF-8?Q?Stephen_R=C3=B6ttger?= , jeffxu@chromium.org, luto@kernel.org, jorgelo@chromium.org, keescook@chromium.org, groeck@chromium.org, jannh@google.com, akpm@linux-foundation.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, linux-hardening@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: onqgjh48wgzodtw9i4jhgbtqg1gwbiqp X-Rspam-User: X-Rspamd-Queue-Id: E4A3CA000D X-Rspamd-Server: rspam07 X-HE-Tag: 1684367358-345867 X-HE-Meta: U2FsdGVkX18bIggI/zoJ8+K3GNoQj+9Fsaubx7zppQdYADDhcbFYvJshkLpkCWx3BisTVmtWE0vDuoonBo4vy3kGkYnMF/xOAJgSh7YextNpXWfVmeE3qNzE9RmbusQhHjXEI/5vRwGpWOVhUoNty067DCb2kgsMQmc62MFy/qGt4V1CSJSThkE5cqDmvWPRa2s3VQc7Pm0juxhW/ORJuHzrOH0HdEXqAVktgG0ccZsRziFZEGFwIHmTtrLjuDhZqGoM6GPhWRleoaxlbI+fFAtxbrVUQeFjstRobqWD5aT+VgQB2/X/i6EYJSy8Pn18UEGZNtbN1hsJ7pyC0rkputL8ydobgXZgkC1sWQ1v3JD2+cxXN2TgAG74r/LqLtqoRYifBltoAwrzT132hJpeZp3zIx2vNUSm1KC4nJJJdmktjb/jZaX3WjjeIJJ4UeLyV5wTnC9L031M+yWd5eO/QcU+gBRwV7G6LfVhdYNAeRtr+J8k3gCMwNKHFB4rXYZpYGD0mZ5Eij9L3VW/FfGIMqHdB+PSkCbMum7dYAT1XOb8pn1J8u9vMmyHpN+0CMjbarn0lUKamZhvs2XVAluZaVAkZ7+zndFfutyYXnhqg9/xFMJV+aU4S/rNKOLQvjDz3CAMLZ9L0EOg/yyQasQ6lvSzDJM7xiYByaWkUQUUAAsXUvzRS006EoY2x1sSMQvgAqEljAPEkLivO7Q04ZwYWFLgrrLf1RuCSP91ouVGOpiTM5lMzAnP72QVQ5KAnig/0BShYkcja0vFiEQtUD79Ef/rubXG3Da3Y2as4oaJoM1pnaBXcumRb7X3MuQx6F0nsGk0bai49SvuYuOeJwchEwX3UODgidOGQmjXPNvh+sqychkeW6FXkIMX/7HTTPpzck9gbYweERQ1ntK8XSbwvdHUzpUTKsmkAm8063h5l6bUUE/5777tNPHf0jYKMmTQZuT0W/LrREmsbU9BzQm KI2ch1+I HwXDyiEu3BlptqE4NuR4qQTsIqr4Rnu0UiJHHi4kOob/y1bHB9WpmyN+xu20oaAf2tYJx9YTRrjTyCLrIkz9UjxWW1LogD+ym6C1ZDsi5Fsp1J/9gZB7IFlG5ImT/HzF5B3jOcTl8Z9NUESKejbUBnoq6zmKtSR2kbpwzTmFh8AakaB/JgElMhH1f+dMXslT2gbJZc6dXsVcmp/T0JsBlYkxwy58JXOPxzJnq9avdX6OSnCjxjiZrAoD/dmUeTw+fyI42nV/kZkn7MbzdPeXmcGoyz91yqXznpt5mM3oRj7Q1G2nGj/HZw3CZ9w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, May 17, 2023 at 8:29=E2=80=AFAM Dave Hansen = wrote: > > On 5/17/23 08:21, Jeff Xu wrote: > >>> I=E2=80=99m not sure I follow the details, can you give an example of= an asynchronous > >>> mechanism to do this? E.g. would this be the kernel writing to the me= mory in a > >>> syscall for example? > >> I was thinking of all of the IORING_OP_*'s that can write to memory or > >> aio(7). > > IORING is challenging from security perspectives, for now, it is > > disabled in ChromeOS. Though I'm not sure how aio is related ? > > Let's say you're the attacking thread and you're the *only* attacking > thread. You have three things at your disposal: > > 1. A benign thread doing aio_read() > 2. An arbitrary write primitive > 3. You can send signals to yourself > 4. You can calculate where your signal stack will be > > You calculate the address of PKRU on the future signal stack. You then > leverage the otherwise benign aio_write() to write a 0 to that PKRU > location. Then, send a signal to yourself. The attacker's PKRU value > will be written to the stack. If you can time it right, the AIO will > complete while the signal handler is in progress and PKRU is on the > stack. On sigreturn, the kernel restores the aio_read()-placed, > attacker-provided PKRU value. Now the attacker has PKRU=3D=3D0. It > effectively build a WRPKRU primitive out of those other pieces. > > Ah, I understand the question now, thanks for the explanation. Signalling handling is the next project that I will be working on. I'm leaning towards saving PKRU register to the thread struct, similar to how context switch works. This will address the attack scenario you described. However, there are a few challenges I have not yet worked through. First, the code needs to track when the first signaling entry occurs (saving the PKRU register to the thread struct) and when it is last returned (restoring the PKRU register from the thread struct). One way to do this would be to add another member to the thread struct to track the level of signaling re-entry. Second, signal is used in error handling, including the kernel's own signaling handling code path. I haven't worked through this part of code logic completely. If the first approach is too complicated or considered intrusive, I could take a different approach. In this approach, I would not track signaling re-entry. Instead, I would modify the PKRU saved in AltStack during handling of the signal, the steps are: a> save PKRU to tmp variable. b> modify PKRU to allow writing to the PKEY protected AltStack c> XSAVE. d> write tmp to the memory address of PKRU in AltStack at the correct offset. Since the thread's PKRU is saved to stack, XRSTOR will restore the thread's original PKRU during sigreturn in normal situations. This approach might be a little hacky because it overwrites XSAVE results. If we go with this route, I need someone's help on the overwriting function, it is CPU specific. However this approach will not work if an attacker can install its own signaling handling (therefore gains the ability to overwrite PKRU stored in stack, as you described), the application will want to install all the signaling handling with PKEY protected AltStack at startup time, and disallow additional signaling handling after that, this is programmatically achievable in V8, as Stephan mentioned. I would appreciate getting more comments in the signaling handling area on those two approaches, or are there better ways to do what we want? Do you think we could continue signaling handling discussion from the original thread that Kees started [1] ? There were already lots of discussions there about signalling handling, so it will be easier for future readers to understand the context. I can repost there. Or I can start a new thread for signaling handling, I'm worried that those discussions will get lengthy and context get lost with patch version update. Although the signaling handling project is related, I think VMA protection using the PKRU project can stand on its own. We could solve this for V8 first then move next to Signaling handling, the work here could also pave the way to add mseal() in future, I expect lots of code logic will be similar. [1] https://lore.kernel.org/all/202208221331.71C50A6F@keescook/ Thanks! Best regards, -Jeff Xu On Wed, May 17, 2023 at 8:29=E2=80=AFAM Dave Hansen = wrote: > > On 5/17/23 08:21, Jeff Xu wrote: > >>> I=E2=80=99m not sure I follow the details, can you give an example of= an asynchronous > >>> mechanism to do this? E.g. would this be the kernel writing to the me= mory in a > >>> syscall for example? > >> I was thinking of all of the IORING_OP_*'s that can write to memory or > >> aio(7). > > IORING is challenging from security perspectives, for now, it is > > disabled in ChromeOS. Though I'm not sure how aio is related ? > > Let's say you're the attacking thread and you're the *only* attacking > thread. You have three things at your disposal: > > 1. A benign thread doing aio_read() > 2. An arbitrary write primitive > 3. You can send signals to yourself > 4. You can calculate where your signal stack will be > > You calculate the address of PKRU on the future signal stack. You then > leverage the otherwise benign aio_write() to write a 0 to that PKRU > location. Then, send a signal to yourself. The attacker's PKRU value > will be written to the stack. If you can time it right, the AIO will > complete while the signal handler is in progress and PKRU is on the > stack. On sigreturn, the kernel restores the aio_read()-placed, > attacker-provided PKRU value. Now the attacker has PKRU=3D=3D0. It > effectively build a WRPKRU primitive out of those other pieces. > >