From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88C9BD3C533 for ; Thu, 17 Oct 2024 20:57:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 161006B007B; Thu, 17 Oct 2024 16:57:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 111656B0082; Thu, 17 Oct 2024 16:57:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1A3B6B0083; Thu, 17 Oct 2024 16:57:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id D1A8F6B007B for ; Thu, 17 Oct 2024 16:57:36 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id CFED71C6D97 for ; Thu, 17 Oct 2024 20:57:23 +0000 (UTC) X-FDA: 82684304850.01.4DADDB3 Received: from mail-oa1-f53.google.com (mail-oa1-f53.google.com [209.85.160.53]) by imf20.hostedemail.com (Postfix) with ESMTP id 3C95E1C0007 for ; Thu, 17 Oct 2024 20:57:22 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=TSGZZgY8; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf20.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.160.53 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1729198621; a=rsa-sha256; cv=none; b=GXrBN9wxvWER4u8ua9UPmtBM028zwpuVSAVe2AJBOiSNv/K0XfOMVuHhWAeUspcpvsx7Ri O4OX1u6GHfKH/jSxQKn0t8EAMGP8C3/oIL7v1pua9CuCfdxX+bZqbZQciPA+zU0xUz/LBt VOkV3ybrKvZERHIvN+5xgvvlhKUBMVU= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=TSGZZgY8; dmarc=pass (policy=none) header.from=chromium.org; spf=pass (imf20.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.160.53 as permitted sender) smtp.mailfrom=jeffxu@chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1729198621; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=/F6YSLte3edSGWFheOgCWVPP3+prQu0BiAqqVtVwz34=; b=8l80WZSh2/X8E1he0EZpuyjejBsiFUKFrJG8kXlwnxuNajNRHR0fw4JRwVUOPP31RpcUsb K0I+2s5eUETfI6YKTmCufIrqREaBwukceopqrGFgO6JW5Ptkz9Erhez8gIMP71YNsvRcZt /hvon0LRNQLfVu0pE7fo1c+NRHlR9C4= Received: by mail-oa1-f53.google.com with SMTP id 586e51a60fabf-2877ea76040so65176fac.1 for ; Thu, 17 Oct 2024 13:57:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1729198653; x=1729803453; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=/F6YSLte3edSGWFheOgCWVPP3+prQu0BiAqqVtVwz34=; b=TSGZZgY89JmrO7ZdJv5WmfomEF5ranu7acg7MQrLuAmodzesPq8qW8h5MwziRq3YMK ojgzJc4Nnt7o6pCo7G9FsZg2MVUa+rozKUnP5iYvIGjA40SaMNQnBNTtXNA3PCiOCYNJ GhSmLaDFwLKcdIUHaRBpP/ywdTwfuPYz9Jpms= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1729198653; x=1729803453; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=/F6YSLte3edSGWFheOgCWVPP3+prQu0BiAqqVtVwz34=; b=cNI82yYAxjcYjkXtnYb2LNDkFXFNQT8u+kLsiq9c4dtMOqygTaWC3VWIyKpeYE9h1Y 5VctmLt69huEJFcJkRW4JqWPmBAE+RsyaMDB+7aly6n08lNW7g+1n+ucRywkF86tyLz8 ArOa/hpZI7NXPqED/8UyepsUv01KmiYcwi0xxPfAVPqCvmq/kYpKWE6CEI2osEDmTMex DaGO9AEt8WueEkS+w3bSCgKm2aiVMOYPU3UMpohlgm112Rht8a+V11XC8cHCdGChzF+Y QGXKstDkp7ogn6XJTb1Sj9EngHUFLfNvv/0ikwL1SUmnfhDo/2KnmPcSX53Ynpe1OcNz O1Lw== X-Forwarded-Encrypted: i=1; AJvYcCVlMEP6E3OJDkiuQb0nGDGZYhKGQy8EcTu3xYfPJYcE36sfPrryWKOZ+QYeJNpMX52U+hSNzwzdLw==@kvack.org X-Gm-Message-State: AOJu0YwVJYt0hzKZDbVNX5in4nFDkDbsn661Gps7a0Z+KHWthBehtCfs 5Km6TFdkasmRpAGFFORyig5pgnQWDm/oTMtR7557PG8z0U4boO7Klzujwwqc+ylUFiwmgufUij8 KHkri/kwhBYdLyRUfNB45hu/+J1px2pXA2s/y X-Google-Smtp-Source: AGHT+IHHUuQFxa/zQxSDrqiQJJZOsss48H0nptMvzFPNY8+qKBTv0db1Iu/UCYTZHYAYX2E0/f1IhYkLDQtjkPBNtko= X-Received: by 2002:a05:6871:3a12:b0:27b:56b1:9ded with SMTP id 586e51a60fabf-2892c2df315mr63608fac.5.1729198653583; Thu, 17 Oct 2024 13:57:33 -0700 (PDT) MIME-Version: 1.0 References: <20241017005105.3047458-1-jeffxu@chromium.org> <20241017005105.3047458-2-jeffxu@chromium.org> <5svaztlptf4gs4sp6zyzycwjm2fnpd2xw3oirsls67sq7gq7wv@pwcktbixrzdo> In-Reply-To: From: Jeff Xu Date: Thu, 17 Oct 2024 13:57:21 -0700 Message-ID: Subject: Re: [PATCH v1 1/2] mseal: Two fixes for madvise(MADV_DONTNEED) when sealed To: Pedro Falcato Cc: akpm@linux-foundation.org, keescook@chromium.org, torvalds@linux-foundation.org, usama.anjum@collabora.com, corbet@lwn.net, Liam.Howlett@oracle.com, lorenzo.stoakes@oracle.com, jeffxu@google.com, jorgelo@chromium.org, groeck@chromium.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org, linux-mm@kvack.org, jannh@google.com, sroettger@google.com, linux-hardening@vger.kernel.org, willy@infradead.org, gregkh@linuxfoundation.org, deraadt@openbsd.org, surenb@google.com, merimus@google.com, rdunlap@infradead.org, stable@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: cycogsy7e7nhpp8k8thk14hfhzah14ww X-Rspamd-Queue-Id: 3C95E1C0007 X-Rspamd-Server: rspam02 X-HE-Tag: 1729198642-49692 X-HE-Meta: U2FsdGVkX1/7R+3ckJbOjCTKLOjpdCMC5Zv6usaPhxp+tzkt0ZfJkB6y2LHEg3dif4TzA9gPVX1+tnCCP945HkB857hSPzIxlhsjkJ6cOc39NKBYcLSOCnn5S84k4Ldb7abkoLfBa4c8QL4Ryuo5xXv/CcVyhJqUZsdG+HDb0ks8pJ0CmhFF5r4f1dQOp1i4aoIu/aZWRt0jJ8eBAxERP4oMR4luDMsSpmH0/SCBiLHC/V4OdR/S9X4A4HzVsGBEbBzG7r0JDgPoqoBbfblsjkJiVJv6UAxy3FkJAXYASJHDTjt9q6/5IG40yuNrCvY7nxEYqcNoro1ZF3kQK8FzVMZ9G0ZGlE7kgcfmU34trQXBQIWLaJQM+6dInG9ckjiNN4uscftDIKD7SYzYimKy0AVNN2fQtzyxuUX+wgwEBOvRLxvvEz+3Kx0nqc5QvBQIiq7srTr1HaYvRa8yVnZC618Vgws7AqgeFueay0SPnZ3a2UIBwYwPPdlzcn/ZlEJHF7wcBC6LStMlF4SovezM0PCLjEjnpHr8sAV5rfq4xfGCCGSkMzNjCFC26bwx5MY4dC+3yvrFdbaJYguRbWMsDjCnZ1+/rsccBVPPFfX0/FDalL9ncs4npS7a+yWCiYPgon13sSY76qtvlH7euS9vW62xG8NGNuQdGFVukAH4YDDhfyf5nU980ujK6rnMDjMy6h0hyafsEo2YxQj+o9KjZ/zF00IYaGVU159PutICoQ/eK1YWHHoLZSPoh7Ez5qxQ4DyovDaAKmLpoWRJR+fWu0tEsTkuBE7HESXS1drPHGezjupmAhzWZ1h5lgnxf3i80wnJtuvh0Pvz3Y3DAXBEGCDbWCnjk46AvX1D9kEB3OA77qVCane0T1YBELy+YjEJwXqs/HeykG4jss6ztcaZTfoETpH++HlquBXBew1XQWi7HlowrdcPAO1DsNyr7iGZEvDEwNBJlRmEF8Ej3Ac WSqYkU1+ xHqzeANsl0MS11gr3/GMtLBvhrYygqFRSVl6UCKdeK3/9zwtdo/tbCfGcGNPAlOcqmO3hpqij+zYy7S5XEkuzrQOsRYFfL7EpU+ddlL5/OoBg4PwU5UccE14RnfxXlGCVt6Cly04OxwE0Ed/v5pp0hRKtKRXwHWD8MzShA7feMJuyUgNhcDnRIj1wwlteePUGxMaCkwMZYwXKNehyAEOhBaJ4UnGE/n+w8Ihk0xP0MsbuZFO6Zi/yb9d6TrUv1ibPlB667Nj68D6Df9VitbRQSnxxqISmvVS/v+A3W5pyvfhmOkxww9WC3sxXY1En/kKSQqnKEbksKNQSeYbcKaa4alwwnQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Oct 17, 2024 at 1:49=E2=80=AFPM Pedro Falcato wrote: > > On Thu, Oct 17, 2024 at 01:34:53PM -0700, Jeff Xu wrote: > > Hi Pedro > > > > On Thu, Oct 17, 2024 at 12:37=E2=80=AFPM Pedro Falcato wrote: > > > > > > > For PROT_NONE mappings, the previous blocking of > > > > madvise(MADV_DONTNEED) is unnecessary. As PROT_NONE already prohibi= ts > > > > memory access, madvise(MADV_DONTNEED) should be allowed to proceed = in > > > > order to free the page. > > > > > > I don't get it. Is there an actual use case for this? > > > > > Sealing should not over-blocking API that it can allow to pass without > > security concern, this is a case in that principle. > > Well, making the interface simple is also important. OpenBSD's mimmutable= () > doesn't do any of this and it Just Works(tm)... > > > > > There is a user case for this as well: to seal NX stack on android, > > Android uses PROT_NONE/madvise to set up a guide page to prevent stack > > run over boundary. So we need to let madvise to pass. > > And you need to MADV_DONTNEED this guard page? > Yes. > > > > > > For file-backed, private, read-only memory mappings, we previously = did > > > > not block the madvise(MADV_DONTNEED). This was based on > > > > the assumption that the memory's content, being file-backed, could = be > > > > retrieved from the file if accessed again. However, this assumption > > > > failed to consider scenarios where a mapping is initially created a= s > > > > read-write, modified, and subsequently changed to read-only. The ne= wly > > > > introduced VM_WASWRITE flag addresses this oversight. > > > > > > We *do not* need this. It's sufficient to just block discard operatio= ns on read-only > > > private mappings. > > I think you meant blocking madvise(MADV_DONTNEED) on all read-only > > private file-backed mappings. > > > > I considered that option, but there is a use case for madvise on those > > mappings that never get modified. > > > > Apps can use that to free up RAM. e.g. Considering read-only .text > > section, which never gets modified, madvise( MADV_DONTNEED) can free > > up RAM when memory is in-stress, memory will be reclaimed from a > > backed-file on next read access. Therefore we can't just block all > > read-only private file-backed mapping, only those that really need to, > > such as mapping changed from rw=3D>r (what you described) > > Does anyone actually do this? If so, why? WHYYYY? > This is a legit use case, I can't argue that it isn't. > The kernel's page reclaim logic should be perfectly cromulent. Please don= 't do this. > MADV_DONTNEED will also not free any pages if those are shared (rather th= ey'll just be unmapped). > > If we really need to do this, I'd maybe suggest walking through page tabl= es, looking for > anon ptes or swap ptes (maybe inside the actual zap code?). But I would r= eally prefer if we > didn't need to do this. > I also considered this route, but it is too complicated. The copy-on-write pages can be put into a swap file, also there is a huge page to consider, etc, The complication makes it really difficult to code it right, also scanning those pages on per VMA level will require lock and also impact performance. > -- > Pedro