From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4E723C3DA4A for ; Fri, 16 Aug 2024 17:30:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8540B8D00A1; Fri, 16 Aug 2024 13:30:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8040F8D00A0; Fri, 16 Aug 2024 13:30:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6A67F8D00A1; Fri, 16 Aug 2024 13:30:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 454F38D00A0 for ; Fri, 16 Aug 2024 13:30:21 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id C866DA3E43 for ; Fri, 16 Aug 2024 17:30:20 +0000 (UTC) X-FDA: 82458797400.30.EC3524E Received: from mail-oa1-f43.google.com (mail-oa1-f43.google.com [209.85.160.43]) by imf26.hostedemail.com (Postfix) with ESMTP id E657B140012 for ; Fri, 16 Aug 2024 17:30:18 +0000 (UTC) Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=h+GGeCQY; spf=pass (imf26.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.160.43 as permitted sender) smtp.mailfrom=jeffxu@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723829360; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8tBYSxfHD+S5eb/sSL8twdBp0696qHHZqkJKClMzb6c=; b=UWPvhg4oyds7UQJhL3Z0KeLZfy8nTpJSyReGhStmRSGW82uQtBuFDEEoGt23+L8LSWOkRB j1Ik9QkzZ0VlIjiDKuXYE/zamzQIFINnO9fd4SBSkAHPjqIIwolwtYCRmCBbC12sQXWgUu amN5C4ciSmz5NuLYrH05t1FeCSKSwfI= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=chromium.org header.s=google header.b=h+GGeCQY; spf=pass (imf26.hostedemail.com: domain of jeffxu@chromium.org designates 209.85.160.43 as permitted sender) smtp.mailfrom=jeffxu@chromium.org; dmarc=pass (policy=none) header.from=chromium.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723829360; a=rsa-sha256; cv=none; b=2FxHOKmeE5UicankVxX2rH4haNLYQoY9qxe/vOj60Bb6I0Rnrc9kqftTy66hFDSpgpvBuN v1Q9FFtU7UpJR1DwXJHSoLAJEVWMU3vnDCoXUa8BDXSeoCJgGYP/CxM+gweNjPwdMSH21G sCmrQe1+8/cohhTPGtyhYGTCWY7MHCQ= Received: by mail-oa1-f43.google.com with SMTP id 586e51a60fabf-27032e6dbf2so85195fac.1 for ; Fri, 16 Aug 2024 10:30:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1723829418; x=1724434218; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=8tBYSxfHD+S5eb/sSL8twdBp0696qHHZqkJKClMzb6c=; b=h+GGeCQY9afU20+mqPLl3ASk6YG088IA648OVI1O4u7XDDiN0N0+1LdSu4B5XT4L2m Whb1mrNevaH2AxhRYX1rqz4qmvSgusut22RSrMwAmhGZ3w4DoCcRIziK8mnCMCjFnnGm g6lVI86dO0SBlvJlxtCbuwBODfhhLTw4JigrU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723829418; x=1724434218; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=8tBYSxfHD+S5eb/sSL8twdBp0696qHHZqkJKClMzb6c=; b=rNRYs6fwYCT3swrdmE2Z+WPb14oz2UUSc0hM61bJBuuf/Y5FG3RGIg/ofooGbo+JXt PklnDejH/W7QhT9W6DOOuCEpNQGwyONzqDbCNYznK4sOFOK6zeTqN5htOYcFuJyHdCMk BOh6j3bFYvQODe40IDwAKiCjxwRT2CVuEEEEIvU6R/VBN8aBglFvZ4t/K4aMn6dIdIC9 4X5K0B49snhgkIe9Gqly5stiaKSpdeH6ExfL0BsPlq9lgXjv0EyqrqUDfwja02jFcYTC /MByXPdH4nhaRr6cKYzL8DTQHWE29BIE6voCMTSL5C4UdouWL9EsjcVoalYgEBFnixJx 0PRw== X-Forwarded-Encrypted: i=1; AJvYcCVIefR8FwhqlJNXk1vjdVq7WTieBnaCHWYyAM6otYFPR7CMa3KZqc3A48cc2tEO+wz7LcQKAVnhHQB/FSIAdBgZ8es= X-Gm-Message-State: AOJu0YxvmJN6zo1a9tmkMrMMGlC48hgfQNWDC1DLdCEthxvMjYgWAOvV 8X5mmWhxi1Ct00/C97sv3jDXn6g89VO+3DzECYBMcrAtyD9JZvJk5jYXnKAzed4dl9QR6lI8dmI hVJr6gA0eBuHtRtWbcEmBr2NRvON41Sl/Mzc7 X-Google-Smtp-Source: AGHT+IF0/4D35MLAl7Lt1Htt30ECMQlvh/GhrvPH0xL3O7nAZsnfu2nsRYWw+OkQzmkX71BMt7chI5Bzp4LNPdxH6/w= X-Received: by 2002:a05:6870:524f:b0:250:756b:b1ed with SMTP id 586e51a60fabf-2701c394beamr4276459fac.19.1723829417706; Fri, 16 Aug 2024 10:30:17 -0700 (PDT) MIME-Version: 1.0 References: <20240807211309.2729719-1-pedro.falcato@gmail.com> <20240808161226.b853642c0ecf530b5cef2ecc@linux-foundation.org> In-Reply-To: From: Jeff Xu Date: Fri, 16 Aug 2024 10:30:05 -0700 Message-ID: Subject: Re: [PATCH v2 0/6] mm: Optimize mseal checks To: Pedro Falcato Cc: Jeff Xu , Andrew Morton , "Liam R. Howlett" , Vlastimil Babka , Lorenzo Stoakes , linux-mm@kvack.org, linux-kernel@vger.kernel.org, oliver.sang@intel.com, torvalds@linux-foundation.org, Michael Ellerman Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: 4dayjjfbd6tdu45zk6qszrdisxppro8c X-Rspam-User: X-Rspamd-Queue-Id: E657B140012 X-Rspamd-Server: rspam02 X-HE-Tag: 1723829418-95311 X-HE-Meta: U2FsdGVkX1+aJBGW8crpfgAxT6oYowjcJWa/L0qwKld0Ysngjwl2T8xXUBlNi4ZBPjnP8mKnCJthH58TDzxQ6EsKPQmIbKIthRVIE5Cj0/K9jLNMv5HCPhFIHlXFX6qR3VcmZ1uXtUs5LY4dhsgIl+I/L6I/KWpSlyZww3X2Ft/nA/PWXSvXbqJ5FzqU1k+b1XVP1qsn5B3iCx8yosVu57lASM9FpNvuVFU/8Am4yqx3960oxMjpDTLjQ5MagEGgSUw+N7S4xpOYU6gDYkVecDoKe0snVUUtXwty6onVQ/oNq5TUXP7w13F8qQk1NqQJS85vHdIzhQ2dyv3+PyEsKADJc4N8QmIqtjzV2eQohE7XthKt0dwjEhMl0xEOTo1nKpbi3DRQTYqtOcBLW2Zp7kjkP6JC/q2uMohu2FvJ3TJi8txIfet2/7RFAn+ltGBRA5iSPzS5na9t5XVZfMfC3MtnisbBQd0CQEC5fXZwUvLi/vJtUeunZRsinpUjMjcrPAfpSfFwo13I9CnEnLcITFu8FPL8JmqEC2vSBs11JszP4NNVLiA6Yb6jppwGXnospzID8t2qnHewlIRIHbJv/e0EISFN/pXx7QlasxeQRnc9sAF6RY4Q+IAwwO3brS6ssiPwVdqU9oe/HyBz7K5OLD6kSbqkeGorsuQ1DVsUmD2gBwhJFhhd8hbU4bMWrn1fu2g4pExW/Jz3tlHPn7tnBNneMrWZFUEdfQ0uqI62eYmHDuFdZwu6lPv4hZbW6PZuiKIx7jpp6iuOBXoe7caJEC+kSR8Ey0hKApbmQrLFgKvnYXqkRVJ1y7J2liHbfCZrnYepNj3j64IxAxwIS5hlBDRJPjUPiIy0RYZyCm0bMtDlKLWqcjMefCRmHF5US8ssSHZagkS62ToJYBY5Od+htsMhO/L5nZPjSOdjxD+65vfgXxx1TgkBgJg3Oga6L9h3gLfubnoH8zDcrsLZl8D eL4Xw7s+ GZBOmweSZ2Tx2mJcr8kXqgHoPzYMQygtak8JrqgAzmyxEvdgaGvowb3/t/fR15W32/FaPqMKZO1+1J1zOAlrXMs3CwRvpr8mCAjMvvC5kP+OzPQKKcwxsdUe95O8TbYo7ayjJ4oHVxVSM2bH87WIVi9rp+ZZukXjdq4mdrjEDKb5YsJ6UT+9Tdnyt0vhweJb9NKmWIpfjkcKnAnnaHV0EvLygprbjdIbmDt26LBcDKJzdPzfQ2+fajRv5BFozQIEOa/vOzbSwVN2o050mV+9WxI36C8bKQaGvMbGhxsbzuamlzv2CoOKGsi13rgVzvq8nhFXwkLkXzTD2Nev3ukh6Th+T+PGdGQgCY1KbTS8yKJ8SJwLU1R3UC5P95hGReqENYfbJlJpToIZ0SYjkBZ49nIAO2NrSasuJ8feKfk7x5zfEFwAtlqYWIOrlJAOZawj7Xo8q4Hlj4qLd6zJGJ/F+JQFaEuD/zTUKoiv9jeBT3QE2OpiNL2nOQh7lfyYP/jF9RnlJ0ZCbUPVRl6M= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Hi Pedro, (resent, previous email has html link) On Thu, Aug 15, 2024 at 6:58=E2=80=AFPM Pedro Falcato wrote: > > On Thu, Aug 15, 2024 at 11:10=E2=80=AFPM Jeff Xu wr= ote: > > > > Hi Andrew, Pedro > > > > On Thu, Aug 8, 2024 at 6:03=E2=80=AFPM Jeff Xu wrot= e: > > > > > > On Thu, Aug 8, 2024 at 5:34=E2=80=AFPM Pedro Falcato wrote: > > > > > > > > On Fri, Aug 9, 2024 at 12:12=E2=80=AFAM Andrew Morton wrote: > > > > > > > > > > On Wed, 7 Aug 2024 22:13:03 +0100 Pedro Falcato wrote: > > > > > > > > > > > This series also depends on (and will eventually very slightly = conflict with) > > > > > > the powerpc series that removes arch_unmap[2]. > > > > > > > > > > That's awkward. Please describe the dependency? > > > > > > > > One of the transformations done in this patch series (patch 2) assu= mes > > > > that arch_unmap either doesn't exist or does nothing. > > > > PPC is the only architecture with an arch_unmap implementation, and > > > > through the series I linked they're going to make it work via > > > > ->close(). > > > > > > > > What's the easiest way to deal with this? Can the PPC series go > > > > through the mm tree? > > > > > > > This patch can't be merged until arch_unmap() is all removed (ppc cha= nge) > > > > > > Also I'm still doing a test/reviewing for this patch, perhaps it is > > > better to wait till my test is done. > > > > > Sorry that I'm late for updating this thread. > > > > With removing arch_unmap() change landed , there is no dependency for > > the patch. However: I have other comments: > > > > 1. Testing > > Testing is 90% of work when I modify kernel code and send a patch. So > > I'm a little disappointed that this patch doesn't have any test > > updates or add new tests. Which makes me less confident about the > > regression risk on mseal itself, i.e. a sealed mapping being > > overwritten by mprotect/mmap/mremap/munmap. I have posted the comment > > in [1], and I would like to repeat it to stress my point. > > > > The V2 series doesn't have selftest change which indicates lack of > > testing. The out-of-loop check is positioned nearer to the API entry > > point and separated from internal business logic, thereby minimizing > > the testing requirements. However, as we move the sealing check > > further inward and intertwine it with business logic, greater test > > coverage becomes necessary to ensure the correctness of sealing > > is preserved. > > Sorry, cut the crap. Your backhanded concerns are very funny. > mseal was _your_ feature. You wrote the tests. You either didn't write > or didn't understand what kinds of tests need/should be done, > otherwise you would've done them. I wrote some code to fix the awful > performance hit. It is a _fix_, not a feature. Your previous mseal > tests should've covered this. If they didn't, that's okay (we all make > mistakes), but pushing your problems onto me is seriously laughable. > I posted the comments about the lack of a test update in V2 last monday, there is no response from you until Thursday night (which is OK). If you were expecting me to update the test cases and to support your patch series, you should explicitly call it out. So your point here is that you wrote the code, passed the existing selftest, fixed the perf, and the job is done. If the test doesn't cover the new cases you added, it is not your problem, you only need to care about perf part. This is how regression happened in past, e.g. commit 77795f900e2a07c1cbedc375789aefb43843b6c2 Author: Liam R. Howlett Date: Tue Jun 6 14:29:12 2023 -0400 mm/mprotect: fix do_mprotect_pkey() limit check The return of do_mprotect_pkey() can still be incorrectly returned as success if there is a gap that spans to or beyond the end address passe= d in. Update the check to ensure that the end address has indeed been se= en. Link: https://lore.kernel.org/all/CABi2SkXjN+5iFoBhxk71t3cmunTk-s=3DrB4= T7qo0UQRh17s49PQ@mail.gmail.com/ Link: https://lkml.kernel.org/r/20230606182912.586576-1-Liam.Howlett@or= acle.com Fixes: 82f951340f25 ("mm/mprotect: fix do_mprotect_pkey() return on err= or") Signed-off-by: Liam R. Howlett Reported-by: Jeff Xu Reviewed-by: Lorenzo Stoakes Acked-by: David Hildenbrand Acked-by: Vlastimil Babka Cc: Signed-off-by: Andrew Morton Had not I wrote selftest to discover this mprotect regression, it would go unnoticed and stay that way. My point is: the existing selftest for mseal is written for out-loop, and will not fully test in-loop. Your patch has made a big change to mseal, more tests are needed. To move forward from this situation for your patch series, either you or me will need to update the tests. How about sharing the load ? > I want to stress this bit: There's no mseal feature, there's no > business logic. There's mmap, munmap, mprotect, madvise, mremap (among > others). These are the things people actually care about, and need to > stay fast. Memory management is a core part of the kernel. All of the > very pretty abstractions you're talking about aren't applicable to > kernel code (for any kernel, really) in general. No one wants to pay > the cost of walking through a data structure 2 or 3 times just to > "separate the business logic" for a random, minimally simple feature. > The testing is about making sure that sealing is preserved after mmap/mremap/munmap/mprotect call, there is no real software that will do that kind of testing, even in the future, selftest is the only way. Security features never come quickly, adding syscall to the kernel is the first step to allow apps to use it. > > > > Yes. I promised to run some tests, which I did, with the existing self > > test (that passed), also I added more tests in the mremap selftest. > > However I'm bound by the time that I can spend on this (my other > > duties and deliverables), I can't test it as much as I like to for > > in-loop change (in a time frame demanded by a dev in this ml). Because > > this patch is not getting tested as it should be, my confidence for > > the V2 patch is low . > > Ok so you: tried to explain to me how to run selftests in v1 (when I > actively did _before sending_, and found a bug in your tests, and > wrote about it in-depth), pledge to "run some tests", never get back > to us, push the "the testsuite I wrote is lacking" concern onto me, > send a whole separate parallel patch set that tries to address _one_ > of the regressions with complete disregard for the work done here, > complain about a lack of time, and now say a backhanded "your > confidence for the V2 patch is low". > > I seriously have no words. > I want to stress I have no way to test real software that uses mseal > because APPARENTLY THERE IS NONE. THE FEATURE ADDED EXPLICITLY FOR > CHROMIUM IS NOT USED BY UPSTREAM CHROMIUM. > > > > > 2 perf testing > > stress-ng is not stable in my test with Chromebook, and I'm requesting > > Oliver to take more samples [2] . This due diligence assures that > > this patch accurately fulfills its purpose. The in-loop approach adds > > complexity to the code, i.e. future dev is harder to understand the > > sealing logic. Additionally, it sacrifices a security feature that > > makes it harder for an attacker to modify mapping (currently if an > > attacker uses munmap with a large address range, if one of the > > addresses is sealed, the entire range is not modified. In the in-loop > > approach, memory will be unmapped till it hits the sealed memory). > > Wrong. munmap is atomic and always has been. It's required by POSIX. > Please run this test on the latest kernel branch to verify: static void test_munmap_free_multiple_ranges(bool seal) { void *ptr; unsigned long page_size =3D getpagesize(); unsigned long size =3D 8 * page_size; int ret; int prot; setup_single_address(size, &ptr); FAIL_TEST_IF_FALSE(ptr !=3D (void *)-1); /* unmap one page from beginning. */ ret =3D sys_munmap(ptr, page_size); FAIL_TEST_IF_FALSE(!ret); /* unmap one page from middle. */ ret =3D sys_munmap(ptr + 4 * page_size, page_size); FAIL_TEST_IF_FALSE(!ret); /* seal the last page */ if (seal) { ret =3D sys_mseal(ptr + 7 * page_size, page_size); FAIL_TEST_IF_FALSE(!ret); } /* munmap all 8 pages from beginning */ ret =3D sys_munmap(ptr, 8 * page_size); if (seal) { FAIL_TEST_IF_FALSE(ret < 0); /* verify none of existing pages in the range are unmapped= */ size =3D get_vma_size(ptr + page_size, &prot); FAIL_TEST_IF_FALSE(size =3D=3D 3 * page_size); FAIL_TEST_IF_FALSE(prot =3D=3D 4); size =3D get_vma_size(ptr + 5 * page_size, &prot); FAIL_TEST_IF_FALSE(size =3D=3D 2 * page_size); FAIL_TEST_IF_FALSE(prot =3D=3D 4); size =3D get_vma_size(ptr + 7 * page_size, &prot); FAIL_TEST_IF_FALSE(size =3D=3D 1 * page_size); FAIL_TEST_IF_FALSE(prot =3D=3D 4); } else { FAIL_TEST_IF_FALSE(!ret); /* verify all pages are unmapped */ for (int i =3D 0; i < 8; i++) { size =3D get_vma_size(ptr, &prot); FAIL_TEST_IF_FALSE(size =3D=3D 0); } } REPORT_TEST_PASS(); } test_munmap_free_multiple_ranges(true) test_munmap_free_multiple_ranges(false) > I would also ask you to back these partial mprotect (assuming that's > what you meant) concerns with a real attack that takes this into > account, instead of hand waving "security". > While noting that all of these operations can already partially fail, > this is not new (and is explicitly permitted by POSIX for > syscalls-not-named-munmap). > As defence gets stronger, with seccomp,selinux,landlock, attackers now have to find an easier route. > > Therefore, I would like to ascertain the gain. > > The gain is real. I've proven it with will-it-scale, Oliver ran his > tests and found the regression to be gone (and they do seem to do > proper statistical analysis). > I would wager you to find a workload or hardware where *doing > substantially less work* makes for slower code. > > > > > 3 mremap refactor work. > > mremap refactoring is not related to these mmap regressions. In the v3 > I'm cleaning up and sending out tomorrow, I minimally change mremap > (as Liam wanted). > If the test issue is not resolved, V3 will be in the same situation as V2. Best Regards, -Jeff > -- > Pedro