From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 08063C5AE59 for ; Tue, 3 Jun 2025 04:30:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 964156B03A1; Tue, 3 Jun 2025 00:30:05 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 914CA6B03A2; Tue, 3 Jun 2025 00:30:05 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82BE26B03A3; Tue, 3 Jun 2025 00:30:05 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 640776B03A1 for ; Tue, 3 Jun 2025 00:30:05 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4D1898015E for ; Tue, 3 Jun 2025 04:30:04 +0000 (UTC) X-FDA: 83512811928.13.CD6083C Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf15.hostedemail.com (Postfix) with ESMTP id 85F93A000C for ; Tue, 3 Jun 2025 04:30:02 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="C/UKH4JF"; spf=pass (imf15.hostedemail.com: domain of jannh@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1748925002; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Z1+7o4KoEnqf8Z0fXy/X9L6TdQCsRqIuV4LnjSLvxu0=; b=sNsICYNikl3xEOeC5xEto2MAazz7dnUKeSqcole5kv9ILJz60sI94UzIE6KDWZ0lfODm8Z 6WJC+vSpcM8RgtwvJshl/8eWrLb0MBrAhcDYP5ZoW3Q94ZihNIuHqta8gJG9l46ky7M8fI v2aqzbvrAA/3oKcgAddOr/W4vK0lLW0= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="C/UKH4JF"; spf=pass (imf15.hostedemail.com: domain of jannh@google.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1748925002; a=rsa-sha256; cv=none; b=XCWKycNklyP6920FTXatpMFF9PRbXK4at2Pr6x6wZXODxzOdP5n4YEQArLpfENjjt5k2cA yyZQRuUwtB2pKZRGfjhAfh7vNSfktCO91HdZaqlCJYnGaMh5rChEj2Kv5wuPt40EhfXW3p 7gYeSygWrKqC4BRZKHFgKkGW5OgWEwM= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-6000791e832so3838a12.1 for ; Mon, 02 Jun 2025 21:30:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1748925001; x=1749529801; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=Z1+7o4KoEnqf8Z0fXy/X9L6TdQCsRqIuV4LnjSLvxu0=; b=C/UKH4JFrZsDjhkUNLKPEcRD6ZNIZYpCEbuBiI0Qulc3gNtyNOhj2mwTZasshmuOFE VWrm6yvcuM1ccpXuU6UcLKfNII6SbOGi+j76LAqpZb5paXMJkxGjYpbMrz7hCVu+xnIS pxHjjKGL8aQalLXoAWngMpQTfmkzMzr2Qr1q3plVnkNkAU8rk+DgfA5ET/XAcJw/gfHg Sand1iVBf0ftYIAnP6Rtf+0w+3njUm6XCew9e/2+i3/TaVckcwacCfX+jg6VbWwEgIjn EVaiFK92Ubxy9MrL/xvThAHbW5zJ1jgtfjjmxn6uk1pUbE9jmpPVXwoZtziAMLQGoMBk qh6g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1748925001; x=1749529801; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=Z1+7o4KoEnqf8Z0fXy/X9L6TdQCsRqIuV4LnjSLvxu0=; b=TsdXy3w7atIVEneloCnFU8qqvnP2s6/nhjjvd3R0W19XYxyQEJakDAKgq2Ww9Jo8KT lKGWPoX7DiZd9zaXHH1czKyIU/nT8xulQvO8egr8ZpEYS0o1MR4iyPpAjCG7sdI75T4i qpVFPhCbYdu/IZZTrvphssfSHBD6An1VHKTMxsk4Bu0N08GmeFsoILOIjYYXIDMKCg+k mVNWMBRCcCAAGAY35rUVoWLfhP3hZqHkNCdBrtFmW/ocnz4zk9RDwJVfXdP1SFiTSq0W EvJEKV627HPOWR3PbIL7j3/sWR0bpe82pIoAQs57QZ1dPU4x6OJh1VQrtO24sh9nzweK QLwQ== X-Forwarded-Encrypted: i=1; AJvYcCVr2nv1aNnZXhNZWikma+lSDxSlwJCWjog6v5pg1kuukXClXVZDFnYsdSgv4YQ1Aw5YCiGydR3UZA==@kvack.org X-Gm-Message-State: AOJu0YwdpKwnRZXJemlIDg32rUAThEcaZNVvkPml8oM19SAcv+IG5N7m FXmvV/MNtdyUYqUr5acP9ELMnymYBCTbZZg4b6xsVcOVnclBJXcR18x3PezV61dsJDLR9vPSuRN Cc0+itqh+tsf3SYiVEn5n91EXTeicirhguJZE+7WJ X-Gm-Gg: ASbGncsXzbX7iKGX+VU2C+ZDAifc3Yhl+t2SbOOChuRv0MDyWaLfGqGiIUtqUbzu7W+ fd2SxuLToFKkPZ8nRTol4ruxftcNA8EeVDQO5ZcDO98BVqkyNad/UEtSHLXV+jU9AMrJaAnAYWR 2oZUQE8BF7rRTKvmV4D4UpGMy1n5JqmxNhm5ND6rIKUuyRKTdfSuUQwBQ+nMk76/QHcN9hkNND X-Google-Smtp-Source: AGHT+IGC8FwrZisTMAjJEwMpof9Vr2CIt4szGcesPNnsMzj1vv0leRpeo+uyIq47rmlPiKYBxDsTPTYXtjNTfys/Q84= X-Received: by 2002:a50:a453:0:b0:606:9e39:3036 with SMTP id 4fb4d7f45d1cf-606afa1ea8dmr35876a12.5.1748925000582; Mon, 02 Jun 2025 21:30:00 -0700 (PDT) MIME-Version: 1.0 References: <20250528-hugetlb-nerf-v1-1-a404ca33e819@google.com> <20250602204107.177e2fdf2209b0926b5ce28e@linux-foundation.org> In-Reply-To: <20250602204107.177e2fdf2209b0926b5ce28e@linux-foundation.org> From: Jann Horn Date: Tue, 3 Jun 2025 06:29:24 +0200 X-Gm-Features: AX0GCFvpnd6PqTKMP74du5YaxNSj32B7sdTurc3pKO2t1p6_uvAk_VodclWSH5c Message-ID: Subject: Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up To: Andrew Morton Cc: Muchun Song , Oscar Salvador , linux-mm@kvack.org, Lorenzo Stoakes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: tquohzag84gbapdkb4kosc5gq3dras8j X-Rspamd-Queue-Id: 85F93A000C X-Rspamd-Server: rspam11 X-HE-Tag: 1748925002-158261 X-HE-Meta: U2FsdGVkX19sStsJwopfSlapwPBPC2gsKRY5wh2IxlPgUIPNEncjX3ibtW6eWPd0s30cYFUGYaVs0pX46vfbiuhSPb2uvLMusQEBIsCGopDBs9hOuvQnnCiXvYCUKO7M+gE0S/Jcr+kajtttR5djcekzgxh7fW1OtmSrZ7nXmkuQAdqnOOj3h8BWLM5A0QBkWmWD5IEAGCQLy8HLZeCHPeTqtler1mO8zgNVlI04Onsjeol+kIANgpmPBrC85kdEHkXwrIJZKBFvrLB3vH9x6H+5hGC/ajZQbccVMItCENT+5V+yPf1ik/iKHbVXpkQHLw4WgX5pc9U+b4aUQDBfv6dU1IzR+bFd1es8vyAkehmtJ2J/esg6fhmOYWeT/qXG2bj+1jg83318xsJNHaksAFY0WzuonqMT39VgDVmQ9FJ846zLXsueHFy9LMgkS9/4kCp+BAjo5XQK/xAgD9EZ2fy8xxeCZvHIpvwGh8FTY5mqBg8yVTRRDUO4301XaSBdipAYBOkm1nHgKy123oSjiiydc0K6Mpf6RxcBqdbRY6uheisJMwiQvQ0/n2v4ngQfP/Dj4eHb0Vj1AxaTrRH2+bc3meZAylqz1rS/jqo+nSm5kv1L77YxlHpgbogAj76hSByVCIa0lAp/dyaBZAaj2DXYrveoj2QYFUJ2s+eNwVmpZbzsLODO/TMITSrXj/i9Z8g8D6+NBiLIijNqsXTBUEAk/z2fTG9SYkXMNemGBn0wWU9GeHqRV8mxbiuCO373eWTYkuVUl8hFa0xcpKQ0SnPOWCYtHRIZnPuq1YQU02UUn1dnF1ANijoQbM+4dT2TAwjBaXb9sHBiK4WEEMdTf7LAbZqMetmi7GiA/J8RdJFHkNEHvrrDQPFNhAYFM4iFplA7mqofUvAgC4IQSHqC0ciqLJre4Ixw X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 3, 2025 at 5:41=E2=80=AFAM Andrew Morton wrote: > On Wed, 28 May 2025 19:51:29 +0200 Jann Horn wrote: > > Many distro kernels enable hugetlb support, but most systems running > > those kernels never actually allocate hugepages or enable hugetlb > > overcommit. > > > > On such systems, hugetlb is unusable for any legitimate usecase, but it > > is still possible to exercise a lot of hugetlb-specific code by creatin= g > > MAP_HUGETLB|MAP_NORESERVE VMAs - for example, it is still possible to > > create page tables shared across processes. > > > > This is exposed through the mmap() syscall, with no privileges required= , > > so from a security perspective, this is interesting attack surface. > > > > Lock it down by completely denying creation of hugetlb files if no huge > > pages for the hstate could be allocated without administratively > > changing huge page limits. > > So this is a non-backward-compatible change? Yes, this change changes kernel behavior that is userspace-visible, and causes syscalls to return errors where they worked before. > If any userspace is affected it's probably either stupid or evil, but I > do wonder if there are legit cases for doing this, such as "I don't > know if there are any hugepages configured, but I'll try this anyway > and figure out what to do later on". And maybe there are other legit > cases! Right. I think an affected case would be if userspace tries to detect whether the kernel supports hugepages by creating a MAP_NORESERVE mapping or huge memfd, and if that works, twiddles sysfs knobs to actually allocate hugepages or shows a specific error message. Such a program might end up wrongly assuming that the kernel does not support hugepages. My understanding is that hugepages are normally administratively configured so that they can be allocated early during boot without having to worry about RAM fragmentation, in which case this probably wouldn't happen, but it's not like I actually have a good understanding of how typical hugetlb users work. Another affected case would be if userspace confirms that the kernel supports hugetlb through sysfs or such, then creates a MAP_NORESERVE hugetlb and asserts that this must work because MAP_NORESERVE more or less can't fail, and crashes with an assertion failure or such. My understanding is that the combination of MAP_HUGETLB and MAP_NORESERVE is somewhat rare in the first place; searching debian codesearch for both flags on the same line, I basically only get one hit in the "gridtools" package, though there might well be other cases where the flags are set on separate lines. memfd_create(MFD_HUGETLB) seems to be more common. But yeah, I can't rule out that this would break something, and I sort of hope that the hugetlb maintainers might have some idea how likely such a scenario would be. If we think that there's a realistic chance of breaking something with this, we shouldn't do this and I could try to cook up a more limited patch that maybe only gates more specific parts of hugetlb on this check in a less user-visible way (perhaps bailing out earlier on hugetlb page faults); but I think that would also reduce the utility of the patch somewhat. I did think about whether this is the kind of borderline-breaking change that should include a pr_warn_once() to inform the user that their system encountered a specific behavioral difference due to a kernel change, in case it does unexpectedly break something; I decided against it, but if someone thinks this is sufficiently close to a breaking change to warrant that, I'll add that.