From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4BE60C71136 for ; Tue, 17 Jun 2025 15:36:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB87C6B0088; Tue, 17 Jun 2025 11:36:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A69706B0089; Tue, 17 Jun 2025 11:36:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97EAD6B008A; Tue, 17 Jun 2025 11:36:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 884F46B0088 for ; Tue, 17 Jun 2025 11:36:09 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3BDC1C04CC for ; Tue, 17 Jun 2025 15:36:09 +0000 (UTC) X-FDA: 83565293658.27.D5B652A Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf02.hostedemail.com (Postfix) with ESMTP id 50AEF80009 for ; Tue, 17 Jun 2025 15:36:07 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Y9ypcXB3; spf=pass (imf02.hostedemail.com: domain of jannh@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1750174567; a=rsa-sha256; cv=none; b=IHPGKRkwkkdvMionJKGBnCUEygZeRQn56YFUJBZR1CihXrC8jvDDOKbCqS8J4qTUkAQKM8 9n4n2iEGiiYX1ZmYVH5Qh3UDTPGxHzV09uEIDq8DAeFXX9MjVoKY0cvm8pirmcEtOQexZ0 Eez1JTuhHK6Rj51EJxPAK4k3kTZ7IpU= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Y9ypcXB3; spf=pass (imf02.hostedemail.com: domain of jannh@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=jannh@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1750174567; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZZR3dEk/u66nsznXcEiwi3kE+ppMJMnFoEZMO1y8HpA=; b=IAXzWacgtUPhg1trf2q0Zeye6yzoWp8qD4CEXykX8QkseMQxpU4iZcL5VQWoKFqm6X0dSu u+LGBr6WV+jOqKrtrvaKR+UV7PqRk22s9dpNhnI9higHtHNDmnliChlXL8UUBKCQmT2ev4 +A+RKbKni7YgzGK3Q5SQdkRhSZs3+dg= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-609b169834cso5797a12.0 for ; Tue, 17 Jun 2025 08:36:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1750174566; x=1750779366; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZZR3dEk/u66nsznXcEiwi3kE+ppMJMnFoEZMO1y8HpA=; b=Y9ypcXB3WQR/pxBROgsERHoa1NRSi6IlMRmoHkhAriHY58osPYdVDW67zanCofZc0i Ry3G/iHXLMj6IpjkaP1RZsHipBw0uqKFlOZIUhxwbZWVPSXlRGCVVXsJi9Ro5q2rhWXi cxredbuFRCElHNYN43CO5XDxbONDL6svG5qihRsWK5wIsWcxHSFs+AK6l4jeKBhLQg3x xvmMrbAoaKd3Nc9EQfK0aUBKKX5ko1LsVkkvkthdn5lP5w/N2XwdJF7wG082Zt03zDPf 7D9/+xX0WpDibPaqKuz06EsuylFYHezMmvkYcKe2uCMSbOVR1EQSxLDK4JJ8PDXqie6v GZ9w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1750174566; x=1750779366; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZZR3dEk/u66nsznXcEiwi3kE+ppMJMnFoEZMO1y8HpA=; b=Oop2GGVGNocjLhEDmM9rbfixOvwlMgpxi2SOjcqHv5aueNINFyfCzahp6vqw+D8FfZ rss22LPzfJy1sN8PkYQCb9wAXL+m3sf1pFGJsnhEhZA6c7a/GyFVD5fPemrKJNeIB8db XRf8dbRv6dTrHv1gzwu4+iYa7vNpxDZKGCWVC+3DN7Uj+/HBpmi7mVXIfPA9olQ7z498 zcAn/Y4bZ/7K9rPlbIsAE+7jkExtTLOeiSjgo8TsfWLe6348EW6Eipr7cciN1zs0Qzil yGSMUKmu+i8EHLn6tlFj5zrv09n+Rt71+F+0sRROiogx3sosz9zASjF83mj/S0ppBDMi fc2Q== X-Forwarded-Encrypted: i=1; AJvYcCUnOiT+8y0rVteGbk9LFFPC6xfv9TLKE7MDnpw2pluRQTVpazSk5v1p0PipvBFCCWZ6qRIzjpkIDg==@kvack.org X-Gm-Message-State: AOJu0Yw/iSqCHCGk6hPu2ZQW3FlE2bBjoYd0b5GuYan8j5gHXweTj3Sx yn6aPB7FHZzaL6TThiHAHerbarjR8T6TEyACnK0GIgPtDyfUpfztCFslgfhQZ++fbM9ydXKOYra ZxKzp8RoRDePQ3B5e3vGHedziYqhpKdQPj2W/67QX X-Gm-Gg: ASbGnctVLRpmmr60Ii8hj171RVCHp2f0iCExNOO92wtblzgbYCUOOC9JmIZU4C1C6lg HqgwQ/HFeik3FZz+LUnCkIk/0tBcVlSxTBFI81bEOb0tHq+Dg6u7L/2ChdfalRUV9DTrew/OgrM 4FbhLyrmGHE8kUi36xocznHjZtxCdb4b9ZjqpBSN912a++thDB6lB44JIwoVtq2AOs4U3734U= X-Google-Smtp-Source: AGHT+IEMDq3jVBJ4rM1jFD8XwfDR1UTh5QHdj90b91BrKHs1WglgB6078RnSB2S37HSvPB4Mso9Dtmz1dnJLP91OjPY= X-Received: by 2002:a50:a402:0:b0:607:41dd:5fe7 with SMTP id 4fb4d7f45d1cf-608da23ac90mr219403a12.1.1750174565193; Tue, 17 Jun 2025 08:36:05 -0700 (PDT) MIME-Version: 1.0 References: <20250528-hugetlb-nerf-v1-1-a404ca33e819@google.com> <20250602204107.177e2fdf2209b0926b5ce28e@linux-foundation.org> In-Reply-To: From: Jann Horn Date: Tue, 17 Jun 2025 17:35:29 +0200 X-Gm-Features: AX0GCFuG41LYjPDFYyDMQtrp9nvAYcI2pYEymmSsmH0R3v86Ulzaq1tIgBF5sgU Message-ID: Subject: Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up To: David Hildenbrand , Mark Brown , Andrew Morton Cc: Muchun Song , Oscar Salvador , linux-mm@kvack.org, Lorenzo Stoakes Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 50AEF80009 X-Stat-Signature: cxqhy1s5x97bc7jk9q87749bbfp1jjd9 X-Rspam-User: X-HE-Tag: 1750174567-587060 X-HE-Meta: U2FsdGVkX1+oWJ+Ct0wcMTbKASxe9uMrq1yjmmAN/XtyzixQRJMZ2nJJOf2F0dKmNTVn0WKhQ+7LU3BWsAIuEVCKx30yL3ZPvgv4n5gUz3SktXhBXonuY8KiQDbbDyU6R9dor32CrHSWLpITyCNjotVtbdgYosNXPf9u0T+g9BislkDB0g0cdDtXSLiH5M4SA19BGupbB4WV7fpFN8w4RqnehkZFSVer4FiLmj5tTyni3RWwPU+0Pnj6u2iPiw3ZaipDCaVbLOtTKn/H+7DF/pD8SyyJxNS04mH+4rKHE+MUAur2fKIXTjI/GmluiYJ2TaGo5SF1cDeJzF+MJSG+umA+8DYA3aC+povrX5AlamlJHQ0WOn5K4BOaWjbuZPOm1b1Deikfu232JsGSkxx1SdCX8oaLRxiVAF//Gyvx/qFidOU9EqrSN6hU+tJjNVIaV+X80CEAC0NfK6AofycnHOTboPF9OPGUSrIsumqIhRGD3vSQZNB55qZZ6LZ2pzOgM92sT+Tecwn0dTwNqhsukdYjOuQAMRTfetWRiyTyWWatAHISXvyCzodAESz5ppIPKpza1p78uD0IDzsVRL8OG6x94qJkybbLXFDQ6GlOTB5kI22q8zZSZK6Fe5awPP22O7Z8JNXMgO7aTaT4rP/8y9awtEs9rdfX2FGaMEu7SRZ2IVf2pLSEMJP88JCJeeIdbSPTs4tnjfoCrxXSi9tJH8Rs59knzoDz8pypx4JuiMMQJx/cqMVFibhIdqmntHQIXxhyPuF5zuyOThZLSmWacd0xZtPYDxAmlecWTGm6RMCVctyJuEphJfBiimIUt6/kViL1Cn+BVMSymnz4DYb3WRvK+b+7hOmYj44OQhx7b1IfBxgk1eS68f0FEY65hltriuzkHOa6vj2sO0qJEDB41TTqu0oM3+w3H0R2i3gm522EzhqHhmBDKKL2T6kBQHATblMGR1st6+Wtg/oBD6j U24fvzmL gMkzoJlvrkVNeAU3lcLMM1m8BvtpOib59W6A3OdLI+Y1U6oriJAsg7I5GzZ9tQ7+TYJ3EMFdF5w+DzsqJsag08aU2Wiz+glQ9RGWBN5sjZ4WWAHgoEUuFXyMQ79dm4cuWDWAOVYgfVDEaEJbBOzpN+hWeIKSq4jB2whO0GusR5ICSQ1ZUGnGl/l1symmoxT+OznFC9lsmiVUezue3Cn+T25l3HfhDeUJjfuJlPwzf0qvTDUVSjstvsS2mu/ZGnfxpsQbIrP+a4+6rcPUnm/H4q26WVMCkhDPzz1YufT0bJaCif3E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 17, 2025 at 11:13=E2=80=AFAM David Hildenbrand wrote: > On 03.06.25 06:29, Jann Horn wrote: > > On Tue, Jun 3, 2025 at 5:41=E2=80=AFAM Andrew Morton wrote: > >> On Wed, 28 May 2025 19:51:29 +0200 Jann Horn wrote: > >>> Many distro kernels enable hugetlb support, but most systems running > >>> those kernels never actually allocate hugepages or enable hugetlb > >>> overcommit. > >>> > >>> On such systems, hugetlb is unusable for any legitimate usecase, but = it > >>> is still possible to exercise a lot of hugetlb-specific code by creat= ing > >>> MAP_HUGETLB|MAP_NORESERVE VMAs - for example, it is still possible to > >>> create page tables shared across processes. > >>> > >>> This is exposed through the mmap() syscall, with no privileges requir= ed, > >>> so from a security perspective, this is interesting attack surface. > >>> > >>> Lock it down by completely denying creation of hugetlb files if no hu= ge > >>> pages for the hstate could be allocated without administratively > >>> changing huge page limits. > >> > >> So this is a non-backward-compatible change? > > > > Yes, this change changes kernel behavior that is userspace-visible, > > and causes syscalls to return errors where they worked before. > > > >> If any userspace is affected it's probably either stupid or evil, but = I > >> do wonder if there are legit cases for doing this, such as "I don't > >> know if there are any hugepages configured, but I'll try this anyway > >> and figure out what to do later on". And maybe there are other legit > >> cases! > > > > Right. I think an affected case would be if userspace tries to detect > > whether the kernel supports hugepages by creating a MAP_NORESERVE > > mapping or huge memfd, and if that works, twiddles sysfs knobs to > > actually allocate hugepages or shows a specific error message. Such a > > program might end up wrongly assuming that the kernel does not support > > hugepages. My understanding is that hugepages are normally > > administratively configured so that they can be allocated early during > > boot without having to worry about RAM fragmentation, in which case > > this probably wouldn't happen, but it's not like I actually have a > > good understanding of how typical hugetlb users work. > > > > Another affected case would be if userspace confirms that the kernel > > supports hugetlb through sysfs or such, then creates a MAP_NORESERVE > > hugetlb and asserts that this must work because MAP_NORESERVE more or > > less can't fail, and crashes with an assertion failure or such. > > > > My understanding is that the combination of MAP_HUGETLB and > > MAP_NORESERVE is somewhat rare in the first place; searching debian > > codesearch for both flags on the same line, I basically only get one > > hit in the "gridtools" package, though there might well be other cases > > where the flags are set on separate lines. memfd_create(MFD_HUGETLB) > > seems to be more common. > > QEMU can trigger this, and there might be corner-case use cases where > you setup a virtio-mem device (to hotplug memory later) when staring the > VM, but actually allocate the huge pages only when wanting to provide > them to the VM. > > It's not that common, because usually you back all your VM through huge > pages, not just hotplugged memory. > > But it's definitely possible ... Okay, yeah, sounds like we should drop this patch for now, and I might come back with a more targeted mitigation later.