From: Jann Horn <jannh@google.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Muchun Song <muchun.song@linux.dev>,
Oscar Salvador <osalvador@suse.de>,
linux-mm@kvack.org, Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
Subject: Re: [PATCH] hugetlb: block hugetlb file creation if hugetlb is not set up
Date: Tue, 3 Jun 2025 06:29:24 +0200 [thread overview]
Message-ID: <CAG48ez0fQ5Ukg=0Scsv=N2k2jS=YRQpxLFeQRwMREABUr6yrgQ@mail.gmail.com> (raw)
In-Reply-To: <20250602204107.177e2fdf2209b0926b5ce28e@linux-foundation.org>
On Tue, Jun 3, 2025 at 5:41 AM Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 28 May 2025 19:51:29 +0200 Jann Horn <jannh@google.com> wrote:
> > Many distro kernels enable hugetlb support, but most systems running
> > those kernels never actually allocate hugepages or enable hugetlb
> > overcommit.
> >
> > On such systems, hugetlb is unusable for any legitimate usecase, but it
> > is still possible to exercise a lot of hugetlb-specific code by creating
> > MAP_HUGETLB|MAP_NORESERVE VMAs - for example, it is still possible to
> > create page tables shared across processes.
> >
> > This is exposed through the mmap() syscall, with no privileges required,
> > so from a security perspective, this is interesting attack surface.
> >
> > Lock it down by completely denying creation of hugetlb files if no huge
> > pages for the hstate could be allocated without administratively
> > changing huge page limits.
>
> So this is a non-backward-compatible change?
Yes, this change changes kernel behavior that is userspace-visible,
and causes syscalls to return errors where they worked before.
> If any userspace is affected it's probably either stupid or evil, but I
> do wonder if there are legit cases for doing this, such as "I don't
> know if there are any hugepages configured, but I'll try this anyway
> and figure out what to do later on". And maybe there are other legit
> cases!
Right. I think an affected case would be if userspace tries to detect
whether the kernel supports hugepages by creating a MAP_NORESERVE
mapping or huge memfd, and if that works, twiddles sysfs knobs to
actually allocate hugepages or shows a specific error message. Such a
program might end up wrongly assuming that the kernel does not support
hugepages. My understanding is that hugepages are normally
administratively configured so that they can be allocated early during
boot without having to worry about RAM fragmentation, in which case
this probably wouldn't happen, but it's not like I actually have a
good understanding of how typical hugetlb users work.
Another affected case would be if userspace confirms that the kernel
supports hugetlb through sysfs or such, then creates a MAP_NORESERVE
hugetlb and asserts that this must work because MAP_NORESERVE more or
less can't fail, and crashes with an assertion failure or such.
My understanding is that the combination of MAP_HUGETLB and
MAP_NORESERVE is somewhat rare in the first place; searching debian
codesearch for both flags on the same line, I basically only get one
hit in the "gridtools" package, though there might well be other cases
where the flags are set on separate lines. memfd_create(MFD_HUGETLB)
seems to be more common.
But yeah, I can't rule out that this would break something, and I sort
of hope that the hugetlb maintainers might have some idea how likely
such a scenario would be. If we think that there's a realistic chance
of breaking something with this, we shouldn't do this and I could try
to cook up a more limited patch that maybe only gates more specific
parts of hugetlb on this check in a less user-visible way (perhaps
bailing out earlier on hugetlb page faults); but I think that would
also reduce the utility of the patch somewhat.
I did think about whether this is the kind of borderline-breaking
change that should include a pr_warn_once() to inform the user that
their system encountered a specific behavioral difference due to a
kernel change, in case it does unexpectedly break something; I decided
against it, but if someone thinks this is sufficiently close to a
breaking change to warrant that, I'll add that.
next prev parent reply other threads:[~2025-06-03 4:30 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-05-28 17:51 Jann Horn
2025-06-03 3:41 ` Andrew Morton
2025-06-03 4:29 ` Jann Horn [this message]
2025-06-03 5:43 ` Oscar Salvador
2025-06-03 19:14 ` Jann Horn
2025-06-04 2:54 ` Andrew Morton
2025-06-16 22:09 ` Mark Brown
2025-06-17 9:13 ` David Hildenbrand
2025-06-17 15:35 ` Jann Horn
2025-06-17 8:12 ` kernel test robot
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAG48ez0fQ5Ukg=0Scsv=N2k2jS=YRQpxLFeQRwMREABUr6yrgQ@mail.gmail.com' \
--to=jannh@google.com \
--cc=akpm@linux-foundation.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=muchun.song@linux.dev \
--cc=osalvador@suse.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox