From: Andrei Vagin <avagin@gmail.com>
To: Mark Rutland <mark.rutland@arm.com>
Cc: Andrei Vagin <avagin@google.com>, Will Deacon <will@kernel.org>,
Kees Cook <kees@kernel.org>,
Andrew Morton <akpm@linux-foundation.org>,
Marek Szyprowski <m.szyprowski@samsung.com>,
Cyrill Gorcunov <gorcunov@gmail.com>,
Mike Rapoport <rppt@kernel.org>,
Alexander Mikhalitsyn <alexander@mihalicyn.com>,
linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
linux-mm@kvack.org, criu@lists.linux.dev,
Catalin Marinas <catalin.marinas@arm.com>,
linux-arm-kernel@lists.infradead.org,
Chen Ridong <chenridong@huawei.com>,
Christian Brauner <brauner@kernel.org>,
David Hildenbrand <david@kernel.org>,
Eric Biederman <ebiederm@xmission.com>,
Lorenzo Stoakes <lorenzo.stoakes@oracle.com>,
Michal Koutny <mkoutny@suse.com>,
Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>,
Linux API <linux-api@vger.kernel.org>
Subject: Re: [PATCH 1/4] exec: inherit HWCAPs from the parent process
Date: Wed, 15 Apr 2026 12:27:03 -0700 [thread overview]
Message-ID: <CANaxB-xitpwsU9SmQ1D7q0Jn0Ng1b4WYxzAWF3L-pE+EYzbObg@mail.gmail.com> (raw)
In-Reply-To: <adUhbk0sKT0ucWhJ@J2N7QTR9R3>
Hi Mark,
Thanks for the feedback and sorry for the delay, was on vacation.
Please see my comments inline.
On Tue, Apr 7, 2026 at 8:29 AM Mark Rutland <mark.rutland@arm.com> wrote:
>
> On Fri, Mar 27, 2026 at 05:21:26PM -0700, Andrei Vagin wrote:
> > Hi Mark,
> >
> > I understand all these points and they are valid. However, as I
> > mentioned, we are not trying to introduce a mechanism that will strictly
> > enforce feature sets for every container. While we would like to have
> > that functionality, as you and will mentioned, it would require
> > substantially more complexity to address, and maintainers would unlikely
> > to pick up that complexity.
>
> The crux of my complaint here is that unless you do that (to some
> degree), this is not going to work reliably, even with the constraints
> you outline.
>
> Further, I disagree with your proposed solution of pushing more
> constraints onto userspace (to also consider HWCAPs as overriding other
> mechainsms, etc).
>
> I think that as-is, the approach is flawed.
I would really appreciate it if we could move this conversation toward
how we can make it work.
>
> > Even masking ID registers on a per-container basis would introduce
> > extra complexity that could make architecture maintainers unhappy.
> > There were a few attempts to introduce container CPUID masking on
> > x86_64 in the past.
>
> > In CRIU, we are not aiming to handle every possible workload. Our goal
> > is to target workloads where developers are ready to cooperate and
> > willing to make adjustments to be C/R compatible. The goal here is to
> > provide developers with clear instructions on what they can do to ensure
> > their applications are C/R compatible. When I say "workloads", I mean
> > this in a broad sense. A container might pack a set of tools with
> > different runtimes (Go, Java, libc-based). All these runtimes should
> > detect only allowed features.
>
> I do not think that arbitrary applications (and libraries!) should have
> to pick up additional constraints that are unnecessary without CRIU,
> especially where that goes against deliberate design decisions (e.g.
> features in arm64's HINT instruction space, which are designed to be
> usable in fast paths WITHOUT needing explicit checks of things like
> HWCAPs). Note that those typically *do* have kernel controls.
>
> I think there's a much larger problem space than you anticipate, and
> adding an incomplete solution now is just going to introduce a
> maintenance burden.
I am not adding arbitrary constraints for standard non-CRIU use cases.
Previously, I suggested that standard libraries would need to call prctl
to determine if hwcaps should be used for feature detection. However,
we can avoid this extra syscall by adding the new HWCAP2_CR bit. Then
libraries will simply check this bit in auxv[AT_HWCAP2], meaning the
overhead for "non-criu" cases is just a single bit check.
As for HINT instructions, there are two class of instructions.
The first one doesn't change a process state and they are not required
any special handling in term of checkpoint/restore. If a process is
checkpointed on a newer cpu, and restore it on an older cpu, the older
hardware will simply skip over that instructions. The architectural
state (registers, memory) should remain consistent.
The second class such as PAC are instructions that actually change a
process state. These instructions require kernel/userspace coordination.
For example, usage of PAC keys can be controlled from userspace via prctl.
I mean when support for new instructions is implemented in the kernel,
we will need to consider that userspace should be able to control them.
>
> > Returning to the subject of this patchset: this series extends the role
> > of hwcaps. With this change, we would establish that hwcaps is the
> > "source of truth" for which features an application can safely use. Any
> > other features available on the current CPU would not be guaranteed to
> > remain available after migration to another machine.
> >
> > After this discussion, I found that the current version missed one major
> > thing: there should be a signal indicating that hwcaps must be used for
> > feature detection. Since we will need to integrate this interface into
> > libc, Go, and other runtimes, they definitely should not rely just on
> > hwcaps by default, especially in the early stages. This can be solved
> > via the prctl command. Libraries like libc would call
> > prctl(PR_USER_HWCAP_ENABLED). If this returns true, the runtime knows
> > that only the features explicitly listed in hwcaps should be used.
>
> I do not think we should be pushing that shape of constraint onto
> userspace.
Look at the previous command.
>
> > You are right, the controlled feature set will be limited to features
> > the kernel knows about. And yes, we would need to report CPU features in
> > hwcaps even if the kernel isn't directly involved in handling them.
>
> To be clear, that is not what I am arguing.
>
> As I mentioned before, the way this works on arm64 is that the kernel
> only exposes what it is aware of, even in the ID regs accessible to
> userspace. We usually *can* hide features, and do that for cases of
> mismatched big.LITTLE, virtual machines, etc.
I understand that. My point was that the kernel would need to report
features in hwcaps even if they don't require specific kernel-side
handling.
>
> > Honestly, I am not certain if this is the "right" interface for that,
> > and I would be happy to consider other ideas. I understand that these
> > hwcaps will not work right out of the box, but we need a way to solve
> > this problem. Having a centralized API for CPU/kernel feature detection
> > seems like the right direction.
>
> I think that for better or worse the approach you are tkaing here simply
> does not solve enough of the problem to actually be worthwhile.
This approach mimics solutions that some CRIU users are already
implementing in userspace, but those only work when the user controls/
recompiles all their libraries. I am open to other ideas, but we need a
path forward.
>
> > As for signal frame size and extended states like SVE/SME, we aware
> > about this problem. However, it is partly mitigated by the fact that if
> > an application does not use some features, those states are not placed
> > in the signal frame.
>
> That is not true. The kernel can and will create signal frames for
> architectural state that a task might never have touched.
>
> Generally arm64 creates signal frames for features when the feature
> *exists*, regardless of whether the task has actively manipulated the
> relevant state. For example, on systems with SVE a trivial SVE signal
> frame gets created even if a task only uses the FPSIMD registers, and on
> systms with SME a TPIDR2 signal frame gets created even if the task has
> never read/written TPIDR2.
>
> When restoring, an unrecognised signal frame is treated as invalid, and
> we can require that certain signal frames are present.
You are right; that was my mistake. My only explanation for why we don't
see this failure often is that C/R is rarely triggered while a process
is actually
inside a signal handler. This is definitely a problem that still needs
to be solved.
>
> > In the future, when we construct/reload a signal frame, we could look
> > at a process feature set for a process and generate a frame according
> > to those features...
>
> When you say 'we' here, are you talking about within the kernel, or
> within the userspace C/R mechanism?
... within the kernel.
Thanks,
Andrei
next prev parent reply other threads:[~2026-04-15 19:27 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-03-23 17:53 [PATCH 0/4 v5] " Andrei Vagin
2026-03-23 17:53 ` [PATCH 1/4] " Andrei Vagin
2026-03-23 18:21 ` Mark Rutland
2026-03-24 10:28 ` Will Deacon
2026-03-24 22:19 ` Andrei Vagin
2026-03-27 15:46 ` Andrei Vagin
2026-03-27 16:06 ` Mark Rutland
2026-03-28 0:21 ` Andrei Vagin
2026-04-07 15:23 ` Mark Rutland
2026-04-15 19:27 ` Andrei Vagin [this message]
2026-03-23 22:59 ` Marek Szyprowski
2026-03-23 17:53 ` [PATCH 2/4] arm64: elf: clear MMF_USER_HWCAP on architecture switch Andrei Vagin
2026-03-23 17:53 ` [PATCH 3/4] mm: synchronize saved_auxv access with arg_lock Andrei Vagin
2026-03-23 17:53 ` [PATCH 4/4] selftests/exec: add test for HWCAP inheritance Andrei Vagin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CANaxB-xitpwsU9SmQ1D7q0Jn0Ng1b4WYxzAWF3L-pE+EYzbObg@mail.gmail.com \
--to=avagin@gmail.com \
--cc=akpm@linux-foundation.org \
--cc=aleksandr.mikhalitsyn@futurfusion.io \
--cc=alexander@mihalicyn.com \
--cc=avagin@google.com \
--cc=brauner@kernel.org \
--cc=catalin.marinas@arm.com \
--cc=chenridong@huawei.com \
--cc=criu@lists.linux.dev \
--cc=david@kernel.org \
--cc=ebiederm@xmission.com \
--cc=gorcunov@gmail.com \
--cc=kees@kernel.org \
--cc=linux-api@vger.kernel.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lorenzo.stoakes@oracle.com \
--cc=m.szyprowski@samsung.com \
--cc=mark.rutland@arm.com \
--cc=mkoutny@suse.com \
--cc=rppt@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox