[LSF/MM/BPF TOPIC] eBPF isolation with pkeys

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
@ 2026-02-12 16:22 Yeoreum Yun
  2026-02-12 16:36 ` Dave Hansen
  2026-02-12 17:44 ` Alexei Starovoitov
  0 siblings, 2 replies; 12+ messages in thread
From: Yeoreum Yun @ 2026-02-12 16:22 UTC (permalink / raw)
  To: lsf-pc, linux-mm, bpf
  Cc: catalin.marinas, david, ryan.roberts, kevin.brodsky,
	sebastian.osterlund, dave.hansen, rick.p.edgecombe, yeoreum.yun

Hi all,

I would like to propose the topic of eBPF isolation with pkeys at the
upcoming LSF/MM/BPF summit.

Background
==========

Today, eBPF programs provide powerful capabilities to extend kernel
functionality without requiring modifications to the kernel itself.
These capabilities are largely enabled by the eBPF verifier, which
enforces memory safety and other constraints to protect the kernel.

However, vulnerabilities in the verifier have repeatedly demonstrated that
eBPF programs can also become a serious attack surface.  In several cases,
flaws in verifier logic have allowed malicious eBPF programs to bypass
safety guarantees and compromise kernel security.

Representative CVEs include:

     - CVE-2020-8835   [1]
     - CVE-2021-3490   [2]
     - CVE-2022-23222  [3]
     - CVE-2023-2163   [4]

To mitigate security risks arising from verifier vulnerabilities, this
proposal introduces an approach to isolate eBPF programs using pkeys
implemented on top of the ARMv8.9 Permission Overlay Extension (POE) in
arm64 [5].

eBPF isolation with pkeys
=========================

The goals of eBPF isolation are as follows:

  1. Prohibit eBPF programs from writing to memory that they
     are not permitted to access.

  2. Prohibit eBPF programs from executing memory that they are
     not permitted to access.

  3. Allow kernel memory writes and code execution only through controlled
     interfaces such as KFUNCS and BPF helpers.

Conceptually, the model can be illustrated as follows:
┌──────────────────────────────────────────────────────────────────┐
│                                     executable memory (RO)       │
│            ┌──────────┐          ┌────────┬─────────┬────────┐   │
│            │          │          │        │         │        │   │
│            │  KFUNCS  │  allowed │  BPF   │         │  BPF   │   │
│            │          │◄─────────┤  PROG  ◄─────────►  PROG  │   │
│            └──────────┘          │        │         │        │   │
│                                  ├────────┘         └────────┤   │
│                                  │                           │   │
│                                  │              ┌────────┐   │   │
│               ┌───────────┐      │              │        │   │   │
│               │ arbitrary │◄─────┼────X-────────┤  BPF   │   │   │
│               │    w/x    │      │              │  PROG  │   │   │
│               └───────────┘      │              │        │   │   │
│                                  │              └────────┘   │   │
│                                  │                           │   │
│   ┌─────────┐◄───┐               │    ┌────────┐             │   │
│   └─────────┘    ├───────────┐   │    │        │             │   │
│                  │    BPF    │◄──┼────┤  BPF   │             │   │
│   ┌─────────┐◄───┤  HELPERS  │   │    │  PROG  │             │   │
│   └─────────┘    └───────────┘   │    │        │             │   │
│                                  └────┴────────┴─────────────┘   │
│                                                                  │
└──────────────────────────────────────────────────────────────────┘

eBPF isolation is implemented by operating with some pkeys for example:

  - PKEY_DEFAULT: used for general kernel memory
  - PKEY_EBPF   : used for memory owned by eBPF programs

Additionally, an extra pkey can be used for specific data or code that is
shared between the core kernel and eBPF programs.

During execution, if an eBPF program attempts to access memory marked with
PKEY_DEFAULT without going through KFUNCS or BPF helpers, a permission
overlay fault is raised by the hardware.

As a result, even if a malicious eBPF program is loaded due to a verifier
vulnerability, any attempt to arbitrarily modify or execute kernel memory
will cause the program to fault and terminate.

This prevents further exploitation and significantly reduces the impact of
verifier bugs.

Goals of the discussion
=======================

The goal of this session is to share concrete ideas on how eBPF isolation
can be implemented using pkeys.

To achieve effective isolation, all memory directly accessible by eBPF
programs must be marked with non-default pkey.  This requires kernel memory
allocators to become pkeys aware.

To that end, this discussion introduces a set of new allocator APIs and
explores more extensible API designs:

  - kmalloc_pkey series
  - vmalloc_pkey series
  - alloc_percpu_pkey series

We also aim to discuss how existing kernel allocators can be internally
extended to propagate and enforce pkey information.

Current status
==============

An RFC series is planned for around Q2 2026, and the experimental
implementations for eBPF isolation with pkey and pkey-aware memory
allocators have already been completed internally.  Using these
implementations, we verified that eBPF programs running under isolation
successfully execute several sched_ext applications provided by
tools/sched_ext, as well as some bpf kselftest cases.

Reference
=========

[1] https://www.zerodayinitiative.com/blog/2020/4/8/cve-2020-8835-linux-kernel-privilege-escalation-via-improper-ebpf-program-verification
[2] https://github.com/chompie1337/Linux_LPE_eBPF_CVE-2021-3490
[3] https://github.com/tr3ee/CVE-2022-23222
[4] https://bughunters.google.com/blog/a-deep-dive-into-cve-2023-2163-how-we-found-and-fixed-an-ebpf-linux-kernel-vulnerability
[5] https://developer.arm.com/community/arm-community-blogs/b/architectures-and-processors-blog/posts/arm-a-profile-architecture-2022

--
Sincerely,
Yeoreum Yun
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-12 16:22 [LSF/MM/BPF TOPIC] eBPF isolation with pkeys Yeoreum Yun
@ 2026-02-12 16:36 ` Dave Hansen
  2026-02-12 17:14   ` Yeoreum Yun
  2026-02-12 17:44 ` Alexei Starovoitov
  1 sibling, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2026-02-12 16:36 UTC (permalink / raw)
  To: Yeoreum Yun, lsf-pc, linux-mm, bpf
  Cc: catalin.marinas, david, ryan.roberts, kevin.brodsky,
	sebastian.osterlund, dave.hansen, rick.p.edgecombe

On 2/12/26 08:22, Yeoreum Yun wrote:
> Current status
> ==============
> 
> An RFC series is planned for around Q2 2026, and the experimental
> implementations for eBPF isolation with pkey and pkey-aware memory
> allocators have already been completed internally.  Using these
> implementations, we verified that eBPF programs running under isolation
> successfully execute several sched_ext applications provided by
> tools/sched_ext, as well as some bpf kselftest cases.

If you have code, post the code, please. It doesn't matter how ugly it is.

> To that end, this discussion introduces a set of new allocator APIs and
> explores more extensible API designs:
> 
>   - kmalloc_pkey series
>   - vmalloc_pkey series
>   - alloc_percpu_pkey series

It all sounds fun, but this doesn't exactly seem very generic. The meory
that sched_ext needs to access is super different from, say, what a
socket-filtering eBPF program would need.

So this doesn't seem to be likely to be true "eBPF isolation" as much as
sched_ext+eBPP isolation.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-12 16:36 ` Dave Hansen
@ 2026-02-12 17:14   ` Yeoreum Yun
  2026-02-12 18:14     ` Dave Hansen
  0 siblings, 1 reply; 12+ messages in thread
From: Yeoreum Yun @ 2026-02-12 17:14 UTC (permalink / raw)
  To: Dave Hansen
  Cc: lsf-pc, linux-mm, bpf, catalin.marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, dave.hansen,
	rick.p.edgecombe

Hi Dave,

[...]
>
> > To that end, this discussion introduces a set of new allocator APIs and
> > explores more extensible API designs:
> >
> >   - kmalloc_pkey series
> >   - vmalloc_pkey series
> >   - alloc_percpu_pkey series
>
> It all sounds fun, but this doesn't exactly seem very generic. The meory
> that sched_ext needs to access is super different from, say, what a
> socket-filtering eBPF program would need.
>
> So this doesn't seem to be likely to be true "eBPF isolation" as much as
> sched_ext+eBPP isolation.

Our current isolation model focuses on restricting writes and execution.
Therefore, if we allocate only the memory that eBPF programs must write
directly with a separate pkey (e.g., packet data or sock),
it seems to me that socket-filtering programs could also benefit from
the same isolation.

Is there anything I might be overlooking?

--
Sincerely,
Yeoreum Yun
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-12 17:14   ` Yeoreum Yun
@ 2026-02-12 18:14     ` Dave Hansen
  2026-02-16  9:57       ` Yeoreum Yun
  0 siblings, 1 reply; 12+ messages in thread
From: Dave Hansen @ 2026-02-12 18:14 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: lsf-pc, linux-mm, bpf, catalin.marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, dave.hansen,
	rick.p.edgecombe

BTW, one high-level thing: what you're talking about here really is a
kind of hardening or kernel self-protection. It's in the spirit of Kees
Cook's work: how can kernel security be resilient even in the face of
kernel bugs and attackers exploiting those bugs?

Those who buy into the idea of Kees's work will likely agree with the
premise of this patch set. Those that don't, won't. :)

On 2/12/26 09:14, Yeoreum Yun wrote:
>>> To that end, this discussion introduces a set of new allocator APIs and
>>> explores more extensible API designs:
>>>
>>>   - kmalloc_pkey series
>>>   - vmalloc_pkey series
>>>   - alloc_percpu_pkey series
>>
>> It all sounds fun, but this doesn't exactly seem very generic. The memory
>> that sched_ext needs to access is super different from, say, what a
>> socket-filtering eBPF program would need.
>>
>> So this doesn't seem to be likely to be true "eBPF isolation" as much as
>> sched_ext+eBPP isolation.
> 
> Our current isolation model focuses on restricting writes and execution.
> Therefore, if we allocate only the memory that eBPF programs must write
> directly with a separate pkey (e.g., packet data or sock),
> it seems to me that socket-filtering programs could also benefit from
> the same isolation.
This means that subsystems using eBPF need to allocate their data
structures separately, or at least in a pkey-aware manner. They either
need to declare the memory at allocation time, or need to be able to pay
the cost (and the collateral damage) of changing its pkey after allocation.

This _might_ be doable for the scheduler. It probably has a limited set
of things that get written to. Most of it is statically allocated.

Networking isn't my strong suit, but packet memory seems rather
dynamically allocated and also needs to be written to by eBPF programs.
I suspect anything that slows packet allocation down by even a few
cycles is a non-starter.

IMNHO, _any_ approach to solving this problem that start with: we just
need a new allocator or modification to existing kernel allocators to
track a new memory type makes it a dead end. Or, best case, a very
surgical, targeted solution.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-12 18:14     ` Dave Hansen
@ 2026-02-16  9:57       ` Yeoreum Yun
  0 siblings, 0 replies; 12+ messages in thread
From: Yeoreum Yun @ 2026-02-16  9:57 UTC (permalink / raw)
  To: Dave Hansen
  Cc: lsf-pc, linux-mm, bpf, catalin.marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, dave.hansen,
	rick.p.edgecombe

Hi Dave,

> BTW, one high-level thing: what you're talking about here really is a
> kind of hardening or kernel self-protection. It's in the spirit of Kees
> Cook's work: how can kernel security be resilient even in the face of
> kernel bugs and attackers exploiting those bugs?
>
> Those who buy into the idea of Kees's work will likely agree with the
> premise of this patch set. Those that don't, won't. :)

Yes exactly. But one of my small wish is most people view this positively...

>
> On 2/12/26 09:14, Yeoreum Yun wrote:
> >>> To that end, this discussion introduces a set of new allocator APIs and
> >>> explores more extensible API designs:
> >>>
> >>>   - kmalloc_pkey series
> >>>   - vmalloc_pkey series
> >>>   - alloc_percpu_pkey series
> >>
> >> It all sounds fun, but this doesn't exactly seem very generic. The memory
> >> that sched_ext needs to access is super different from, say, what a
> >> socket-filtering eBPF program would need.
> >>
> >> So this doesn't seem to be likely to be true "eBPF isolation" as much as
> >> sched_ext+eBPP isolation.
> >
> > Our current isolation model focuses on restricting writes and execution.
> > Therefore, if we allocate only the memory that eBPF programs must write
> > directly with a separate pkey (e.g., packet data or sock),
> > it seems to me that socket-filtering programs could also benefit from
> > the same isolation.
> This means that subsystems using eBPF need to allocate their data
> structures separately, or at least in a pkey-aware manner. They either
> need to declare the memory at allocation time, or need to be able to pay
> the cost (and the collateral damage) of changing its pkey after allocation.
>
> This _might_ be doable for the scheduler. It probably has a limited set
> of things that get written to. Most of it is statically allocated.
>
> Networking isn't my strong suit, but packet memory seems rather
> dynamically allocated and also needs to be written to by eBPF programs.
> I suspect anything that slows packet allocation down by even a few
> cycles is a non-starter.
>
> IMNHO, _any_ approach to solving this problem that start with: we just
> need a new allocator or modification to existing kernel allocators to
> track a new memory type makes it a dead end. Or, best case, a very
> surgical, targeted solution.

TBH, I think there is no difference for a _memory_ usage between
Network packet and scheduler since most of BPF program uses
"MAPs" and this is needed to be written directly by them and
"MAPs" are always allocated dynamically not statically and uses
existing allocators' feature.

IMHO, I think it would be better to make existed memory allocator
aware pkey than make a new allocator since the there is no difference
except pkey-aware with existing allocator (and I think
this would make a more code duplication and add more complexity).

Thus, I would like to discuss with the way to extension of
existing allocators first.


--
Sincerely,
Yeoreum Yun
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-12 16:22 [LSF/MM/BPF TOPIC] eBPF isolation with pkeys Yeoreum Yun
  2026-02-12 16:36 ` Dave Hansen
@ 2026-02-12 17:44 ` Alexei Starovoitov
  2026-02-12 18:01   ` Yeoreum Yun
  1 sibling, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2026-02-12 17:44 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: lsf-pc, linux-mm, bpf, Catalin Marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, Dave Hansen, Rick Edgecombe

On Thu, Feb 12, 2026 at 8:24 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
>
> Hi all,
>
> I would like to propose the topic of eBPF isolation with pkeys at the
> upcoming LSF/MM/BPF summit.
>
>
> Background
> ==========
>
> Today, eBPF programs provide powerful capabilities to extend kernel
> functionality without requiring modifications to the kernel itself.
> These capabilities are largely enabled by the eBPF verifier, which
> enforces memory safety and other constraints to protect the kernel.
>
> However, vulnerabilities in the verifier have repeatedly demonstrated that
> eBPF programs can also become a serious attack surface.  In several cases,
> flaws in verifier logic have allowed malicious eBPF programs to bypass
> safety guarantees and compromise kernel security.

eBPF was restricted to root for many years, so the above is simply not true.

> Representative CVEs include:
>
>      - CVE-2020-8835   [1]
>      - CVE-2021-3490   [2]
>      - CVE-2022-23222  [3]
>      - CVE-2023-2163   [4]

None of them are security issues. They're just bugs.
Like all those found by syzbot.

> An RFC series is planned for around Q2 2026, and the experimental
> implementations for eBPF isolation with pkey and pkey-aware memory
> allocators have already been completed internally.  Using these
> implementations, we verified that eBPF programs running under isolation
> successfully execute several sched_ext applications provided by
> tools/sched_ext, as well as some bpf kselftest cases.

The stated goal is wrong, hence not interested in patches
or discussion at lsfmm.

arm has a nice hw feature. Sure, but this is not a place to apply it.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-12 17:44 ` Alexei Starovoitov
@ 2026-02-12 18:01   ` Yeoreum Yun
  2026-02-12 18:37     ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Yeoreum Yun @ 2026-02-12 18:01 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: lsf-pc, linux-mm, bpf, Catalin Marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, Dave Hansen, Rick Edgecombe

Hi Alexei,

> On Thu, Feb 12, 2026 at 8:24 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> >
> > Hi all,
> >
> > I would like to propose the topic of eBPF isolation with pkeys at the
> > upcoming LSF/MM/BPF summit.
> >
> >
> > Background
> > ==========
> >
> > Today, eBPF programs provide powerful capabilities to extend kernel
> > functionality without requiring modifications to the kernel itself.
> > These capabilities are largely enabled by the eBPF verifier, which
> > enforces memory safety and other constraints to protect the kernel.
> >
> > However, vulnerabilities in the verifier have repeatedly demonstrated that
> > eBPF programs can also become a serious attack surface.  In several cases,
> > flaws in verifier logic have allowed malicious eBPF programs to bypass
> > safety guarantees and compromise kernel security.
>
> eBPF was restricted to root for many years, so the above is simply not true.
>
> > Representative CVEs include:
> >
> >      - CVE-2020-8835   [1]
> >      - CVE-2021-3490   [2]
> >      - CVE-2022-23222  [3]
> >      - CVE-2023-2163   [4]
>
> None of them are security issues. They're just bugs.
> Like all those found by syzbot.
>
> > An RFC series is planned for around Q2 2026, and the experimental
> > implementations for eBPF isolation with pkey and pkey-aware memory
> > allocators have already been completed internally.  Using these
> > implementations, we verified that eBPF programs running under isolation
> > successfully execute several sched_ext applications provided by
> > tools/sched_ext, as well as some bpf kselftest cases.
>
> The stated goal is wrong, hence not interested in patches
> or discussion at lsfmm.
>
> arm has a nice hw feature. Sure, but this is not a place to apply it.

That is correct — this is a verifier bug.
However, the concern is that such a bug can lead to a security incident.
Not only root, but also users with CAP_BPF who are allowed to
load eBPF programs could potentially trigger additional security issues
through such bugs.

The proposed isolation mechanism is intended as a safeguard to minimize
the impact of security incidents that could arise from
bugs in the eBPF verifier. For that reason, I believe LSF/MM would be
an appropriate venue to discuss this approach and apply pkeys to here.

Am I overlooking anything?

--
Sincerely,
Yeoreum Yun
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-12 18:01   ` Yeoreum Yun
@ 2026-02-12 18:37     ` Alexei Starovoitov
  2026-02-13 10:08       ` Yeoreum Yun
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2026-02-12 18:37 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: lsf-pc, linux-mm, bpf, Catalin Marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, Dave Hansen, Rick Edgecombe

On Thu, Feb 12, 2026 at 10:03 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
>
> Hi Alexei,
>
> > On Thu, Feb 12, 2026 at 8:24 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> > >
> > > Hi all,
> > >
> > > I would like to propose the topic of eBPF isolation with pkeys at the
> > > upcoming LSF/MM/BPF summit.
> > >
> > >
> > > Background
> > > ==========
> > >
> > > Today, eBPF programs provide powerful capabilities to extend kernel
> > > functionality without requiring modifications to the kernel itself.
> > > These capabilities are largely enabled by the eBPF verifier, which
> > > enforces memory safety and other constraints to protect the kernel.
> > >
> > > However, vulnerabilities in the verifier have repeatedly demonstrated that
> > > eBPF programs can also become a serious attack surface.  In several cases,
> > > flaws in verifier logic have allowed malicious eBPF programs to bypass
> > > safety guarantees and compromise kernel security.
> >
> > eBPF was restricted to root for many years, so the above is simply not true.
> >
> > > Representative CVEs include:
> > >
> > >      - CVE-2020-8835   [1]
> > >      - CVE-2021-3490   [2]
> > >      - CVE-2022-23222  [3]
> > >      - CVE-2023-2163   [4]
> >
> > None of them are security issues. They're just bugs.
> > Like all those found by syzbot.
> >
> > > An RFC series is planned for around Q2 2026, and the experimental
> > > implementations for eBPF isolation with pkey and pkey-aware memory
> > > allocators have already been completed internally.  Using these
> > > implementations, we verified that eBPF programs running under isolation
> > > successfully execute several sched_ext applications provided by
> > > tools/sched_ext, as well as some bpf kselftest cases.
> >
> > The stated goal is wrong, hence not interested in patches
> > or discussion at lsfmm.
> >
> > arm has a nice hw feature. Sure, but this is not a place to apply it.
>
> That is correct — this is a verifier bug.
> However, the concern is that such a bug can lead to a security incident.
> Not only root, but also users with CAP_BPF who are allowed to
> load eBPF programs could potentially trigger additional security issues
> through such bugs.

Again. They are not security issues. cap_bpf is effectively root.
Just like cap_perfmon in tracing space is a root.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-12 18:37     ` Alexei Starovoitov
@ 2026-02-13 10:08       ` Yeoreum Yun
  2026-02-13 21:37         ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: Yeoreum Yun @ 2026-02-13 10:08 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: lsf-pc, linux-mm, bpf, Catalin Marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, Dave Hansen, Rick Edgecombe

Hi Alexei,

> On Thu, Feb 12, 2026 at 10:03 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> >
> > Hi Alexei,
> >
> > > On Thu, Feb 12, 2026 at 8:24 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> > > >
> > > > Hi all,
> > > >
> > > > I would like to propose the topic of eBPF isolation with pkeys at the
> > > > upcoming LSF/MM/BPF summit.
> > > >
> > > >
> > > > Background
> > > > ==========
> > > >
> > > > Today, eBPF programs provide powerful capabilities to extend kernel
> > > > functionality without requiring modifications to the kernel itself.
> > > > These capabilities are largely enabled by the eBPF verifier, which
> > > > enforces memory safety and other constraints to protect the kernel.
> > > >
> > > > However, vulnerabilities in the verifier have repeatedly demonstrated that
> > > > eBPF programs can also become a serious attack surface.  In several cases,
> > > > flaws in verifier logic have allowed malicious eBPF programs to bypass
> > > > safety guarantees and compromise kernel security.
> > >
> > > eBPF was restricted to root for many years, so the above is simply not true.
> > >
> > > > Representative CVEs include:
> > > >
> > > >      - CVE-2020-8835   [1]
> > > >      - CVE-2021-3490   [2]
> > > >      - CVE-2022-23222  [3]
> > > >      - CVE-2023-2163   [4]
> > >
> > > None of them are security issues. They're just bugs.
> > > Like all those found by syzbot.
> > >
> > > > An RFC series is planned for around Q2 2026, and the experimental
> > > > implementations for eBPF isolation with pkey and pkey-aware memory
> > > > allocators have already been completed internally.  Using these
> > > > implementations, we verified that eBPF programs running under isolation
> > > > successfully execute several sched_ext applications provided by
> > > > tools/sched_ext, as well as some bpf kselftest cases.
> > >
> > > The stated goal is wrong, hence not interested in patches
> > > or discussion at lsfmm.
> > >
> > > arm has a nice hw feature. Sure, but this is not a place to apply it.
> >
> > That is correct — this is a verifier bug.
> > However, the concern is that such a bug can lead to a security incident.
> > Not only root, but also users with CAP_BPF who are allowed to
> > load eBPF programs could potentially trigger additional security issues
> > through such bugs.
>
> Again. They are not security issues. cap_bpf is effectively root.
> Just like cap_perfmon in tracing space is a root.

The argument is not about whether the verifier bug is a security issue
per se.  The point is that relying solely on privilege boundaries
(e.g., root-only loading) does not eliminate the impact of a verifier bug.
Therefore, leveraging hardware isolation to further constrain
the blast radius is a defense-in-depth measure.

That said, I may be approaching this from the wrong angle.
It’s possible that I’m somewhat fixated on my own line of reasoning and
failing to see where it might be flawed.

If so, I would really appreciate it if you could point out
what I might be missing or where my reasoning falls short.

Thanks!

--
Sincerely,
Yeoreum Yun
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-13 10:08       ` Yeoreum Yun
@ 2026-02-13 21:37         ` Alexei Starovoitov
  2026-02-16 14:27           ` James Bottomley
  0 siblings, 1 reply; 12+ messages in thread
From: Alexei Starovoitov @ 2026-02-13 21:37 UTC (permalink / raw)
  To: Yeoreum Yun
  Cc: lsf-pc, linux-mm, bpf, Catalin Marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, Dave Hansen, Rick Edgecombe

On Fri, Feb 13, 2026 at 2:10 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
>
> Hi Alexei,
>
> > On Thu, Feb 12, 2026 at 10:03 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> > >
> > > Hi Alexei,
> > >
> > > > On Thu, Feb 12, 2026 at 8:24 AM Yeoreum Yun <yeoreum.yun@arm.com> wrote:
> > > > >
> > > > > Hi all,
> > > > >
> > > > > I would like to propose the topic of eBPF isolation with pkeys at the
> > > > > upcoming LSF/MM/BPF summit.
> > > > >
> > > > >
> > > > > Background
> > > > > ==========
> > > > >
> > > > > Today, eBPF programs provide powerful capabilities to extend kernel
> > > > > functionality without requiring modifications to the kernel itself.
> > > > > These capabilities are largely enabled by the eBPF verifier, which
> > > > > enforces memory safety and other constraints to protect the kernel.
> > > > >
> > > > > However, vulnerabilities in the verifier have repeatedly demonstrated that
> > > > > eBPF programs can also become a serious attack surface.  In several cases,
> > > > > flaws in verifier logic have allowed malicious eBPF programs to bypass
> > > > > safety guarantees and compromise kernel security.
> > > >
> > > > eBPF was restricted to root for many years, so the above is simply not true.
> > > >
> > > > > Representative CVEs include:
> > > > >
> > > > >      - CVE-2020-8835   [1]
> > > > >      - CVE-2021-3490   [2]
> > > > >      - CVE-2022-23222  [3]
> > > > >      - CVE-2023-2163   [4]
> > > >
> > > > None of them are security issues. They're just bugs.
> > > > Like all those found by syzbot.
> > > >
> > > > > An RFC series is planned for around Q2 2026, and the experimental
> > > > > implementations for eBPF isolation with pkey and pkey-aware memory
> > > > > allocators have already been completed internally.  Using these
> > > > > implementations, we verified that eBPF programs running under isolation
> > > > > successfully execute several sched_ext applications provided by
> > > > > tools/sched_ext, as well as some bpf kselftest cases.
> > > >
> > > > The stated goal is wrong, hence not interested in patches
> > > > or discussion at lsfmm.
> > > >
> > > > arm has a nice hw feature. Sure, but this is not a place to apply it.
> > >
> > > That is correct — this is a verifier bug.
> > > However, the concern is that such a bug can lead to a security incident.
> > > Not only root, but also users with CAP_BPF who are allowed to
> > > load eBPF programs could potentially trigger additional security issues
> > > through such bugs.
> >
> > Again. They are not security issues. cap_bpf is effectively root.
> > Just like cap_perfmon in tracing space is a root.
>
> The argument is not about whether the verifier bug is a security issue
> per se.  The point is that relying solely on privilege boundaries
> (e.g., root-only loading) does not eliminate the impact of a verifier bug.
> Therefore, leveraging hardware isolation to further constrain
> the blast radius is a defense-in-depth measure.

I hate the reasoning that bpf somehow needs this hw feature.
It's not. Look for other use cases for pkey.


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-13 21:37         ` Alexei Starovoitov
@ 2026-02-16 14:27           ` James Bottomley
  2026-02-20  2:50             ` Alexei Starovoitov
  0 siblings, 1 reply; 12+ messages in thread
From: James Bottomley @ 2026-02-16 14:27 UTC (permalink / raw)
  To: Alexei Starovoitov, Yeoreum Yun
  Cc: lsf-pc, linux-mm, bpf, Catalin Marinas, david, ryan.roberts,
	kevin.brodsky, sebastian.osterlund, Dave Hansen, Rick Edgecombe

On Fri, 2026-02-13 at 13:37 -0800, Alexei Starovoitov wrote:
> On Fri, Feb 13, 2026 at 2:10 AM Yeoreum Yun <yeoreum.yun@arm.com>
> wrote:
> > 
> > Hi Alexei,
> > 
> > > On Thu, Feb 12, 2026 at 10:03 AM Yeoreum Yun
> > > <yeoreum.yun@arm.com> wrote:
[...]
> > > > That is correct — this is a verifier bug.
> > > > However, the concern is that such a bug can lead to a security
> > > > incident. Not only root, but also users with CAP_BPF who are
> > > > allowed to load eBPF programs could potentially trigger
> > > > additional security issues through such bugs.
> > > 
> > > Again. They are not security issues. cap_bpf is effectively root.
> > > Just like cap_perfmon in tracing space is a root.
> > 
> > The argument is not about whether the verifier bug is a security
> > issue per se.  The point is that relying solely on privilege
> > boundaries (e.g., root-only loading) does not eliminate the impact
> > of a verifier bug. Therefore, leveraging hardware isolation to
> > further constrain the blast radius is a defense-in-depth measure.
> 
> I hate the reasoning that bpf somehow needs this hw feature.
> It's not. Look for other use cases for pkey.

That's a bit of a short sighted attitude and also you're looking at it
in the wrong way: hardware, correctly designed, should always be
looking at ways to help software.  eBPF may not "need" this in the same
way qemu doesn't need the VMX accelerations ... it's just more secure
and efficient when they're in use.  After all, if the kernel had said
"no" to VMX in 2006, KVM would never have existed, we'd have been stuck
with Xen paravirt and VMware would be laughing all the way to the bank.
So why not at least discuss whether this could prove useful?  I have my
own doubts about the complexity vs security tradeoffs of protection
keys but if it can actually prove useful, sticking your head in the
sand and ignoring it now would be a disservice to your users (and a
possible gift to Windows or MacOS).

Regards,

James

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [LSF/MM/BPF TOPIC] eBPF isolation with pkeys
  2026-02-16 14:27           ` James Bottomley
@ 2026-02-20  2:50             ` Alexei Starovoitov
  0 siblings, 0 replies; 12+ messages in thread
From: Alexei Starovoitov @ 2026-02-20  2:50 UTC (permalink / raw)
  To: James Bottomley
  Cc: Yeoreum Yun, lsf-pc, linux-mm, bpf, Catalin Marinas, david,
	ryan.roberts, kevin.brodsky, sebastian.osterlund, Dave Hansen,
	Rick Edgecombe

On Mon, Feb 16, 2026 at 6:27 AM James Bottomley
<James.Bottomley@hansenpartnership.com> wrote:
>
> On Fri, 2026-02-13 at 13:37 -0800, Alexei Starovoitov wrote:
> > On Fri, Feb 13, 2026 at 2:10 AM Yeoreum Yun <yeoreum.yun@arm.com>
> > wrote:
> > >
> > > Hi Alexei,
> > >
> > > > On Thu, Feb 12, 2026 at 10:03 AM Yeoreum Yun
> > > > <yeoreum.yun@arm.com> wrote:
> [...]
> > > > > That is correct — this is a verifier bug.
> > > > > However, the concern is that such a bug can lead to a security
> > > > > incident. Not only root, but also users with CAP_BPF who are
> > > > > allowed to load eBPF programs could potentially trigger
> > > > > additional security issues through such bugs.
> > > >
> > > > Again. They are not security issues. cap_bpf is effectively root.
> > > > Just like cap_perfmon in tracing space is a root.
> > >
> > > The argument is not about whether the verifier bug is a security
> > > issue per se.  The point is that relying solely on privilege
> > > boundaries (e.g., root-only loading) does not eliminate the impact
> > > of a verifier bug. Therefore, leveraging hardware isolation to
> > > further constrain the blast radius is a defense-in-depth measure.
> >
> > I hate the reasoning that bpf somehow needs this hw feature.
> > It's not. Look for other use cases for pkey.
>
> That's a bit of a short sighted attitude and also you're looking at it
> in the wrong way: hardware, correctly designed, should always be
> looking at ways to help software.  eBPF may not "need" this in the same
> way qemu doesn't need the VMX accelerations ... it's just more secure
> and efficient when they're in use.  After all, if the kernel had said
> "no" to VMX in 2006, KVM would never have existed, we'd have been stuck

This is a false analogy. Virtualization extensions were added
because virtualization software already existed and had shortcomings
that CPU designers wanted to address.
Here pkey was added to differentiate one ISA to another and
now cpu folks are desperately looking for a use case.
Maybe it exists, but it's definitely not bpf.


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2026-02-20  2:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-12 16:22 [LSF/MM/BPF TOPIC] eBPF isolation with pkeys Yeoreum Yun
2026-02-12 16:36 ` Dave Hansen
2026-02-12 17:14   ` Yeoreum Yun
2026-02-12 18:14     ` Dave Hansen
2026-02-16  9:57       ` Yeoreum Yun
2026-02-12 17:44 ` Alexei Starovoitov
2026-02-12 18:01   ` Yeoreum Yun
2026-02-12 18:37     ` Alexei Starovoitov
2026-02-13 10:08       ` Yeoreum Yun
2026-02-13 21:37         ` Alexei Starovoitov
2026-02-16 14:27           ` James Bottomley
2026-02-20  2:50             ` Alexei Starovoitov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox