KASAN vs realloc

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* KASAN vs realloc
@ 2026-01-06 12:42 Maciej Żenczykowski
  2026-01-07 20:28 ` Kees Cook
  0 siblings, 1 reply; 6+ messages in thread
From: Maciej Żenczykowski @ 2026-01-06 12:42 UTC (permalink / raw)
  To: Maciej Wieczor-Retman
  Cc: joonki.min, Kees Cook, Andrew Morton, Andrey Ryabinin,
	Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vincenzo Frascino, Andrew Morton, Uladzislau Rezki,
	Danilo Krummrich, Kees Cook, jiayuan.chen,
	syzbot+997752115a851cb0cf36, Maciej Wieczor-Retman, kasan-dev,
	Kernel hackers, linux-mm

We've got internal reports (b/467571011 - from CC'ed Samsung
developer) that kasan realloc is broken for sizes that are not a
multiple of the granule.  This appears to be triggered during Android
bootup by some ebpf program loading operations (a struct is 88 bytes
in size, which is a multiple of 8, but not 16, which is the granule
size).

(this is on 6.18 with
https://lore.kernel.org/all/38dece0a4074c43e48150d1e242f8242c73bf1a5.1764874575.git.m.wieczorretman@pm.me/
already included)

joonki.min@samsung-slsi.corp-partner.google.com summarized it as
"When newly requested size is not bigger than allocated size and old
size was not 16 byte aligned, it failed to unpoison extended area."

and *very* rough comment:

Right. "size - old_size" is not guaranteed 16-byte alignment in this case.

I think we may unpoison 16-byte alignment size, but it allowed more
than requested :(

I'm not sure that's right approach.

if (size <= alloced_size) {
- kasan_unpoison_vmalloc(p + old_size, size - old_size,
+               kasan_unpoison_vmalloc(p + old_size, round_up(size -
old_size, KASAN_GRANULE_SIZE),
      KASAN_VMALLOC_PROT_NORMAL |
      KASAN_VMALLOC_VM_ALLOC |
      KASAN_VMALLOC_KEEP_TAG);
/*
* No need to zero memory here, as unused memory will have
* already been zeroed at initial allocation time or during
* realloc shrink time.
*/
- vm->requested_size = size;
+               vm->requested_size = round_up(size, KASAN_GRANULE_SIZE);

my personal guess is that

But just above the code you quoted in mm/vmalloc.c I see:
        if (size <= old_size) {
...
                kasan_poison_vmalloc(p + size, old_size - size);

is also likely wrong?? Considering:

mm/kasan/shadow.c

void __kasan_poison_vmalloc(const void *start, unsigned long size)
{
        if (!is_vmalloc_or_module_addr(start))
                return;

        size = round_up(size, KASAN_GRANULE_SIZE);
        kasan_poison(start, size, KASAN_VMALLOC_INVALID, false);
}

This doesn't look right - if start isn't a multiple of the granule.

--
Maciej Żenczykowski, Kernel Networking Developer @ Google


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KASAN vs realloc
  2026-01-06 12:42 KASAN vs realloc Maciej Żenczykowski
@ 2026-01-07 20:28 ` Kees Cook
  2026-01-07 20:47   ` Maciej Wieczor-Retman
  0 siblings, 1 reply; 6+ messages in thread
From: Kees Cook @ 2026-01-07 20:28 UTC (permalink / raw)
  To: Maciej Żenczykowski
  Cc: Maciej Wieczor-Retman, joonki.min, Andrew Morton,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Marco Elver, Andrew Morton,
	Uladzislau Rezki, Danilo Krummrich, jiayuan.chen,
	syzbot+997752115a851cb0cf36, Maciej Wieczor-Retman, kasan-dev,
	Kernel hackers, linux-mm

On Tue, Jan 06, 2026 at 01:42:45PM +0100, Maciej Żenczykowski wrote:
> We've got internal reports (b/467571011 - from CC'ed Samsung
> developer) that kasan realloc is broken for sizes that are not a
> multiple of the granule.  This appears to be triggered during Android
> bootup by some ebpf program loading operations (a struct is 88 bytes
> in size, which is a multiple of 8, but not 16, which is the granule
> size).
> 
> (this is on 6.18 with
> https://lore.kernel.org/all/38dece0a4074c43e48150d1e242f8242c73bf1a5.1764874575.git.m.wieczorretman@pm.me/
> already included)
> 
> joonki.min@samsung-slsi.corp-partner.google.com summarized it as
> "When newly requested size is not bigger than allocated size and old
> size was not 16 byte aligned, it failed to unpoison extended area."
> 
> and *very* rough comment:
> 
> Right. "size - old_size" is not guaranteed 16-byte alignment in this case.
> 
> I think we may unpoison 16-byte alignment size, but it allowed more
> than requested :(
> 
> I'm not sure that's right approach.
> 
> if (size <= alloced_size) {
> - kasan_unpoison_vmalloc(p + old_size, size - old_size,
> +               kasan_unpoison_vmalloc(p + old_size, round_up(size -
> old_size, KASAN_GRANULE_SIZE),
>       KASAN_VMALLOC_PROT_NORMAL |
>       KASAN_VMALLOC_VM_ALLOC |
>       KASAN_VMALLOC_KEEP_TAG);
> /*
> * No need to zero memory here, as unused memory will have
> * already been zeroed at initial allocation time or during
> * realloc shrink time.
> */
> - vm->requested_size = size;
> +               vm->requested_size = round_up(size, KASAN_GRANULE_SIZE);
> 
> my personal guess is that
> 
> But just above the code you quoted in mm/vmalloc.c I see:
>         if (size <= old_size) {
> ...
>                 kasan_poison_vmalloc(p + size, old_size - size);
> 
> is also likely wrong?? Considering:
> 
> mm/kasan/shadow.c
> 
> void __kasan_poison_vmalloc(const void *start, unsigned long size)
> {
>         if (!is_vmalloc_or_module_addr(start))
>                 return;
> 
>         size = round_up(size, KASAN_GRANULE_SIZE);
>         kasan_poison(start, size, KASAN_VMALLOC_INVALID, false);
> }
> 
> This doesn't look right - if start isn't a multiple of the granule.

I don't think we can ever have the start not be a granule multiple, can
we?

I'm not sure how any of this is supposed to be handled by KASAN, though.
It does seem like a round_up() is missing, though?

-Kees

-- 
Kees Cook


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KASAN vs realloc
  2026-01-07 20:28 ` Kees Cook
@ 2026-01-07 20:47   ` Maciej Wieczor-Retman
  2026-01-07 21:47     ` Maciej Żenczykowski
  0 siblings, 1 reply; 6+ messages in thread
From: Maciej Wieczor-Retman @ 2026-01-07 20:47 UTC (permalink / raw)
  To: Kees Cook
  Cc: Maciej Żenczykowski, joonki.min, Andrew Morton,
	Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov,
	Dmitry Vyukov, Vincenzo Frascino, Marco Elver, Andrew Morton,
	Uladzislau Rezki, Danilo Krummrich, jiayuan.chen,
	syzbot+997752115a851cb0cf36, Maciej Wieczor-Retman, kasan-dev,
	Kernel hackers, linux-mm

On 2026-01-07 at 12:28:27 -0800, Kees Cook wrote:
>On Tue, Jan 06, 2026 at 01:42:45PM +0100, Maciej Żenczykowski wrote:
>> We've got internal reports (b/467571011 - from CC'ed Samsung
>> developer) that kasan realloc is broken for sizes that are not a
>> multiple of the granule.  This appears to be triggered during Android
>> bootup by some ebpf program loading operations (a struct is 88 bytes
>> in size, which is a multiple of 8, but not 16, which is the granule
>> size).
>>
>> (this is on 6.18 with
>> https://lore.kernel.org/all/38dece0a4074c43e48150d1e242f8242c73bf1a5.1764874575.git.m.wieczorretman@pm.me/
>> already included)
>>
>> joonki.min@samsung-slsi.corp-partner.google.com summarized it as
>> "When newly requested size is not bigger than allocated size and old
>> size was not 16 byte aligned, it failed to unpoison extended area."
>>
>> and *very* rough comment:
>>
>> Right. "size - old_size" is not guaranteed 16-byte alignment in this case.
>>
>> I think we may unpoison 16-byte alignment size, but it allowed more
>> than requested :(
>>
>> I'm not sure that's right approach.
>>
>> if (size <= alloced_size) {
>> - kasan_unpoison_vmalloc(p + old_size, size - old_size,
>> +               kasan_unpoison_vmalloc(p + old_size, round_up(size -
>> old_size, KASAN_GRANULE_SIZE),
>>       KASAN_VMALLOC_PROT_NORMAL |
>>       KASAN_VMALLOC_VM_ALLOC |
>>       KASAN_VMALLOC_KEEP_TAG);
>> /*
>> * No need to zero memory here, as unused memory will have
>> * already been zeroed at initial allocation time or during
>> * realloc shrink time.
>> */
>> - vm->requested_size = size;
>> +               vm->requested_size = round_up(size, KASAN_GRANULE_SIZE);
>>
>> my personal guess is that
>>
>> But just above the code you quoted in mm/vmalloc.c I see:
>>         if (size <= old_size) {
>> ...
>>                 kasan_poison_vmalloc(p + size, old_size - size);
>>
>> is also likely wrong?? Considering:
>>
>> mm/kasan/shadow.c
>>
>> void __kasan_poison_vmalloc(const void *start, unsigned long size)
>> {
>>         if (!is_vmalloc_or_module_addr(start))
>>                 return;
>>
>>         size = round_up(size, KASAN_GRANULE_SIZE);
>>         kasan_poison(start, size, KASAN_VMALLOC_INVALID, false);
>> }
>>
>> This doesn't look right - if start isn't a multiple of the granule.
>
>I don't think we can ever have the start not be a granule multiple, can
>we?
>
>I'm not sure how any of this is supposed to be handled by KASAN, though.
>It does seem like a round_up() is missing, though?
>
>-Kees
>
>--
>Kees Cook

I assume the error happens in hw-tags mode? And this used to work because
KASAN_VMALLOC_VM_ALLOC was missing and kasan_unpoison_vmalloc() used to do an
early return, while now it's actually doing the unpoisoning here?

If that's the case, I agree, the round up seems to be missing; I can add it and
send a patch later.

-- 
Kind regards
Maciej Wieczór-Retman



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KASAN vs realloc
  2026-01-07 20:47   ` Maciej Wieczor-Retman
@ 2026-01-07 21:47     ` Maciej Żenczykowski
  2026-01-07 21:50       ` Maciej Żenczykowski
  0 siblings, 1 reply; 6+ messages in thread
From: Maciej Żenczykowski @ 2026-01-07 21:47 UTC (permalink / raw)
  To: Maciej Wieczor-Retman
  Cc: Kees Cook, joonki.min, Andrew Morton, Andrey Ryabinin,
	Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vincenzo Frascino, Marco Elver, Andrew Morton, Uladzislau Rezki,
	Danilo Krummrich, jiayuan.chen, syzbot+997752115a851cb0cf36,
	Maciej Wieczor-Retman, kasan-dev, Kernel hackers, linux-mm

On Wed, Jan 7, 2026 at 9:47 PM Maciej Wieczor-Retman
<m.wieczorretman@pm.me> wrote:
>
> On 2026-01-07 at 12:28:27 -0800, Kees Cook wrote:
> >On Tue, Jan 06, 2026 at 01:42:45PM +0100, Maciej Żenczykowski wrote:
> >> We've got internal reports (b/467571011 - from CC'ed Samsung
> >> developer) that kasan realloc is broken for sizes that are not a
> >> multiple of the granule.  This appears to be triggered during Android
> >> bootup by some ebpf program loading operations (a struct is 88 bytes
> >> in size, which is a multiple of 8, but not 16, which is the granule
> >> size).
> >>
> >> (this is on 6.18 with
> >> https://lore.kernel.org/all/38dece0a4074c43e48150d1e242f8242c73bf1a5.1764874575.git.m.wieczorretman@pm.me/
> >> already included)
> >>
> >> joonki.min@samsung-slsi.corp-partner.google.com summarized it as
> >> "When newly requested size is not bigger than allocated size and old
> >> size was not 16 byte aligned, it failed to unpoison extended area."
> >>
> >> and *very* rough comment:
> >>
> >> Right. "size - old_size" is not guaranteed 16-byte alignment in this case.
> >>
> >> I think we may unpoison 16-byte alignment size, but it allowed more
> >> than requested :(
> >>
> >> I'm not sure that's right approach.
> >>
> >> if (size <= alloced_size) {
> >> - kasan_unpoison_vmalloc(p + old_size, size - old_size,
> >> +               kasan_unpoison_vmalloc(p + old_size, round_up(size -
> >> old_size, KASAN_GRANULE_SIZE),
> >>       KASAN_VMALLOC_PROT_NORMAL |
> >>       KASAN_VMALLOC_VM_ALLOC |
> >>       KASAN_VMALLOC_KEEP_TAG);
> >> /*
> >> * No need to zero memory here, as unused memory will have
> >> * already been zeroed at initial allocation time or during
> >> * realloc shrink time.
> >> */
> >> - vm->requested_size = size;
> >> +               vm->requested_size = round_up(size, KASAN_GRANULE_SIZE);
> >>
> >> my personal guess is that
> >>
> >> But just above the code you quoted in mm/vmalloc.c I see:
> >>         if (size <= old_size) {
> >> ...
> >>                 kasan_poison_vmalloc(p + size, old_size - size);

I assume p is presumably 16-byte aligned, but size (ie. new size) /
old_size can presumably be odd.

This means the first argument passed to kasan_poison_vmalloc() is
potentially utterly unaligned.

> >> is also likely wrong?? Considering:
> >>
> >> mm/kasan/shadow.c
> >>
> >> void __kasan_poison_vmalloc(const void *start, unsigned long size)
> >> {
> >>         if (!is_vmalloc_or_module_addr(start))
> >>                 return;
> >>
> >>         size = round_up(size, KASAN_GRANULE_SIZE);
> >>         kasan_poison(start, size, KASAN_VMALLOC_INVALID, false);
> >> }
> >>
> >> This doesn't look right - if start isn't a multiple of the granule.
> >
> >I don't think we can ever have the start not be a granule multiple, can
> >we?

See above for why I think we can...
I fully admit though I have no idea how this works, KASAN is not
something I really work with.

> >I'm not sure how any of this is supposed to be handled by KASAN, though.
> >It does seem like a round_up() is missing, though?

perhaps add a:
 BUG_ON(start & 15)
 BUG_ON(start & (GRANULE_SIZE-1))

if you think it shouldn't trigger?

and/or comments/documentation about the expected alignment of the
pointers and sizes if it cannot be arbitrary?

> I assume the error happens in hw-tags mode? And this used to work because
> KASAN_VMALLOC_VM_ALLOC was missing and kasan_unpoison_vmalloc() used to do an
> early return, while now it's actually doing the unpoisoning here?

I was under the impression this was triggering with software tags.
However, reproduction on a pixel 6 done by another Google engineer did
indeed fail.
It is failing on some Samsung device, but not sure what that is using...
Maybe a Pixel 8+ would use MTE???
So perhaps it is only hw tags???  Sorry, no idea.
I'm not sure, this is way way lower than I've wandered in the past
years, lately I mostly write userspace & ebpf code...

Would a stack trace help?

[   22.280856][  T762]
==================================================================
[   22.280866][  T762] BUG: KASAN: invalid-access in
bpf_patch_insn_data+0x25c/0x378
[   22.280880][  T762] Write of size 27896 at addr 43ffffc08baf14d0 by
task netbpfload/762
[   22.280888][  T762] Pointer tag: [43], memory tag: [54]
[   22.280893][  T762]
[   22.280900][  T762] CPU: 9 UID: 0 PID: 762 Comm: netbpfload
Tainted: G           OE       6.18.0-android17-0-gef2f661f7812-4k #1
PREEMPT  5f8baed9473d1315a96dec60171cddf4b0b35487
[   22.280907][  T762] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[   22.280909][  T762] Hardware name: Samsung xxxxxxxxx
[   22.280912][  T762] Call trace:
[   22.280914][  T762]  show_stack+0x18/0x28 (C)
[   22.280922][  T762]  __dump_stack+0x28/0x3c
[   22.280930][  T762]  dump_stack_lvl+0x7c/0xa8
[   22.280934][  T762]  print_address_description+0x7c/0x20c
[   22.280941][  T762]  print_report+0x70/0x8c
[   22.280945][  T762]  kasan_report+0xb4/0x114
[   22.280952][  T762]  kasan_check_range+0x94/0xa0
[   22.280956][  T762]  __asan_memmove+0x54/0x88
[   22.280960][  T762]  bpf_patch_insn_data+0x25c/0x378
[   22.280965][  T762]  bpf_check+0x25a4/0x8ef0
[   22.280971][  T762]  bpf_prog_load+0x8dc/0x990
[   22.280976][  T762]  __sys_bpf+0x340/0x524
[   22.280980][  T762]  __arm64_sys_bpf+0x48/0x64
[   22.280984][  T762]  invoke_syscall+0x6c/0x13c
[   22.280990][  T762]  el0_svc_common+0xf8/0x138
[   22.280994][  T762]  do_el0_svc+0x30/0x40
[   22.280999][  T762]  el0_svc+0x38/0x90
[   22.281007][  T762]  el0t_64_sync_handler+0x68/0xdc
[   22.281012][  T762]  el0t_64_sync+0x1b8/0x1bc
[   22.281015][  T762]
[   22.281063][  T762] The buggy address belongs to a 8-page vmalloc
region starting at 0x43ffffc08baf1000 allocated at
bpf_patch_insn_data+0xb0/0x378
[   22.281088][  T762] The buggy address belongs to the physical page:
[   22.281093][  T762] page: refcount:1 mapcount:0
mapping:0000000000000000 index:0x0 pfn:0x8ce792
[   22.281099][  T762] memcg:f0ffff88354e7e42
[   22.281104][  T762] flags: 0x4300000000000000(zone=1|kasantag=0xc)
[   22.281113][  T762] raw: 4300000000000000 0000000000000000
dead000000000122 0000000000000000
[   22.281119][  T762] raw: 0000000000000000 0000000000000000
00000001ffffffff f0ffff88354e7e42
[   22.281125][  T762] page dumped because: kasan: bad access detected
[   22.281129][  T762]
[   22.281134][  T762] Memory state around the buggy address:
[   22.281139][  T762]  ffffffc08baf7f00: 43 43 43 43 43 43 43 43 43
43 43 43 43 43 43 43
[   22.281144][  T762]  ffffffc08baf8000: 43 43 43 43 43 43 43 43 43
43 43 43 43 43 43 43
[   22.281150][  T762] >ffffffc08baf8100: 43 43 43 43 43 43 43 54 54
54 54 54 54 fe fe fe
[   22.281155][  T762]                                         ^
[   22.281160][  T762]  ffffffc08baf8200: fe fe fe fe fe fe fe fe fe
fe fe fe fe fe fe fe
[   22.281165][  T762]  ffffffc08baf8300: fe fe fe fe fe fe fe fe fe
fe fe fe fe fe fe fe
[   22.281170][  T762]
==================================================================
[   22.281199][  T762] Kernel panic - not syncing: KASAN: panic_on_warn set ...

> If that's the case, I agree, the round up seems to be missing; I can add it and
> send a patch later.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KASAN vs realloc
  2026-01-07 21:47     ` Maciej Żenczykowski
@ 2026-01-07 21:50       ` Maciej Żenczykowski
  2026-01-07 21:55         ` Maciej Żenczykowski
  0 siblings, 1 reply; 6+ messages in thread
From: Maciej Żenczykowski @ 2026-01-07 21:50 UTC (permalink / raw)
  To: Maciej Wieczor-Retman
  Cc: Kees Cook, joonki.min, Andrew Morton, Andrey Ryabinin,
	Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vincenzo Frascino, Marco Elver, Andrew Morton, Uladzislau Rezki,
	Danilo Krummrich, jiayuan.chen, syzbot+997752115a851cb0cf36,
	Maciej Wieczor-Retman, kasan-dev, Kernel hackers, linux-mm

On Wed, Jan 7, 2026 at 10:47 PM Maciej Żenczykowski <maze@google.com> wrote:
>
> On Wed, Jan 7, 2026 at 9:47 PM Maciej Wieczor-Retman
> <m.wieczorretman@pm.me> wrote:
> >
> > On 2026-01-07 at 12:28:27 -0800, Kees Cook wrote:
> > >On Tue, Jan 06, 2026 at 01:42:45PM +0100, Maciej Żenczykowski wrote:
> > >> We've got internal reports (b/467571011 - from CC'ed Samsung
> > >> developer) that kasan realloc is broken for sizes that are not a
> > >> multiple of the granule.  This appears to be triggered during Android
> > >> bootup by some ebpf program loading operations (a struct is 88 bytes
> > >> in size, which is a multiple of 8, but not 16, which is the granule
> > >> size).
> > >>
> > >> (this is on 6.18 with
> > >> https://lore.kernel.org/all/38dece0a4074c43e48150d1e242f8242c73bf1a5.1764874575.git.m.wieczorretman@pm.me/
> > >> already included)
> > >>
> > >> joonki.min@samsung-slsi.corp-partner.google.com summarized it as
> > >> "When newly requested size is not bigger than allocated size and old
> > >> size was not 16 byte aligned, it failed to unpoison extended area."
> > >>
> > >> and *very* rough comment:
> > >>
> > >> Right. "size - old_size" is not guaranteed 16-byte alignment in this case.
> > >>
> > >> I think we may unpoison 16-byte alignment size, but it allowed more
> > >> than requested :(
> > >>
> > >> I'm not sure that's right approach.
> > >>
> > >> if (size <= alloced_size) {
> > >> - kasan_unpoison_vmalloc(p + old_size, size - old_size,
> > >> +               kasan_unpoison_vmalloc(p + old_size, round_up(size -
> > >> old_size, KASAN_GRANULE_SIZE),
> > >>       KASAN_VMALLOC_PROT_NORMAL |
> > >>       KASAN_VMALLOC_VM_ALLOC |
> > >>       KASAN_VMALLOC_KEEP_TAG);
> > >> /*
> > >> * No need to zero memory here, as unused memory will have
> > >> * already been zeroed at initial allocation time or during
> > >> * realloc shrink time.
> > >> */
> > >> - vm->requested_size = size;
> > >> +               vm->requested_size = round_up(size, KASAN_GRANULE_SIZE);
> > >>
> > >> my personal guess is that
> > >>
> > >> But just above the code you quoted in mm/vmalloc.c I see:
> > >>         if (size <= old_size) {
> > >> ...
> > >>                 kasan_poison_vmalloc(p + size, old_size - size);
>
> I assume p is presumably 16-byte aligned, but size (ie. new size) /
> old_size can presumably be odd.
>
> This means the first argument passed to kasan_poison_vmalloc() is
> potentially utterly unaligned.
>
> > >> is also likely wrong?? Considering:
> > >>
> > >> mm/kasan/shadow.c
> > >>
> > >> void __kasan_poison_vmalloc(const void *start, unsigned long size)
> > >> {
> > >>         if (!is_vmalloc_or_module_addr(start))
> > >>                 return;
> > >>
> > >>         size = round_up(size, KASAN_GRANULE_SIZE);
> > >>         kasan_poison(start, size, KASAN_VMALLOC_INVALID, false);
> > >> }
> > >>
> > >> This doesn't look right - if start isn't a multiple of the granule.
> > >
> > >I don't think we can ever have the start not be a granule multiple, can
> > >we?
>
> See above for why I think we can...
> I fully admit though I have no idea how this works, KASAN is not
> something I really work with.
>
> > >I'm not sure how any of this is supposed to be handled by KASAN, though.
> > >It does seem like a round_up() is missing, though?
>
> perhaps add a:
>  BUG_ON(start & 15)
>  BUG_ON(start & (GRANULE_SIZE-1))
>
> if you think it shouldn't trigger?
>
> and/or comments/documentation about the expected alignment of the
> pointers and sizes if it cannot be arbitrary?
>
> > I assume the error happens in hw-tags mode? And this used to work because
> > KASAN_VMALLOC_VM_ALLOC was missing and kasan_unpoison_vmalloc() used to do an
> > early return, while now it's actually doing the unpoisoning here?
>
> I was under the impression this was triggering with software tags.
> However, reproduction on a pixel 6 done by another Google engineer did
> indeed fail.
> It is failing on some Samsung device, but not sure what that is using...
> Maybe a Pixel 8+ would use MTE???
> So perhaps it is only hw tags???  Sorry, no idea.
> I'm not sure, this is way way lower than I've wandered in the past
> years, lately I mostly write userspace & ebpf code...
>
> Would a stack trace help?
>
> [   22.280856][  T762]
> ==================================================================
> [   22.280866][  T762] BUG: KASAN: invalid-access in
> bpf_patch_insn_data+0x25c/0x378
> [   22.280880][  T762] Write of size 27896 at addr 43ffffc08baf14d0 by
> task netbpfload/762
> [   22.280888][  T762] Pointer tag: [43], memory tag: [54]
> [   22.280893][  T762]
> [   22.280900][  T762] CPU: 9 UID: 0 PID: 762 Comm: netbpfload
> Tainted: G           OE       6.18.0-android17-0-gef2f661f7812-4k #1
> PREEMPT  5f8baed9473d1315a96dec60171cddf4b0b35487
> [   22.280907][  T762] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
> [   22.280909][  T762] Hardware name: Samsung xxxxxxxxx
> [   22.280912][  T762] Call trace:
> [   22.280914][  T762]  show_stack+0x18/0x28 (C)
> [   22.280922][  T762]  __dump_stack+0x28/0x3c
> [   22.280930][  T762]  dump_stack_lvl+0x7c/0xa8
> [   22.280934][  T762]  print_address_description+0x7c/0x20c
> [   22.280941][  T762]  print_report+0x70/0x8c
> [   22.280945][  T762]  kasan_report+0xb4/0x114
> [   22.280952][  T762]  kasan_check_range+0x94/0xa0
> [   22.280956][  T762]  __asan_memmove+0x54/0x88
> [   22.280960][  T762]  bpf_patch_insn_data+0x25c/0x378
> [   22.280965][  T762]  bpf_check+0x25a4/0x8ef0
> [   22.280971][  T762]  bpf_prog_load+0x8dc/0x990
> [   22.280976][  T762]  __sys_bpf+0x340/0x524
> [   22.280980][  T762]  __arm64_sys_bpf+0x48/0x64
> [   22.280984][  T762]  invoke_syscall+0x6c/0x13c
> [   22.280990][  T762]  el0_svc_common+0xf8/0x138
> [   22.280994][  T762]  do_el0_svc+0x30/0x40
> [   22.280999][  T762]  el0_svc+0x38/0x90
> [   22.281007][  T762]  el0t_64_sync_handler+0x68/0xdc
> [   22.281012][  T762]  el0t_64_sync+0x1b8/0x1bc
> [   22.281015][  T762]
> [   22.281063][  T762] The buggy address belongs to a 8-page vmalloc
> region starting at 0x43ffffc08baf1000 allocated at
> bpf_patch_insn_data+0xb0/0x378
> [   22.281088][  T762] The buggy address belongs to the physical page:
> [   22.281093][  T762] page: refcount:1 mapcount:0
> mapping:0000000000000000 index:0x0 pfn:0x8ce792
> [   22.281099][  T762] memcg:f0ffff88354e7e42
> [   22.281104][  T762] flags: 0x4300000000000000(zone=1|kasantag=0xc)
> [   22.281113][  T762] raw: 4300000000000000 0000000000000000
> dead000000000122 0000000000000000
> [   22.281119][  T762] raw: 0000000000000000 0000000000000000
> 00000001ffffffff f0ffff88354e7e42
> [   22.281125][  T762] page dumped because: kasan: bad access detected
> [   22.281129][  T762]
> [   22.281134][  T762] Memory state around the buggy address:
> [   22.281139][  T762]  ffffffc08baf7f00: 43 43 43 43 43 43 43 43 43
> 43 43 43 43 43 43 43
> [   22.281144][  T762]  ffffffc08baf8000: 43 43 43 43 43 43 43 43 43
> 43 43 43 43 43 43 43
> [   22.281150][  T762] >ffffffc08baf8100: 43 43 43 43 43 43 43 54 54
> 54 54 54 54 fe fe fe
> [   22.281155][  T762]                                         ^
> [   22.281160][  T762]  ffffffc08baf8200: fe fe fe fe fe fe fe fe fe
> fe fe fe fe fe fe fe
> [   22.281165][  T762]  ffffffc08baf8300: fe fe fe fe fe fe fe fe fe
> fe fe fe fe fe fe fe
> [   22.281170][  T762]
> ==================================================================
> [   22.281199][  T762] Kernel panic - not syncing: KASAN: panic_on_warn set ...
>
> > If that's the case, I agree, the round up seems to be missing; I can add it and
> > send a patch later.

WARNING: Actually I'm not sure if this is the *right* stack trace.
This might be on a bare 6.18 without the latest extra 4 patches.
I'm not finding a more recent stack trace.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: KASAN vs realloc
  2026-01-07 21:50       ` Maciej Żenczykowski
@ 2026-01-07 21:55         ` Maciej Żenczykowski
  0 siblings, 0 replies; 6+ messages in thread
From: Maciej Żenczykowski @ 2026-01-07 21:55 UTC (permalink / raw)
  To: Maciej Wieczor-Retman
  Cc: Kees Cook, joonki.min, Andrew Morton, Andrey Ryabinin,
	Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov,
	Vincenzo Frascino, Marco Elver, Andrew Morton, Uladzislau Rezki,
	Danilo Krummrich, jiayuan.chen, syzbot+997752115a851cb0cf36,
	Maciej Wieczor-Retman, kasan-dev, Kernel hackers, linux-mm

> WARNING: Actually I'm not sure if this is the *right* stack trace.
> This might be on a bare 6.18 without the latest extra 4 patches.
> I'm not finding a more recent stack trace.

Found comments from Samsung dev:

But another panic came after those fixes [ie. 4 patches] applied.
struct bpf_insn_aux_data is 88byte, so panic on warn set when old_size
ends with 0x8.
It seems like vrealloc cannot handle that case.

  84.536021] [4:     netbpfload:  771] ------------[ cut here ]------------
[   84.536196] [4:     netbpfload:  771] WARNING: CPU: 4 PID: 771 at
mm/kasan/shadow.c:174 __kasan_unpoison_vmalloc+0x94/0xa0
....
[   84.773445] [4:     netbpfload:  771] CPU: 4 UID: 0 PID: 771 Comm:
netbpfload Tainted: G           OE
6.18.1-android17-0-g41be44edb8d5-4k #1 PREEMPT
70442b615e7d1d560808f482eb5d71810120225e
[   84.789323] [4:     netbpfload:  771] Tainted: [O]=OOT_MODULE,
[E]=UNSIGNED_MODULE
[   84.795311] [4:     netbpfload:  771] Hardware name: Samsung xxxx
[   84.802519] [4:     netbpfload:  771] pstate: 03402005 (nzcv daif
+PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[   84.810152] [4:     netbpfload:  771] pc : __kasan_unpoison_vmalloc+0x94/0xa0
[   84.815708] [4:     netbpfload:  771] lr : __kasan_unpoison_vmalloc+0x24/0xa0
[   84.821264] [4:     netbpfload:  771] sp : ffffffc0a97e77a0
[   84.825256] [4:     netbpfload:  771] x29: ffffffc0a97e77a0 x28:
3bffff8837198670 x27: 0000000000008000
[   84.833069] [4:     netbpfload:  771] x26: 41ffff8837ef8e00 x25:
ffffffffffffffa8 x24: 00000000000071c8
[   84.840880] [4:     netbpfload:  771] x23: 0000000000000001 x22:
00000000ffffffff x21: 000000000000000e
[   84.848694] [4:     netbpfload:  771] x20: 0000000000000058 x19:
c3ffffc0a8f271c8 x18: ffffffc082f1c100
[   84.856504] [4:     netbpfload:  771] x17: 000000003688d116 x16:
000000003688d116 x15: ffffff8837efff80
[   84.864317] [4:     netbpfload:  771] x14: 0000000000000180 x13:
0000000000000000 x12: e6ffff8837eff700
[   84.872129] [4:     netbpfload:  771] x11: 0000000000000041 x10:
0000000000000000 x9 : fffffffebf800000
[   84.879941] [4:     netbpfload:  771] x8 : ffffffc0a8f271c8 x7 :
0000000000000000 x6 : ffffffc0805bef3c
[   84.887754] [4:     netbpfload:  771] x5 : 0000000000000000 x4 :
0000000000000000 x3 : ffffffc080234b6c
[   84.895566] [4:     netbpfload:  771] x2 : 000000000000000e x1 :
0000000000000058 x0 : 0000000000000001
[   84.903377] [4:     netbpfload:  771] Call trace:
[   84.906502] [4:     netbpfload:  771]  __kasan_unpoison_vmalloc+0x94/0xa0 (P)
[   84.912058] [4:     netbpfload:  771]  vrealloc_node_align_noprof+0xdc/0x2e4
[   84.917525] [4:     netbpfload:  771]  bpf_patch_insn_data+0xb0/0x378
[   84.922384] [4:     netbpfload:  771]  bpf_check+0x25a4/0x8ef0
[   84.926638] [4:     netbpfload:  771]  bpf_prog_load+0x8dc/0x990
[   84.931065] [4:     netbpfload:  771]  __sys_bpf+0x340/0x524

[   79.334574][  T827] bpf_patch_insn_data: insn_aux_data size realloc
at abffffc08ef41000 to 330
[   79.334919][  T827] bpf_patch_insn_data: insn_aux_data at 55ffffc0a9c00000

[   79.335151][  T827] bpf_patch_insn_data: insn_aux_data size realloc
at 55ffffc0a9c00000 to 331
[   79.336331][  T827] vrealloc_node_align_noprof: p=55ffffc0a9c00000
old_size=7170
[   79.343898][  T827] vrealloc_node_align_noprof: size=71c8 alloced_size=8000
[   79.350782][  T827] bpf_patch_insn_data: insn_aux_data at 55ffffc0a9c00000

[   79.357591][  T827] bpf_patch_insn_data: insn_aux_data size realloc
at 55ffffc0a9c00000 to 332
[   79.366174][  T827] vrealloc_node_align_noprof: p=55ffffc0a9c00000
old_size=71c8
[   79.373588][  T827] vrealloc_node_align_noprof: size=7220 alloced_size=8000
[   79.380485][  T827] kasan_unpoison: after kasan_reset_tag
addr=ffffffc0a9c071c8(granule mask=f)

I added 8 bytes dummy data to avoid "p + old_size" was not ended with
8, it booted well.

diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h
index 4c497e839526..f9d3448321e8 100644
--- a/include/linux/bpf_verifier.h
+++ b/include/linux/bpf_verifier.h
@@ -581,6 +581,7 @@ struct bpf_insn_aux_data {
        u32 scc;
        /* registers alive before this instruction. */
        u16 live_regs_before;
+       u16 buf[4];     // TEST
 };

maze: Likely if 8 bytes worked then 'u8 buf[7]' would too?

it will be 88bytes + 7 bytes = 95 bytes(=0x5f) which is in the range
of granule mask(=0xf)

I don't think it works, but it works.


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2026-01-07 21:55 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-01-06 12:42 KASAN vs realloc Maciej Żenczykowski
2026-01-07 20:28 ` Kees Cook
2026-01-07 20:47   ` Maciej Wieczor-Retman
2026-01-07 21:47     ` Maciej Żenczykowski
2026-01-07 21:50       ` Maciej Żenczykowski
2026-01-07 21:55         ` Maciej Żenczykowski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox