We observed a crash "KFENCE: use-after-free in __skb_clone" during fuzzing. It's a frequent occurrance in aarch64 and the codepath is always the same,but cannot be reproduced in x86_64. The config and reproducer are in the attachement. Detailed crash information is as follows. ----------------------------------------- BUG: KFENCE: use-after-free read in __skb_clone+0x214/0x280 Use-after-free read at 0xffff00022250306f (in kfence-#250): __skb_clone+0x214/0x280 skb_clone+0xb4/0x180 bpf_clone_redirect+0x60/0x190 bpf_prog_207b739f41707f89+0x88/0xb8 bpf_test_run+0x2dc/0x4fc bpf_prog_test_run_skb+0x4ac/0x7d0 __sys_bpf+0x700/0x1020 __arm64_sys_bpf+0x4c/0x60 invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x88/0x200 do_el0_svc+0x3c/0x50 el0_svc+0x68/0xd0 el0t_64_sync_handler+0xb4/0x130 el0t_64_sync+0x16c/0x170 kfence-#250: 0xffff000222503000-0xffff00022250318e, size=399, cache=kmalloc-512 allocated by task 2970 on cpu 0 at 65.981345s: bpf_test_init.isra.0+0x68/0x100 bpf_prog_test_run_skb+0x114/0x7d0 __sys_bpf+0x700/0x1020 __arm64_sys_bpf+0x4c/0x60 invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x88/0x200 do_el0_svc+0x3c/0x50 el0_svc+0x68/0xd0 el0t_64_sync_handler+0xb4/0x130 el0t_64_sync+0x16c/0x170 CPU: 0 PID: 2970 Comm: syz Tainted: G B W 6.1.0-rc2-next-20221025 #140 Hardware name: linux,dummy-virt (DT) pstate: 60400005 (nZCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : __skb_clone+0x214/0x280 lr : __skb_clone+0x208/0x280 sp : ffff80000fc37630 x29: ffff80000fc37630 x28: ffff80000fc37bd0 x27: ffff80000fc37720 x26: ffff000222503000 x25: 000000000000028f x24: ffff0000d0898d5c x23: ffff0000d08997c0 x22: ffff0000d089977e x21: ffff00022250304f x20: ffff0000d0899700 x19: ffff0000d0898c80 x18: 0000000000000000 x17: ffff800008379bbc x16: ffff800008378ee0 x15: ffff800008379bbc x14: ffff800008378ee0 x13: 0040004effff0008 x12: ffff6000444a060f x11: 1fffe000444a060e x10: ffff6000444a060e x9 : dfff800000000000 x8 : ffff000222503072 x7 : 00009fffbbb5f9f3 x6 : 0000000000000002 x5 : ffff00022250306f x4 : ffff6000444a060f x3 : ffff8000096fb2a8 x2 : 0000000000000001 x1 : ffff00022250306f x0 : 0000000000000001 Call trace: __skb_clone+0x214/0x280 skb_clone+0xb4/0x180 bpf_clone_redirect+0x60/0x190 bpf_prog_207b739f41707f89+0x88/0xb8 bpf_test_run+0x2dc/0x4fc bpf_prog_test_run_skb+0x4ac/0x7d0 __sys_bpf+0x700/0x1020 __arm64_sys_bpf+0x4c/0x60 invoke_syscall+0x64/0x190 el0_svc_common.constprop.0+0x88/0x200 do_el0_svc+0x3c/0x50 el0_svc+0x68/0xd0 el0t_64_sync_handler+0xb4/0x130 el0t_64_sync+0x16c/0x170 From the crash info, I found the problem happend at atomic_inc(&(skb_shinfo(skb)->dataref)) in __skb_clone(). static struct sk_buff *__skb_clone(struct sk_buff *n, struct sk_buff *skb) { ... refcount_set(&n->users, 1); > atomic_inc(&(skb_shinfo(skb)->dataref)); skb->cloned = 1; return n; #undef C } when KENCE UAF happend, the address of skb_shinfo(skb) always end with 0xf£¬like 0xffff0002224f104f, 0xffff0002224f304f, etc. But when KFENCE is not working, the address of skb_shinfo(skb) always end with 0xc0, like 0xffff0000d7e908c0, 0xffff0000d682f4c0, ect. So, I guess the problem is related to kfence memory address alignment in aarch64. In bpf_prog_test_run_skb(), I try to let the 'size' align with SMP_CACHE_BYTES to fix that. After that, the KENCE user-after-free disappeared. Fixes: be3d72a2896c ("bpf: move user_size out of bpf_test_init") Signed-off-by: Baisong Zhong --- net/bpf/test_run.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c index 13d578ce2a09..3414aa2930d4 100644 --- a/net/bpf/test_run.c +++ b/net/bpf/test_run.c @@ -1096,6 +1096,8 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr, if (kattr->test.flags || kattr->test.cpu || kattr->test.batch_size) return -EINVAL; + size = SKB_DATA_ALIGN(size); + data = bpf_test_init(kattr, kattr->test.data_size_in, size, NET_SKB_PAD + NET_IP_ALIGN, SKB_DATA_ALIGN(sizeof(struct skb_shared_info))); -- 2.25.1 .