From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2537DC43217 for ; Wed, 2 Nov 2022 07:20:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CB946B0071; Wed, 2 Nov 2022 03:20:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87B706B0072; Wed, 2 Nov 2022 03:20:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 71C1F6B0073; Wed, 2 Nov 2022 03:20:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 62A6B6B0071 for ; Wed, 2 Nov 2022 03:20:01 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 380DEA0B9B for ; Wed, 2 Nov 2022 07:20:01 +0000 (UTC) X-FDA: 80087653002.28.45678A6 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf26.hostedemail.com (Postfix) with ESMTP id D3B5B140007 for ; Wed, 2 Nov 2022 07:19:58 +0000 (UTC) Received: from canpemm500005.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4N2J740d2TzRnx1; Wed, 2 Nov 2022 15:14:56 +0800 (CST) Received: from [10.174.178.197] (10.174.178.197) by canpemm500005.china.huawei.com (7.192.104.229) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 2 Nov 2022 15:19:52 +0800 Message-ID: <666b976a-8873-25e2-66dd-1398682c6cb7@huawei.com> Date: Wed, 2 Nov 2022 15:19:52 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.4.1 Subject: Re: [PATCH -next] bpf, test_run: fix alignment problem in bpf_prog_test_run_skb() To: Eric Dumazet , Kees Cook CC: Jakub Kicinski , Daniel Borkmann , , , , , , , , , , Alexander Potapenko , Marco Elver , Dmitry Vyukov , Linux MM , References: <20221101040440.3637007-1-zhongbaisong@huawei.com> <20221101210542.724e3442@kernel.org> <202211012121.47D68D0@keescook> From: zhongbaisong Organization: huawei In-Reply-To: Content-Type: text/plain; charset="UTF-8"; format=flowed Content-Transfer-Encoding: 7bit X-Originating-IP: [10.174.178.197] X-ClientProxiedBy: dggems706-chm.china.huawei.com (10.3.19.183) To canpemm500005.china.huawei.com (7.192.104.229) X-CFilter-Loop: Reflected ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667373600; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=svIGRSS1VVolWkDV4P5xAip4g+IJTaNKZodS3fHx068=; b=Aocwk1qAEYwo2v75RvhTqE6hKpsKiXBEy5pjMU9+DPym6zizM+s+WoR3taZmJyJSqbJnZg jTzFdWj0CfrWOd7WXTmmON4CZ2Sz1kXVxi2Rx6LsHIn2eDB4hHPFeA/tIaCQrAi3oYKF3f u/7pDFglED06FtmZDv0L9n2P/wuMbls= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of zhongbaisong@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=zhongbaisong@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667373600; a=rsa-sha256; cv=none; b=kEOT9BA1IyoG3u/Us5lh+6bEhXETMXLMPOP9zE6RWI+Hz/jaV9vLd/a4bHVrDBblkTfumB cctp2V04n+4f9yTnOe6+LuNEB2W9ss4LuVHXrOIjR7ApeQGk6h1Mjyjx4U1+LynEyrXZ3o TOrfjbOUUjGAMdA4fNokZl2ZN8O0Zxc= Authentication-Results: imf26.hostedemail.com; dkim=none; spf=pass (imf26.hostedemail.com: domain of zhongbaisong@huawei.com designates 45.249.212.188 as permitted sender) smtp.mailfrom=zhongbaisong@huawei.com; dmarc=pass (policy=quarantine) header.from=huawei.com X-Stat-Signature: 85kdmbied8jmngtzjht5sy3n8xtosirw X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: D3B5B140007 X-HE-Tag: 1667373598-324258 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2022/11/2 12:37, Eric Dumazet wrote: > On Tue, Nov 1, 2022 at 9:27 PM Kees Cook wrote: >> >> On Tue, Nov 01, 2022 at 09:05:42PM -0700, Jakub Kicinski wrote: >>> On Wed, 2 Nov 2022 10:59:44 +0800 zhongbaisong wrote: >>>> On 2022/11/2 0:45, Daniel Borkmann wrote: >>>>> [ +kfence folks ] >>>> >>>> + cc: Alexander Potapenko, Marco Elver, Dmitry Vyukov >>>> >>>> Do you have any suggestions about this problem? >>> >>> + Kees who has been sending similar patches for drivers >>> >>>>> On 11/1/22 5:04 AM, Baisong Zhong wrote: >>>>>> Recently, we got a syzkaller problem because of aarch64 >>>>>> alignment fault if KFENCE enabled. >>>>>> >>>>>> When the size from user bpf program is an odd number, like >>>>>> 399, 407, etc, it will cause skb shard info's alignment access, >>>>>> as seen below: >>>>>> >>>>>> BUG: KFENCE: use-after-free read in __skb_clone+0x23c/0x2a0 >>>>>> net/core/skbuff.c:1032 >>>>>> >>>>>> Use-after-free read at 0xffff6254fffac077 (in kfence-#213): >>>>>> __lse_atomic_add arch/arm64/include/asm/atomic_lse.h:26 [inline] >>>>>> arch_atomic_add arch/arm64/include/asm/atomic.h:28 [inline] >>>>>> arch_atomic_inc include/linux/atomic-arch-fallback.h:270 [inline] >>>>>> atomic_inc include/asm-generic/atomic-instrumented.h:241 [inline] >>>>>> __skb_clone+0x23c/0x2a0 net/core/skbuff.c:1032 >>>>>> skb_clone+0xf4/0x214 net/core/skbuff.c:1481 >>>>>> ____bpf_clone_redirect net/core/filter.c:2433 [inline] >>>>>> bpf_clone_redirect+0x78/0x1c0 net/core/filter.c:2420 >>>>>> bpf_prog_d3839dd9068ceb51+0x80/0x330 >>>>>> bpf_dispatcher_nop_func include/linux/bpf.h:728 [inline] >>>>>> bpf_test_run+0x3c0/0x6c0 net/bpf/test_run.c:53 >>>>>> bpf_prog_test_run_skb+0x638/0xa7c net/bpf/test_run.c:594 >>>>>> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] >>>>>> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] >>>>>> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 >>>>>> >>>>>> kfence-#213: 0xffff6254fffac000-0xffff6254fffac196, size=407, >>>>>> cache=kmalloc-512 >>>>>> >>>>>> allocated by task 15074 on cpu 0 at 1342.585390s: >>>>>> kmalloc include/linux/slab.h:568 [inline] >>>>>> kzalloc include/linux/slab.h:675 [inline] >>>>>> bpf_test_init.isra.0+0xac/0x290 net/bpf/test_run.c:191 >>>>>> bpf_prog_test_run_skb+0x11c/0xa7c net/bpf/test_run.c:512 >>>>>> bpf_prog_test_run kernel/bpf/syscall.c:3148 [inline] >>>>>> __do_sys_bpf kernel/bpf/syscall.c:4441 [inline] >>>>>> __se_sys_bpf+0xad0/0x1634 kernel/bpf/syscall.c:4381 >>>>>> __arm64_sys_bpf+0x50/0x60 kernel/bpf/syscall.c:4381 >>>>>> >>>>>> To fix the problem, we round up allocations with kmalloc_size_roundup() >>>>>> so that build_skb()'s use of kize() is always alignment and no special >>>>>> handling of the memory is needed by KFENCE. >>>>>> >>>>>> Fixes: 1cf1cae963c2 ("bpf: introduce BPF_PROG_TEST_RUN command") >>>>>> Signed-off-by: Baisong Zhong >>>>>> --- >>>>>> net/bpf/test_run.c | 1 + >>>>>> 1 file changed, 1 insertion(+) >>>>>> >>>>>> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c >>>>>> index 13d578ce2a09..058b67108873 100644 >>>>>> --- a/net/bpf/test_run.c >>>>>> +++ b/net/bpf/test_run.c >>>>>> @@ -774,6 +774,7 @@ static void *bpf_test_init(const union bpf_attr >>>>>> *kattr, u32 user_size, >>>>>> if (user_size > size) >>>>>> return ERR_PTR(-EMSGSIZE); >>>>>> + size = kmalloc_size_roundup(size); >>>>>> data = kzalloc(size + headroom + tailroom, GFP_USER); >>>>> >>>>> The fact that you need to do this roundup on call sites feels broken, no? >>>>> Was there some discussion / consensus that now all k*alloc() call sites >>>>> would need to be fixed up? Couldn't this be done transparently in k*alloc() >>>>> when KFENCE is enabled? I presume there may be lots of other such occasions >>>>> in the kernel where similar issue triggers, fixing up all call-sites feels >>>>> like ton of churn compared to api-internal, generic fix. >> >> I hope I answer this in more detail here: >> https://lore.kernel.org/lkml/202211010937.4631CB1B0E@keescook/ >> >> The problem is that ksize() should never have existed in the first >> place. :P Every runtime bounds checker has tripped over it, and with >> the addition of the __alloc_size attribute, I had to start ripping >> ksize() out: it can't be used to pretend an allocation grew in size. >> Things need to either preallocate more or go through *realloc() like >> everything else. Luckily, ksize() is rare. >> >> FWIW, the above fix doesn't look correct to me -- I would expect this to >> be: >> >> size_t alloc_size; >> ... >> alloc_size = kmalloc_size_roundup(size + headroom + tailroom); >> data = kzalloc(alloc_size, GFP_USER); > > Making sure the struct skb_shared_info is aligned to a cache line does > not need kmalloc_size_roundup(). > > What is needed is to adjust @size so that (@size + @headroom) is a > multiple of SMP_CACHE_BYTES ok, I'll fix it and send v2. Thanks .