From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F660EB64DD for ; Wed, 12 Jul 2023 08:37:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CDF8F6B0071; Wed, 12 Jul 2023 04:37:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C8F576B0072; Wed, 12 Jul 2023 04:37:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B58316B0075; Wed, 12 Jul 2023 04:37:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id A5A8E6B0071 for ; Wed, 12 Jul 2023 04:37:19 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 689DE16016D for ; Wed, 12 Jul 2023 08:37:19 +0000 (UTC) X-FDA: 81002305398.02.C8A8AAB Received: from mail-pf1-f176.google.com (mail-pf1-f176.google.com [209.85.210.176]) by imf15.hostedemail.com (Postfix) with ESMTP id 89257A0007 for ; Wed, 12 Jul 2023 08:37:17 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VqzN9T6Q; spf=pass (imf15.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689151037; a=rsa-sha256; cv=none; b=DPGKqxpy7LlnTAS9qgZlAQQrs/kiWLTmygknjssE5X/ncpe8+9xP989FbIxrGmNSwpojTh WwayKZFsYuFSW3A14/EB9q+Nd5jNnuNF3+lMV3Yig90jX141A4mSqxQ7XeKP9FfAuJAMPI IVMF3NHbUv6wJgJjjEsG8X9zKB8pqg4= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=bytedance.com header.s=google header.b=VqzN9T6Q; spf=pass (imf15.hostedemail.com: domain of zhangpeng.00@bytedance.com designates 209.85.210.176 as permitted sender) smtp.mailfrom=zhangpeng.00@bytedance.com; dmarc=pass (policy=quarantine) header.from=bytedance.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1689151037; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rBUqtpQ/qPgW5u7ywMLTowE+PlXrM3g8tx27h51sMCE=; b=0Uv04pY7vhpBwchjzz2jbM1n1bby0VGLZIouEXCP552w2pCVsVzZDM6DF5YA6pn7WPIYUS RAbl0aR/Mj5pjfAbGm/20eI+v8UpPh+f8XjWwKiwL8qyEAC7k+6h/RXg1Q12LvbM+0FHmr OjVBLPaiSGsim/TOpUM4QJsIDH/Q4c8= Received: by mail-pf1-f176.google.com with SMTP id d2e1a72fcca58-6689430d803so3708570b3a.0 for ; Wed, 12 Jul 2023 01:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance.com; s=google; t=1689151036; x=1691743036; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:from:to:cc:subject:date :message-id:reply-to; bh=rBUqtpQ/qPgW5u7ywMLTowE+PlXrM3g8tx27h51sMCE=; b=VqzN9T6Qk0SmNiMem+kRtijR440z3E/ALzGAi5DDK6EoHaDzEG/nn3Ed2Qv6pH6Iyw VHphXyz81s1pWvpd9DYqoqyVUxrs9jGYhkmJrFtpjhHBEwogtxcBINhKqQHqVblVqUO8 /KeWzOyK6V4UDu0itEHFvgsvtfZKNRJXn+mbCx2r8H8oYSb/p0K+H5e7vx4a0/h3Zpm1 twR1Zo4NCjJTuFY8b1HAl+Q6DJYwJjOe4MC1k0S6I7CrK3mu0QXpEHzbyOFcxsuTgfNz JplpzbKV72uyfouPy9r3K5hfutrfi3qrrOPCi5u3iYE8E9zF8IVLRtOX9LffJXqcWJSw D5qA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1689151036; x=1691743036; h=content-transfer-encoding:in-reply-to:from:references:cc:to:subject :user-agent:mime-version:date:message-id:x-gm-message-state:from:to :cc:subject:date:message-id:reply-to; bh=rBUqtpQ/qPgW5u7ywMLTowE+PlXrM3g8tx27h51sMCE=; b=gFMXyZ9bx1EV7mdnJ8zVlogbSzVShMyLjCAShTJJ1yZvBpHaqiRyLq1PtH+hMe3UgN vUr/UEDbYscYd79aELzykPFqUH8On0tExlSFohSjsOj4DNMUkWOZPlSaOMnTXufC1cCO ST9PI1hPgOuDuB0ocl75kFOuY02KlYAjhM9WWq9EkY8XmFvkgGq/ozC0Vn4dzhXPk0df NYJ9dCOYWEqR3gyxl4Z1ku3oy1P/hfVOpwpW76b8qKdQk83WSfwYdIPVd3FzBgOPvVHp NU5bdYm4Ybojk/fmlG7PcR/4JZN8LGQaisCk/0Q1yNOo2yUqZum9OtCHbSXMX/rSEyBG edEQ== X-Gm-Message-State: ABy/qLZKqJ8jeFIJmD0hvknRHWrVxICqiVDDByoXatv88IVDaBKGT1q6 q13R1Wcdt4AC6Kiou/n9rTFF0A== X-Google-Smtp-Source: APBJJlH41zH8i6+u4OkMNyNGL+1U7FgwlqKFvL7f8wmCC510ncVN0Y1EGYRFfKG++mLfEqfgMFyv7w== X-Received: by 2002:a05:6a00:1409:b0:673:5d1e:6654 with SMTP id l9-20020a056a00140900b006735d1e6654mr16234608pfu.33.1689151036022; Wed, 12 Jul 2023 01:37:16 -0700 (PDT) Received: from [10.254.22.102] ([139.177.225.243]) by smtp.gmail.com with ESMTPSA id c19-20020aa78e13000000b00682b2fbd20fsm3078868pfr.31.2023.07.12.01.37.12 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 12 Jul 2023 01:37:15 -0700 (PDT) Message-ID: Date: Wed, 12 Jul 2023 16:37:10 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.13.0 Subject: Re: [PATCH] mm: kfence: allocate kfence_metadata at runtime To: Peng Zhang Cc: glider@google.com, dvyukov@google.com, akpm@linux-foundation.org, kasan-dev@googlegroups.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, muchun.song@linux.dev, Marco Elver References: <20230710032714.26200-1-zhangpeng.00@bytedance.com> <2a16a76c-506c-f325-6792-4fb58e8da531@bytedance.com> From: Peng Zhang In-Reply-To: <2a16a76c-506c-f325-6792-4fb58e8da531@bytedance.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 89257A0007 X-Stat-Signature: 3cj3fk5km5egaouoywrg193buzcbwbma X-Rspam-User: X-HE-Tag: 1689151037-556890 X-HE-Meta: U2FsdGVkX19zcAJ8hxQ0HxdxOzXreZvBJDZsBr20M/xH6eUL5huPlY8lIYgxuXM/CBbW+wVU6ylkW4uKvyW2BwNa/IqG5zJMHokoWnmCLQ+qG9NRtoXW5XBYs6uBuRiuBzzRs3TT09CkUr713U2Dt97MJtI+Drs/UpAG46TLJyEv1BkNeNbJ20ZvKGitlevIsPIb3f5FlfRZBRf2RSN1yAIXt8GHUc3xRx40S4PUB7kx3CYQ7N4BWiab2SAYLMqToMmhQ89L2MB6AO/W18OiPlWfOnaZHGunTN8A8ldCwfMPFb/JeRV/MUoddgUCifGQZsSdnP1OoKcgn/KKXwy+TsGvAR1z2CEGkT1JUp8yKoa3X9FYdJXrKyFbyDhDvEGJI1AwEsdz4B+t8mbHPBZIMo1NxOfGchL6eRaFVbfPj7kZWHjWN1dry5/jyh4cSOBbdOMLWUoQwEr1KJ3F3uadhSZW2aiU02hp7LIj1s87N+FxDOwasaLoGMZDnl5v1sX7iPMIlIMUI/F4uMZazGG9+YvA3r2wggSRnw3VKa0/LUMHBnIC0GqZsw5AbbMWrBcgWRTH7UZkGpAnMjGJbofFWR63nsHgw9JQhlvkPAttNSoOJPy26q84Ll5MWVkpbcnO/eILvmXYAL6p0YrAP92XNqm2lm3ePXJXgR4kn72T+rW4ZJYtcRJpQo1bip/Ia9gyuRiLs4GsMA3hGUmeaEok1Liw4NBY/4PIZZ5H8O2wefiFNkER9QzvGMvvr32Q+FQhzftBysCxFuvB1wcKS+IPPH1TOVxHR2bzAMgYoEUgiRgPaJcAKVVhtnsJWlVbfGrq3wD53Q95WXy8szlarusZ7CNiPnNIRhx56KzZWC5uq07MXb4ww2VU2ClEHnFOI7ZWrDUmoZGVhLkapC7YWYJ2BX8QQzKHtipLBxwo/E+XU1TqL5FoEQMyHdKxijlwqR/MVCC6TDOFP+Og0mnDhH7 ISfmZ9j4 1OT7LGXHVMgmDXB3oz8kiubYmBMUKqJxoy7u40/gnBvd0o6ueg7JmzhocSUyX9M1eQSqc/KLOMFgcH0iOYblbwtqmTgZa1Jk1FTLb7zGb00u97B6A2VmblCYPIir8tPnvQtQL8DSk2pv3GzC9UGML5lBGzl4FcnLvDvPb5Fz2FSu5cbFZDx6sS/JCzrlntEYte6Bi4XM+xkjm20SfgsnUuOvlAODEZgMnR5kuedxe4qo9ezoqpu+vn0SXePLrSz/voc3Em7z7Ohnqmw/CS3wlhH7+h9dwmB4rrBkzq+ZhPzpM6lotpncgCoXeXUUmXyPoWNQb16lNDQcCTtGGJD3fD0DDnGSsVRLwq1DMtVmRtdq8+o/Wp0NtGUIVBPg+bic1XZtETyLrySxHfepL3sOc1SMcGc8DRp6zYBor+Uy6BY1X4pO6AoaIu2UHtmNdqhAPOpxpIpSsnQdtC/eMgmO68TRX6RHJAfT8sCyV9LkPcHN5kx7V45UCE31DWr3XGzM+u/jRz0GYtCw5LuIQF6I6SX5gdS/haX9PUUcyOJ/ViYDBPnqNPGg/ab38QXb8tgHgUg17f3AVTnRMCEI7dPSVCLXdeA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: 在 2023/7/12 16:28, Peng Zhang 写道: > > > 在 2023/7/10 18:19, Marco Elver 写道: >> On Mon, 10 Jul 2023 at 05:27, 'Peng Zhang' via kasan-dev >> wrote: >>> >>> kfence_metadata is currently a static array. For the purpose of >>> allocating scalable __kfence_pool, we first change it to runtime >>> allocation of metadata. Since the size of an object of kfence_metadata >>> is 1160 bytes, we can save at least 72 pages (with default 256 objects) >>> without enabling kfence. >>> >>> Below is the numbers obtained in qemu (with default 256 objects). >>> before: Memory: 8134692K/8388080K available (3668K bss) >>> after: Memory: 8136740K/8388080K available (1620K bss) >>> More than expected, it saves 2MB memory. >>> >>> Signed-off-by: Peng Zhang >> >> Seems like a reasonable optimization, but see comments below. >> >> Also with this patch applied on top of v6.5-rc1, KFENCE just doesn't >> init at all anymore (early init). Please fix. > I'm very sorry because I made a slight modification before sending the > patch but it has not been tested, which caused it to not work properly. > I fixed some of the issues you mentioned in v2[1]. > > [1] > https://lore.kernel.org/lkml/20230712081616.45177-1-zhangpeng.00@bytedance.com/ > >> >>> --- >>>   mm/kfence/core.c   | 102 ++++++++++++++++++++++++++++++++------------- >>>   mm/kfence/kfence.h |   5 ++- >>>   2 files changed, 78 insertions(+), 29 deletions(-) >>> >>> diff --git a/mm/kfence/core.c b/mm/kfence/core.c >>> index dad3c0eb70a0..b9fec1c46e3d 100644 >>> --- a/mm/kfence/core.c >>> +++ b/mm/kfence/core.c >>> @@ -116,7 +116,7 @@ EXPORT_SYMBOL(__kfence_pool); /* Export for test >>> modules. */ >>>    * backing pages (in __kfence_pool). >>>    */ >>>   static_assert(CONFIG_KFENCE_NUM_OBJECTS > 0); >>> -struct kfence_metadata kfence_metadata[CONFIG_KFENCE_NUM_OBJECTS]; >>> +struct kfence_metadata *kfence_metadata; >>> >>>   /* Freelist with available objects. */ >>>   static struct list_head kfence_freelist = >>> LIST_HEAD_INIT(kfence_freelist); >>> @@ -643,13 +643,56 @@ static unsigned long kfence_init_pool(void) >>>          return addr; >>>   } >>> >>> +static int kfence_alloc_metadata(void) >>> +{ >>> +       unsigned long nr_pages = KFENCE_METADATA_SIZE / PAGE_SIZE; >>> + >>> +#ifdef CONFIG_CONTIG_ALLOC >>> +       struct page *pages; >>> + >>> +       pages = alloc_contig_pages(nr_pages, GFP_KERNEL, >>> first_online_node, >>> +                                  NULL); >>> +       if (pages) >>> +               kfence_metadata = page_to_virt(pages); >>> +#else >>> +       if (nr_pages > MAX_ORDER_NR_PAGES) { >>> +               pr_warn("KFENCE_NUM_OBJECTS too large for buddy >>> allocator\n"); >> >> Does this mean that KFENCE won't work at all if we can't allocate the >> metadata? I.e. it won't work either in early nor late init modes? >> >> I know we already have this limitation for _late init_ of the KFENCE >> pool. >> >> So I have one major question: when doing _early init_, what is the >> maximum size of the KFENCE pool (#objects) with this change? > It will be limited to 2^10/sizeof(struct kfence_metadata) by buddy Sorry, 2^10*PAGE_SIZE/sizeof(struct kfence_metadata) > system, so I used memblock to allocate kfence_metadata in v2. >> >>> +               return -EINVAL; >>> +       } >>> +       kfence_metadata = alloc_pages_exact(KFENCE_METADATA_SIZE, >>> +                                           GFP_KERNEL); >>> +#endif >>> + >>> +       if (!kfence_metadata) >>> +               return -ENOMEM; >>> + >>> +       memset(kfence_metadata, 0, KFENCE_METADATA_SIZE); >> >> memzero_explicit, or pass __GFP_ZERO to alloc_pages? > Unfortunately, __GFP_ZERO does not work successfully in > alloc_contig_pages(), so I used memzero_explicit() in v2. > Even though I don't know if memzero_explicit() is necessary > (it just uses the barrier). >> >>> +       return 0; >>> +} >>> + >>> +static void kfence_free_metadata(void) >>> +{ >>> +       if (WARN_ON(!kfence_metadata)) >>> +               return; >>> +#ifdef CONFIG_CONTIG_ALLOC >>> +       free_contig_range(page_to_pfn(virt_to_page((void >>> *)kfence_metadata)), >>> +                         KFENCE_METADATA_SIZE / PAGE_SIZE); >>> +#else >>> +       free_pages_exact((void *)kfence_metadata, KFENCE_METADATA_SIZE); >>> +#endif >>> +       kfence_metadata = NULL; >>> +} >>> + >>>   static bool __init kfence_init_pool_early(void) >>>   { >>> -       unsigned long addr; >>> +       unsigned long addr = (unsigned long)__kfence_pool; >>> >>>          if (!__kfence_pool) >>>                  return false; >>> >>> +       if (!kfence_alloc_metadata()) >>> +               goto free_pool; >>> + >>>          addr = kfence_init_pool(); >>> >>>          if (!addr) { >>> @@ -663,6 +706,7 @@ static bool __init kfence_init_pool_early(void) >>>                  return true; >>>          } >>> >>> +       kfence_free_metadata(); >>>          /* >>>           * Only release unprotected pages, and do not try to go back >>> and change >>>           * page attributes due to risk of failing to do so as well. >>> If changing >>> @@ -670,31 +714,12 @@ static bool __init kfence_init_pool_early(void) >>>           * fails for the first page, and therefore expect >>> addr==__kfence_pool in >>>           * most failure cases. >>>           */ >>> +free_pool: >>>          memblock_free_late(__pa(addr), KFENCE_POOL_SIZE - (addr - >>> (unsigned long)__kfence_pool)); >>>          __kfence_pool = NULL; >>>          return false; >>>   } >>> >>> -static bool kfence_init_pool_late(void) >>> -{ >>> -       unsigned long addr, free_size; >>> - >>> -       addr = kfence_init_pool(); >>> - >>> -       if (!addr) >>> -               return true; >>> - >>> -       /* Same as above. */ >>> -       free_size = KFENCE_POOL_SIZE - (addr - (unsigned >>> long)__kfence_pool); >>> -#ifdef CONFIG_CONTIG_ALLOC >>> -       free_contig_range(page_to_pfn(virt_to_page((void *)addr)), >>> free_size / PAGE_SIZE); >>> -#else >>> -       free_pages_exact((void *)addr, free_size); >>> -#endif >>> -       __kfence_pool = NULL; >>> -       return false; >>> -} >>> - >>>   /* === DebugFS Interface >>> ==================================================== */ >>> >>>   static int stats_show(struct seq_file *seq, void *v) >>> @@ -896,6 +921,10 @@ void __init kfence_init(void) >>>   static int kfence_init_late(void) >>>   { >>>          const unsigned long nr_pages = KFENCE_POOL_SIZE / PAGE_SIZE; >>> +       unsigned long addr = (unsigned long)__kfence_pool; >>> +       unsigned long free_size = KFENCE_POOL_SIZE; >>> +       int ret; >>> + >>>   #ifdef CONFIG_CONTIG_ALLOC >>>          struct page *pages; >>> >>> @@ -913,15 +942,29 @@ static int kfence_init_late(void) >>>                  return -ENOMEM; >>>   #endif >>> >>> -       if (!kfence_init_pool_late()) { >>> -               pr_err("%s failed\n", __func__); >>> -               return -EBUSY; >>> +       ret = kfence_alloc_metadata(); >>> +       if (!ret) >>> +               goto free_pool; >>> + >>> +       addr = kfence_init_pool(); >>> +       if (!addr) { >>> +               kfence_init_enable(); >>> +               kfence_debugfs_init(); >>> +               return 0; >>>          } >>> >>> -       kfence_init_enable(); >>> -       kfence_debugfs_init(); >>> +       pr_err("%s failed\n", __func__); >>> +       kfence_free_metadata(); >>> +       free_size = KFENCE_POOL_SIZE - (addr - (unsigned >>> long)__kfence_pool); >>> +       ret = -EBUSY; >>> >>> -       return 0; >>> +free_pool: >>> +#ifdef CONFIG_CONTIG_ALLOC >>> +       free_contig_range(page_to_pfn(virt_to_page((void *)addr)), >>> free_size / PAGE_SIZE); >>> +#else >>> +       free_pages_exact((void *)addr, free_size); >>> +#endif >> >> You moved this from kfence_init_pool_late - that did "__kfence_pool = >> NULL" which is missing now. > Thanks for spotting this, I added it in v2. >