From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-13.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 361D7C63797 for ; Thu, 22 Jul 2021 14:51:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BDD8661353 for ; Thu, 22 Jul 2021 14:51:57 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BDD8661353 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linuxfoundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 526026B0036; Thu, 22 Jul 2021 10:51:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D5FC6B005D; Thu, 22 Jul 2021 10:51:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C49C6B006C; Thu, 22 Jul 2021 10:51:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0179.hostedemail.com [216.40.44.179]) by kanga.kvack.org (Postfix) with ESMTP id 234A86B0036 for ; Thu, 22 Jul 2021 10:51:57 -0400 (EDT) Received: from smtpin32.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id CBB59181AEF15 for ; Thu, 22 Jul 2021 14:51:56 +0000 (UTC) X-FDA: 78390513432.32.71BFDD3 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by imf27.hostedemail.com (Postfix) with ESMTP id 60FCA70012ED for ; Thu, 22 Jul 2021 14:51:56 +0000 (UTC) Received: by mail.kernel.org (Postfix) with ESMTPSA id 2E5E361221; Thu, 22 Jul 2021 14:51:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=linuxfoundation.org; s=korg; t=1626965515; bh=xdA6Hdjg2+6Gzp3nnHsFI7gA3TVgFZUyFVtoZ8ddagA=; h=Subject:To:Cc:From:Date:In-Reply-To:From; b=x86HAxbF3jPLtpALGcn9t4EpN2HXMlEOcafTgMhtSQ5RFjeIOZVOfFbe9zzzapQEA pY7hmaSoTBCzudDXWZXlPH9LX9pR/y6So23sfEREaVg4AL+YFKvRpxFvCabIrvxLoB grg/k1VoBLfKOXtEwgOtqvNdHlE1aJpfCyr7QqDU= Subject: Patch "mm: slab: fix kmem_cache_create failed when sysfs node not destroyed" has been added to the 4.19-stable tree To: akpm@linux-foundation.org,cl@linux.com,gregkh@linuxfoundation.org,iamjoonsoo.kim@lge.com,linux-mm@kvack.org,penberg@kernel.org,rientjes@google.com,songmuchun@bytedance.com,sunnanyong@huawei.com Cc: From: Date: Thu, 22 Jul 2021 16:51:42 +0200 In-Reply-To: <20210720082048.2797315-1-sunnanyong@huawei.com> Message-ID: <1626965502161250@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ANSI_X3.4-1968 X-stable: commit X-Patchwork-Hint: ignore Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=linuxfoundation.org header.s=korg header.b=x86HAxbF; spf=pass (imf27.hostedemail.com: domain of gregkh@linuxfoundation.org designates 198.145.29.99 as permitted sender) smtp.mailfrom=gregkh@linuxfoundation.org; dmarc=pass (policy=none) header.from=linuxfoundation.org X-Stat-Signature: ibsubz8moiwyzjmmeuraafdjmw57uduc X-Rspamd-Queue-Id: 60FCA70012ED X-Rspamd-Server: rspam01 X-HE-Tag: 1626965516-606303 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a note to let you know that I've just added the patch titled mm: slab: fix kmem_cache_create failed when sysfs node not destroyed to the 4.19-stable tree which can be found at: http://www.kernel.org/git/?p=3Dlinux/kernel/git/stable/stable-queue.g= it;a=3Dsummary The filename of the patch is: mm-slab-fix-kmem_cache_create-failed-when-sysfs-node-not-destroyed.p= atch and it can be found in the queue-4.19 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let know about it. >From sunnanyong@huawei.com Thu Jul 22 16:42:41 2021 From: Nanyong Sun Date: Tue, 20 Jul 2021 16:20:48 +0800 Subject: mm: slab: fix kmem_cache_create failed when sysfs node not destr= oyed To: , , , , , Cc: , , Message-ID: <20210720082048.2797315-1-sunnanyong@huawei.com> From: Nanyong Sun The commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy") introduced a problem: If one thread destroy a kmem_cache A and another thread concurrently create a kmem_cache B, which is mergeable with A and has same size with A, the B may fail to create due to the duplicate sysfs node. The scenario in detail: 1) Thread 1 uses kmem_cache_destroy() to destroy kmem_cache A which is mergeable, it decreases A's refcount and if refcount is 0, then call memcg_set_kmem_cache_dying() which set A->memcg_params.dying =3D true, then unlock the slab_mutex and call flush_memcg_workqueue(), it may cost a while. Note: now the sysfs node(like '/kernel/slab/:0000248') of A is still present, it will be deleted in shutdown_cache() which will be called after flush_memcg_workqueue() is done and lock the slab_mutex again. 2) Now if thread 2 is coming, it use kmem_cache_create() to create B, whi= ch is mergeable with A(their size is same), it gain the lock of slab_mutex, then call __kmem_cache_alias() trying to find a mergeable node, because of the below added code in commit d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_cache destroy"), B is not mergeable with A whose memcg_params.dying is true. int slab_unmergeable(struct kmem_cache *s) if (s->refcount < 0) return 1; /* * Skip the dying kmem_cache. */ if (s->memcg_params.dying) return 1; return 0; } So B has to create its own sysfs node by calling: create_cache-> __kmem_cache_create-> sysfs_slab_add-> kobject_init_and_add Because B is mergeable itself, its filename of sysfs node is based on its= size, like '/kernel/slab/:0000248', which is duplicate with A, and the sysfs node of A is still present now, so kobject_init_and_add() will return fail and result in kmem_cache_create() fail. Concurrently modprobe and rmmod the two modules below can reproduce the i= ssue quickly: nf_conntrack_expect, se_sess_cache. See call trace in the end. LTS versions of v4.19.y and v5.4.y have this problem, whereas linux versi= ons after v5.9 do not have this problem because the patchset: ("The new cgroup slab= memory controller") almost refactored memcg slab. A potential solution(this patch belongs): Just let the dying kmem_cache b= e mergeable, the slab_mutex lock can prevent the race between alias kmem_cache creatin= g thread and root kmem_cache destroying thread. In the destroying thread, after flush_memcg_workqueue() is done, judge the refcount again, if someone reference it again during un-lock time, we don't need to destroy the kmem= _cache completely, we can reuse it. Another potential solution: revert the commit d38a2b7a9c93 ("mm: memcg/sl= ab: fix memory leak at non-root kmem_cache destroy"), compare to the fail of kmem_cache_create, the memory leak in special scenario seems less harmful= . Call trace: sysfs: cannot create duplicate filename '/kernel/slab/:0000248' Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x198 show_stack+0x24/0x30 dump_stack+0xb0/0x100 sysfs_warn_dup+0x6c/0x88 sysfs_create_dir_ns+0x104/0x120 kobject_add_internal+0xd0/0x378 kobject_init_and_add+0x90/0xd8 sysfs_slab_add+0x16c/0x2d0 __kmem_cache_create+0x16c/0x1d8 create_cache+0xbc/0x1f8 kmem_cache_create_usercopy+0x1a0/0x230 kmem_cache_create+0x50/0x68 init_se_kmem_caches+0x38/0x258 [target_core_mod] target_core_init_configfs+0x8c/0x390 [target_core_mod] do_one_initcall+0x54/0x230 do_init_module+0x64/0x1ec load_module+0x150c/0x16f0 __se_sys_finit_module+0xf0/0x108 __arm64_sys_finit_module+0x24/0x30 el0_svc_common+0x80/0x1c0 el0_svc_handler+0x78/0xe0 el0_svc+0x10/0x260 kobject_add_internal failed for :0000248 with -EEXIST, don't try to regi= ster things with the same name in the same directory. kmem_cache_create(se_sess_cache) failed with error -17 Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x0/0x198 show_stack+0x24/0x30 dump_stack+0xb0/0x100 kmem_cache_create_usercopy+0xa8/0x230 kmem_cache_create+0x50/0x68 init_se_kmem_caches+0x38/0x258 [target_core_mod] target_core_init_configfs+0x8c/0x390 [target_core_mod] do_one_initcall+0x54/0x230 do_init_module+0x64/0x1ec load_module+0x150c/0x16f0 __se_sys_finit_module+0xf0/0x108 __arm64_sys_finit_module+0x24/0x30 el0_svc_common+0x80/0x1c0 el0_svc_handler+0x78/0xe0 el0_svc+0x10/0x260 Fixes: d38a2b7a9c93 ("mm: memcg/slab: fix memory leak at non-root kmem_ca= che destroy") Signed-off-by: Nanyong Sun Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman --- mm/slab_common.c | 18 ++++++++++-------- 1 file changed, 10 insertions(+), 8 deletions(-) --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -311,14 +311,6 @@ int slab_unmergeable(struct kmem_cache * if (s->refcount < 0) return 1; =20 -#ifdef CONFIG_MEMCG_KMEM - /* - * Skip the dying kmem_cache. - */ - if (s->memcg_params.dying) - return 1; -#endif - return 0; } =20 @@ -918,6 +910,16 @@ void kmem_cache_destroy(struct kmem_cach get_online_mems(); =20 mutex_lock(&slab_mutex); + + /* + * Another thread referenced it again + */ + if (READ_ONCE(s->refcount)) { + spin_lock_irq(&memcg_kmem_wq_lock); + s->memcg_params.dying =3D false; + spin_unlock_irq(&memcg_kmem_wq_lock); + goto out_unlock; + } #endif =20 err =3D shutdown_memcg_caches(s); Patches currently in stable-queue which might be from sunnanyong@huawei.c= om are queue-4.19/mm-slab-fix-kmem_cache_create-failed-when-sysfs-node-not-destr= oyed.patch