From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B6A0C678D5 for ; Wed, 8 Mar 2023 07:17:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB0566B0072; Wed, 8 Mar 2023 02:17:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A39C3280001; Wed, 8 Mar 2023 02:17:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8D9956B0075; Wed, 8 Mar 2023 02:17:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7854E6B0072 for ; Wed, 8 Mar 2023 02:17:03 -0500 (EST) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 464821C4BD9 for ; Wed, 8 Mar 2023 07:17:03 +0000 (UTC) X-FDA: 80544874326.16.CF32BD5 Received: from szxga01-in.huawei.com (szxga01-in.huawei.com [45.249.212.187]) by imf05.hostedemail.com (Postfix) with ESMTP id 857FE10000C for ; Wed, 8 Mar 2023 07:16:59 +0000 (UTC) Authentication-Results: imf05.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf05.hostedemail.com: domain of chenjun102@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=chenjun102@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1678259820; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:references; bh=NhlorH1krfZqYfN5iYRkhcqRB9lR6oinLc/+GvRUUoo=; b=wl1iYbO6bvVl/r5kPs86htpTxvf86kxoEDeNoTvmiFwOxoH3bCCXDWazD9faOublueRxpm E5mhXld9jDYzke4r/7Jgy+97khgzP7IvlKwN43CfQ4E+4ipo9tgFtSs/QQa91Yb9LkTmhx pr78oNmhBfXd4U+Vrs0fEhLOcMtHbjs= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf05.hostedemail.com: domain of chenjun102@huawei.com designates 45.249.212.187 as permitted sender) smtp.mailfrom=chenjun102@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1678259820; a=rsa-sha256; cv=none; b=OG9erWIDjYXZXI1bToXytwJoxze+uxVkMwvGnqo1ws7NE4+mTAZHxGfybaTmU/4ERan4gE PJDPJOxbsDXkhj0lDp2ioDpo3KI1JHlPSHp+PhkwRD8ZOg1CRPxwQXzt8bty+n0Auzwmu/ Trrc6UguuIYm9Vv8LRK/QJIuqxsWGBY= Received: from kwepemi500013.china.huawei.com (unknown [172.30.72.55]) by szxga01-in.huawei.com (SkyGuard) with ESMTP id 4PWkBD1XXCzrSWD; Wed, 8 Mar 2023 15:16:04 +0800 (CST) Received: from dggpemm500006.china.huawei.com (7.185.36.236) by kwepemi500013.china.huawei.com (7.221.188.120) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.21; Wed, 8 Mar 2023 15:16:50 +0800 Received: from dggpemm500006.china.huawei.com ([7.185.36.236]) by dggpemm500006.china.huawei.com ([7.185.36.236]) with mapi id 15.01.2507.021; Wed, 8 Mar 2023 15:16:49 +0800 From: "chenjun (AM)" To: Hyeonggon Yoo <42.hyeyoo@gmail.com> CC: "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" , "cl@linux.com" , "penberg@kernel.org" , "rientjes@google.com" , "iamjoonsoo.kim@lge.com" , "akpm@linux-foundation.org" , "vbabka@suse.cz" , "xuqiang (M)" Subject: Re: [RFC] mm/slub: Reduce memory consumption in extreme scenarios Thread-Topic: [RFC] mm/slub: Reduce memory consumption in extreme scenarios Thread-Index: AQHZUM8n8Cz2r/0+O0WpGZE+2ntUXw== Date: Wed, 8 Mar 2023 07:16:49 +0000 Message-ID: <4ad448c565134d76bea0ac8afffe4f37@huawei.com> References: <20230307082811.120774-1-chenjun102@huawei.com> Accept-Language: zh-CN, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.174.178.43] Content-Type: text/plain; charset="iso-2022-jp" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 857FE10000C X-Stat-Signature: 3awpmem59qumz7rz7t718twxhh8tg5zk X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1678259819-230235 X-HE-Meta: U2FsdGVkX18wAqzXDFaukFiQRyc0eqdCj+XMXoblMKD/ch9sZk8XgEC0lMSaFWVfZXyKA1FR/bjbQISyOEku/0BEPMMPWCjCncXZqiCa3Z/oBhDGHjn0TwzyvjCGqqo12OmOFNVLPhYLxd/sJkzoPPzz13sf/7shK38fgcbw8B3+C7CkHCqeVIdpA0CpfxouDqFf/ahcn+m+ZtWKh5cpOl6EbguOh7m1tp4eOkA2Ukkew9u36E0Pa9xNWtu+tJpRPzBq/CVhCPXkK4f5wMhQ406FjBsMk+9bY5PSQ3bneBkK2Zk3gwWSIJv6fduyIooprG5A+vh057LPkC79TX36iMxhrOnKuY8hC4pCWvCgdayfVq3CpRdqlXVeaorImTP3RnVcpUctfsYjGl+s3TZSoQ5W9gOnLsaeJ2VXML3gQqG14MLPkXZADK5nEWaUW3AZHzGBOE/udEo92fHjeNGQsoVdYu0difggt9MgYjWALl+mqqZyZgDeQ74Ngcm05sUeZzuFg5EWel8FtM0cNnytagGh3MfAI9otRRYODiBB5Kr1k7tBlVPK19p33vFVM1fqWmGxk1B0w6JksGZzQ1glON5HBfGOOTaGNNlHe2OLFSTDRjjxx7ugQXfyjiiLreByhwFBcagv3mjkUkJq47SUoRpAvrJtRx7nS+ID/GMNtI8TZKOG0NK4jRzGWFgVeC+XjWmql+SzQlV/cTl1fu7KaL1aa5zTGF5eHjzcMDQxpTM58ZdmdOEAarMMdswPuKiEogSgWMnGzqnAFJ5YP5T1QeDNpuf0V4f7SLGailyRKcAtWRTpy8/QqPof0EK3CU0hkbZwiHjnyUGa03ivTpLcjIj1RnAfswPOZwsRXIhezvt6h9W7q+Sj1LhJUSOw0z0PNNII9q3a0Y++wtLwOhPBFrxLaT5TrP3OgOZcDToDPwY/FkuZtxUMBK29kdAow6UTi8ke7rlt3iKsFa4mchB HQmeS6SP Fsi/bscOmV8NY9TN0666xUj0O423x/IZk7RW9K7/gxZ97biPoV+DLpMRCvYPC6SKn71+9+SV+ZFF5G9br8ihS2ZBR44PjZGwNpt+0zBD5dE/XjgwHchBYb2MATym53HJ6cjJO1rmuEqSpTmmbywCuuy4Ku+pzZ5uWBwAV/knRbiy6xsY2Ht8d+HVMIjgZA+FdyPfsu60Z2sQVe6dkK7t8c2Z+V302qDyeIoRpmWIVrpYw/icAh5OQ/x3e2SEjkOZLmXgFO7GRV1akXReZkrQrjPrsZy7NPoVMNBFi X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi,=0A= =0A= Thanks for reply.=0A= =0A= =1B$B:_=1B(B 2023/3/7 22:20, Hyeonggon Yoo =1B$B On Tue, Mar 07, 2023 at 08:28:11AM +0000, Chen Jun wrote:=0A= >> If call kmalloc_node with NO __GFP_THISNODE and node[A] with no memory.= =0A= >> Slub will alloc a slub page which is not belong to A, and put the page= =0A= >> to kmem_cache_node[page_to_nid(page)]. The page can not be reused=0A= >> at next calling, because NULL will be get from get_partical().=0A= >> That make kmalloc_node consume more memory.=0A= > =0A= > Hello,=0A= > =0A= > elaborating a little bit:=0A= > =0A= > "When kmalloc_node() is called without __GFP_THISNODE and the target node= =0A= > lacks sufficient memory, SLUB allocates a folio from a different node oth= er=0A= > than the requested node, instead of taking a partial slab from it.=0A= > =0A= > However, since the allocated folio does not belong to the requested node,= =0A= > it is deactivated and added to the partial slab list of the node it=0A= > belongs to.=0A= > =0A= > This behavior can result in excessive memory usage when the requested=0A= > node has insufficient memory, as SLUB will repeatedly allocate folios fro= m=0A= > other nodes without reusing the previously allocated ones.=0A= > =0A= > To prevent memory wastage, take a partial slab from a different node when= =0A= > the requested node has no partial slab and __GFP_THISNODE is not explicit= ly=0A= > specified."=0A= > =0A= =0A= Thanks, This is more clear than what I described.=0A= =0A= >> On qemu with 4 numas and each numa has 1G memory, Write a test ko=0A= >> to call kmalloc_node(196, 0xd20c0, 3) for 5 * 1024 * 1024 times.=0A= >>=0A= >> cat /proc/slabinfo shows:=0A= >> kmalloc-256 4302317 15151808 256 32 2 : tunables..=0A= >>=0A= >> the total objects is much more then active objects.=0A= >>=0A= >> After this patch, cat /prac/slubinfo shows:=0A= >> kmalloc-256 5244950 5245088 256 32 2 : tunables..=0A= >>=0A= >> Signed-off-by: Chen Jun =0A= >> ---=0A= >> mm/slub.c | 17 ++++++++++++++---=0A= >> 1 file changed, 14 insertions(+), 3 deletions(-)=0A= >>=0A= >> diff --git a/mm/slub.c b/mm/slub.c=0A= >> index 39327e98fce3..c0090a5de54e 100644=0A= >> --- a/mm/slub.c=0A= >> +++ b/mm/slub.c=0A= >> @@ -2384,7 +2384,7 @@ static void *get_partial(struct kmem_cache *s, int= node, struct partial_context=0A= >> searchnode =3D numa_mem_id();=0A= >> =0A= >> object =3D get_partial_node(s, get_node(s, searchnode), pc);=0A= >> - if (object || node !=3D NUMA_NO_NODE)=0A= >> + if (object || (node !=3D NUMA_NO_NODE && (pc->flags & __GFP_THISNODE))= )=0A= >> return object;=0A= > =0A= > I think the problem here is to avoid taking a partial slab from=0A= > different node than the requested node even if __GFP_THISNODE is not set.= =0A= > (and then allocating new slab instead)=0A= > =0A= > Thus this hunk makes sense to me,=0A= > even if SLUB currently do not implement __GFP_THISNODE semantics.=0A= > =0A= >> return get_any_partial(s, pc);=0A= >> @@ -3069,6 +3069,7 @@ static void *___slab_alloc(struct kmem_cache *s, g= fp_t gfpflags, int node,=0A= >> struct slab *slab;=0A= >> unsigned long flags;=0A= >> struct partial_context pc;=0A= >> + int try_thisndoe =3D 0;=0A= >>=0A= >> =0A= >> stat(s, ALLOC_SLOWPATH);=0A= >> =0A= >> @@ -3181,8 +3182,12 @@ static void *___slab_alloc(struct kmem_cache *s, = gfp_t gfpflags, int node,=0A= >> }=0A= >> =0A= >> new_objects:=0A= >> -=0A= >> pc.flags =3D gfpflags;=0A= >> +=0A= >> + /* Try to get page from specific node even if __GFP_THISNODE is not se= t */=0A= >> + if (node !=3D NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_this= node)=0A= >> + pc.flags |=3D __GFP_THISNODE;=0A= >> +=0A= >> pc.slab =3D &slab;=0A= >> pc.orig_size =3D orig_size;=0A= >> freelist =3D get_partial(s, node, &pc);=0A= >> @@ -3190,10 +3195,16 @@ static void *___slab_alloc(struct kmem_cache *s,= gfp_t gfpflags, int node,=0A= >> goto check_new_slab;=0A= >> =0A= >> slub_put_cpu_ptr(s->cpu_slab);=0A= >> - slab =3D new_slab(s, gfpflags, node);=0A= >> + slab =3D new_slab(s, pc.flags, node);=0A= >> c =3D slub_get_cpu_ptr(s->cpu_slab);=0A= >> =0A= >> if (unlikely(!slab)) {=0A= >> + /* Try to get page from any other node */=0A= >> + if (node !=3D NUMA_NO_NODE && !(gfpflags & __GFP_THISNODE) && try_thi= snode) {=0A= >> + try_thisnode =3D 0;=0A= >> + goto new_objects;=0A= >> + }=0A= >> +=0A= >> slab_out_of_memory(s, gfpflags, node);=0A= >> return NULL;=0A= > =0A= > But these hunks do not make sense to me.=0A= > Why force __GFP_THISNODE even when the caller did not specify it?=0A= > =0A= > (Apart from the fact that try_thisnode is defined as try_thisndoe,=0A= > and try_thisnode is never set to nonzero value.)=0A= =0A= My mistake=1B$B!$=1B(B It should be:=0A= int try_thisnode =3D 0;=0A= =0A= > =0A= > IMHO the first hunk is enough to solve the problem.=0A= =0A= I think, we should try to alloc a page on the target node before getting = =0A= one from other nodes' partial.=0A= =0A= If the caller does not specify __GFP_THISNODE, we add __GFP_THISNODE to =0A= try to get the slab only on the target node. If it fails, use the =0A= original GFP FLAG to allow fallback.=0A= =0A= > =0A= > Thanks,=0A= > Hyeonggon=0A= > =0A= >> }=0A= >> -- =0A= >> 2.17.1=0A= >>=0A= >>=0A= > =0A= =0A= Thanks,=0A= Chen Jun=0A=