From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7D775CD1284 for ; Tue, 2 Apr 2024 03:45:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CC5586B0082; Mon, 1 Apr 2024 23:45:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C4E386B0083; Mon, 1 Apr 2024 23:45:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AC7EC6B0085; Mon, 1 Apr 2024 23:45:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 8EFED6B0082 for ; Mon, 1 Apr 2024 23:45:34 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4863A1608DB for ; Tue, 2 Apr 2024 03:45:34 +0000 (UTC) X-FDA: 81963202188.20.AF3E830 Received: from out-185.mta1.migadu.com (out-185.mta1.migadu.com [95.215.58.185]) by imf14.hostedemail.com (Postfix) with ESMTP id 56DC5100007 for ; Tue, 2 Apr 2024 03:45:32 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Wq+KPU+3; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf14.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.185 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1712029532; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ctTsajfD2CFhooWbZgthbVIMHuKKMcgzYtu6va2X7T0=; b=7/2fuH+71JhRn/yxOUUB1HDu+ojHbuOpQeTklTXoZbY3qaO9VDp+xBm1JA8ummeLRvgCu3 XZdmy9tdUzP9MA6X2Zia7pstTxqczp6RkgMGSxSLuOWs63Vg3iRNEdkVolL37ubMfIGVJ3 Nsl/Rrly0dehIhXJGUQc5tiQ/O+Ogpg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Wq+KPU+3; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf14.hostedemail.com: domain of chengming.zhou@linux.dev designates 95.215.58.185 as permitted sender) smtp.mailfrom=chengming.zhou@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1712029532; a=rsa-sha256; cv=none; b=OmoPqeXCB25FK7nubJrarxKbGcMFX6sKSK7Cmtc9HogoNxAQgBOSpZxFE8rBARJ6GEJwbT X7Qu8vvKqKcaSlajz3GRo7Wn5KUbpH06/n5rfDEA6iZqokx5za5UOPCnYwc3Shu9kmd0QW wb6xMmWABwjH2aSBagNvBFEzeodnZEE= Message-ID: DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1712029529; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ctTsajfD2CFhooWbZgthbVIMHuKKMcgzYtu6va2X7T0=; b=Wq+KPU+3aGgby5YAp/hOzF4o4cV84D9KTn5dccOP/qIjuiQLkUs8CP3K5j0bHgNbq15Lp3 2sfFFbnCYx0+OnTFfwWwo0P5nG5aYQtWfUQGEan35u6dgCBwmlPBnV2XgbnSeLDkmcW732 jyFSZZHPaihVL3Re3fLjeIRNMeFT0YA= Date: Tue, 2 Apr 2024 11:45:19 +0800 MIME-Version: 1.0 Subject: Re: [PATCH] slub: fix slub segmentation To: Ming Yang , cl@linux.com, penberg@kernel.org, rientjes@google.com, iamjoonsoo.kim@lge.com, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, 42.hyeyoo@gmail.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: zhangliang5@huawei.com, wangzhigang17@huawei.com, liushixin2@huawei.com, alex.chen@huawei.com, pengyi.pengyi@huawei.com, xiqi2@huawei.com References: <20240402031025.1097-1-yangming73@huawei.com> Content-Language: en-US X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Chengming Zhou In-Reply-To: <20240402031025.1097-1-yangming73@huawei.com> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 56DC5100007 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 891w5bqbtsdjehpsybqhk59jph7yyo5j X-HE-Tag: 1712029532-118829 X-HE-Meta: U2FsdGVkX19MWuN/nMf0xJFEE1T7VKdXGf4NCub5kQ7RmU/FEKGMJdwYYls5lKslihxdzf0jkb2cygtfsatVvK5Hu05VWg/N54J4M/ITrQMuWOxD8HITaCgvh637W+8CIYCTsvdTNeqSajml5lk9wTcZIXyqH3jmDgPCgMDsTQryInjQw/GiT1i8Uz18XLnU+JhQGo2Ht5OdTP+F7PpXbKXS6ai7s6OFuam1F7bcsPeN5NBO60ORoe34njUDfIDbcy6Tcuv1ZJEwuqoqy61X9y1knp8KsP2+vRf+HNy12Uoq32bIhbTjQPdPRpagpU5pfea1fC5dpxczgIWiP9/gXuOAcbYXLfQSG54Vo0BtKKt/cSbmXvVmag3oVUbP8s19R0+g5P+e5ZqmF6nizHv5rE5A3dnPlW7f7P+91dwEwNV4zt9qHU5rl2uQ3+32dWEugpng1ZFQuzPHc0fycqGCjTAvEgLoiK87ZzoRzClVz1DnQVx5AqtYeYKs61sxgKU1tuWZWKblU42BakTAChjDdh3VVcJBbLaHO7qQb0KD9qMISkc1tiDpFJB1L5YRQvP30r/wDKVnQDz4ou+WPw6avyhny7HYpgKCtFC6p6+YDC+r6i1bp4fIYclRpWap96rUc4pMGgXLsggf33YZqh9b0GOSyk+A4PihW/3tGefnuVWMFLO3Yo7kP/MyQOspr7mQNAbLriBNaX/2u69MJfOx02gVBWWZQw9S+eMxNEUfpBGKa/MS+mBhnfcEp+cya7li0CmcMOS/+3qff5PeHy8ADeLCIDflEDBigSUNPxJZPM7xGf4Gh/fzCcmNHiSi5Dsz/AuncaW9gHYf2R7rsKe9M4/0KqWnlq62NxX3zs0ZlKgLVA7VzR0caJqxLaWT7a3LR0d9zN5pXts5Av4hicbI0LNRrNoRA5fYR9AED/IUFRco+qWkCHKAdJhLx9iYvfRVxhrotjOxuti+RLsQR/3 OOzEevHe 8csIDCwfTc1xLvaeRYMtA8QSG7lveuesPlugX9UdPBUHIiI0qpqC7Co8aS6keh1FV9M+KoO7V5wbazuoynBBJuXelKWsRI52t4XchlXVrvYTnr4Uqhsi1G5fFqjKWkCPTSLBic4FiE5bKVKGLotBSDO5riu+AVTcwIEjf8izexFI9CAugGh5HeKGIj3AyakzUy9IvOSzm9eVEwMRV7CeJMaxv6sqdpwIRjACzikCMWkTyN9NYzY2baWp0CQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 2024/4/2 11:10, Ming Yang wrote: > When one of numa nodes runs out of memory and lots of processes still > booting, slabinfo shows much slub segmentation exits. The following > shows some of them: > > tunables : slabdata > > kmalloc-512 84309 380800 1024 32 8 : > tunables 0 0 0 : slabdata 11900 11900 0 > kmalloc-256 65869 365408 512 32 4 : > tunables 0 0 0 : slabdata 11419 11419 0 > > 365408 "kmalloc-256" objects are alloced but only 65869 of them are > used; While 380800 "kmalloc-512" objects are alloced but only 84309 > of them are used. > > This problem exits in the following senario: > 1. Multiple numa nodes, e.g. four nodes. > 2. Lack of memory in any one node. > 3. Functions which alloc many slub memory in certain numa nodes, > like alloc_fair_sched_group. > > The slub segmentation generated because of the following reason: > In function "___slab_alloc" a new slab is attempted to be gotten via > function "get_partial". If the argument 'node' is assigned but there > are neither partial memory nor buddy memory in that assigned node, no > slab could be gotten. And then the program attempt to alloc new slub > from buddy system, as mentationed before: no buddy memory in that > assigned node left, a new slub might be alloced from the buddy system > of other node directly, no matter whether there is free partil memory > left on other node. As a result slub segmentation generated. > > The key point of above allocation flow is: the slab should be alloced > from the partial of other node first, instead of the buddy system of > other node directly. > > In this commit a new slub allocation flow is proposed: > 1. Attempt to get a slab via function get_partial (first step in > new_objects lable). > 2. If no slab is gotten and 'node' is assigned, try to alloc a new > slab just from the assigned node instead of all node. > 3. If no slab could be alloced from the assigned node, try to alloc > slub from partial of other node. > 4. If the alloctation in step 3 fails, alloc a new slub from buddy > system of all node. FYI, there is another patch to the very same problem: https://lore.kernel.org/all/20240330082335.29710-1-chenjun102@huawei.com/ > > Signed-off-by: Ming Yang > Signed-off-by: Liang Zhang > Signed-off-by: Zhigang Wang > Reviewed-by: Shixin Liu > --- > This patch can be tested and verified by following steps: > 1. First, try to run out memory on node0. echo 1000(depending on your memory) > > /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages. > 2. Second, boot 10000(depending on your memory) processes which use setsid > systemcall, as the setsid systemcall may likely call function > alloc_fair_sched_group. > 3. Last, check slabinfo, cat /proc/slabinfo. > > Hardware info: > Memory : 8GiB > CPU (total #): 120 > numa node: 4 > > Test clang code example: > int main() { > void *p = malloc(1024); > setsid(); > while(1); > } > > mm/slub.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > diff --git a/mm/slub.c b/mm/slub.c > index 1bb2a93cf7..3eb2e7d386 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3522,7 +3522,18 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > } > > slub_put_cpu_ptr(s->cpu_slab); > + if (node != NUMA_NO_NODE) { > + slab = new_slab(s, gfpflags | __GFP_THISNODE, node); > + if (slab) > + goto slab_alloced; > + > + slab = get_any_partial(s, &pc); > + if (slab) > + goto slab_alloced; > + } > slab = new_slab(s, gfpflags, node); > + > +slab_alloced: > c = slub_get_cpu_ptr(s->cpu_slab); > > if (unlikely(!slab)) {