linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Chengming Zhou <chengming.zhou@linux.dev>,
	Ming Yang <yangming73@huawei.com>,
	cl@linux.com, penberg@kernel.org, rientjes@google.com,
	iamjoonsoo.kim@lge.com, akpm@linux-foundation.org,
	roman.gushchin@linux.dev, 42.hyeyoo@gmail.com,
	linux-mm@kvack.org, linux-kernel@vger.kernel.org
Cc: zhangliang5@huawei.com, wangzhigang17@huawei.com,
	liushixin2@huawei.com, alex.chen@huawei.com,
	pengyi.pengyi@huawei.com, xiqi2@huawei.com
Subject: Re: [PATCH] slub: fix slub segmentation
Date: Tue, 2 Apr 2024 18:13:22 +0200	[thread overview]
Message-ID: <cd9d9890-0259-4039-917b-0e517a388433@suse.cz> (raw)
In-Reply-To: <cd42083e-ea53-48bd-aa32-a16fc9f73ffa@linux.dev>

On 4/2/24 5:45 AM, Chengming Zhou wrote:
> On 2024/4/2 11:10, Ming Yang wrote:
>> When one of numa nodes runs out of memory and lots of processes still
>> booting, slabinfo shows much slub segmentation exits. The following

You mean fragmentation not segmentation, right?

>> shows some of them:
>> 
>> tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs>
>> <num_slabs> <sharedavail>
>> kmalloc-512        84309 380800   1024   32    8 :
>> tunables    0    0    0 : slabdata  11900  11900      0
>> kmalloc-256        65869 365408    512   32    4 :
>> tunables    0    0    0 : slabdata  11419  11419      0
>> 
>> 365408 "kmalloc-256" objects are alloced but only 65869 of them are
>> used; While 380800 "kmalloc-512" objects are alloced but only 84309
>> of them are used.
>> 
>> This problem exits in the following senario:
>> 1. Multiple numa nodes, e.g. four nodes.
>> 2. Lack of memory in any one node.
>> 3. Functions which alloc many slub memory in certain numa nodes,
>> like alloc_fair_sched_group.
>> 
>> The slub segmentation generated because of the following reason:
>> In function "___slab_alloc" a new slab is attempted to be gotten via
>> function "get_partial". If the argument 'node' is assigned but there
>> are neither partial memory nor buddy memory in that assigned node, no
>> slab could be gotten. And then the program attempt to alloc new slub
>> from buddy system, as mentationed before: no buddy memory in that
>> assigned node left, a new slub might be alloced from the buddy system
>> of other node directly, no matter whether there is free partil memory
>> left on other node. As a result slub segmentation generated.
>> 
>> The key point of above allocation flow is: the slab should be alloced
>> from the partial of other node first, instead of the buddy system of
>> other node directly.
>> 
>> In this commit a new slub allocation flow is proposed:
>> 1. Attempt to get a slab via function get_partial (first step in
>> new_objects lable).
>> 2. If no slab is gotten and 'node' is assigned, try to alloc a new
>> slab just from the assigned node instead of all node.
>> 3. If no slab could be alloced from the assigned node, try to alloc
>> slub from partial of other node.
>> 4. If the alloctation in step 3 fails, alloc a new slub from buddy
>> system of all node.
> 
> FYI, there is another patch to the very same problem:
> 
> https://lore.kernel.org/all/20240330082335.29710-1-chenjun102@huawei.com/

Yeah and I have just taken that one to slab/for-6.10

>> 
>> Signed-off-by: Ming Yang <yangming73@huawei.com>
>> Signed-off-by: Liang Zhang <zhangliang5@huawei.com>
>> Signed-off-by: Zhigang Wang <wangzhigang17@huawei.com>
>> Reviewed-by: Shixin Liu <liushixin2@huawei.com>
>> ---
>> This patch can be tested and verified by following steps:
>> 1. First, try to run out memory on node0. echo 1000(depending on your memory) > 
>> /sys/devices/system/node/node0/hugepages/hugepages-2048kB/nr_hugepages.
>> 2. Second, boot 10000(depending on your memory) processes which use setsid 
>> systemcall, as the setsid systemcall may likely call function 
>> alloc_fair_sched_group.
>> 3. Last, check slabinfo, cat /proc/slabinfo.
>> 
>> Hardware info:
>> Memory : 8GiB
>> CPU (total #): 120
>> numa node: 4
>> 
>> Test clang code example:
>> int main() {
>>     void *p = malloc(1024);
>>     setsid();
>>     while(1);
>> }
>> 
>>  mm/slub.c | 11 +++++++++++
>>  1 file changed, 11 insertions(+)
>> 
>> diff --git a/mm/slub.c b/mm/slub.c
>> index 1bb2a93cf7..3eb2e7d386 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -3522,7 +3522,18 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node,
>>  	}
>>  
>>  	slub_put_cpu_ptr(s->cpu_slab);
>> +	if (node != NUMA_NO_NODE) {
>> +		slab = new_slab(s, gfpflags | __GFP_THISNODE, node);
>> +		if (slab)
>> +			goto slab_alloced;
>> +
>> +		slab = get_any_partial(s, &pc);
>> +		if (slab)
>> +			goto slab_alloced;
>> +	}
>>  	slab = new_slab(s, gfpflags, node);
>> +
>> +slab_alloced:
>>  	c = slub_get_cpu_ptr(s->cpu_slab);
>>  
>>  	if (unlikely(!slab)) {



  reply	other threads:[~2024-04-02 16:13 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-04-02  3:10 Ming Yang
2024-04-02  3:45 ` Chengming Zhou
2024-04-02 16:13   ` Vlastimil Babka [this message]
2024-04-04 19:12 ` Christoph Lameter (Ampere)
2024-04-05  9:05   ` Vlastimil Babka

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cd9d9890-0259-4039-917b-0e517a388433@suse.cz \
    --to=vbabka@suse.cz \
    --cc=42.hyeyoo@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=alex.chen@huawei.com \
    --cc=chengming.zhou@linux.dev \
    --cc=cl@linux.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liushixin2@huawei.com \
    --cc=penberg@kernel.org \
    --cc=pengyi.pengyi@huawei.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=wangzhigang17@huawei.com \
    --cc=xiqi2@huawei.com \
    --cc=yangming73@huawei.com \
    --cc=zhangliang5@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox