From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 00864C43334 for ; Sun, 26 Jun 2022 16:48:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 407E38E0002; Sun, 26 Jun 2022 12:48:38 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 390498E0001; Sun, 26 Jun 2022 12:48:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 20B788E0002; Sun, 26 Jun 2022 12:48:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 0AB118E0001 for ; Sun, 26 Jun 2022 12:48:38 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id D5931344CC for ; Sun, 26 Jun 2022 16:48:37 +0000 (UTC) X-FDA: 79620970674.11.65820D2 Received: from out30-45.freemail.mail.aliyun.com (out30-45.freemail.mail.aliyun.com [115.124.30.45]) by imf14.hostedemail.com (Postfix) with ESMTP id 60C3410001D for ; Sun, 26 Jun 2022 16:48:35 +0000 (UTC) X-Alimail-AntiSpam:AC=PASS;BC=-1|-1;BR=01201311R131e4;CH=green;DM=||false|;DS=||;FP=0|-1|-1|-1|0|-1|-1|-1;HT=ay29a033018046051;MF=rongwei.wang@linux.alibaba.com;NM=1;PH=DS;RN=11;SR=0;TI=SMTPD_---0VHObQK7_1656262106; Received: from 30.30.86.254(mailfrom:rongwei.wang@linux.alibaba.com fp:SMTPD_---0VHObQK7_1656262106) by smtp.aliyun-inc.com; Mon, 27 Jun 2022 00:48:28 +0800 Message-ID: <20a62094-8ec4-f9b9-c7ac-20c0dfc2d283@linux.alibaba.com> Date: Mon, 27 Jun 2022 00:48:26 +0800 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0) Gecko/20100101 Thunderbird/102.0 Subject: Re: [PATCH 1/3] mm/slub: fix the race between validate_slab and slab_free Content-Language: en-US To: Christoph Lameter Cc: David Rientjes , songmuchun@bytedance.com, Hyeonggon Yoo <42.hyeyoo@gmail.com>, akpm@linux-foundation.org, vbabka@suse.cz, roman.gushchin@linux.dev, iamjoonsoo.kim@lge.com, penberg@kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <20220529081535.69275-1-rongwei.wang@linux.alibaba.com> <9794df4f-3ffe-4e99-0810-a1346b139ce8@linux.alibaba.com> <29723aaa-5e28-51d3-7f87-9edf0f7b9c33@linux.alibaba.com> <02298c0e-3293-9deb-f1ed-6d8862f7c349@linux.alibaba.com> <5085437c-adc9-b6a3-dbd8-91dc0856cf19@linux.alibaba.com> <1b434d4c-2a19-9ac1-b2b9-b767b642ec0c@linux.alibaba.com> From: Rongwei Wang In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656262117; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=TZoaKXAOKt8522EhIU9IN7I7C+22GgBA8i9F1Lso1E4=; b=o9qvM0/aBNSPh7Ty+/k80PyfRlVhg37Mg8jbjCfo9Ua+1tN67V0vz1FNcYFEkLbjFUAjBv mH8hItFg3Sa1k5MVVEzjxvMlxCVAnfbstgEUj6V7tUGqL+Qply7AsQZ4eZlsFNP7MneRgp PtsKlGO6nCBpe9MMqWX59hcArUjg6B4= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656262117; a=rsa-sha256; cv=none; b=BVMrkaaAriUqq9iUrybTAtaVZ0WmQG+gPcuNBDVXAZcNjG5jJybmmcltOOq9swMCurRorR fozxnJFatuqicNQt+PnswwIjjU0wtTUvyOOn3IdCnR/yplg0QO1FQYOXLKCmZooMXVyeMh w7U0jRkxNgaDjcuFSfue29t+mGoyESo= X-Stat-Signature: 35aox6qdj9w3pf6ut5fpucsdc5b883u1 X-Rspamd-Queue-Id: 60C3410001D X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=none; spf=pass (imf14.hostedemail.com: domain of rongwei.wang@linux.alibaba.com designates 115.124.30.45 as permitted sender) smtp.mailfrom=rongwei.wang@linux.alibaba.com; dmarc=pass (policy=none) header.from=alibaba.com X-Rspamd-Server: rspam02 X-HE-Tag: 1656262115-548938 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 6/20/22 7:57 PM, Christoph Lameter wrote: > On Sat, 18 Jun 2022, Rongwei Wang wrote: > >>> Well the cycle reduction is strange. Tests are not done in the same >>> environment? Maybe good to not use NUMA or bind to the same cpu >> It's the same environment. I can sure. And there are four nodes (32G per-node >> and 8 cores per-node) in my test environment. whether I need to test in one >> node? If right, I can try. > > Ok in a NUMA environment the memory allocation is randomized on bootup. > You may get different numbers after you reboot the system. Try to switch > NUMA off. Use s a single node to get consistent numbers. Sorry for late reply. At first, let me share my test environment: arm64 VM (32 cores and 128G memory), and I only configure one node for this VM. Plus, I had use 'numactl -N 0 -m 0 qemu-kvm ...' to start this VM. It mainly to avoid data jitter. And I can sure my physical machine where this VM is located has same configuration when I tested prototype kernel and patched kernel. If above test environment has any problems, please let me know. The following is the latest data: Single thread testing 1. Kmalloc: Repeatedly allocate then free test before fix kmalloc kfree kmalloc kfree 10000 times 8 4 cycles 5 cycles 4 cycles 5 cycles 10000 times 16 3 cycles 5 cycles 3 cycles 5 cycles 10000 times 32 3 cycles 5 cycles 3 cycles 5 cycles 10000 times 64 3 cycles 5 cycles 3 cycles 5 cycles 10000 times 128 3 cycles 5 cycles 3 cycles 5 cycles 10000 times 256 14 cycles 9 cycles 6 cycles 8 cycles 10000 times 512 9 cycles 8 cycles 9 cycles 10 cycles 10000 times 1024 48 cycles 10 cycles 6 cycles 10 cycles 10000 times 2048 31 cycles 12 cycles 35 cycles 13 cycles 10000 times 4096 96 cycles 17 cycles 96 cycles 18 cycles 10000 times 8192 188 cycles 27 cycles 190 cycles 27 cycles 10000 times 16384 117 cycles 38 cycles 115 cycles 38 cycles 2. Kmalloc: alloc/free test before fix 10000 times kmalloc(8)/kfree 3 cycles 3 cycles 10000 times kmalloc(16)/kfree 3 cycles 3 cycles 10000 times kmalloc(32)/kfree 3 cycles 3 cycles 10000 times kmalloc(64)/kfree 3 cycles 3 cycles 10000 times kmalloc(128)/kfree 3 cycles 3 cycles 10000 times kmalloc(256)/kfree 3 cycles 3 cycles 10000 times kmalloc(512)/kfree 3 cycles 3 cycles 10000 times kmalloc(1024)/kfree 3 cycles 3 cycles 10000 times kmalloc(2048)/kfree 3 cycles 3 cycles 10000 times kmalloc(4096)/kfree 3 cycles 3 cycles 10000 times kmalloc(8192)/kfree 3 cycles 3 cycles 10000 times kmalloc(16384)/kfree 33 cycles 33 cycles Concurrent allocs before fix Kmalloc N*alloc N*free(8) Average=13/14 Average=14/15 Kmalloc N*alloc N*free(16) Average=13/15 Average=13/15 Kmalloc N*alloc N*free(32) Average=13/15 Average=13/15 Kmalloc N*alloc N*free(64) Average=13/15 Average=13/15 Kmalloc N*alloc N*free(128) Average=13/15 Average=13/15 Kmalloc N*alloc N*free(256) Average=137/29 Average=134/39 Kmalloc N*alloc N*free(512) Average=61/29 Average=64/28 Kmalloc N*alloc N*free(1024) Average=465/50 Average=656/55 Kmalloc N*alloc N*free(2048) Average=503/97 Average=422/97 Kmalloc N*alloc N*free(4096) Average=1592/206 Average=1624/207 Kmalloc N*(alloc free)(8) Average=3 Average=3 Kmalloc N*(alloc free)(16) Average=3 Average=3 Kmalloc N*(alloc free)(32) Average=3 Average=3 Kmalloc N*(alloc free)(64) Average=3 Average=3 Kmalloc N*(alloc free)(128) Average=3 Average=3 Kmalloc N*(alloc free)(256) Average=3 Average=3 Kmalloc N*(alloc free)(512) Average=3 Average=3 Kmalloc N*(alloc free)(1024) Average=3 Average=3 Kmalloc N*(alloc free)(2048) Average=3 Average=3 Kmalloc N*(alloc free)(4096) Average=3 Average=3 Can the above data indicate that this modification (only works when kmem_cache_debug(s) is true) does not introduce significant performance impact? Thanks for your time. > > It maybe useful to figure out what memory structure causes the increase in > latency in a NUMA environment. If you can figure that out and properly > allocate the memory structure that causes the increases in latency then > you may be able to increase the performance of the allocator.