From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E9456C282EC for ; Tue, 18 Mar 2025 07:21:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E0BD5280006; Tue, 18 Mar 2025 03:21:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB933280001; Tue, 18 Mar 2025 03:21:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C8167280006; Tue, 18 Mar 2025 03:21:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id AAC09280001 for ; Tue, 18 Mar 2025 03:21:52 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 0A5421A0CF1 for ; Tue, 18 Mar 2025 07:21:54 +0000 (UTC) X-FDA: 83233827348.15.FF6EEE2 Received: from m16.mail.126.com (m16.mail.126.com [220.197.31.8]) by imf07.hostedemail.com (Postfix) with ESMTP id 6FB5740002 for ; Tue, 18 Mar 2025 07:21:51 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b="mJ4/1SIg"; spf=pass (imf07.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742282512; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xAv74FX3DoajPloCyOTTZZeRXTUHncsx5Pnyk67rO28=; b=2tB3tQzqLlXyO0aaa26kw672xwTdUX45H9rtxVa1Na8gTFdZ9jAgiZAx58ULaMnOoVUDFf zj3ghvyz77OmYgSnyNnS4LFValLeA5oSA+gtA9pyGWqNzfronCxETtwoN3J74IHXUqZqN/ eZBoiomf8+R+4TL8ARN+LDA1aYrHlVI= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742282512; a=rsa-sha256; cv=none; b=zb7ZqlQAw2WFdh5KBGioBDPX4KRIWkfE0xpaSeYN+vChdoqqFp8twsV6u5VfGtkqtVTK6Q p++tOiLYSi7SHWrFRhSfYMxi3Tsv/AsVcCNN9v5+UKiKH6f8xzSPWZZO3zG369EFl/INRY fK7ygmjdCvKmQXzUAKTeYF3Dfqh4jws= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=126.com header.s=s110527 header.b="mJ4/1SIg"; spf=pass (imf07.hostedemail.com: domain of yangge1116@126.com designates 220.197.31.8 as permitted sender) smtp.mailfrom=yangge1116@126.com; dmarc=pass (policy=none) header.from=126.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=126.com; s=s110527; h=Message-ID:Date:MIME-Version:Subject:From: Content-Type; bh=xAv74FX3DoajPloCyOTTZZeRXTUHncsx5Pnyk67rO28=; b=mJ4/1SIg/aUMGV0gn9/9PCKruF2BIM3Yg1wS5VVSX04zaBkBte77Evc1xa0RIF kvz6MEj1iofetvtwEkoIELZIIrHP2OaneYo4lOjN4f1q+FoNTMt5QpmtNcKP3dvi CsRCBeWIXuVkkyMPkP7iTGZLxd3z9o10GJL28BxyHBtxU= Received: from [172.19.20.199] (unknown []) by gzsmtp3 (Coremail) with SMTP id PikvCgD3jrsGH9lnEPZYCA--.20701S2; Tue, 18 Mar 2025 15:21:43 +0800 (CST) Message-ID: <09057869-eb32-45dd-a7a1-9b7e1850eb11@126.com> Date: Tue, 18 Mar 2025 15:21:42 +0800 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V2] mm/cma: using per-CMA locks to improve concurrent allocation performance To: Andrew Morton Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, 21cnbao@gmail.com, david@redhat.com, baolin.wang@linux.alibaba.com, aisheng.dong@nxp.com, liuzixing@hygon.cn References: <1739152566-744-1-git-send-email-yangge1116@126.com> <20250317204325.99b45373023ad2f901c1152e@linux-foundation.org> From: Ge Yang In-Reply-To: <20250317204325.99b45373023ad2f901c1152e@linux-foundation.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-CM-TRANSID:PikvCgD3jrsGH9lnEPZYCA--.20701S2 X-Coremail-Antispam: 1Uf129KBjvJXoWxKFyxAF1fGrWxtFWxAF4Utwb_yoWxJFy5pF W8GFyDCr98Xry7Aw42k34DuF9a9ws7WFW7KFyjva4xZFnxCr90grs5tFy5u3y8urZrWFy0 vryjqasrZw1UZ3DanT9S1TB71UUUUU7qnTZGkaVYY2UrUUUUjbIjqfuFe4nvWSU5nxnvy2 9KBjDUYxBIdaVFxhVjvjDU0xZFpf9x07jYGQhUUUUU= X-Originating-IP: [112.64.138.194] X-CM-SenderInfo: 51dqwwjhrrila6rslhhfrp/1tbiIh0UG2fZFYW67QAAsJ X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 6FB5740002 X-Stat-Signature: ir465qudhbhgsmh3agcbadn8itj4mnqz X-HE-Tag: 1742282511-279987 X-HE-Meta: U2FsdGVkX1/a5yviOkiQ29J6sQsPuMKLULgGZMCQlYL1TLPBfeEoveuNdaN85Ry6aMZMV12MY2F4FYyHZwq4mDQ7Wgrek0KKrmvWNw5CZH6DIqVLwAP1HakVgdnQnFevhX0prGSf4XSb0bnfsrmAHIbGuLuHM/BcOhcrx/Pacm2vswm8Slj6VKGTgHy39+3tLcZLo47WEqkFuo/D7ZIBuyH5mFNFE8axooslGgfxOKEW5JQ1hrzNcxPHj2/piXnDhk3cYwBVu9ItgS3ndQODhFwf0eJxKHF1YGflEsoMOTYEf046JxuoOBz+ppdn8Z5BlQhueS23W03YXka99pPhr8LzfKu7LJSxCO1kzXHjCAtS+GZPYjI3KPmWOANGmfolziAFOA9fwC7Jfdb3MuJRzm87pZpw8hl0lvG2MjHSRcn7UlZoF6Ra1L8Ft4DOXPiQKwcjB8NJahZVdpoXFCCJHHgwzcnf1aOpX9XsWDSQkf2oS5FijEniBCP5ckVrIx4lxkIDzyqqlI/kleJ3SeE+0vdytlZLAj8bsVlT1KGS7FvXQ3fkJV5Ijve7XpBelmZkNtWWrlmZSCrY/F6VoKJEVROqh9FiJ4s1wzq8KiYggVvvcYg2EhDQhuPe5wNlJukB1V3rwGdJTpjuIgMJkEIv+tgZrq8OtCYIqtoe7fm69+I+R2PJV2+K7LbR2QSvHpuGTn4FT9R9011djiblxAN3ZAG2bnas0Q1rx5HBDFJvzc5kAnwDhZ/SeoXErlwZ+3IHvJYsjETAIrbRlWezGJ4L7dPemo7XGoESq7AMuS854XIAIspnu4lkNdGuLqvpTBgBCRUNld81AJTUTssQZq2mwDMHnvLVYTtCW1gw7J+YA+h2o3xlKe3oQOK5kLE+EkrbEYwJR5K4NRAuSSaGUhwzS3KjFPSD3LhaMk772y7q2OK3OzAQTByChOuZxHtf4zFvbVDlz/PfC6k2pdYJOur kBPVaJE0 Mn1hEEScfzqFKsmyh5PE8bkB4yPLOdocVn0VQ6C5Y2Daf3SBNpOYgu/NCmX71CqY3cF7GTIe5gCt4XfJNMkOh+C0H8CxdCURVaSRjKgaaFRzPCeyv3uRNMEpOAm1yeFdaNha9iAqRafJiuacQJ1tP9sYZk8LHSSa4DdXff0dXZqVziHVXhAu1+NjPlFrbit7lv7V25Xcd3e8FcAUReVnX0IM0RVWLb4eZouKtOAaNtUMNMD1tJ1f7+UtnmchFUDnfjRiQrTA5JH9k53X2nRaMgT8jx/xPLD7w1iTac5rfX9xAS2O8USC2IwjN5fX35gtUlJm4F9IGarl6QuxQiZ4u/0l2Xugs6dQcKuGGzO2wxnUvcQQU+Lx+bHdouzPlMfcY4iaLOcCXUiCtJpy5qPYVnEdL4Wls/0+5q0DqW/ct3twvPVudRhKnLIx0z/D56/X+IJPs3oStnTMEECQhJp/67fXYjaHPLMh+iiA9ZbbViwIsr7F9GM0P0Kd5fkzd4aAn4A+RvBNV76gqcNKIFdNUot+tgEG47ktpq3cF9bAky6oLL8JP6c3ME2Ho2zZkdqulicBY X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: 在 2025/3/18 11:43, Andrew Morton 写道: > On Mon, 10 Feb 2025 09:56:06 +0800 yangge1116@126.com wrote: > >> From: yangge >> >> For different CMAs, concurrent allocation of CMA memory ideally should not >> require synchronization using locks. Currently, a global cma_mutex lock is >> employed to synchronize all CMA allocations, which can impact the >> performance of concurrent allocations across different CMAs. >> >> To test the performance impact, follow these steps: >> 1. Boot the kernel with the command line argument hugetlb_cma=30G to >> allocate a 30GB CMA area specifically for huge page allocations. (note: >> on my machine, which has 3 nodes, each node is initialized with 10G of >> CMA) >> 2. Use the dd command with parameters if=/dev/zero of=/dev/shm/file bs=1G >> count=30 to fully utilize the CMA area by writing zeroes to a file in >> /dev/shm. >> 3. Open three terminals and execute the following commands simultaneously: >> (Note: Each of these commands attempts to allocate 10GB [2621440 * 4KB >> pages] of CMA memory.) >> On Terminal 1: time echo 2621440 > /sys/kernel/debug/cma/hugetlb1/alloc >> On Terminal 2: time echo 2621440 > /sys/kernel/debug/cma/hugetlb2/alloc >> On Terminal 3: time echo 2621440 > /sys/kernel/debug/cma/hugetlb3/alloc >> >> We attempt to allocate pages through the CMA debug interface and use the >> time command to measure the duration of each allocation. >> Performance comparison: >> Without this patch With this patch >> Terminal1 ~7s ~7s >> Terminal2 ~14s ~8s >> Terminal3 ~21s ~7s >> >> To slove problem above, we could use per-CMA locks to improve concurrent >> allocation performance. This would allow each CMA to be managed >> independently, reducing the need for a global lock and thus improving >> scalability and performance. > > This patch was in and out of mm-unstable for a while, as Frank's series > "hugetlb/CMA improvements for large systems" was being added and > dropped. > > Consequently it hasn't received any testing for a while. > > Below is the version which I've now re-added to mm-unstable. Can > you please check this and retest it? Based on the latest mm-unstable code, after applying the patch and conducting tests, it works normally. Thanks. > > Thanks. > > From: Ge Yang > Subject: mm/cma: using per-CMA locks to improve concurrent allocation performance > Date: Mon, 10 Feb 2025 09:56:06 +0800 > > For different CMAs, concurrent allocation of CMA memory ideally should not > require synchronization using locks. Currently, a global cma_mutex lock > is employed to synchronize all CMA allocations, which can impact the > performance of concurrent allocations across different CMAs. > > To test the performance impact, follow these steps: > 1. Boot the kernel with the command line argument hugetlb_cma=30G to > allocate a 30GB CMA area specifically for huge page allocations. (note: > on my machine, which has 3 nodes, each node is initialized with 10G of > CMA) > 2. Use the dd command with parameters if=/dev/zero of=/dev/shm/file bs=1G > count=30 to fully utilize the CMA area by writing zeroes to a file in > /dev/shm. > 3. Open three terminals and execute the following commands simultaneously: > (Note: Each of these commands attempts to allocate 10GB [2621440 * 4KB > pages] of CMA memory.) > On Terminal 1: time echo 2621440 > /sys/kernel/debug/cma/hugetlb1/alloc > On Terminal 2: time echo 2621440 > /sys/kernel/debug/cma/hugetlb2/alloc > On Terminal 3: time echo 2621440 > /sys/kernel/debug/cma/hugetlb3/alloc > > We attempt to allocate pages through the CMA debug interface and use the > time command to measure the duration of each allocation. > Performance comparison: > Without this patch With this patch > Terminal1 ~7s ~7s > Terminal2 ~14s ~8s > Terminal3 ~21s ~7s > > To solve problem above, we could use per-CMA locks to improve concurrent > allocation performance. This would allow each CMA to be managed > independently, reducing the need for a global lock and thus improving > scalability and performance. > > Link: https://lkml.kernel.org/r/1739152566-744-1-git-send-email-yangge1116@126.com > Signed-off-by: Ge Yang > Reviewed-by: Barry Song > Acked-by: David Hildenbrand > Reviewed-by: Oscar Salvador > Cc: Aisheng Dong > Cc: Baolin Wang > Signed-off-by: Andrew Morton > --- > > mm/cma.c | 7 ++++--- > mm/cma.h | 1 + > 2 files changed, 5 insertions(+), 3 deletions(-) > > --- a/mm/cma.c~mm-cma-using-per-cma-locks-to-improve-concurrent-allocation-performance > +++ a/mm/cma.c > @@ -34,7 +34,6 @@ > > struct cma cma_areas[MAX_CMA_AREAS]; > unsigned int cma_area_count; > -static DEFINE_MUTEX(cma_mutex); > > static int __init __cma_declare_contiguous_nid(phys_addr_t base, > phys_addr_t size, phys_addr_t limit, > @@ -175,6 +174,8 @@ static void __init cma_activate_area(str > > spin_lock_init(&cma->lock); > > + mutex_init(&cma->alloc_mutex); > + > #ifdef CONFIG_CMA_DEBUGFS > INIT_HLIST_HEAD(&cma->mem_head); > spin_lock_init(&cma->mem_head_lock); > @@ -813,9 +814,9 @@ static int cma_range_alloc(struct cma *c > spin_unlock_irq(&cma->lock); > > pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit); > - mutex_lock(&cma_mutex); > + mutex_lock(&cma->alloc_mutex); > ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp); > - mutex_unlock(&cma_mutex); > + mutex_unlock(&cma->alloc_mutex); > if (ret == 0) { > page = pfn_to_page(pfn); > break; > --- a/mm/cma.h~mm-cma-using-per-cma-locks-to-improve-concurrent-allocation-performance > +++ a/mm/cma.h > @@ -39,6 +39,7 @@ struct cma { > unsigned long available_count; > unsigned int order_per_bit; /* Order of pages represented by one bit */ > spinlock_t lock; > + struct mutex alloc_mutex; > #ifdef CONFIG_CMA_DEBUGFS > struct hlist_head mem_head; > spinlock_t mem_head_lock; > _