linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: David Hildenbrand <david@redhat.com>
To: yangge1116@126.com, akpm@linux-foundation.org
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	21cnbao@gmail.com, baolin.wang@linux.alibaba.com,
	aisheng.dong@nxp.com, liuzixing@hygon.cn
Subject: Re: [PATCH V2] mm/cma: using per-CMA locks to improve concurrent allocation performance
Date: Mon, 10 Feb 2025 09:34:46 +0100	[thread overview]
Message-ID: <51514ffd-a1ca-4eb4-90ab-c6871ab92c95@redhat.com> (raw)
In-Reply-To: <1739152566-744-1-git-send-email-yangge1116@126.com>

On 10.02.25 02:56, yangge1116@126.com wrote:
> From: yangge <yangge1116@126.com>
> 
> For different CMAs, concurrent allocation of CMA memory ideally should not
> require synchronization using locks. Currently, a global cma_mutex lock is
> employed to synchronize all CMA allocations, which can impact the
> performance of concurrent allocations across different CMAs.
> 
> To test the performance impact, follow these steps:
> 1. Boot the kernel with the command line argument hugetlb_cma=30G to
>     allocate a 30GB CMA area specifically for huge page allocations. (note:
>     on my machine, which has 3 nodes, each node is initialized with 10G of
>     CMA)
> 2. Use the dd command with parameters if=/dev/zero of=/dev/shm/file bs=1G
>     count=30 to fully utilize the CMA area by writing zeroes to a file in
>     /dev/shm.
> 3. Open three terminals and execute the following commands simultaneously:
>     (Note: Each of these commands attempts to allocate 10GB [2621440 * 4KB
>     pages] of CMA memory.)
>     On Terminal 1: time echo 2621440 > /sys/kernel/debug/cma/hugetlb1/alloc
>     On Terminal 2: time echo 2621440 > /sys/kernel/debug/cma/hugetlb2/alloc
>     On Terminal 3: time echo 2621440 > /sys/kernel/debug/cma/hugetlb3/alloc
> 

Hi,

I'm curious, what is the real workload you are trying to optimize for? I 
assume this example here is just to have some measurement.

Is concurrency within a single CMA area also a problem for your use case?


> We attempt to allocate pages through the CMA debug interface and use the
> time command to measure the duration of each allocation.
> Performance comparison:
>               Without this patch      With this patch
> Terminal1        ~7s                     ~7s
> Terminal2       ~14s                     ~8s
> Terminal3       ~21s                     ~7s
> 
> To slove problem above, we could use per-CMA locks to improve concurrent
> allocation performance. This would allow each CMA to be managed
> independently, reducing the need for a global lock and thus improving
> scalability and performance.
> 
> Signed-off-by: yangge <yangge1116@126.com>
> ---
> 
> V2:
> - update code and message suggested by Barry.
> 
>   mm/cma.c | 7 ++++---
>   mm/cma.h | 1 +
>   2 files changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/cma.c b/mm/cma.c
> index 34a4df2..a0d4d2f 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -34,7 +34,6 @@
>   
>   struct cma cma_areas[MAX_CMA_AREAS];
>   unsigned int cma_area_count;
> -static DEFINE_MUTEX(cma_mutex);
>   
>   static int __init __cma_declare_contiguous_nid(phys_addr_t base,
>   			phys_addr_t size, phys_addr_t limit,
> @@ -175,6 +174,8 @@ static void __init cma_activate_area(struct cma *cma)
>   
>   	spin_lock_init(&cma->lock);
>   
> +	mutex_init(&cma->alloc_mutex);
> +
>   #ifdef CONFIG_CMA_DEBUGFS
>   	INIT_HLIST_HEAD(&cma->mem_head);
>   	spin_lock_init(&cma->mem_head_lock);
> @@ -813,9 +814,9 @@ static int cma_range_alloc(struct cma *cma, struct cma_memrange *cmr,
>   		spin_unlock_irq(&cma->lock);
>   
>   		pfn = cmr->base_pfn + (bitmap_no << cma->order_per_bit);
> -		mutex_lock(&cma_mutex);
> +		mutex_lock(&cma->alloc_mutex);
>   		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
> -		mutex_unlock(&cma_mutex);
> +		mutex_unlock(&cma->alloc_mutex);


As raised, a better approach might be to return -EAGAIN  in case we hit 
an isolated pageblock and deal with that more gracefully here (e.g., try 
another block, or retry this one if there are not others left, ...)

In any case, this change here looks like an improvement.

Acked-by: David Hildenbrand <david@redhat.com>

-- 
Cheers,

David / dhildenb



  parent reply	other threads:[~2025-02-10  8:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-02-10  1:56 yangge1116
2025-02-10  3:44 ` Barry Song
2025-02-10  8:34 ` David Hildenbrand [this message]
2025-02-10  8:56   ` Ge Yang
2025-02-10  9:15 ` Oscar Salvador
2025-03-18  3:43 ` Andrew Morton
2025-03-18  7:21   ` Ge Yang
2025-03-18 13:02   ` David Hildenbrand

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=51514ffd-a1ca-4eb4-90ab-c6871ab92c95@redhat.com \
    --to=david@redhat.com \
    --cc=21cnbao@gmail.com \
    --cc=aisheng.dong@nxp.com \
    --cc=akpm@linux-foundation.org \
    --cc=baolin.wang@linux.alibaba.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liuzixing@hygon.cn \
    --cc=yangge1116@126.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox