[PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
@ 2025-01-24 11:21 yangge1116
  2025-01-27 23:04 ` Andrew Morton
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: yangge1116 @ 2025-01-24 11:21 UTC (permalink / raw)
  To: akpm
  Cc: linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	aisheng.dong, liuzixing, yangge

From: yangge <yangge1116@126.com>

Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
simply reverts to the original method of using the cma_mutex to ensure
that alloc_contig_range() runs sequentially. This change was made to avoid
concurrency allocation failures. However, it can negatively impact
performance when concurrent allocation of CMA memory is required.

To address this issue, we could introduce an API for concurrency settings,
allowing users to decide whether their CMA can perform concurrent memory
allocations or not.

Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
Signed-off-by: yangge <yangge1116@126.com>
Cc: <stable@vger.kernel.org>
---
 include/linux/cma.h |  2 ++
 mm/cma.c            | 22 ++++++++++++++++++++--
 mm/cma.h            |  1 +
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index d15b64f..2384624 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
 
 extern void cma_reserve_pages_on_error(struct cma *cma);
 
+extern bool cma_set_concurrency(struct cma *cma, bool concurrency);
+
 #ifdef CONFIG_CMA
 struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
 bool cma_free_folio(struct cma *cma, const struct folio *folio);
diff --git a/mm/cma.c b/mm/cma.c
index de5bc0c..49a7186 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
 		spin_unlock_irq(&cma->lock);
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
-		mutex_lock(&cma_mutex);
+
+		/*
+		 * If the user sets the concurr_alloc of CMA to true, concurrent
+		 * memory allocation is allowed. If the user sets it to false or
+		 * does not set it, concurrent memory allocation is not allowed.
+		 */
+		if (!cma->concurr_alloc)
+			mutex_lock(&cma_mutex);
 		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
-		mutex_unlock(&cma_mutex);
+		if (!cma->concurr_alloc)
+			mutex_unlock(&cma_mutex);
 		if (ret == 0) {
 			page = pfn_to_page(pfn);
 			break;
@@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
 
 	return 0;
 }
+
+bool cma_set_concurrency(struct cma *cma, bool concurrency)
+{
+	if (!cma)
+		return false;
+
+	cma->concurr_alloc = concurrency;
+
+	return true;
+}
diff --git a/mm/cma.h b/mm/cma.h
index 8485ef8..30f489d 100644
--- a/mm/cma.h
+++ b/mm/cma.h
@@ -16,6 +16,7 @@ struct cma {
 	unsigned long   *bitmap;
 	unsigned int order_per_bit; /* Order of pages represented by one bit */
 	spinlock_t	lock;
+	bool concurr_alloc;
 #ifdef CONFIG_CMA_DEBUGFS
 	struct hlist_head mem_head;
 	spinlock_t mem_head_lock;
-- 
2.7.4



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
  2025-01-24 11:21 [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA yangge1116
@ 2025-01-27 23:04 ` Andrew Morton
  2025-02-08  8:19   ` Ge Yang
  2025-01-28  6:11 ` Christoph Hellwig
  2025-01-28  9:58 ` Barry Song
  2 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2025-01-27 23:04 UTC (permalink / raw)
  To: yangge1116
  Cc: linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	aisheng.dong, liuzixing

On Fri, 24 Jan 2025 19:21:27 +0800 yangge1116@126.com wrote:

> From: yangge <yangge1116@126.com>
> 
> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
> simply reverts to the original method of using the cma_mutex to ensure
> that alloc_contig_range() runs sequentially. This change was made to avoid
> concurrency allocation failures. However, it can negatively impact
> performance when concurrent allocation of CMA memory is required.
> 
> To address this issue, we could introduce an API for concurrency settings,
> allowing users to decide whether their CMA can perform concurrent memory
> allocations or not.

The term "users" tends to refer to userspace code.  Here I'm thinking
you mean in-kernel code, so a better term to use is "callers".

This new interface has no callers.  We prefer not to merge unused code!
Please send along the patch which calls cma_set_concurrency() so we
can better understand this proposal and so that the new code is
testable.  In fact the patch has cc:stable, which makes things
stranger.  Why should the -stable maintainers merge a patch which
doesn't do anything?

And please quantify the benefit.  "negatively impact" is too vague. 
How much benefit can we expect our users to see from this?  Some
runtime testing results would be good.

And please describe in more detail why this particular caller doesn't
require concurrency protection.  And help other developers understand
when it is safe for them to use concurr_alloc==false.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
  2025-01-24 11:21 [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA yangge1116
  2025-01-27 23:04 ` Andrew Morton
@ 2025-01-28  6:11 ` Christoph Hellwig
  2025-01-28  9:58 ` Barry Song
  2 siblings, 0 replies; 9+ messages in thread
From: Christoph Hellwig @ 2025-01-28  6:11 UTC (permalink / raw)
  To: yangge1116
  Cc: akpm, linux-mm, linux-kernel, stable, 21cnbao, david,
	baolin.wang, aisheng.dong, liuzixing

On Fri, Jan 24, 2025 at 07:21:27PM +0800, yangge1116@126.com wrote:
> From: yangge <yangge1116@126.com>
> 
> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
> simply reverts to the original method of using the cma_mutex to ensure
> that alloc_contig_range() runs sequentially. This change was made to avoid
> concurrency allocation failures. However, it can negatively impact
> performance when concurrent allocation of CMA memory is required.
> 
> To address this issue, we could introduce an API for concurrency settings,
> allowing users to decide whether their CMA can perform concurrent memory
> allocations or not.
> 
> Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
> Signed-off-by: yangge <yangge1116@126.com>
> Cc: <stable@vger.kernel.org>

Umm, you're adding new unused functions while not even reporting what
the problem is.  This looks sketchy as hell and surely is not a stable
candidate.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
  2025-01-24 11:21 [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA yangge1116
  2025-01-27 23:04 ` Andrew Morton
  2025-01-28  6:11 ` Christoph Hellwig
@ 2025-01-28  9:58 ` Barry Song
  2025-02-08  8:50   ` Ge Yang
  2 siblings, 1 reply; 9+ messages in thread
From: Barry Song @ 2025-01-28  9:58 UTC (permalink / raw)
  To: yangge1116
  Cc: akpm, linux-mm, linux-kernel, stable, david, baolin.wang,
	aisheng.dong, liuzixing

On Sat, Jan 25, 2025 at 12:21 AM <yangge1116@126.com> wrote:
>
> From: yangge <yangge1116@126.com>
>
> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
> simply reverts to the original method of using the cma_mutex to ensure
> that alloc_contig_range() runs sequentially. This change was made to avoid
> concurrency allocation failures. However, it can negatively impact
> performance when concurrent allocation of CMA memory is required.

Do we have some data?

>
> To address this issue, we could introduce an API for concurrency settings,
> allowing users to decide whether their CMA can perform concurrent memory
> allocations or not.

Who is the intended user of cma_set_concurrency? I also feel it is somewhat
unsafe since cma->concurr_alloc is not protected by any locks.

Will a user setting cma->concurr_alloc = 1 encounter the original issue that
commit 60a60e32cf91 was attempting to fix?

>
> Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
> Signed-off-by: yangge <yangge1116@126.com>
> Cc: <stable@vger.kernel.org>
> ---
>  include/linux/cma.h |  2 ++
>  mm/cma.c            | 22 ++++++++++++++++++++--
>  mm/cma.h            |  1 +
>  3 files changed, 23 insertions(+), 2 deletions(-)
>
> diff --git a/include/linux/cma.h b/include/linux/cma.h
> index d15b64f..2384624 100644
> --- a/include/linux/cma.h
> +++ b/include/linux/cma.h
> @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
>
>  extern void cma_reserve_pages_on_error(struct cma *cma);
>
> +extern bool cma_set_concurrency(struct cma *cma, bool concurrency);
> +
>  #ifdef CONFIG_CMA
>  struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
>  bool cma_free_folio(struct cma *cma, const struct folio *folio);
> diff --git a/mm/cma.c b/mm/cma.c
> index de5bc0c..49a7186 100644
> --- a/mm/cma.c
> +++ b/mm/cma.c
> @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
>                 spin_unlock_irq(&cma->lock);
>
>                 pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
> -               mutex_lock(&cma_mutex);
> +
> +               /*
> +                * If the user sets the concurr_alloc of CMA to true, concurrent
> +                * memory allocation is allowed. If the user sets it to false or
> +                * does not set it, concurrent memory allocation is not allowed.
> +                */
> +               if (!cma->concurr_alloc)
> +                       mutex_lock(&cma_mutex);
>                 ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
> -               mutex_unlock(&cma_mutex);
> +               if (!cma->concurr_alloc)
> +                       mutex_unlock(&cma_mutex);
>                 if (ret == 0) {
>                         page = pfn_to_page(pfn);
>                         break;
> @@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
>
>         return 0;
>  }
> +
> +bool cma_set_concurrency(struct cma *cma, bool concurrency)
> +{
> +       if (!cma)
> +               return false;
> +
> +       cma->concurr_alloc = concurrency;
> +
> +       return true;
> +}
> diff --git a/mm/cma.h b/mm/cma.h
> index 8485ef8..30f489d 100644
> --- a/mm/cma.h
> +++ b/mm/cma.h
> @@ -16,6 +16,7 @@ struct cma {
>         unsigned long   *bitmap;
>         unsigned int order_per_bit; /* Order of pages represented by one bit */
>         spinlock_t      lock;
> +       bool concurr_alloc;
>  #ifdef CONFIG_CMA_DEBUGFS
>         struct hlist_head mem_head;
>         spinlock_t mem_head_lock;
> --
> 2.7.4
>
>

Thanks
Barry


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
  2025-01-27 23:04 ` Andrew Morton
@ 2025-02-08  8:19   ` Ge Yang
  0 siblings, 0 replies; 9+ messages in thread
From: Ge Yang @ 2025-02-08  8:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, stable, 21cnbao, david, baolin.wang,
	aisheng.dong, liuzixing



在 2025/1/28 7:04, Andrew Morton 写道:
> On Fri, 24 Jan 2025 19:21:27 +0800 yangge1116@126.com wrote:
> 
>> From: yangge <yangge1116@126.com>
>>
>> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
>> simply reverts to the original method of using the cma_mutex to ensure
>> that alloc_contig_range() runs sequentially. This change was made to avoid
>> concurrency allocation failures. However, it can negatively impact
>> performance when concurrent allocation of CMA memory is required.
>>
>> To address this issue, we could introduce an API for concurrency settings,
>> allowing users to decide whether their CMA can perform concurrent memory
>> allocations or not.
> 
> The term "users" tends to refer to userspace code.  Here I'm thinking
> you mean in-kernel code, so a better term to use is "callers".
Ok, thank you. I will change it in the next version.

> 
> This new interface has no callers.  We prefer not to merge unused code!
> Please send along the patch which calls cma_set_concurrency() so we
> can better understand this proposal and so that the new code is
> testable.
Ok, thank you. I will add the caller in the next version.

   In fact the patch has cc:stable, which makes things
> stranger.  Why should the -stable maintainers merge a patch which
> doesn't do anything?
> 
> And please quantify the benefit.  "negatively impact" is too vague.
> How much benefit can we expect our users to see from this?  Some
> runtime testing results would be good.
> 
> And please describe in more detail why this particular caller doesn't
> require concurrency protection.  And help other developers understand
> when it is safe for them to use concurr_alloc==false.
Ok, thank you.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
  2025-01-28  9:58 ` Barry Song
@ 2025-02-08  8:50   ` Ge Yang
  2025-02-08 21:34     ` Barry Song
  0 siblings, 1 reply; 9+ messages in thread
From: Ge Yang @ 2025-02-08  8:50 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, linux-mm, linux-kernel, stable, david, baolin.wang,
	aisheng.dong, liuzixing



在 2025/1/28 17:58, Barry Song 写道:
> On Sat, Jan 25, 2025 at 12:21 AM <yangge1116@126.com> wrote:
>>
>> From: yangge <yangge1116@126.com>
>>
>> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
>> simply reverts to the original method of using the cma_mutex to ensure
>> that alloc_contig_range() runs sequentially. This change was made to avoid
>> concurrency allocation failures. However, it can negatively impact
>> performance when concurrent allocation of CMA memory is required.
> 
> Do we have some data?
Yes, I will add it in the next version, thanks.
> 
>>
>> To address this issue, we could introduce an API for concurrency settings,
>> allowing users to decide whether their CMA can perform concurrent memory
>> allocations or not.
> 
> Who is the intended user of cma_set_concurrency? 
We have some drivers that use cma_set_concurrency(), but they have not 
yet been merged into the mainline. The cma_alloc_mem() function in the 
mainline also supports concurrent allocation of CMA memory. By applying 
this patch, we can also achieve significant performance improvements in 
certain scenarios. I will provide performance data in the next version.
I also feel it is somewhat
> unsafe since cma->concurr_alloc is not protected by any locks.
Ok, thanks.
> 
> Will a user setting cma->concurr_alloc = 1 encounter the original issue that
> commit 60a60e32cf91 was attempting to fix?
> 
Yes, if a user encounters the issue described in commit 60a60e32cf91, 
they will not be able to set cma->concurr_alloc to 1.
>>
>> Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
>> Signed-off-by: yangge <yangge1116@126.com>
>> Cc: <stable@vger.kernel.org>
>> ---
>>   include/linux/cma.h |  2 ++
>>   mm/cma.c            | 22 ++++++++++++++++++++--
>>   mm/cma.h            |  1 +
>>   3 files changed, 23 insertions(+), 2 deletions(-)
>>
>> diff --git a/include/linux/cma.h b/include/linux/cma.h
>> index d15b64f..2384624 100644
>> --- a/include/linux/cma.h
>> +++ b/include/linux/cma.h
>> @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
>>
>>   extern void cma_reserve_pages_on_error(struct cma *cma);
>>
>> +extern bool cma_set_concurrency(struct cma *cma, bool concurrency);
>> +
>>   #ifdef CONFIG_CMA
>>   struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
>>   bool cma_free_folio(struct cma *cma, const struct folio *folio);
>> diff --git a/mm/cma.c b/mm/cma.c
>> index de5bc0c..49a7186 100644
>> --- a/mm/cma.c
>> +++ b/mm/cma.c
>> @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
>>                  spin_unlock_irq(&cma->lock);
>>
>>                  pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
>> -               mutex_lock(&cma_mutex);
>> +
>> +               /*
>> +                * If the user sets the concurr_alloc of CMA to true, concurrent
>> +                * memory allocation is allowed. If the user sets it to false or
>> +                * does not set it, concurrent memory allocation is not allowed.
>> +                */
>> +               if (!cma->concurr_alloc)
>> +                       mutex_lock(&cma_mutex);
>>                  ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
>> -               mutex_unlock(&cma_mutex);
>> +               if (!cma->concurr_alloc)
>> +                       mutex_unlock(&cma_mutex);
>>                  if (ret == 0) {
>>                          page = pfn_to_page(pfn);
>>                          break;
>> @@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
>>
>>          return 0;
>>   }
>> +
>> +bool cma_set_concurrency(struct cma *cma, bool concurrency)
>> +{
>> +       if (!cma)
>> +               return false;
>> +
>> +       cma->concurr_alloc = concurrency;
>> +
>> +       return true;
>> +}
>> diff --git a/mm/cma.h b/mm/cma.h
>> index 8485ef8..30f489d 100644
>> --- a/mm/cma.h
>> +++ b/mm/cma.h
>> @@ -16,6 +16,7 @@ struct cma {
>>          unsigned long   *bitmap;
>>          unsigned int order_per_bit; /* Order of pages represented by one bit */
>>          spinlock_t      lock;
>> +       bool concurr_alloc;
>>   #ifdef CONFIG_CMA_DEBUGFS
>>          struct hlist_head mem_head;
>>          spinlock_t mem_head_lock;
>> --
>> 2.7.4
>>
>>
> 
> Thanks
> Barry



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
  2025-02-08  8:50   ` Ge Yang
@ 2025-02-08 21:34     ` Barry Song
  2025-02-09 10:49       ` Ge Yang
  2025-02-10  8:28       ` David Hildenbrand
  0 siblings, 2 replies; 9+ messages in thread
From: Barry Song @ 2025-02-08 21:34 UTC (permalink / raw)
  To: Ge Yang
  Cc: akpm, linux-mm, linux-kernel, stable, david, baolin.wang,
	aisheng.dong, liuzixing

On Sat, Feb 8, 2025 at 9:50 PM Ge Yang <yangge1116@126.com> wrote:
>
>
>
> 在 2025/1/28 17:58, Barry Song 写道:
> > On Sat, Jan 25, 2025 at 12:21 AM <yangge1116@126.com> wrote:
> >>
> >> From: yangge <yangge1116@126.com>
> >>
> >> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
> >> simply reverts to the original method of using the cma_mutex to ensure
> >> that alloc_contig_range() runs sequentially. This change was made to avoid
> >> concurrency allocation failures. However, it can negatively impact
> >> performance when concurrent allocation of CMA memory is required.
> >
> > Do we have some data?
> Yes, I will add it in the next version, thanks.
> >
> >>
> >> To address this issue, we could introduce an API for concurrency settings,
> >> allowing users to decide whether their CMA can perform concurrent memory
> >> allocations or not.
> >
> > Who is the intended user of cma_set_concurrency?
> We have some drivers that use cma_set_concurrency(), but they have not
> yet been merged into the mainline. The cma_alloc_mem() function in the
> mainline also supports concurrent allocation of CMA memory. By applying
> this patch, we can also achieve significant performance improvements in
> certain scenarios. I will provide performance data in the next version.
> I also feel it is somewhat
> > unsafe since cma->concurr_alloc is not protected by any locks.
> Ok, thanks.
> >
> > Will a user setting cma->concurr_alloc = 1 encounter the original issue that
> > commit 60a60e32cf91 was attempting to fix?
> >
> Yes, if a user encounters the issue described in commit 60a60e32cf91,
> they will not be able to set cma->concurr_alloc to 1.

A user who hasn't encountered a problem yet doesn't mean they won't
encounter it; it most likely just means the testing time hasn't been long
enough.

Is it possible to implement a per-CMA lock or range lock that simultaneously
improves performance and prevents the original issue that commit
60a60e32cf91 aimed to fix?

I strongly believe that cma->concurr_alloc is not the right approach. Let's
not waste our time on this kind of hack or workaround.  Instead, we should
find a proper fix that remains transparent to users.

> >>
> >> Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
> >> Signed-off-by: yangge <yangge1116@126.com>
> >> Cc: <stable@vger.kernel.org>
> >> ---
> >>   include/linux/cma.h |  2 ++
> >>   mm/cma.c            | 22 ++++++++++++++++++++--
> >>   mm/cma.h            |  1 +
> >>   3 files changed, 23 insertions(+), 2 deletions(-)
> >>
> >> diff --git a/include/linux/cma.h b/include/linux/cma.h
> >> index d15b64f..2384624 100644
> >> --- a/include/linux/cma.h
> >> +++ b/include/linux/cma.h
> >> @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
> >>
> >>   extern void cma_reserve_pages_on_error(struct cma *cma);
> >>
> >> +extern bool cma_set_concurrency(struct cma *cma, bool concurrency);
> >> +
> >>   #ifdef CONFIG_CMA
> >>   struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
> >>   bool cma_free_folio(struct cma *cma, const struct folio *folio);
> >> diff --git a/mm/cma.c b/mm/cma.c
> >> index de5bc0c..49a7186 100644
> >> --- a/mm/cma.c
> >> +++ b/mm/cma.c
> >> @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
> >>                  spin_unlock_irq(&cma->lock);
> >>
> >>                  pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
> >> -               mutex_lock(&cma_mutex);
> >> +
> >> +               /*
> >> +                * If the user sets the concurr_alloc of CMA to true, concurrent
> >> +                * memory allocation is allowed. If the user sets it to false or
> >> +                * does not set it, concurrent memory allocation is not allowed.
> >> +                */
> >> +               if (!cma->concurr_alloc)
> >> +                       mutex_lock(&cma_mutex);
> >>                  ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
> >> -               mutex_unlock(&cma_mutex);
> >> +               if (!cma->concurr_alloc)
> >> +                       mutex_unlock(&cma_mutex);
> >>                  if (ret == 0) {
> >>                          page = pfn_to_page(pfn);
> >>                          break;
> >> @@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
> >>
> >>          return 0;
> >>   }
> >> +
> >> +bool cma_set_concurrency(struct cma *cma, bool concurrency)
> >> +{
> >> +       if (!cma)
> >> +               return false;
> >> +
> >> +       cma->concurr_alloc = concurrency;
> >> +
> >> +       return true;
> >> +}
> >> diff --git a/mm/cma.h b/mm/cma.h
> >> index 8485ef8..30f489d 100644
> >> --- a/mm/cma.h
> >> +++ b/mm/cma.h
> >> @@ -16,6 +16,7 @@ struct cma {
> >>          unsigned long   *bitmap;
> >>          unsigned int order_per_bit; /* Order of pages represented by one bit */
> >>          spinlock_t      lock;
> >> +       bool concurr_alloc;
> >>   #ifdef CONFIG_CMA_DEBUGFS
> >>          struct hlist_head mem_head;
> >>          spinlock_t mem_head_lock;
> >> --
> >> 2.7.4
> >>
> >>
> >

Thanks
Barry


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
  2025-02-08 21:34     ` Barry Song
@ 2025-02-09 10:49       ` Ge Yang
  2025-02-10  8:28       ` David Hildenbrand
  1 sibling, 0 replies; 9+ messages in thread
From: Ge Yang @ 2025-02-09 10:49 UTC (permalink / raw)
  To: Barry Song
  Cc: akpm, linux-mm, linux-kernel, stable, david, baolin.wang,
	aisheng.dong, liuzixing



在 2025/2/9 5:34, Barry Song 写道:
> On Sat, Feb 8, 2025 at 9:50 PM Ge Yang <yangge1116@126.com> wrote:
>>
>>
>>
>> 在 2025/1/28 17:58, Barry Song 写道:
>>> On Sat, Jan 25, 2025 at 12:21 AM <yangge1116@126.com> wrote:
>>>>
>>>> From: yangge <yangge1116@126.com>
>>>>
>>>> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
>>>> simply reverts to the original method of using the cma_mutex to ensure
>>>> that alloc_contig_range() runs sequentially. This change was made to avoid
>>>> concurrency allocation failures. However, it can negatively impact
>>>> performance when concurrent allocation of CMA memory is required.
>>>
>>> Do we have some data?
>> Yes, I will add it in the next version, thanks.
>>>
>>>>
>>>> To address this issue, we could introduce an API for concurrency settings,
>>>> allowing users to decide whether their CMA can perform concurrent memory
>>>> allocations or not.
>>>
>>> Who is the intended user of cma_set_concurrency?
>> We have some drivers that use cma_set_concurrency(), but they have not
>> yet been merged into the mainline. The cma_alloc_mem() function in the
>> mainline also supports concurrent allocation of CMA memory. By applying
>> this patch, we can also achieve significant performance improvements in
>> certain scenarios. I will provide performance data in the next version.
>> I also feel it is somewhat
>>> unsafe since cma->concurr_alloc is not protected by any locks.
>> Ok, thanks.
>>>
>>> Will a user setting cma->concurr_alloc = 1 encounter the original issue that
>>> commit 60a60e32cf91 was attempting to fix?
>>>
>> Yes, if a user encounters the issue described in commit 60a60e32cf91,
>> they will not be able to set cma->concurr_alloc to 1.
> 
> A user who hasn't encountered a problem yet doesn't mean they won't
> encounter it; it most likely just means the testing time hasn't been long
> enough.
> 
> Is it possible to implement a per-CMA lock or range lock that simultaneously
> improves performance and prevents the original issue that commit
> 60a60e32cf91 aimed to fix?
> 
Using per-CMA locks can improve performance and prevent the original 
issue. I am currently preparing the patch. Thanks.
> I strongly believe that cma->concurr_alloc is not the right approach. Let's
> not waste our time on this kind of hack or workaround.  Instead, we should
> find a proper fix that remains transparent to users.
> 
>>>>
>>>> Fixes: 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
>>>> Signed-off-by: yangge <yangge1116@126.com>
>>>> Cc: <stable@vger.kernel.org>
>>>> ---
>>>>    include/linux/cma.h |  2 ++
>>>>    mm/cma.c            | 22 ++++++++++++++++++++--
>>>>    mm/cma.h            |  1 +
>>>>    3 files changed, 23 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/include/linux/cma.h b/include/linux/cma.h
>>>> index d15b64f..2384624 100644
>>>> --- a/include/linux/cma.h
>>>> +++ b/include/linux/cma.h
>>>> @@ -53,6 +53,8 @@ extern int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
>>>>
>>>>    extern void cma_reserve_pages_on_error(struct cma *cma);
>>>>
>>>> +extern bool cma_set_concurrency(struct cma *cma, bool concurrency);
>>>> +
>>>>    #ifdef CONFIG_CMA
>>>>    struct folio *cma_alloc_folio(struct cma *cma, int order, gfp_t gfp);
>>>>    bool cma_free_folio(struct cma *cma, const struct folio *folio);
>>>> diff --git a/mm/cma.c b/mm/cma.c
>>>> index de5bc0c..49a7186 100644
>>>> --- a/mm/cma.c
>>>> +++ b/mm/cma.c
>>>> @@ -460,9 +460,17 @@ static struct page *__cma_alloc(struct cma *cma, unsigned long count,
>>>>                   spin_unlock_irq(&cma->lock);
>>>>
>>>>                   pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
>>>> -               mutex_lock(&cma_mutex);
>>>> +
>>>> +               /*
>>>> +                * If the user sets the concurr_alloc of CMA to true, concurrent
>>>> +                * memory allocation is allowed. If the user sets it to false or
>>>> +                * does not set it, concurrent memory allocation is not allowed.
>>>> +                */
>>>> +               if (!cma->concurr_alloc)
>>>> +                       mutex_lock(&cma_mutex);
>>>>                   ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA, gfp);
>>>> -               mutex_unlock(&cma_mutex);
>>>> +               if (!cma->concurr_alloc)
>>>> +                       mutex_unlock(&cma_mutex);
>>>>                   if (ret == 0) {
>>>>                           page = pfn_to_page(pfn);
>>>>                           break;
>>>> @@ -610,3 +618,13 @@ int cma_for_each_area(int (*it)(struct cma *cma, void *data), void *data)
>>>>
>>>>           return 0;
>>>>    }
>>>> +
>>>> +bool cma_set_concurrency(struct cma *cma, bool concurrency)
>>>> +{
>>>> +       if (!cma)
>>>> +               return false;
>>>> +
>>>> +       cma->concurr_alloc = concurrency;
>>>> +
>>>> +       return true;
>>>> +}
>>>> diff --git a/mm/cma.h b/mm/cma.h
>>>> index 8485ef8..30f489d 100644
>>>> --- a/mm/cma.h
>>>> +++ b/mm/cma.h
>>>> @@ -16,6 +16,7 @@ struct cma {
>>>>           unsigned long   *bitmap;
>>>>           unsigned int order_per_bit; /* Order of pages represented by one bit */
>>>>           spinlock_t      lock;
>>>> +       bool concurr_alloc;
>>>>    #ifdef CONFIG_CMA_DEBUGFS
>>>>           struct hlist_head mem_head;
>>>>           spinlock_t mem_head_lock;
>>>> --
>>>> 2.7.4
>>>>
>>>>
>>>
> 
> Thanks
> Barry



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA
  2025-02-08 21:34     ` Barry Song
  2025-02-09 10:49       ` Ge Yang
@ 2025-02-10  8:28       ` David Hildenbrand
  1 sibling, 0 replies; 9+ messages in thread
From: David Hildenbrand @ 2025-02-10  8:28 UTC (permalink / raw)
  To: Barry Song, Ge Yang
  Cc: akpm, linux-mm, linux-kernel, stable, baolin.wang, aisheng.dong,
	liuzixing

On 08.02.25 22:34, Barry Song wrote:
> On Sat, Feb 8, 2025 at 9:50 PM Ge Yang <yangge1116@126.com> wrote:
>>
>>
>>
>> 在 2025/1/28 17:58, Barry Song 写道:
>>> On Sat, Jan 25, 2025 at 12:21 AM <yangge1116@126.com> wrote:
>>>>
>>>> From: yangge <yangge1116@126.com>
>>>>
>>>> Commit 60a60e32cf91 ("Revert "mm/cma.c: remove redundant cma_mutex lock"")
>>>> simply reverts to the original method of using the cma_mutex to ensure
>>>> that alloc_contig_range() runs sequentially. This change was made to avoid
>>>> concurrency allocation failures. However, it can negatively impact
>>>> performance when concurrent allocation of CMA memory is required.
>>>
>>> Do we have some data?
>> Yes, I will add it in the next version, thanks.
>>>
>>>>
>>>> To address this issue, we could introduce an API for concurrency settings,
>>>> allowing users to decide whether their CMA can perform concurrent memory
>>>> allocations or not.
>>>
>>> Who is the intended user of cma_set_concurrency?
>> We have some drivers that use cma_set_concurrency(), but they have not
>> yet been merged into the mainline. The cma_alloc_mem() function in the
>> mainline also supports concurrent allocation of CMA memory. By applying
>> this patch, we can also achieve significant performance improvements in
>> certain scenarios. I will provide performance data in the next version.
>> I also feel it is somewhat
>>> unsafe since cma->concurr_alloc is not protected by any locks.
>> Ok, thanks.
>>>
>>> Will a user setting cma->concurr_alloc = 1 encounter the original issue that
>>> commit 60a60e32cf91 was attempting to fix?
>>>
>> Yes, if a user encounters the issue described in commit 60a60e32cf91,
>> they will not be able to set cma->concurr_alloc to 1.
> 
> A user who hasn't encountered a problem yet doesn't mean they won't
> encounter it; it most likely just means the testing time hasn't been long
> enough.
> 
> Is it possible to implement a per-CMA lock or range lock that simultaneously
> improves performance and prevents the original issue that commit
> 60a60e32cf91 aimed to fix?
> 
> I strongly believe that cma->concurr_alloc is not the right approach. Let's
> not waste our time on this kind of hack or workaround.  Instead, we should
> find a proper fix that remains transparent to users.

Fully agreed.

IIUC, the problem is that we find a pageblock is already isolated. It 
might be sufficient to return -EAGAIN in that case from 
alloc_contig_range_noprof()->start_isolate_page_range() and retry in 
CMA. ideally, we'd have a way to wait on some event (e.g., any pageblock 
transitioning from isolated -> !isolated).

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2025-02-10  8:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-01-24 11:21 [PATCH] mm/cma: add an API to enable/disable concurrent memory allocation for the CMA yangge1116
2025-01-27 23:04 ` Andrew Morton
2025-02-08  8:19   ` Ge Yang
2025-01-28  6:11 ` Christoph Hellwig
2025-01-28  9:58 ` Barry Song
2025-02-08  8:50   ` Ge Yang
2025-02-08 21:34     ` Barry Song
2025-02-09 10:49       ` Ge Yang
2025-02-10  8:28       ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox