[PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
@ 2024-07-22  2:10 Li Zhijian
  2024-07-22  6:44 ` Vlastimil Babka (SUSE)
  0 siblings, 1 reply; 6+ messages in thread
From: Li Zhijian @ 2024-07-22  2:10 UTC (permalink / raw)
  To: linux-mm
  Cc: akpm, linux-kernel, Yasunori Gotou, Li Zhijian,
	David Hildenbrand, Vlastimil Babka, Yao Xingtao

It's expected that no page should be left in pcp_list after calling
zone_pcp_disable() in offline_pages(). Previously, it's observed that
offline_pages() gets stuck [1] due to some pages remaining in pcp_list.

Cause:
There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
involving the pcp->count variable. See below scenario:

         CPU0                              CPU1
    ----------------                    ---------------
                                      spin_lock(&pcp->lock);
                                      __rmqueue_pcplist() {
zone_pcp_disable() {
                                        /* list is empty */
                                        if (list_empty(list)) {
                                          /* add pages to pcp_list */
                                          alloced = rmqueue_bulk()
  mutex_lock(&pcp_batch_high_lock)
  ...
  __drain_all_pages() {
    drain_pages_zone() {
      /* read pcp->count, it's 0 here */
      count = READ_ONCE(pcp->count)
      /* 0 means nothing to drain */
                                          /* update pcp->count */
                                          pcp->count += alloced << order;
      ...
                                      ...
                                      spin_unlock(&pcp->lock);

In this case, after calling zone_pcp_disable() though, there are still some
pages in pcp_list. And these pages in pcp_list are neither movable nor
isolated, offline_pages() gets stuck as a result.

Solution:
Expand the scope of the pcp->lock to also protect pcp->count in
drain_pages_zone(), to ensure no pages are left in the pcp list after
zone_pcp_disable()

[1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/

Cc: David Hildenbrand <david@redhat.com>
Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
Reported-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
---
V2:
    - Narrow down the scope of the spin_lock() to limit the draining latency. # Vlastimil and David
    - In above scenario, it's sufficient to read pcp->count once with lock held, and it fully fixed
      my issue[1] in thounds runs(It happened in more than 5% before).
RFC:
    https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@fujitsu.com/
---
 mm/page_alloc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 9ecf99190ea2..5388a35c4e9c 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2323,8 +2323,11 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
 static void drain_pages_zone(unsigned int cpu, struct zone *zone)
 {
 	struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
-	int count = READ_ONCE(pcp->count);
+	int count;
 
+	spin_lock(&pcp->lock);
+	count = pcp->count;
+	spin_unlock(&pcp->lock);
 	while (count) {
 		int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
 		count -= to_drain;
-- 
2.29.2



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
  2024-07-22  2:10 [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() Li Zhijian
@ 2024-07-22  6:44 ` Vlastimil Babka (SUSE)
  2024-07-22  9:15   ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 6+ messages in thread
From: Vlastimil Babka (SUSE) @ 2024-07-22  6:44 UTC (permalink / raw)
  To: Li Zhijian, linux-mm
  Cc: akpm, linux-kernel, Yasunori Gotou, David Hildenbrand, Yao Xingtao

On 7/22/24 4:10 AM, Li Zhijian wrote:
> It's expected that no page should be left in pcp_list after calling
> zone_pcp_disable() in offline_pages(). Previously, it's observed that
> offline_pages() gets stuck [1] due to some pages remaining in pcp_list.
> 
> Cause:
> There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
> involving the pcp->count variable. See below scenario:
> 
>          CPU0                              CPU1
>     ----------------                    ---------------
>                                       spin_lock(&pcp->lock);
>                                       __rmqueue_pcplist() {
> zone_pcp_disable() {
>                                         /* list is empty */
>                                         if (list_empty(list)) {
>                                           /* add pages to pcp_list */
>                                           alloced = rmqueue_bulk()
>   mutex_lock(&pcp_batch_high_lock)
>   ...
>   __drain_all_pages() {
>     drain_pages_zone() {
>       /* read pcp->count, it's 0 here */
>       count = READ_ONCE(pcp->count)
>       /* 0 means nothing to drain */
>                                           /* update pcp->count */
>                                           pcp->count += alloced << order;
>       ...
>                                       ...
>                                       spin_unlock(&pcp->lock);
> 
> In this case, after calling zone_pcp_disable() though, there are still some
> pages in pcp_list. And these pages in pcp_list are neither movable nor
> isolated, offline_pages() gets stuck as a result.
> 
> Solution:
> Expand the scope of the pcp->lock to also protect pcp->count in
> drain_pages_zone(), to ensure no pages are left in the pcp list after
> zone_pcp_disable()
> 
> [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/
> 
> Cc: David Hildenbrand <david@redhat.com>
> Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
> Reported-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>

Can we find a breaking commit for Fixes: ?

> ---
> V2:
>     - Narrow down the scope of the spin_lock() to limit the draining latency. # Vlastimil and David
>     - In above scenario, it's sufficient to read pcp->count once with lock held, and it fully fixed
>       my issue[1] in thounds runs(It happened in more than 5% before).

That should be ok indeed, but...

> RFC:
>     https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@fujitsu.com/
> ---
>  mm/page_alloc.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 9ecf99190ea2..5388a35c4e9c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2323,8 +2323,11 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
>  static void drain_pages_zone(unsigned int cpu, struct zone *zone)
>  {
>  	struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
> -	int count = READ_ONCE(pcp->count);
> +	int count;
>  
> +	spin_lock(&pcp->lock);
> +	count = pcp->count;
> +	spin_unlock(&pcp->lock);
>  	while (count) {
>  		int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
>  		count -= to_drain;

It's wasteful to do a lock/unlock cycle just to read the count. It could
rather look something like this:

while (true)
    spin_lock(&pcp->lock);
    count = pcp->count;
    ...
    count -= to_drain;
    if (to_drain)
        drain_zone_pages(...)
    ...
    spin_unlock(&pcp->lock);
    if (count)
         break;


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
  2024-07-22  6:44 ` Vlastimil Babka (SUSE)
@ 2024-07-22  9:15   ` Zhijian Li (Fujitsu)
  2024-07-22  9:28     ` Vlastimil Babka (SUSE)
  2024-07-22  9:34     ` David Hildenbrand
  0 siblings, 2 replies; 6+ messages in thread
From: Zhijian Li (Fujitsu) @ 2024-07-22  9:15 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), linux-mm
  Cc: akpm, linux-kernel, Yasunori Gotou (Fujitsu),
	David Hildenbrand, Xingtao Yao (Fujitsu)

Hi David

Thanks for you quickly reply.


On 22/07/2024 14:44, Vlastimil Babka (SUSE) wrote:
> On 7/22/24 4:10 AM, Li Zhijian wrote:
>> It's expected that no page should be left in pcp_list after calling
>> zone_pcp_disable() in offline_pages(). Previously, it's observed that
>> offline_pages() gets stuck [1] due to some pages remaining in pcp_list.
>>
>> Cause:
>> There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
>> involving the pcp->count variable. See below scenario:
>>
>>           CPU0                              CPU1
>>      ----------------                    ---------------
>>                                        spin_lock(&pcp->lock);
>>                                        __rmqueue_pcplist() {
>> zone_pcp_disable() {
>>                                          /* list is empty */
>>                                          if (list_empty(list)) {
>>                                            /* add pages to pcp_list */
>>                                            alloced = rmqueue_bulk()
>>    mutex_lock(&pcp_batch_high_lock)
>>    ...
>>    __drain_all_pages() {
>>      drain_pages_zone() {
>>        /* read pcp->count, it's 0 here */
>>        count = READ_ONCE(pcp->count)
>>        /* 0 means nothing to drain */
>>                                            /* update pcp->count */
>>                                            pcp->count += alloced << order;
>>        ...
>>                                        ...
>>                                        spin_unlock(&pcp->lock);
>>
>> In this case, after calling zone_pcp_disable() though, there are still some
>> pages in pcp_list. And these pages in pcp_list are neither movable nor
>> isolated, offline_pages() gets stuck as a result.
>>
>> Solution:
>> Expand the scope of the pcp->lock to also protect pcp->count in
>> drain_pages_zone(), to ensure no pages are left in the pcp list after
>> zone_pcp_disable()
>>
>> [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/
>>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>> Reported-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
> 
> Can we find a breaking commit for Fixes: ?

I haven't confirmed the FBC because my reproducer is not fit to run in the old kernel for some reasons.
but I noticed it didn't read the count without lock held since below commit

4b23a68f9536 mm/page_alloc: protect PCP lists with a spinlock
  


> 
>> ---
>> V2:
>>      - Narrow down the scope of the spin_lock() to limit the draining latency. # Vlastimil and David
>>      - In above scenario, it's sufficient to read pcp->count once with lock held, and it fully fixed
>>        my issue[1] in thounds runs(It happened in more than 5% before).
> 
> That should be ok indeed, but...
> 
>> RFC:
>>      https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@fujitsu.com/
>> ---
>>   mm/page_alloc.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 9ecf99190ea2..5388a35c4e9c 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -2323,8 +2323,11 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
>>   static void drain_pages_zone(unsigned int cpu, struct zone *zone)
>>   {
>>   	struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
>> -	int count = READ_ONCE(pcp->count);
>> +	int count;
>>   
>> +	spin_lock(&pcp->lock);
>> +	count = pcp->count;
>> +	spin_unlock(&pcp->lock);
>>   	while (count) {
>>   		int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
>>   		count -= to_drain;
> 
> It's wasteful to do a lock/unlock cycle just to read the count.

How about,

static void drain_pages_zone(unsigned int cpu, struct zone *zone)
{
         struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
         int count, to_drain;
                                                                                                    
         do {
                 spin_lock(&pcp->lock);
                 to_drain = min(pcp->count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
                 free_pcppages_bulk(zone, to_drain, pcp, 0);
                 spin_unlock(&pcp->lock);
         } while (to_drain);
}


> It could
> rather look something like this:
> 

Sorry, I don't follow your code...

> while (true)
>      spin_lock(&pcp->lock);
>      count = pcp->count;
>      ...
>      count -= to_drain;
>      if (to_drain)
>          drain_zone_pages(...)

Which subroutine does this code belong to, why it involves drain_zone_pages

>      ...
>      spin_unlock(&pcp->lock);
>      if (count)
>           break;

Thanks
Zhijian

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
  2024-07-22  9:15   ` Zhijian Li (Fujitsu)
@ 2024-07-22  9:28     ` Vlastimil Babka (SUSE)
  2024-07-23  6:50       ` Zhijian Li (Fujitsu)
  2024-07-22  9:34     ` David Hildenbrand
  1 sibling, 1 reply; 6+ messages in thread
From: Vlastimil Babka (SUSE) @ 2024-07-22  9:28 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu), linux-mm
  Cc: akpm, linux-kernel, Yasunori Gotou (Fujitsu),
	David Hildenbrand, Xingtao Yao (Fujitsu)

On 7/22/24 11:15 AM, Zhijian Li (Fujitsu) wrote:
> Hi David
> 
> Thanks for you quickly reply.
> 
> 
> On 22/07/2024 14:44, Vlastimil Babka (SUSE) wrote:
>> On 7/22/24 4:10 AM, Li Zhijian wrote:
>>> It's expected that no page should be left in pcp_list after calling
>>> zone_pcp_disable() in offline_pages(). Previously, it's observed that
>>> offline_pages() gets stuck [1] due to some pages remaining in pcp_list.
>>>
>>> Cause:
>>> There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
>>> involving the pcp->count variable. See below scenario:
>>>
>>>           CPU0                              CPU1
>>>      ----------------                    ---------------
>>>                                        spin_lock(&pcp->lock);
>>>                                        __rmqueue_pcplist() {
>>> zone_pcp_disable() {
>>>                                          /* list is empty */
>>>                                          if (list_empty(list)) {
>>>                                            /* add pages to pcp_list */
>>>                                            alloced = rmqueue_bulk()
>>>    mutex_lock(&pcp_batch_high_lock)
>>>    ...
>>>    __drain_all_pages() {
>>>      drain_pages_zone() {
>>>        /* read pcp->count, it's 0 here */
>>>        count = READ_ONCE(pcp->count)
>>>        /* 0 means nothing to drain */
>>>                                            /* update pcp->count */
>>>                                            pcp->count += alloced << order;
>>>        ...
>>>                                        ...
>>>                                        spin_unlock(&pcp->lock);
>>>
>>> In this case, after calling zone_pcp_disable() though, there are still some
>>> pages in pcp_list. And these pages in pcp_list are neither movable nor
>>> isolated, offline_pages() gets stuck as a result.
>>>
>>> Solution:
>>> Expand the scope of the pcp->lock to also protect pcp->count in
>>> drain_pages_zone(), to ensure no pages are left in the pcp list after
>>> zone_pcp_disable()
>>>
>>> [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/
>>>
>>> Cc: David Hildenbrand <david@redhat.com>
>>> Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>>> Reported-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>> 
>> Can we find a breaking commit for Fixes: ?
> 
> I haven't confirmed the FBC because my reproducer is not fit to run in the old kernel for some reasons.
> but I noticed it didn't read the count without lock held since below commit
> 
> 4b23a68f9536 mm/page_alloc: protect PCP lists with a spinlock
>   
> 
> 
>> 
>>> ---
>>> V2:
>>>      - Narrow down the scope of the spin_lock() to limit the draining latency. # Vlastimil and David
>>>      - In above scenario, it's sufficient to read pcp->count once with lock held, and it fully fixed
>>>        my issue[1] in thounds runs(It happened in more than 5% before).
>> 
>> That should be ok indeed, but...
>> 
>>> RFC:
>>>      https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@fujitsu.com/
>>> ---
>>>   mm/page_alloc.c | 5 ++++-
>>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 9ecf99190ea2..5388a35c4e9c 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -2323,8 +2323,11 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
>>>   static void drain_pages_zone(unsigned int cpu, struct zone *zone)
>>>   {
>>>   	struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
>>> -	int count = READ_ONCE(pcp->count);
>>> +	int count;
>>>   
>>> +	spin_lock(&pcp->lock);
>>> +	count = pcp->count;
>>> +	spin_unlock(&pcp->lock);
>>>   	while (count) {
>>>   		int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
>>>   		count -= to_drain;
>> 
>> It's wasteful to do a lock/unlock cycle just to read the count.
> 
> How about,
> 
> static void drain_pages_zone(unsigned int cpu, struct zone *zone)
> {
>          struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
>          int count, to_drain;
>                                                                                                     
>          do {
>                  spin_lock(&pcp->lock);
>                  to_drain = min(pcp->count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
>                  free_pcppages_bulk(zone, to_drain, pcp, 0);
>                  spin_unlock(&pcp->lock);
>          } while (to_drain);

Yeah better than break. But I think you still should use
  count = pcp->count;
  ...
  count -= to_drain;
} while(count);

or you make one extra wasteful iteration to find to_drain is zero.
(assuming "it's sufficient to read pcp->count once with lock held", that I
agree with)

> }
>> It could
>> rather look something like this:
>> 
> 
> Sorry, I don't follow your code...
> 
>> while (true)
>>      spin_lock(&pcp->lock);
>>      count = pcp->count;
>>      ...
>>      count -= to_drain;
>>      if (to_drain)
>>          drain_zone_pages(...)
> 
> Which subroutine does this code belong to, why it involves drain_zone_pages

Yeah sorry I meant free_pcppages_bulk()

>>      ...
>>      spin_unlock(&pcp->lock);
>>      if (count)
>>           break;
> 
> Thanks
> Zhijian



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
  2024-07-22  9:15   ` Zhijian Li (Fujitsu)
  2024-07-22  9:28     ` Vlastimil Babka (SUSE)
@ 2024-07-22  9:34     ` David Hildenbrand
  1 sibling, 0 replies; 6+ messages in thread
From: David Hildenbrand @ 2024-07-22  9:34 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu), Vlastimil Babka (SUSE), linux-mm
  Cc: akpm, linux-kernel, Yasunori Gotou (Fujitsu), Xingtao Yao (Fujitsu)

On 22.07.24 11:15, Zhijian Li (Fujitsu) wrote:
> Hi David
> 
> Thanks for you quickly reply.

Heh, Vlasimil replied but I agree with his feedback :)

-- 
Cheers,

David / dhildenb



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist()
  2024-07-22  9:28     ` Vlastimil Babka (SUSE)
@ 2024-07-23  6:50       ` Zhijian Li (Fujitsu)
  0 siblings, 0 replies; 6+ messages in thread
From: Zhijian Li (Fujitsu) @ 2024-07-23  6:50 UTC (permalink / raw)
  To: Vlastimil Babka (SUSE), linux-mm
  Cc: akpm, linux-kernel, Yasunori Gotou (Fujitsu),
	David Hildenbrand, Xingtao Yao (Fujitsu)



On 22/07/2024 17:28, Vlastimil Babka (SUSE) wrote:
> On 7/22/24 11:15 AM, Zhijian Li (Fujitsu) wrote:
>> Hi David
>>
>> Thanks for you quickly reply.
>>
>>
>> On 22/07/2024 14:44, Vlastimil Babka (SUSE) wrote:
>>> On 7/22/24 4:10 AM, Li Zhijian wrote:
>>>> It's expected that no page should be left in pcp_list after calling
>>>> zone_pcp_disable() in offline_pages(). Previously, it's observed that
>>>> offline_pages() gets stuck [1] due to some pages remaining in pcp_list.
>>>>
>>>> Cause:
>>>> There is a race condition between drain_pages_zone() and __rmqueue_pcplist()
>>>> involving the pcp->count variable. See below scenario:
>>>>
>>>>            CPU0                              CPU1
>>>>       ----------------                    ---------------
>>>>                                         spin_lock(&pcp->lock);
>>>>                                         __rmqueue_pcplist() {
>>>> zone_pcp_disable() {
>>>>                                           /* list is empty */
>>>>                                           if (list_empty(list)) {
>>>>                                             /* add pages to pcp_list */
>>>>                                             alloced = rmqueue_bulk()
>>>>     mutex_lock(&pcp_batch_high_lock)
>>>>     ...
>>>>     __drain_all_pages() {
>>>>       drain_pages_zone() {
>>>>         /* read pcp->count, it's 0 here */
>>>>         count = READ_ONCE(pcp->count)
>>>>         /* 0 means nothing to drain */
>>>>                                             /* update pcp->count */
>>>>                                             pcp->count += alloced << order;
>>>>         ...
>>>>                                         ...
>>>>                                         spin_unlock(&pcp->lock);
>>>>
>>>> In this case, after calling zone_pcp_disable() though, there are still some
>>>> pages in pcp_list. And these pages in pcp_list are neither movable nor
>>>> isolated, offline_pages() gets stuck as a result.
>>>>
>>>> Solution:
>>>> Expand the scope of the pcp->lock to also protect pcp->count in
>>>> drain_pages_zone(), to ensure no pages are left in the pcp list after
>>>> zone_pcp_disable()
>>>>
>>>> [1] https://lore.kernel.org/linux-mm/6a07125f-e720-404c-b2f9-e55f3f166e85@fujitsu.com/
>>>>
>>>> Cc: David Hildenbrand <david@redhat.com>
>>>> Cc: Vlastimil Babka (SUSE) <vbabka@kernel.org>
>>>> Reported-by: Yao Xingtao <yaoxt.fnst@fujitsu.com>
>>>> Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
>>>
>>> Can we find a breaking commit for Fixes: ?
>>
>> I haven't confirmed the FBC because my reproducer is not fit to run in the old kernel for some reasons.
>> but I noticed it didn't read the count without lock held since below commit
>>
>> 4b23a68f9536 mm/page_alloc: protect PCP lists with a spinlock
>>    
>>
>>
>>>
>>>> ---
>>>> V2:
>>>>       - Narrow down the scope of the spin_lock() to limit the draining latency. # Vlastimil and David
>>>>       - In above scenario, it's sufficient to read pcp->count once with lock held, and it fully fixed
>>>>         my issue[1] in thounds runs(It happened in more than 5% before).
>>>
>>> That should be ok indeed, but...
>>>
>>>> RFC:
>>>>       https://lore.kernel.org/linux-mm/20240716073929.843277-1-lizhijian@fujitsu.com/
>>>> ---
>>>>    mm/page_alloc.c | 5 ++++-
>>>>    1 file changed, 4 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>> index 9ecf99190ea2..5388a35c4e9c 100644
>>>> --- a/mm/page_alloc.c
>>>> +++ b/mm/page_alloc.c
>>>> @@ -2323,8 +2323,11 @@ void drain_zone_pages(struct zone *zone, struct per_cpu_pages *pcp)
>>>>    static void drain_pages_zone(unsigned int cpu, struct zone *zone)
>>>>    {
>>>>    	struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
>>>> -	int count = READ_ONCE(pcp->count);
>>>> +	int count;
>>>>    
>>>> +	spin_lock(&pcp->lock);
>>>> +	count = pcp->count;
>>>> +	spin_unlock(&pcp->lock);
>>>>    	while (count) {
>>>>    		int to_drain = min(count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
>>>>    		count -= to_drain;
>>>
>>> It's wasteful to do a lock/unlock cycle just to read the count.
>>
>> How about,
>>
>> static void drain_pages_zone(unsigned int cpu, struct zone *zone)
>> {
>>           struct per_cpu_pages *pcp = per_cpu_ptr(zone->per_cpu_pageset, cpu);
>>           int count, to_drain;
>>                                                                                                      
>>           do {
>>                   spin_lock(&pcp->lock);
>>                   to_drain = min(pcp->count, pcp->batch << CONFIG_PCP_BATCH_SCALE_MAX);
>>                   free_pcppages_bulk(zone, to_drain, pcp, 0);
>>                   spin_unlock(&pcp->lock);
>>           } while (to_drain);
> 
> Yeah better than break. But I think you still should use


Okay, I will update it in V3


Thanks
Zhijian


>    count = pcp->count;
>    ...
>    count -= to_drain;
> } while(count);
> 
> or you make one extra wasteful iteration to find to_drain is zero.
> (assuming "it's sufficient to read pcp->count once with lock held", that I
> agree with)> 
>> }
>>> It could
>>> rather look something like this:
>>>
>>
>> Sorry, I don't follow your code...
>>
>>> while (true)
>>>       spin_lock(&pcp->lock);
>>>       count = pcp->count;
>>>       ...
>>>       count -= to_drain;
>>>       if (to_drain)
>>>           drain_zone_pages(...)
>>
>> Which subroutine does this code belong to, why it involves drain_zone_pages
> 
> Yeah sorry I meant free_pcppages_bulk()
> 
>>>       ...
>>>       spin_unlock(&pcp->lock);
>>>       if (count)
>>>            break;
>>
>> Thanks
>> Zhijian
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-07-23  6:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-07-22  2:10 [PATCH v2] mm/page_alloc: Fix pcp->count race between drain_pages_zone() vs __rmqueue_pcplist() Li Zhijian
2024-07-22  6:44 ` Vlastimil Babka (SUSE)
2024-07-22  9:15   ` Zhijian Li (Fujitsu)
2024-07-22  9:28     ` Vlastimil Babka (SUSE)
2024-07-23  6:50       ` Zhijian Li (Fujitsu)
2024-07-22  9:34     ` David Hildenbrand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox