Re: [PATCH 09/10] mm, pcp: avoid to reduce PCP high unnecessarily

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: <linux-mm@kvack.org>,  <linux-kernel@vger.kernel.org>,
	 Arjan Van De Ven <arjan@linux.intel.com>,
	 Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	 David Hildenbrand <david@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	 Michal Hocko <mhocko@suse.com>,
	 Pavel Tatashin <pasha.tatashin@soleen.com>,
	 Matthew Wilcox <willy@infradead.org>,
	 "Christoph Lameter" <cl@linux.com>
Subject: Re: [PATCH 09/10] mm, pcp: avoid to reduce PCP high unnecessarily
Date: Thu, 12 Oct 2023 15:48:04 +0800	[thread overview]
Message-ID: <87lec8ffij.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <20231011140949.rwsqfb57vyuub6va@techsingularity.net> (Mel Gorman's message of "Wed, 11 Oct 2023 15:09:49 +0100")

Mel Gorman <mgorman@techsingularity.net> writes:

> On Wed, Sep 20, 2023 at 02:18:55PM +0800, Huang Ying wrote:
>> In PCP high auto-tuning algorithm, to minimize idle pages in PCP, in
>> periodic vmstat updating kworker (via refresh_cpu_vm_stats()), we will
>> decrease PCP high to try to free possible idle PCP pages.  One issue
>> is that even if the page allocating/freeing depth is larger than
>> maximal PCP high, we may reduce PCP high unnecessarily.
>> 
>> To avoid the above issue, in this patch, we will track the minimal PCP
>> page count.  And, the periodic PCP high decrement will not more than
>> the recent minimal PCP page count.  So, only detected idle pages will
>> be freed.
>> 
>> On a 2-socket Intel server with 224 logical CPU, we tested kbuild on
>> one socket with `make -j 112`.  With the patch, The number of pages
>> allocated from zone (instead of from PCP) decreases 25.8%.
>> 
>> Signed-off-by: "Huang, Ying" <ying.huang@intel.com>
>> Cc: Andrew Morton <akpm@linux-foundation.org>
>> Cc: Mel Gorman <mgorman@techsingularity.net>
>> Cc: Vlastimil Babka <vbabka@suse.cz>
>> Cc: David Hildenbrand <david@redhat.com>
>> Cc: Johannes Weiner <jweiner@redhat.com>
>> Cc: Dave Hansen <dave.hansen@linux.intel.com>
>> Cc: Michal Hocko <mhocko@suse.com>
>> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
>> Cc: Matthew Wilcox <willy@infradead.org>
>> Cc: Christoph Lameter <cl@linux.com>
>> ---
>>  include/linux/mmzone.h |  1 +
>>  mm/page_alloc.c        | 15 ++++++++++-----
>>  2 files changed, 11 insertions(+), 5 deletions(-)
>> 
>> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
>> index 8a19e2af89df..35b78c7522a7 100644
>> --- a/include/linux/mmzone.h
>> +++ b/include/linux/mmzone.h
>> @@ -682,6 +682,7 @@ enum zone_watermarks {
>>  struct per_cpu_pages {
>>  	spinlock_t lock;	/* Protects lists field */
>>  	int count;		/* number of pages in the list */
>> +	int count_min;		/* minimal number of pages in the list recently */
>>  	int high;		/* high watermark, emptying needed */
>>  	int high_min;		/* min high watermark */
>>  	int high_max;		/* max high watermark */
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 3f8c7dfeed23..77e9b7b51688 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -2166,19 +2166,20 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order,
>>   */
>>  int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
>>  {
>> -	int high_min, to_drain, batch;
>> +	int high_min, decrease, to_drain, batch;
>>  	int todo = 0;
>>  
>>  	high_min = READ_ONCE(pcp->high_min);
>>  	batch = READ_ONCE(pcp->batch);
>>  	/*
>> -	 * Decrease pcp->high periodically to try to free possible
>> -	 * idle PCP pages.  And, avoid to free too many pages to
>> -	 * control latency.
>> +	 * Decrease pcp->high periodically to free idle PCP pages counted
>> +	 * via pcp->count_min.  And, avoid to free too many pages to
>> +	 * control latency.  This caps pcp->high decrement too.
>>  	 */
>>  	if (pcp->high > high_min) {
>> +		decrease = min(pcp->count_min, pcp->high / 5);
>
> Not directly related to this patch but why 20%, it seems a bit
> arbitrary. While this is not an fast path, using a divide rather than a
> shift seems unnecessarily expensive.

Yes.  The number chosen is kind of arbitrary.  Will use ">> 3" (/ 8).

>>  		pcp->high = max3(pcp->count - (batch << PCP_BATCH_SCALE_MAX),
>> -				 pcp->high * 4 / 5, high_min);
>> +				 pcp->high - decrease, high_min);
>>  		if (pcp->high > high_min)
>>  			todo++;
>>  	}
>> @@ -2191,6 +2192,8 @@ int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp)
>>  		todo++;
>>  	}
>>  
>> +	pcp->count_min = pcp->count;
>> +
>>  	return todo;
>>  }
>>  
>> @@ -2828,6 +2831,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order,
>>  		page = list_first_entry(list, struct page, pcp_list);
>>  		list_del(&page->pcp_list);
>>  		pcp->count -= 1 << order;
>> +		if (pcp->count < pcp->count_min)
>> +			pcp->count_min = pcp->count;
>
> While the accounting for this is in a relatively fast path.
>
> At the moment I don't have a better suggestion but I'm not as keen on
> this patch. It seems like it would have been more appropriate to decay if
> there was no recent allocation activity tracked via pcp->flags.  The major
> caveat there is tracking a bit and clearing it may very well be in a fast
> path unless it was tried to refills but that is subject to timing issues
> and the allocation request stream :(
>
> While you noted the difference in buddy allocations which may tie into
> lock contention issues, how much difference to it make to the actual
> performance of the workload?

Thanks Andrew for his reminding on test results.  I found that I used a
uncommon configuration to test kbuild in V1 of the patchset.  So, I sent
out V2 of the patchset as follows with only test results and document
changed.

https://lore.kernel.org/linux-mm/20230926060911.266511-1-ying.huang@intel.com/

So, for performance data, please refer to V2 of the patchset.  For this
patch, the performance data are,

"
On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild
instances in parallel (each with `make -j 28`) in 8 cgroup.  This
simulates the kbuild server that is used by 0-Day kbuild service.
With the patch, The number of pages allocated from zone (instead of
from PCP) decreases 21.4%.
"

I also showed the performance number for each step of optimization as
follows (copied from the above patchset V2 link).

"
	build time   lock contend%	free_high	alloc_zone
	----------	----------	---------	----------
base	     100.0	      13.5          100.0            100.0
patch1	      99.2	      10.6	     19.2	      95.6
patch3	      99.2	      11.7	      7.1	      95.6
patch5	      98.4	      10.0	      8.2	      97.1
patch7	      94.9	       0.7	      3.0	      19.0
patch9	      94.9	       0.6	      2.7	      15.0  <--	this patch
patch10	      94.9	       0.9	      8.8	      18.6
"

Although I think the patch is helpful via avoiding the unnecessary
pcp->high decaying, thus reducing the zone lock contention.  There's no
visible benchmark score change for the patch.

--
Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2023-10-12  7:50 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-20  6:18 [PATCH 00/10] mm: PCP high auto-tuning Huang Ying
2023-09-20  6:18 ` [PATCH 01/10] mm, pcp: avoid to drain PCP when process exit Huang Ying
2023-10-11 12:46   ` Mel Gorman
2023-10-11 17:16     ` Andrew Morton
2023-10-12 13:09       ` Mel Gorman
2023-10-12 13:35         ` Huang, Ying
2023-10-12 12:21     ` Huang, Ying
2023-09-20  6:18 ` [PATCH 02/10] cacheinfo: calculate per-CPU data cache size Huang Ying
2023-09-20  9:24   ` Sudeep Holla
2023-09-22  7:56     ` Huang, Ying
2023-10-11 12:20   ` Mel Gorman
2023-10-12 12:08     ` Huang, Ying
2023-10-12 12:52       ` Mel Gorman
2023-10-12 13:12         ` Huang, Ying
2023-10-12 15:22           ` Mel Gorman
2023-10-13  3:06             ` Huang, Ying
2023-10-16 15:43               ` Mel Gorman
2023-09-20  6:18 ` [PATCH 03/10] mm, pcp: reduce lock contention for draining high-order pages Huang Ying
2023-10-11 12:49   ` Mel Gorman
2023-10-12 12:11     ` Huang, Ying
2023-09-20  6:18 ` [PATCH 04/10] mm: restrict the pcp batch scale factor to avoid too long latency Huang Ying
2023-10-11 12:52   ` Mel Gorman
2023-10-12 12:15     ` Huang, Ying
2023-09-20  6:18 ` [PATCH 05/10] mm, page_alloc: scale the number of pages that are batch allocated Huang Ying
2023-10-11 12:54   ` Mel Gorman
2023-09-20  6:18 ` [PATCH 06/10] mm: add framework for PCP high auto-tuning Huang Ying
2023-09-20  6:18 ` [PATCH 07/10] mm: tune PCP high automatically Huang Ying
2023-09-20  6:18 ` [PATCH 08/10] mm, pcp: decrease PCP high if free pages < high watermark Huang Ying
2023-10-11 13:08   ` Mel Gorman
2023-10-12 12:19     ` Huang, Ying
2023-09-20  6:18 ` [PATCH 09/10] mm, pcp: avoid to reduce PCP high unnecessarily Huang Ying
2023-10-11 14:09   ` Mel Gorman
2023-10-12  7:48     ` Huang, Ying [this message]
2023-10-12 12:49       ` Mel Gorman
2023-10-12 13:19         ` Huang, Ying
2023-09-20  6:18 ` [PATCH 10/10] mm, pcp: reduce detecting time of consecutive high order page freeing Huang Ying
2023-09-20 16:41 ` [PATCH 00/10] mm: PCP high auto-tuning Andrew Morton
2023-09-21 13:32   ` Huang, Ying
2023-09-21 15:46     ` Andrew Morton
2023-09-22  0:33       ` Huang, Ying
2023-10-11 13:05   ` Mel Gorman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87lec8ffij.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=arjan@linux.intel.com \
    --cc=cl@linux.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=david@redhat.com \
    --cc=jweiner@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@suse.com \
    --cc=pasha.tatashin@soleen.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox