From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 04C03CD6E64
	for <linux-mm@archiver.kernel.org>; Wed, 11 Oct 2023 13:05:28 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 8CD378D0107; Wed, 11 Oct 2023 09:05:28 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 87AF18D0002; Wed, 11 Oct 2023 09:05:28 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 76A448D0107; Wed, 11 Oct 2023 09:05:28 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 68DD98D0002
	for <linux-mm@kvack.org>; Wed, 11 Oct 2023 09:05:28 -0400 (EDT)
Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay10.hostedemail.com (Postfix) with ESMTP id 29A5FC0172
	for <linux-mm@kvack.org>; Wed, 11 Oct 2023 13:05:28 +0000 (UTC)
X-FDA: 81333201936.21.1170B0F
Received: from outbound-smtp57.blacknight.com (outbound-smtp57.blacknight.com [46.22.136.241])
	by imf14.hostedemail.com (Postfix) with ESMTP id CF7671000CD
	for <linux-mm@kvack.org>; Wed, 11 Oct 2023 13:05:09 +0000 (UTC)
Authentication-Results: imf14.hostedemail.com;
	dkim=none;
	dmarc=none;
	spf=pass (imf14.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.241 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1697029510;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=wTsmGQzoC0xtRRJymfVDEYWGe8BfO8CRDI7hIcVYUa4=;
	b=GtDZ9MyHlsGk+julTsKaY+mLa1pahQLZks4fZgNxQ+3PPxSCzERVN535+Q17A7jP04O1rq
	k9H5toIxap1qKQfzns/DQYTCiedJkmqkClKKG16ez08ruCty7MA5u0ZCVHCkDLT3EON675
	9T3xls9ErkjNG6KpmEvvvaeGc1Cfi8Q=
ARC-Authentication-Results: i=1;
	imf14.hostedemail.com;
	dkim=none;
	dmarc=none;
	spf=pass (imf14.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.241 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697029510; a=rsa-sha256;
	cv=none;
	b=yJBMLsMKNdyNjjwcN1U++ujx4zmxWGhYS9aAVLnAryA1RNZUk5iTZdnxm0IqNSuQaEHEU2
	gtp4bM0nPxC93i+9v6Rv9c8lSs4PCETtNNcTHgbGJ3Cq1RCHHKRkNaG7DsRB38cwJL3/VA
	kBo1JynM6c4GLDpblfh9E1ys2DrwqjU=
Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16])
	by outbound-smtp57.blacknight.com (Postfix) with ESMTPS id D247AFB198
	for <linux-mm@kvack.org>; Wed, 11 Oct 2023 14:05:07 +0100 (IST)
Received: (qmail 2271 invoked from network); 11 Oct 2023 13:05:07 -0000
Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.197.19])
  by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 11 Oct 2023 13:05:07 -0000
Date: Wed, 11 Oct 2023 14:05:05 +0100
From: Mel Gorman <mgorman@techsingularity.net>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Huang Ying <ying.huang@intel.com>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Arjan Van De Ven <arjan@linux.intel.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Michal Hocko <mhocko@suse.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Matthew Wilcox <willy@infradead.org>,
	Christoph Lameter <cl@linux.com>
Subject: Re: [PATCH 00/10] mm: PCP high auto-tuning
Message-ID: <20231011130505.356soszayes3vy2n@techsingularity.net>
References: <20230920061856.257597-1-ying.huang@intel.com>
 <20230920094118.8b8f739125c6aede17c627e0@linux-foundation.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
In-Reply-To: <20230920094118.8b8f739125c6aede17c627e0@linux-foundation.org>
X-Rspamd-Queue-Id: CF7671000CD
X-Rspam-User: 
X-Rspamd-Server: rspam05
X-Stat-Signature: 73exwc7m3u6s16m7zyhagu1b83xsqint
X-HE-Tag: 1697029509-880352
X-HE-Meta: U2FsdGVkX18QZvCem2mUJoRsGvwMRcS88G3arBkQlc6zrByZheRsmixKtU5J6AVYTZfyRxaoBuf7nFFXj8otMo5NKQ5RRrpfUth5peEeEJGO+VKH0pn1XnTZgH1cscMXBNIBetZ3a3PW48t/QITw/RklOwyB8KFp3O6y5P4FJscV85aeoCB4ZGyuO9kvUqJxzrw8yUVYuH0ydUHEZUVvaTRn1LHLX2h1rLUzDtlsgCFHtkBI25ybkRfhACjZ+1s0xZNexCIq6XFccW2/pNN2ZBEgWecLQBZqyk4+qJnCYYE8PP3kShNrWK6AuEXKkSTcWXEJ9XI9TwW4JAEwqevKnPwy9V1sadGHSp3gq0Xl77nI88vcYgZO3X45GRT5x4pk83LnVssHQy9c5W3Y3te6N7/SQ+QL1YIqsV9RNdqZEHwZubxfYjAplvqPwAT3vnwpohRN1ob3CZDKn/PHOKfdT5UWN55L9ZlbpIKqXx1tQ4cV38ufQIAe8tp3TZDpDOgRrpQ4eq6chFZGwlUAkQWXIqYN62IoU6qMZwh89jhQ1T0DOJwT6vrlnlmhfMXyV1DH6viytAwFDI8va6fxJ3bOwJUeywsAdRxX8DYDsT2Z5p0P1wX3nuTABhmXWJQzDAdmpUShrG+fGJOoMjn6uHGqp9jnXdEpO1MY1ki2owTjKTG17/669ZQvDEiIW+w3N7ppgtkV+lZZpyE9u1Dveno8NLh0AfC7hxV1D0ttBiU9ReR+/gNuZBa9j62LYw3Nt4fsspvDdIhG9Zw1xo1Y+IXy2daGMdFAH0USYK8Tev2cM/WmN6/yarGhOCaHKBgy84I4qSB7FoHs7n+wlnEqN53j4ABf9uJpD4/OfkZ4/fOtyW0c/jo3ftnlbMl4N/yyj1Wo2HWlyLxWhAFK/RyiIK3664XjFmClinpN1W7MxmTanE/xTEbImutn2WFHqZqx7f9GRQ1mCGhs5u9z6HC0dqN
 hNvMxRy+
 CXWAoqww2VqVs2tE6AJNTLaqi4QNrDg+GxOIZVfTVfmzl1d/Qz/1K3eQEHc9SWu8v/MRmoB7tsEiTuZdaNwaNOfNnrm5VZIuEDa3h6olCYrsbxOg4Y7wyBAs2/HMFn0Kd/F6E//0X9NSOWSWMG+Z5/uCohaUvIF0FMvnEDVAGb8aA52Y5w84/a8IiK0rqepBuwGJM
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Wed, Sep 20, 2023 at 09:41:18AM -0700, Andrew Morton wrote:
> On Wed, 20 Sep 2023 14:18:46 +0800 Huang Ying <ying.huang@intel.com> wrote:
> 
> > The page allocation performance requirements of different workloads
> > are often different.  So, we need to tune the PCP (Per-CPU Pageset)
> > high on each CPU automatically to optimize the page allocation
> > performance.
> 
> Some of the performance changes here are downright scary.
> 
> I've never been very sure that percpu pages was very beneficial (and
> hey, I invented the thing back in the Mesozoic era).  But these numbers
> make me think it's very important and we should have been paying more
> attention.
> 

FWIW, it is because not only does it avoid lock contention issues, it
avoids excessive splitting/merging of buddies as well as the slower
paths of the allocator. It is not very satisfactory and frankly, the
whole page allocator needs a revisit to account for very large zones but
it is far from a trivial project. PCP just masks the worst of the issues
and replacing it is far harder than tweaking it.

> > The list of patches in series is as follows,
> > 
> >  1 mm, pcp: avoid to drain PCP when process exit
> >  2 cacheinfo: calculate per-CPU data cache size
> >  3 mm, pcp: reduce lock contention for draining high-order pages
> >  4 mm: restrict the pcp batch scale factor to avoid too long latency
> >  5 mm, page_alloc: scale the number of pages that are batch allocated
> >  6 mm: add framework for PCP high auto-tuning
> >  7 mm: tune PCP high automatically
> >  8 mm, pcp: decrease PCP high if free pages < high watermark
> >  9 mm, pcp: avoid to reduce PCP high unnecessarily
> > 10 mm, pcp: reduce detecting time of consecutive high order page freeing
> > 
> > Patch 1/2/3 optimize the PCP draining for consecutive high-order pages
> > freeing.
> > 
> > Patch 4/5 optimize batch freeing and allocating.
> > 
> > Patch 6/7/8/9 implement and optimize a PCP high auto-tuning method.
> > 
> > Patch 10 optimize the PCP draining for consecutive high order page
> > freeing based on PCP high auto-tuning.
> > 
> > The test results for patches with performance impact are as follows,
> > 
> > kbuild
> > ======
> > 
> > On a 2-socket Intel server with 224 logical CPU, we tested kbuild on
> > one socket with `make -j 112`.
> > 
> > 	build time	zone lock%	free_high	alloc_zone
> > 	----------	----------	---------	----------
> > base	     100.0	      43.6          100.0            100.0
> > patch1	      96.6	      40.3	     49.2	      95.2
> > patch3	      96.4	      40.5	     11.3	      95.1
> > patch5	      96.1	      37.9	     13.3	      96.8
> > patch7	      86.4	       9.8	      6.2	      22.0
> > patch9	      85.9	       9.4	      4.8	      16.3
> > patch10	      87.7	      12.6	     29.0	      32.3
> 
> You're seriously saying that kbuild got 12% faster?
> 
> I see that [07/10] (autotuning) alone sped up kbuild by 10%?
> 
> Other thoughts:
> 
> - What if any facilities are provided to permit users/developers to
>   monitor the operation of the autotuning algorithm?
> 

Not that I've seen yet but I'm still in part of the series. It could be
monitored with tracepoints but it can also be inferred from lock
contention issue. I think it would only be meaningful to developers to
monitor this closely, at least that's what I think now. Honestly, I'm
more worried about potential changes in behaviour depending on the exact
CPU and cache implementation than I am about being able to actively
monitor it.

> - I'm not seeing any Documentation/ updates.  Surely there are things
>   we can tell users?
> 
> - This:
> 
>   : It's possible that PCP high auto-tuning doesn't work well for some
>   : workloads.  So, when PCP high is tuned by hand via the sysctl knob,
>   : the auto-tuning will be disabled.  The PCP high set by hand will be
>   : used instead.
> 
>   Is it a bit hacky to disable autotuning when the user alters
>   pcp-high?  Would it be cleaner to have a separate on/off knob for
>   autotuning?
> 

It might be but tuning the allocator is very specific and once we
introduce that tunable, we're probably stuck with it. I would prefer to
see it introduced if and only if we have to.

>   And how is the user to determine that "PCP high auto-tuning doesn't work
>   well" for their workload?

Not easily. It may manifest as variable lock contention issues when the
workload is at a steady state but that would increase the pressure to
split the allocator away from being zone-based entirely instead of tweaking
PCP further.

-- 
Mel Gorman
SUSE Labs