From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 04C03CD6E64 for ; Wed, 11 Oct 2023 13:05:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8CD378D0107; Wed, 11 Oct 2023 09:05:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 87AF18D0002; Wed, 11 Oct 2023 09:05:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 76A448D0107; Wed, 11 Oct 2023 09:05:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 68DD98D0002 for ; Wed, 11 Oct 2023 09:05:28 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 29A5FC0172 for ; Wed, 11 Oct 2023 13:05:28 +0000 (UTC) X-FDA: 81333201936.21.1170B0F Received: from outbound-smtp57.blacknight.com (outbound-smtp57.blacknight.com [46.22.136.241]) by imf14.hostedemail.com (Postfix) with ESMTP id CF7671000CD for ; Wed, 11 Oct 2023 13:05:09 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf14.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.241 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697029510; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=wTsmGQzoC0xtRRJymfVDEYWGe8BfO8CRDI7hIcVYUa4=; b=GtDZ9MyHlsGk+julTsKaY+mLa1pahQLZks4fZgNxQ+3PPxSCzERVN535+Q17A7jP04O1rq k9H5toIxap1qKQfzns/DQYTCiedJkmqkClKKG16ez08ruCty7MA5u0ZCVHCkDLT3EON675 9T3xls9ErkjNG6KpmEvvvaeGc1Cfi8Q= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf14.hostedemail.com: domain of mgorman@techsingularity.net designates 46.22.136.241 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697029510; a=rsa-sha256; cv=none; b=yJBMLsMKNdyNjjwcN1U++ujx4zmxWGhYS9aAVLnAryA1RNZUk5iTZdnxm0IqNSuQaEHEU2 gtp4bM0nPxC93i+9v6Rv9c8lSs4PCETtNNcTHgbGJ3Cq1RCHHKRkNaG7DsRB38cwJL3/VA kBo1JynM6c4GLDpblfh9E1ys2DrwqjU= Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16]) by outbound-smtp57.blacknight.com (Postfix) with ESMTPS id D247AFB198 for ; Wed, 11 Oct 2023 14:05:07 +0100 (IST) Received: (qmail 2271 invoked from network); 11 Oct 2023 13:05:07 -0000 Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.197.19]) by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 11 Oct 2023 13:05:07 -0000 Date: Wed, 11 Oct 2023 14:05:05 +0100 From: Mel Gorman To: Andrew Morton Cc: Huang Ying , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Arjan Van De Ven , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , Christoph Lameter Subject: Re: [PATCH 00/10] mm: PCP high auto-tuning Message-ID: <20231011130505.356soszayes3vy2n@techsingularity.net> References: <20230920061856.257597-1-ying.huang@intel.com> <20230920094118.8b8f739125c6aede17c627e0@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20230920094118.8b8f739125c6aede17c627e0@linux-foundation.org> X-Rspamd-Queue-Id: CF7671000CD X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 73exwc7m3u6s16m7zyhagu1b83xsqint X-HE-Tag: 1697029509-880352 X-HE-Meta: U2FsdGVkX18QZvCem2mUJoRsGvwMRcS88G3arBkQlc6zrByZheRsmixKtU5J6AVYTZfyRxaoBuf7nFFXj8otMo5NKQ5RRrpfUth5peEeEJGO+VKH0pn1XnTZgH1cscMXBNIBetZ3a3PW48t/QITw/RklOwyB8KFp3O6y5P4FJscV85aeoCB4ZGyuO9kvUqJxzrw8yUVYuH0ydUHEZUVvaTRn1LHLX2h1rLUzDtlsgCFHtkBI25ybkRfhACjZ+1s0xZNexCIq6XFccW2/pNN2ZBEgWecLQBZqyk4+qJnCYYE8PP3kShNrWK6AuEXKkSTcWXEJ9XI9TwW4JAEwqevKnPwy9V1sadGHSp3gq0Xl77nI88vcYgZO3X45GRT5x4pk83LnVssHQy9c5W3Y3te6N7/SQ+QL1YIqsV9RNdqZEHwZubxfYjAplvqPwAT3vnwpohRN1ob3CZDKn/PHOKfdT5UWN55L9ZlbpIKqXx1tQ4cV38ufQIAe8tp3TZDpDOgRrpQ4eq6chFZGwlUAkQWXIqYN62IoU6qMZwh89jhQ1T0DOJwT6vrlnlmhfMXyV1DH6viytAwFDI8va6fxJ3bOwJUeywsAdRxX8DYDsT2Z5p0P1wX3nuTABhmXWJQzDAdmpUShrG+fGJOoMjn6uHGqp9jnXdEpO1MY1ki2owTjKTG17/669ZQvDEiIW+w3N7ppgtkV+lZZpyE9u1Dveno8NLh0AfC7hxV1D0ttBiU9ReR+/gNuZBa9j62LYw3Nt4fsspvDdIhG9Zw1xo1Y+IXy2daGMdFAH0USYK8Tev2cM/WmN6/yarGhOCaHKBgy84I4qSB7FoHs7n+wlnEqN53j4ABf9uJpD4/OfkZ4/fOtyW0c/jo3ftnlbMl4N/yyj1Wo2HWlyLxWhAFK/RyiIK3664XjFmClinpN1W7MxmTanE/xTEbImutn2WFHqZqx7f9GRQ1mCGhs5u9z6HC0dqN hNvMxRy+ CXWAoqww2VqVs2tE6AJNTLaqi4QNrDg+GxOIZVfTVfmzl1d/Qz/1K3eQEHc9SWu8v/MRmoB7tsEiTuZdaNwaNOfNnrm5VZIuEDa3h6olCYrsbxOg4Y7wyBAs2/HMFn0Kd/F6E//0X9NSOWSWMG+Z5/uCohaUvIF0FMvnEDVAGb8aA52Y5w84/a8IiK0rqepBuwGJM X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, Sep 20, 2023 at 09:41:18AM -0700, Andrew Morton wrote: > On Wed, 20 Sep 2023 14:18:46 +0800 Huang Ying wrote: > > > The page allocation performance requirements of different workloads > > are often different. So, we need to tune the PCP (Per-CPU Pageset) > > high on each CPU automatically to optimize the page allocation > > performance. > > Some of the performance changes here are downright scary. > > I've never been very sure that percpu pages was very beneficial (and > hey, I invented the thing back in the Mesozoic era). But these numbers > make me think it's very important and we should have been paying more > attention. > FWIW, it is because not only does it avoid lock contention issues, it avoids excessive splitting/merging of buddies as well as the slower paths of the allocator. It is not very satisfactory and frankly, the whole page allocator needs a revisit to account for very large zones but it is far from a trivial project. PCP just masks the worst of the issues and replacing it is far harder than tweaking it. > > The list of patches in series is as follows, > > > > 1 mm, pcp: avoid to drain PCP when process exit > > 2 cacheinfo: calculate per-CPU data cache size > > 3 mm, pcp: reduce lock contention for draining high-order pages > > 4 mm: restrict the pcp batch scale factor to avoid too long latency > > 5 mm, page_alloc: scale the number of pages that are batch allocated > > 6 mm: add framework for PCP high auto-tuning > > 7 mm: tune PCP high automatically > > 8 mm, pcp: decrease PCP high if free pages < high watermark > > 9 mm, pcp: avoid to reduce PCP high unnecessarily > > 10 mm, pcp: reduce detecting time of consecutive high order page freeing > > > > Patch 1/2/3 optimize the PCP draining for consecutive high-order pages > > freeing. > > > > Patch 4/5 optimize batch freeing and allocating. > > > > Patch 6/7/8/9 implement and optimize a PCP high auto-tuning method. > > > > Patch 10 optimize the PCP draining for consecutive high order page > > freeing based on PCP high auto-tuning. > > > > The test results for patches with performance impact are as follows, > > > > kbuild > > ====== > > > > On a 2-socket Intel server with 224 logical CPU, we tested kbuild on > > one socket with `make -j 112`. > > > > build time zone lock% free_high alloc_zone > > ---------- ---------- --------- ---------- > > base 100.0 43.6 100.0 100.0 > > patch1 96.6 40.3 49.2 95.2 > > patch3 96.4 40.5 11.3 95.1 > > patch5 96.1 37.9 13.3 96.8 > > patch7 86.4 9.8 6.2 22.0 > > patch9 85.9 9.4 4.8 16.3 > > patch10 87.7 12.6 29.0 32.3 > > You're seriously saying that kbuild got 12% faster? > > I see that [07/10] (autotuning) alone sped up kbuild by 10%? > > Other thoughts: > > - What if any facilities are provided to permit users/developers to > monitor the operation of the autotuning algorithm? > Not that I've seen yet but I'm still in part of the series. It could be monitored with tracepoints but it can also be inferred from lock contention issue. I think it would only be meaningful to developers to monitor this closely, at least that's what I think now. Honestly, I'm more worried about potential changes in behaviour depending on the exact CPU and cache implementation than I am about being able to actively monitor it. > - I'm not seeing any Documentation/ updates. Surely there are things > we can tell users? > > - This: > > : It's possible that PCP high auto-tuning doesn't work well for some > : workloads. So, when PCP high is tuned by hand via the sysctl knob, > : the auto-tuning will be disabled. The PCP high set by hand will be > : used instead. > > Is it a bit hacky to disable autotuning when the user alters > pcp-high? Would it be cleaner to have a separate on/off knob for > autotuning? > It might be but tuning the allocator is very specific and once we introduce that tunable, we're probably stuck with it. I would prefer to see it introduced if and only if we have to. > And how is the user to determine that "PCP high auto-tuning doesn't work > well" for their workload? Not easily. It may manifest as variable lock contention issues when the workload is at a steady state but that would increase the pressure to split the allocator away from being zone-based entirely instead of tweaking PCP further. -- Mel Gorman SUSE Labs