From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 88E74CDB482 for ; Thu, 12 Oct 2023 07:50:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E87DC8D0007; Thu, 12 Oct 2023 03:50:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E10918D0002; Thu, 12 Oct 2023 03:50:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD7EB8D0007; Thu, 12 Oct 2023 03:50:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B73FD8D0002 for ; Thu, 12 Oct 2023 03:50:21 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 805261203A1 for ; Thu, 12 Oct 2023 07:50:21 +0000 (UTC) X-FDA: 81336036642.17.0BA4BBD Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.151]) by imf27.hostedemail.com (Postfix) with ESMTP id F05B240006 for ; Thu, 12 Oct 2023 07:50:17 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fzbO8ksP; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1697097019; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=BX79Eh1Cnk1qftQGi8cje5oRd2sREauy23MMwtGyRKY=; b=2vGM7MP4UmkF1AfSzfQ+L6UNYzigcEh/hR09jIophZuj1h1vrwj+TClRxZqBaZfPIfXsvV taVHmJV5wPt9gzbFzyN6ufkmwZMJcQAtHdmDuRHSqQ4OdQ/mOd31IxdDCL2F2JEdSq9qIw X2yXLTTKfOgLd26WzYH7L7OFJgDAqiw= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=fzbO8ksP; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf27.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.151 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1697097019; a=rsa-sha256; cv=none; b=nmqywaqb01ZK3LEaUFE/NMpavGKxrh4RLpHie2elaUwkB3Si4fZ2pyp1dANQEAIVDbzXUx k+clzjpl+SC/b3dqL5SyIN4OAz+nqMe+BxZ6BZPr+9gkjhY+1oJGa8MoTTNELMtt2AU7z+ 2vxJAc4qpm1NyZwtO+1jmoOYD4Da0bw= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1697097018; x=1728633018; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=riwDyZ8poGIaGdAq64J2s2w8eTJ6yuogky3gS+kYshg=; b=fzbO8ksPp2Oj5yXLRIIN+9w/caRBAb1ywncld3Mv0j+z5Ygnd1Jptk0J ewkgbosvuZbbizn2+fcT8sf4JP5Rxww7j1+xg+4ZbxcWef5Gx8Y+3rs9r VxlmLDDcCb/pqkF/ovSKgjQTo2Kli59uHhFNdH/NuDb2nS5GjZR+e4SKd +CbsmBNtfPQMH8XGiom+uc4acNtaJWKCGcsN9mgxZsYzOBhaxe0r6WOEZ +xeQcruC741BIEsaeR+JWHMJvVI7g/TC6DqKwsDsCchyu+nfgsGjTrJJA Z82dV11qehAWouy44a5+ja0ta1L4jWC+9+d9+Oll77LXHZw5dYZu6Xsuy w==; X-IronPort-AV: E=McAfee;i="6600,9927,10860"; a="365132396" X-IronPort-AV: E=Sophos;i="6.03,218,1694761200"; d="scan'208";a="365132396" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga107.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2023 00:50:16 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10860"; a="747786032" X-IronPort-AV: E=Sophos;i="6.03,218,1694761200"; d="scan'208";a="747786032" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga007-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 12 Oct 2023 00:50:12 -0700 From: "Huang, Ying" To: Mel Gorman Cc: , , Arjan Van De Ven , Andrew Morton , Vlastimil Babka , David Hildenbrand , Johannes Weiner , Dave Hansen , Michal Hocko , Pavel Tatashin , Matthew Wilcox , "Christoph Lameter" Subject: Re: [PATCH 09/10] mm, pcp: avoid to reduce PCP high unnecessarily References: <20230920061856.257597-1-ying.huang@intel.com> <20230920061856.257597-10-ying.huang@intel.com> <20231011140949.rwsqfb57vyuub6va@techsingularity.net> Date: Thu, 12 Oct 2023 15:48:04 +0800 In-Reply-To: <20231011140949.rwsqfb57vyuub6va@techsingularity.net> (Mel Gorman's message of "Wed, 11 Oct 2023 15:09:49 +0100") Message-ID: <87lec8ffij.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: F05B240006 X-Stat-Signature: ppd4ngkfxgfa3id7ep46p7hddt5s4ose X-Rspam-User: X-HE-Tag: 1697097017-518857 X-HE-Meta: U2FsdGVkX182PeXjs9WK6rXxHR6+wpHm2w2p/6PaUfmMhYfdIFoCRs6cGU9fzJlzVt3VyWMrkSQqUJdtbbUKyergRjk7j9iMG7x6Q8V06yfOggsl7SIMrf+Y0v6DzJW/Drsbx6giwrXe6LKvrkNOTCWTaXVDw0MafE9lqL9fhEFfDI0KNnd5N+EKPDNB1se1sUdWtQf7PvadmwDfDRQwdiiaxaxZEMeZMiMT3j4sLe7R9Rp+/GZK4rBscqEcEnkV4y8IN0cIM0TG4GgMTd3CIu57ZUNgGhiDQhrQ3U7TH2FRDmyXAmGf9vaszQ33njCVKmOPnE2OE2HITXovZOzW+wbaYgTnBNeRgnNC7zGIfdIb/gXSHksWzXX2xM5idnjWDno985BBFaw0UZ9FJYo1hNmNmS85d+sqyepIzReQ+WNz5nkhivcuW/HX+PNnYCayPe3sFXHBHXyaLtY6gAKCC78yJJfSNsP8o9YzPjwAeOeKBA8/UeEe2VdfULNtznF98zTxZI/K5NNxD10XnOEj7aBLtbZ779a9pj35g1RQctyMHooRdt6ZvE9tTOVn8KCbOLaxWTobPdZokJPFcjTQYflFBKPW2tYjeZJqPo3ff6zP+lgGSuHGFbMpWsTfdsactoYBiiu0nvTBSkA69e4bSTUD1QNLfQtRkZ7EvHCyWiu3EuDk4CSlUug+nvvdOuLXDhYwMNfqoGHC+KYE8GAW93O0Qn4ueaIY4qAaNc2nHWtoGzkY11PlNVd//C+OiCXb0C92EKDcn0qirJ1+aycbG3DBm9ef5MZ+mYQeYvsie73OPdtAKbilzfBZgyNis4n7zL7j4VzJQgAOjgcTpg+8BMlb1G5S9Lya010PRW/6ttb5DTmxAf7TNk3xOtqBiRHVfk+KpvkuZfVtCmy3ct9XzTAumwINGD3bwPJwbtSRtwIOhuRTiUAvSBnNXpeG/mVYBZoP1NB2BCcehYQL/Jg bQ99zs4I 9BNJMaAMhx/05N7BsLSuikxIfa697ia/Q+jDV4GKWLQJNSA3YWecTfdbJ8gbfR9UJLC4gy9pfX5HaQLPOOyck8gn6FuOt5hIxq3FAP70HjhFlNfYYxkg+wcx7QmwcVrZjiPxHKQQ7bm95gGHLtoqGYqCPHSi3ZHVsn6wZTMMPwsg3Zd3OhuYjKumj1FR91HiCWODYJt7i5rOOjZ5ipGoaL/GyuRtXLUUJTHqf8OEQSc4EimsBHiEUTor6oFSSfOIgJorGWzJYhZKbEDw8G0IogKGcnTFNU0nI9a6+0dzpOCQQYm+ffrT0hAmo/oEuJULDOd0YkSWaB6Q3O3BCsEgMwcYzNSlZuOfOSH9u3TF88TJm9gyN21hySPNFwqW1R1vkpBbDG0GXAvNF6QoWPnO1R/G6xRvysKL+4uUE7UAFTeRHV4ZJ7ofTJ0aM6WqohHdRFjjauee4DFZ9DHCqeMl6bD+pLLc2GZSfrMEFLC6DMG5IAZ6k1Bo9O4Zq81ZmELXcUV0hztCuYmK6WjjVa4XCHi2t433yNsqizIZtD+AVxeSKnIuUdScVY17/adt08Si+KN/9U5o329D8nwIyVTHHRabw+WJWH0QjhcGq8Tnmq3WgaLg= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Mel Gorman writes: > On Wed, Sep 20, 2023 at 02:18:55PM +0800, Huang Ying wrote: >> In PCP high auto-tuning algorithm, to minimize idle pages in PCP, in >> periodic vmstat updating kworker (via refresh_cpu_vm_stats()), we will >> decrease PCP high to try to free possible idle PCP pages. One issue >> is that even if the page allocating/freeing depth is larger than >> maximal PCP high, we may reduce PCP high unnecessarily. >> >> To avoid the above issue, in this patch, we will track the minimal PCP >> page count. And, the periodic PCP high decrement will not more than >> the recent minimal PCP page count. So, only detected idle pages will >> be freed. >> >> On a 2-socket Intel server with 224 logical CPU, we tested kbuild on >> one socket with `make -j 112`. With the patch, The number of pages >> allocated from zone (instead of from PCP) decreases 25.8%. >> >> Signed-off-by: "Huang, Ying" >> Cc: Andrew Morton >> Cc: Mel Gorman >> Cc: Vlastimil Babka >> Cc: David Hildenbrand >> Cc: Johannes Weiner >> Cc: Dave Hansen >> Cc: Michal Hocko >> Cc: Pavel Tatashin >> Cc: Matthew Wilcox >> Cc: Christoph Lameter >> --- >> include/linux/mmzone.h | 1 + >> mm/page_alloc.c | 15 ++++++++++----- >> 2 files changed, 11 insertions(+), 5 deletions(-) >> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h >> index 8a19e2af89df..35b78c7522a7 100644 >> --- a/include/linux/mmzone.h >> +++ b/include/linux/mmzone.h >> @@ -682,6 +682,7 @@ enum zone_watermarks { >> struct per_cpu_pages { >> spinlock_t lock; /* Protects lists field */ >> int count; /* number of pages in the list */ >> + int count_min; /* minimal number of pages in the list recently */ >> int high; /* high watermark, emptying needed */ >> int high_min; /* min high watermark */ >> int high_max; /* max high watermark */ >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index 3f8c7dfeed23..77e9b7b51688 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -2166,19 +2166,20 @@ static int rmqueue_bulk(struct zone *zone, unsigned int order, >> */ >> int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp) >> { >> - int high_min, to_drain, batch; >> + int high_min, decrease, to_drain, batch; >> int todo = 0; >> >> high_min = READ_ONCE(pcp->high_min); >> batch = READ_ONCE(pcp->batch); >> /* >> - * Decrease pcp->high periodically to try to free possible >> - * idle PCP pages. And, avoid to free too many pages to >> - * control latency. >> + * Decrease pcp->high periodically to free idle PCP pages counted >> + * via pcp->count_min. And, avoid to free too many pages to >> + * control latency. This caps pcp->high decrement too. >> */ >> if (pcp->high > high_min) { >> + decrease = min(pcp->count_min, pcp->high / 5); > > Not directly related to this patch but why 20%, it seems a bit > arbitrary. While this is not an fast path, using a divide rather than a > shift seems unnecessarily expensive. Yes. The number chosen is kind of arbitrary. Will use ">> 3" (/ 8). >> pcp->high = max3(pcp->count - (batch << PCP_BATCH_SCALE_MAX), >> - pcp->high * 4 / 5, high_min); >> + pcp->high - decrease, high_min); >> if (pcp->high > high_min) >> todo++; >> } >> @@ -2191,6 +2192,8 @@ int decay_pcp_high(struct zone *zone, struct per_cpu_pages *pcp) >> todo++; >> } >> >> + pcp->count_min = pcp->count; >> + >> return todo; >> } >> >> @@ -2828,6 +2831,8 @@ struct page *__rmqueue_pcplist(struct zone *zone, unsigned int order, >> page = list_first_entry(list, struct page, pcp_list); >> list_del(&page->pcp_list); >> pcp->count -= 1 << order; >> + if (pcp->count < pcp->count_min) >> + pcp->count_min = pcp->count; > > While the accounting for this is in a relatively fast path. > > At the moment I don't have a better suggestion but I'm not as keen on > this patch. It seems like it would have been more appropriate to decay if > there was no recent allocation activity tracked via pcp->flags. The major > caveat there is tracking a bit and clearing it may very well be in a fast > path unless it was tried to refills but that is subject to timing issues > and the allocation request stream :( > > While you noted the difference in buddy allocations which may tie into > lock contention issues, how much difference to it make to the actual > performance of the workload? Thanks Andrew for his reminding on test results. I found that I used a uncommon configuration to test kbuild in V1 of the patchset. So, I sent out V2 of the patchset as follows with only test results and document changed. https://lore.kernel.org/linux-mm/20230926060911.266511-1-ying.huang@intel.com/ So, for performance data, please refer to V2 of the patchset. For this patch, the performance data are, " On a 2-socket Intel server with 224 logical CPU, we run 8 kbuild instances in parallel (each with `make -j 28`) in 8 cgroup. This simulates the kbuild server that is used by 0-Day kbuild service. With the patch, The number of pages allocated from zone (instead of from PCP) decreases 21.4%. " I also showed the performance number for each step of optimization as follows (copied from the above patchset V2 link). " build time lock contend% free_high alloc_zone ---------- ---------- --------- ---------- base 100.0 13.5 100.0 100.0 patch1 99.2 10.6 19.2 95.6 patch3 99.2 11.7 7.1 95.6 patch5 98.4 10.0 8.2 97.1 patch7 94.9 0.7 3.0 19.0 patch9 94.9 0.6 2.7 15.0 <-- this patch patch10 94.9 0.9 8.8 18.6 " Although I think the patch is helpful via avoiding the unnecessary pcp->high decaying, thus reducing the zone lock contention. There's no visible benchmark score change for the patch. -- Best Regards, Huang, Ying