From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27CD5CA9EAF for ; Mon, 21 Oct 2019 10:27:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CE80E2084C for ; Mon, 21 Oct 2019 10:27:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CE80E2084C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 294396B0003; Mon, 21 Oct 2019 06:27:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 243CD6B0005; Mon, 21 Oct 2019 06:27:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 159706B0006; Mon, 21 Oct 2019 06:27:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0166.hostedemail.com [216.40.44.166]) by kanga.kvack.org (Postfix) with ESMTP id E829C6B0003 for ; Mon, 21 Oct 2019 06:27:17 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with SMTP id 797974425 for ; Mon, 21 Oct 2019 10:27:17 +0000 (UTC) X-FDA: 76067414514.25.act35_5cb20362bf960 X-HE-Tag: act35_5cb20362bf960 X-Filterd-Recvd-Size: 4946 Received: from mx1.suse.de (mx2.suse.de [195.135.220.15]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Oct 2019 10:27:16 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 60E09B4FC; Mon, 21 Oct 2019 10:27:15 +0000 (UTC) Date: Mon, 21 Oct 2019 12:27:14 +0200 From: Michal Hocko To: Mel Gorman Cc: Andrew Morton , Vlastimil Babka , Thomas Gleixner , Matt Fleming , Borislav Petkov , Linux-MM , Linux Kernel Mailing List Subject: Re: [PATCH 1/3] mm, meminit: Recalculate pcpu batch and high limits after init completes Message-ID: <20191021102714.GH9379@dhcp22.suse.cz> References: <20191021094808.28824-1-mgorman@techsingularity.net> <20191021094808.28824-2-mgorman@techsingularity.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191021094808.28824-2-mgorman@techsingularity.net> User-Agent: Mutt/1.10.1 (2018-07-13) X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon 21-10-19 10:48:06, Mel Gorman wrote: > Deferred memory initialisation updates zone->managed_pages during > the initialisation phase but before that finishes, the per-cpu page > allocator (pcpu) calculates the number of pages allocated/freed in > batches as well as the maximum number of pages allowed on a per-cpu list. > As zone->managed_pages is not up to date yet, the pcpu initialisation > calculates inappropriately low batch and high values. > > This increases zone lock contention quite severely in some cases with the > degree of severity depending on how many CPUs share a local zone and the > size of the zone. A private report indicated that kernel build times were > excessive with extremely high system CPU usage. A perf profile indicated > that a large chunk of time was lost on zone->lock contention. > > This patch recalculates the pcpu batch and high values after deferred > initialisation completes for every populated zone in the system. It > was tested on a 2-socket AMD EPYC 2 machine using a kernel compilation > workload -- allmodconfig and all available CPUs. > > mmtests configuration: config-workload-kernbench-max > Configuration was modified to build on a fresh XFS partition. > > kernbench > 5.4.0-rc3 5.4.0-rc3 > vanilla resetpcpu-v2 > Amean user-256 13249.50 ( 0.00%) 16401.31 * -23.79%* > Amean syst-256 14760.30 ( 0.00%) 4448.39 * 69.86%* > Amean elsp-256 162.42 ( 0.00%) 119.13 * 26.65%* > Stddev user-256 42.97 ( 0.00%) 19.15 ( 55.43%) > Stddev syst-256 336.87 ( 0.00%) 6.71 ( 98.01%) > Stddev elsp-256 2.46 ( 0.00%) 0.39 ( 84.03%) > > 5.4.0-rc3 5.4.0-rc3 > vanilla resetpcpu-v2 > Duration User 39766.24 49221.79 > Duration System 44298.10 13361.67 > Duration Elapsed 519.11 388.87 > > The patch reduces system CPU usage by 69.86% and total build time by > 26.65%. The variance of system CPU usage is also much reduced. > > Before, this was the breakdown of batch and high values over all zones was. > > 256 batch: 1 > 256 batch: 63 > 512 batch: 7 > 256 high: 0 > 256 high: 378 > 512 high: 42 > > 512 pcpu pagesets had a batch limit of 7 and a high limit of 42. After the patch > > 256 batch: 1 > 768 batch: 63 > 256 high: 0 > 768 high: 378 > > Cc: stable@vger.kernel.org # v4.1+ > Signed-off-by: Mel Gorman Acked-by: Michal Hocko > --- > mm/page_alloc.c | 8 ++++++++ > 1 file changed, 8 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index c0b2e0306720..f972076d0f6b 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -1947,6 +1947,14 @@ void __init page_alloc_init_late(void) > /* Block until all are initialised */ > wait_for_completion(&pgdat_init_all_done_comp); > > + /* > + * The number of managed pages has changed due to the initialisation > + * so the pcpu batch and high limits needs to be updated or the limits > + * will be artificially small. > + */ > + for_each_populated_zone(zone) > + zone_pcp_update(zone); > + > /* > * We initialized the rest of the deferred pages. Permanently disable > * on-demand struct page initialization. > -- > 2.16.4 -- Michal Hocko SUSE Labs