From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1FFC9C433F5 for ; Wed, 16 Feb 2022 14:37:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 624576B0073; Wed, 16 Feb 2022 09:37:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D4C96B0074; Wed, 16 Feb 2022 09:37:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4C2AA6B0078; Wed, 16 Feb 2022 09:37:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0240.hostedemail.com [216.40.44.240]) by kanga.kvack.org (Postfix) with ESMTP id 3B8EA6B0073 for ; Wed, 16 Feb 2022 09:37:55 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id EB50D8249980 for ; Wed, 16 Feb 2022 14:37:54 +0000 (UTC) X-FDA: 79148897268.13.E9328A7 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf02.hostedemail.com (Postfix) with ESMTP id 3EDFB80015 for ; Wed, 16 Feb 2022 14:37:54 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id 150431F37D; Wed, 16 Feb 2022 14:37:53 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_rsa; t=1645022273; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hzxw7hrW6dK7op0zINjolnPd3bCW2rJjoUiSMQH+CTk=; b=TvYzXGmg7n17GwN8Ys+jLoQHubu+F6/aqB18rg+S6Hnk4Oj4SBiD4trr4k0W7I/B3JQ8aS c5ApEWU6UmHLXFjTLg1aQXVkVdYvdrrYHadBSTBXmVpMThw2k70Y4qu9Tv4Bblu19M8M6+ uisCnEoGmQUnfZk8iniWj/U+KChjx6c= DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=suse.cz; s=susede2_ed25519; t=1645022273; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Hzxw7hrW6dK7op0zINjolnPd3bCW2rJjoUiSMQH+CTk=; b=n10LNqT7gvOOyLsB+Xx/BFainR8wLXRyzOA+tOnQxrdj/1HAptg3Qd+VINmQG/sj6LR6uo blgA+z97B0HhJHDw== Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id E23F113B24; Wed, 16 Feb 2022 14:37:52 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id N95zNUAMDWJEFwAAMHmgww (envelope-from ); Wed, 16 Feb 2022 14:37:52 +0000 Message-ID: <5c4747b7-60b7-c7b4-3be6-4ecea92cf975@suse.cz> Date: Wed, 16 Feb 2022 15:37:52 +0100 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.6.0 Subject: Re: [PATCH 5/5] mm/page_alloc: Limit number of high-order pages on PCP during bulk free Content-Language: en-US To: Mel Gorman , Andrew Morton Cc: Aaron Lu , Dave Hansen , Michal Hocko , Jesper Dangaard Brouer , LKML , Linux-MM References: <20220215145111.27082-1-mgorman@techsingularity.net> <20220215145111.27082-6-mgorman@techsingularity.net> From: Vlastimil Babka In-Reply-To: <20220215145111.27082-6-mgorman@techsingularity.net> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Stat-Signature: si5dfuein6kpiyhmk95eu178y4bysi33 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 3EDFB80015 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.cz header.s=susede2_rsa header.b=TvYzXGmg; dkim=pass header.d=suse.cz header.s=susede2_ed25519 header.b=n10LNqT7; dmarc=none; spf=pass (imf02.hostedemail.com: domain of vbabka@suse.cz designates 195.135.220.29 as permitted sender) smtp.mailfrom=vbabka@suse.cz X-Rspam-User: X-HE-Tag: 1645022274-869 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2/15/22 15:51, Mel Gorman wrote: > When a PCP is mostly used for frees then high-order pages can exist on PCP > lists for some time. This is problematic when the allocation pattern is all > allocations from one CPU and all frees from another resulting in colder > pages being used. When bulk freeing pages, limit the number of high-order > pages that are stored on the PCP lists. > > Netperf running on localhost exhibits this pattern and while it does > not matter for some machines, it does matter for others with smaller > caches where cache misses cause problems due to reduced page reuse. > Pages freed directly to the buddy list may be reused quickly while still > cache hot where as storing on the PCP lists may be cold by the time > free_pcppages_bulk() is called. > > Using perf kmem:mm_page_alloc, the 5 most used page frames were > > 5.17-rc3 > 13041 pfn=0x111a30 > 13081 pfn=0x5814d0 > 13097 pfn=0x108258 > 13121 pfn=0x689598 > 13128 pfn=0x5814d8 > > 5.17-revert-highpcp > 192009 pfn=0x54c140 > 195426 pfn=0x1081d0 > 200908 pfn=0x61c808 > 243515 pfn=0xa9dc20 > 402523 pfn=0x222bb8 > > 5.17-full-series > 142693 pfn=0x346208 > 162227 pfn=0x13bf08 > 166413 pfn=0x2711e0 > 166950 pfn=0x2702f8 > > The spread is wider as there is still time before pages freed to one > PCP get released with a tradeoff between fast reuse and reduced zone > lock acquisition. > > From the machine used to gather the traces, the headline performance > was equivalent. > > netperf-tcp > 5.17.0-rc3 5.17.0-rc3 5.17.0-rc3 > vanilla mm-reverthighpcp-v1r1 mm-highpcplimit-v1r12 > Hmean 64 839.93 ( 0.00%) 840.77 ( 0.10%) 835.34 * -0.55%* > Hmean 128 1614.22 ( 0.00%) 1622.07 * 0.49%* 1604.18 * -0.62%* > Hmean 256 2952.00 ( 0.00%) 2953.19 ( 0.04%) 2959.46 ( 0.25%) > Hmean 1024 10291.67 ( 0.00%) 10239.17 ( -0.51%) 10287.05 ( -0.04%) > Hmean 2048 17335.08 ( 0.00%) 17399.97 ( 0.37%) 17125.73 * -1.21%* > Hmean 3312 22628.15 ( 0.00%) 22471.97 ( -0.69%) 22414.24 * -0.95%* > Hmean 4096 25009.50 ( 0.00%) 24752.83 * -1.03%* 24620.03 * -1.56%* > Hmean 8192 32745.01 ( 0.00%) 31682.63 * -3.24%* 32475.31 ( -0.82%) > Hmean 16384 39759.59 ( 0.00%) 36805.78 * -7.43%* 39291.42 ( -1.18%) > > From a 1-socket skylake machine with a small CPU cache that suffers > more if cache misses are too high > > netperf-tcp > 5.17.0-rc3 5.17.0-rc3 5.17.0-rc3 > vanilla mm-reverthighpcp-v1 mm-highpcplimit-v1 > Min 64 935.38 ( 0.00%) 939.40 ( 0.43%) 940.11 ( 0.51%) > Min 128 1831.69 ( 0.00%) 1856.15 ( 1.34%) 1849.30 ( 0.96%) > Min 256 3560.61 ( 0.00%) 3659.25 ( 2.77%) 3654.12 ( 2.63%) > Min 1024 13165.24 ( 0.00%) 13444.74 ( 2.12%) 13281.71 ( 0.88%) > Min 2048 22706.44 ( 0.00%) 23219.67 ( 2.26%) 23027.31 ( 1.41%) > Min 3312 30960.26 ( 0.00%) 31985.01 ( 3.31%) 31484.40 ( 1.69%) > Min 4096 35149.03 ( 0.00%) 35997.44 ( 2.41%) 35891.92 ( 2.11%) > Min 8192 48064.73 ( 0.00%) 49574.05 ( 3.14%) 48928.89 ( 1.80%) > Min 16384 58017.25 ( 0.00%) 60352.93 ( 4.03%) 60691.14 ( 4.61%) > Hmean 64 938.95 ( 0.00%) 941.50 * 0.27%* 940.47 ( 0.16%) > Hmean 128 1843.10 ( 0.00%) 1857.58 * 0.79%* 1855.83 * 0.69%* > Hmean 256 3573.07 ( 0.00%) 3667.45 * 2.64%* 3662.08 * 2.49%* > Hmean 1024 13206.52 ( 0.00%) 13487.80 * 2.13%* 13351.11 * 1.09%* > Hmean 2048 22870.23 ( 0.00%) 23337.96 * 2.05%* 23149.68 * 1.22%* > Hmean 3312 31001.99 ( 0.00%) 32206.50 * 3.89%* 31849.40 * 2.73%* > Hmean 4096 35364.59 ( 0.00%) 36490.96 * 3.19%* 36112.91 * 2.12%* > Hmean 8192 48497.71 ( 0.00%) 49954.05 * 3.00%* 49384.50 * 1.83%* > Hmean 16384 58410.86 ( 0.00%) 60839.80 * 4.16%* 61362.12 * 5.05%* > > Note that this was a machine that did not benefit from caching high-order > pages and performance is almost restored with the series applied. It's not > fully restored as cache misses are still higher. This is a trade-off > between optimising for a workload that does all allocs on one CPU and frees > on another or more general workloads that need high-order pages for SLUB > and benefit from avoiding zone->lock for every SLUB refill/drain. > > Signed-off-by: Mel Gorman Reviewed-by: Vlastimil Babka > --- > mm/page_alloc.c | 26 +++++++++++++++++++++----- > 1 file changed, 21 insertions(+), 5 deletions(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 6881175b27df..cfb3cbad152c 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -3314,10 +3314,15 @@ static bool free_unref_page_prepare(struct page *page, unsigned long pfn, > return true; > } > > -static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch) > +static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch, > + bool free_high) > { > int min_nr_free, max_nr_free; > > + /* Free everything if batch freeing high-order pages. */ > + if (unlikely(free_high)) > + return pcp->count; > + > /* Check for PCP disabled or boot pageset */ > if (unlikely(high < batch)) > return 1; > @@ -3338,11 +3343,12 @@ static int nr_pcp_free(struct per_cpu_pages *pcp, int high, int batch) > return batch; > } > > -static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone) > +static int nr_pcp_high(struct per_cpu_pages *pcp, struct zone *zone, > + bool free_high) > { > int high = READ_ONCE(pcp->high); > > - if (unlikely(!high)) > + if (unlikely(!high || free_high)) > return 0; > > if (!test_bit(ZONE_RECLAIM_ACTIVE, &zone->flags)) > @@ -3362,17 +3368,27 @@ static void free_unref_page_commit(struct page *page, unsigned long pfn, > struct per_cpu_pages *pcp; > int high; > int pindex; > + bool free_high; > > __count_vm_event(PGFREE); > pcp = this_cpu_ptr(zone->per_cpu_pageset); > pindex = order_to_pindex(migratetype, order); > list_add(&page->lru, &pcp->lists[pindex]); > pcp->count += 1 << order; > - high = nr_pcp_high(pcp, zone); > + > + /* > + * As high-order pages other than THP's stored on PCP can contribute > + * to fragmentation, limit the number stored when PCP is heavily > + * freeing without allocation. The remainder after bulk freeing > + * stops will be drained from vmstat refresh context. > + */ > + free_high = (pcp->free_factor && order && order <= PAGE_ALLOC_COSTLY_ORDER); > + > + high = nr_pcp_high(pcp, zone, free_high); > if (pcp->count >= high) { > int batch = READ_ONCE(pcp->batch); > > - free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch), pcp, pindex); > + free_pcppages_bulk(zone, nr_pcp_free(pcp, high, batch, free_high), pcp, pindex); > } > } >