From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 9AEC0EB64DA
	for <linux-mm@archiver.kernel.org>; Tue, 18 Jul 2023 12:34:36 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id DE7218D0001; Tue, 18 Jul 2023 08:34:35 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id D980B6B0074; Tue, 18 Jul 2023 08:34:35 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id C85F48D0001; Tue, 18 Jul 2023 08:34:35 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id BAA0D6B0071
	for <linux-mm@kvack.org>; Tue, 18 Jul 2023 08:34:35 -0400 (EDT)
Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay08.hostedemail.com (Postfix) with ESMTP id 8782914010B
	for <linux-mm@kvack.org>; Tue, 18 Jul 2023 12:34:35 +0000 (UTC)
X-FDA: 81024676110.20.B84E434
Received: from outbound-smtp01.blacknight.com (outbound-smtp01.blacknight.com [81.17.249.7])
	by imf23.hostedemail.com (Postfix) with ESMTP id 495FA14001A
	for <linux-mm@kvack.org>; Tue, 18 Jul 2023 12:34:32 +0000 (UTC)
Authentication-Results: imf23.hostedemail.com;
	dkim=none;
	spf=pass (imf23.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.7 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net;
	dmarc=none
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1689683673;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=Eab4/Ty/Yw21V4h9xjirvlOdqc32FEXhfpD0JB8ypZw=;
	b=RcPGUORZtaU5cshZCXCAt5e3Zcj1Asn+oVAkYpuBdGFqvoJfOkrTdDqScvzWZEGDrMNnho
	jYW+2fWmMuA03Z1IU5XIAuwRX0dLGIqm3FQDGOTGgZ6PHo3Fv6a0arlgXat38PHdrMSyis
	80nwtrCXQnRqRxsMnJDxf/TPG+HGr/o=
ARC-Authentication-Results: i=1;
	imf23.hostedemail.com;
	dkim=none;
	spf=pass (imf23.hostedemail.com: domain of mgorman@techsingularity.net designates 81.17.249.7 as permitted sender) smtp.mailfrom=mgorman@techsingularity.net;
	dmarc=none
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1689683673; a=rsa-sha256;
	cv=none;
	b=cXLj7WBgJNbHDmP5c6ubJv5YE+dZf+1KSb+FNHv7iy1Fq7Q8EEkRYmSyCKKvVF5kLj0R/0
	9C1sjjwZsiWy/0nFcxlisd3HJOQVcGCn2bpJM9kKa1bauDQjFjJqTjX5hm37kcW1tG8RSh
	bwQAakUEiulEJEuFLhGW6tX/HARjPHI=
Received: from mail.blacknight.com (pemlinmail03.blacknight.ie [81.17.254.16])
	by outbound-smtp01.blacknight.com (Postfix) with ESMTPS id 34C52C4A82
	for <linux-mm@kvack.org>; Tue, 18 Jul 2023 13:34:31 +0100 (IST)
Received: (qmail 26163 invoked from network); 18 Jul 2023 12:34:31 -0000
Received: from unknown (HELO techsingularity.net) (mgorman@techsingularity.net@[84.203.20.191])
  by 81.17.254.9 with ESMTPSA (AES256-SHA encrypted, authenticated); 18 Jul 2023 12:34:30 -0000
Date: Tue, 18 Jul 2023 13:34:28 +0100
From: Mel Gorman <mgorman@techsingularity.net>
To: "Huang, Ying" <ying.huang@intel.com>
Cc: Michal Hocko <mhocko@suse.com>, linux-mm@kvack.org,
	linux-kernel@vger.kernel.org,
	Arjan Van De Ven <arjan@linux.intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	David Hildenbrand <david@redhat.com>,
	Johannes Weiner <jweiner@redhat.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Pavel Tatashin <pasha.tatashin@soleen.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC 2/2] mm: alloc/free depth based PCP high auto-tuning
Message-ID: <20230718123428.jcy4avtjg3rhuh7i@techsingularity.net>
References: <20230710065325.290366-1-ying.huang@intel.com>
 <20230710065325.290366-3-ying.huang@intel.com>
 <ZK060sMG0GfC5gUS@dhcp22.suse.cz>
 <20230712090526.thk2l7sbdcdsllfi@techsingularity.net>
 <871qhcdwa1.fsf@yhuang6-desk2.ccr.corp.intel.com>
 <20230714140710.5xbesq6xguhcbyvi@techsingularity.net>
 <87pm4qdhk4.fsf@yhuang6-desk2.ccr.corp.intel.com>
 <20230717135017.7ro76lsaninbazvf@techsingularity.net>
 <87lefeca2z.fsf@yhuang6-desk2.ccr.corp.intel.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-15
Content-Disposition: inline
In-Reply-To: <87lefeca2z.fsf@yhuang6-desk2.ccr.corp.intel.com>
X-Rspamd-Queue-Id: 495FA14001A
X-Rspam-User: 
X-Stat-Signature: 8u5njxqkj4mg8eb5wqsio4xqy8ae4qxr
X-Rspamd-Server: rspam01
X-HE-Tag: 1689683672-643139
X-HE-Meta: U2FsdGVkX18ojxVCJrK8xwvJgweVMZyEyRH80qwXig0ybxdcv9AIh8fcJiPLDGTST9nEjF3h/mC/VNaqVV38DNXH6QRjeWftAtKMLWePDjduRDhy9lFKd75qOsyB3jPd11gLiIOuA/wt3pHxHVxHPPA3uzLX6YfI2eBXTHqkinNfFf5RQPvrgagNAeLE2CuaNjFD6/ilqDeeOTyRM7Pll1hiSUOoZetYZnbSgeIAUCphWlFy0uIKls5ViuVVruNqAF2xB6TPULjpmkEPG/CXWL6k+Vc0ocV1DGRaiafmjqZn0CQ7e1Q2f8GMbbL+c0/eQW+gtL6/o+5sL+cQTzQLoInx76c/jIu6K7bRFKt0+GLdELjbMItkcjqY9lcqyz94oYSg15ZfgZesZPAGCl6VT2kP3qnKPIm89eVA/UGfsQ1MampnJzmDk2Ny5wQeL/1i363vyXWE24220v3Uk7L+8IJlcuQu5io8alavzWADacRO1bVcHPCvm4IbZ+DgnizUcDKs8hQVDk6bBQIdAhWIvrsTLugClyHMz8l2P6E95JqdPwo8FQuKGuLeObcWAiGsrqggR1/bAD0i/+BzwYbXtfX2BCmdZyHVfI3aS42Fy4ALsD8/FQPReIRfzgQF59a690h9+pC7uyNvJZDlLWteCCejQsWXHbwQ6cvuEzvZuJlFM3vmWyJ95fNDH9oNXYQdPxe1DXc3dyTWJSzr3HLZ+n8Znn2A22ddxotQXlxv8qxAXEl6UtpSLHBPPuFbFYVri/y3W4P648i/7mAVjhghrzEGe2I1Xrzbs6aqQgBjKsZ757DbJBQBUTvUa/0sLlf/KYLQcsFlzU939EfokP6gnjKfL658onbEajqLfCzIVzCGnIS/hw0BZylNKYV/HDwikn7Wc6XkI6uquiWTBQMrQA6b4DuIDkYvC60vRuQi2aHAVDFkeabD0eYpj6WHL9op2jXsJP+rtRoMf9LSpqx
 Ji+YtRAs
 zC0t4eR23GcKm8pkXKqbslnijdCDL+ugl+gpCxD7YRrGqX3EcTufMnQTIAHSMWk7I+5f4bMzQYYkLlB7v0UdxTLN++5TBKi6F8C7sImIJjTu7/uv/ricr1aucXvdw28Yonpadp1pw/wsLL+KTilbg+ew4G1XgDt+c2ac5rwzTZQXvZ24ojw1z2pu6P0j90P1nPMLJMVwv6uJGIHsD4+MHNwGNdYjs0qPlSPObLDhamP7ntWPMkfjtbgV3UcOxTnIHQ3H/dF7ByKXrKNWoLqmn+JKZPKgwqZvYd63zd3pWfuFsDQHrOm5CraiLcNweAa8jWuqi
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, Jul 18, 2023 at 08:55:16AM +0800, Huang, Ying wrote:
> Mel Gorman <mgorman@techsingularity.net> writes:
> 
> > On Mon, Jul 17, 2023 at 05:16:11PM +0800, Huang, Ying wrote:
> >> Mel Gorman <mgorman@techsingularity.net> writes:
> >> 
> >> > Batch should have a much lower maximum than high because it's a deferred cost
> >> > that gets assigned to an arbitrary task. The worst case is where a process
> >> > that is a light user of the allocator incurs the full cost of a refill/drain.
> >> >
> >> > Again, intuitively this may be PID Control problem for the "Mix" case
> >> > to estimate the size of high required to minimise drains/allocs as each
> >> > drain/alloc is potentially a lock contention. The catchall for corner
> >> > cases would be to decay high from vmstat context based on pcp->expires. The
> >> > decay would prevent the "high" being pinned at an artifically high value
> >> > without any zone lock contention for prolonged periods of time and also
> >> > mitigate worst-case due to state being per-cpu. The downside is that "high"
> >> > would also oscillate for a continuous steady allocation pattern as the PID
> >> > control might pick an ideal value suitable for a long period of time with
> >> > the "decay" disrupting that ideal value.
> >> 
> >> Maybe we can track the minimal value of pcp->count.  If it's small
> >> enough recently, we can avoid to decay pcp->high.  Because the pages in
> >> PCP are used for allocations instead of idle.
> >
> > Implement as a separate patch. I suspect this type of heuristic will be
> > very benchmark specific and the complexity may not be worth it in the
> > general case.
> 
> OK.
> 
> >> Another question is as follows.
> >> 
> >> For example, on CPU A, a large number of pages are freed, and we
> >> maximize batch and high.  So, a large number of pages are put in PCP.
> >> Then, the possible situations may be,
> >> 
> >> a) a large number of pages are allocated on CPU A after some time
> >> b) a large number of pages are allocated on another CPU B
> >> 
> >> For a), we want the pages are kept in PCP of CPU A as long as possible.
> >> For b), we want the pages are kept in PCP of CPU A as short as possible.
> >> I think that we need to balance between them.  What is the reasonable
> >> time to keep pages in PCP without many allocations?
> >> 
> >
> > This would be a case where you're relying on vmstat to drain the PCP after
> > a period of time as it is a corner case.
> 
> Yes.  The remaining question is how long should "a period of time" be?

Match the time used for draining "remote" pages from the PCP lists. The
choice is arbitrary and no matter what value is chosen, it'll be possible
to build an adverse workload.

> If it's long, the pages in PCP can be used for allocation after some
> time.  If it's short the pages can be put in buddy, so can be used by
> other workloads if needed.
> 

Assume that the main reason to expire pages and put them back on the buddy
list is to avoid premature allocation failures due to pages pinned on the
PCP. Once pages are going back onto the buddy list and the expiry is hit,
it might as well be assumed that the pages are cache-cold. Some bad corner
cases should be mitigated by disabling the adapative sizing when reclaim is
active. The big remaaining corner case to watch out for is where the sum
of the boosted pcp->high exceeds the low watermark.  If that should ever
happen then potentially a premature OOM happens because the watermarks
are fine so no reclaim is active but no pages are available. It may even
be the case that the sum of pcp->high should not exceed *min* as that
corner case means that processes may prematurely enter direct reclaim
(not as bad as OOM but still bad).

> Anyway, I will do some experiment for that.
> 
> > You cannot reasonably detect the pattern on two separate per-cpu lists
> > without either inspecting remote CPU state or maintaining global
> > state. Either would incur cache miss penalties that probably cost more
> > than the heuristic saves.
> 
> Yes.  Totally agree.
> 
> Best Regards,
> Huang, Ying

-- 
Mel Gorman
SUSE Labs