From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 4B46AC433FE
	for <linux-mm@archiver.kernel.org>; Wed,  9 Nov 2022 08:06:37 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 61D1B6B0072; Wed,  9 Nov 2022 03:06:36 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 5A67A8E0002; Wed,  9 Nov 2022 03:06:36 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 4203A8E0001; Wed,  9 Nov 2022 03:06:36 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17])
	by kanga.kvack.org (Postfix) with ESMTP id 2E7276B0072
	for <linux-mm@kvack.org>; Wed,  9 Nov 2022 03:06:36 -0500 (EST)
Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay05.hostedemail.com (Postfix) with ESMTP id E589B40716
	for <linux-mm@kvack.org>; Wed,  9 Nov 2022 08:06:35 +0000 (UTC)
X-FDA: 80113171950.20.3561E1B
Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29])
	by imf14.hostedemail.com (Postfix) with ESMTP id 64838100018
	for <linux-mm@kvack.org>; Wed,  9 Nov 2022 08:06:27 +0000 (UTC)
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by smtp-out2.suse.de (Postfix) with ESMTPS id CF4E71F8C4;
	Wed,  9 Nov 2022 08:05:28 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1;
	t=1667981128; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc:
	 mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=NtpuDwdakbsSweUDnOE8zAX0zXuYtEwWTzH9ZnpMCs8=;
	b=lViDlySknjzjl7K3UwqMWkFkLRCXSNasuR0ZF0+MldKSfQJZ3QyiiPpUXRzAJg95iSXXIn
	fdbGXvK8zzgSbxslolLq7XdBqXTtd0E5KCtQHOyy2Nn2/3EBrfweJRkiBn44TId/IZOOzo
	WCspmBuikEMQfpAIci0sYwXsNfU9Aig=
Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512)
	(No client certificate requested)
	by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A8E91139F1;
	Wed,  9 Nov 2022 08:05:28 +0000 (UTC)
Received: from dovecot-director2.suse.de ([192.168.254.65])
	by imap2.suse-dmz.suse.de with ESMTPSA
	id zTxfJkhfa2PCRgAAMHmgww
	(envelope-from <mhocko@suse.com>); Wed, 09 Nov 2022 08:05:28 +0000
Date: Wed, 9 Nov 2022 09:05:28 +0100
From: Michal Hocko <mhocko@suse.com>
To: Leonardo =?iso-8859-1?Q?Br=E1s?= <leobras@redhat.com>
Cc: Ingo Molnar <mingo@redhat.com>, Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Daniel Bristot de Oliveira <bristot@redhat.com>,
	Valentin Schneider <vschneid@redhat.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Frederic Weisbecker <frederic@kernel.org>,
	Phil Auld <pauld@redhat.com>, Marcelo Tosatti <mtosatti@redhat.com>,
	linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
	linux-mm@kvack.org
Subject: Re: [PATCH v1 0/3] Avoid scheduling cache draining to isolated cpus
Message-ID: <Y2tfSAgt/lBVcdvf@dhcp22.suse.cz>
References: <20221102020243.522358-1-leobras@redhat.com>
 <Y2IwHVdgAJ6wfOVH@dhcp22.suse.cz>
 <07810c49ef326b26c971008fb03adf9dc533a178.camel@redhat.com>
 <Y2Pe45LHANFxxD7B@dhcp22.suse.cz>
 <0183b60e79cda3a0f992d14b4db5a818cd096e33.camel@redhat.com>
 <Y2TQLavnLVd4qHMT@dhcp22.suse.cz>
 <3c4ae3bb70d92340d9aaaa1856928476641a8533.camel@redhat.com>
 <Y2i9h+TRdX9EOs0T@dhcp22.suse.cz>
 <4a4a6c73f3776d65f70f7ca92eb26fc90ed3d51a.camel@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=iso-8859-1
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <4a4a6c73f3776d65f70f7ca92eb26fc90ed3d51a.camel@redhat.com>
ARC-Authentication-Results: i=1;
	imf14.hostedemail.com;
	dkim=pass header.d=suse.com header.s=susede1 header.b=lViDlySk;
	spf=pass (imf14.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com;
	dmarc=pass (policy=quarantine) header.from=suse.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667981191; a=rsa-sha256;
	cv=none;
	b=vZgzuN1htSGwxcJD5NOiqhKocWOHKCGdceWi4wqiKT+LJnqyDGISCI6RNdfBAwCZwqwWu4
	1CebmovNDzBCYLKhRv0cE6pRz9sXlX0kDt5KuMFTg4dTjVW9Po/qZ9h3a4k1bU5Ju8uSsG
	Nlz+dNRD1p7BWOol3wGO5kuaRZ/pjJE=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1667981191;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=NtpuDwdakbsSweUDnOE8zAX0zXuYtEwWTzH9ZnpMCs8=;
	b=QY+yMpL3OVWC1sun7tuKa/v4ymDXBBSotUT30I1YBKA7s2V+1dtMojN5/JlK05+Q7t/QTI
	U05MruWV84/iU9sXG4JnvKJrzgXiFyWkCJCOnpDrDt6zzZ2g2n0BaF8PxXTGO/UubLP9UK
	i5O5/g3X7w3LWoftrNgtq4ZuXFk5xvQ=
X-Rspamd-Server: rspam04
X-Rspamd-Queue-Id: 64838100018
X-Rspam-User: 
Authentication-Results: imf14.hostedemail.com;
	dkim=pass header.d=suse.com header.s=susede1 header.b=lViDlySk;
	spf=pass (imf14.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com;
	dmarc=pass (policy=quarantine) header.from=suse.com
X-Stat-Signature: scjdi37huw7r56c43yod5qebqhjkpy8j
X-HE-Tag: 1667981187-843700
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue 08-11-22 20:09:25, Leonardo Brás wrote:
[...]
> > Yes, with a notable difference that with your spin lock option there is
> > still a chance that the remote draining could influence the isolated CPU
> > workload throug that said spinlock. If there is no pcp cache for that
> > cpu being used then there is no potential interaction at all.
> 
> I see. 
> But the slow path is slow for some reason, right?
> Does not it make use of any locks also? So on normal operation there could be a
> potentially larger impact than a spinlock, even though there would be no
> scheduled draining.

Well, for the regular (try_charge) path that is essentially page_counter_try_charge
which boils down to atomic_long_add_return of the memcg counter + all
parents up the hierarchy and high memory limit evaluation (essentially 2
atomic_reads for the memcg + all parents up the hierchy). That is not
whole of a lot - especially when the memcg hierarchy is not very deep.

Per cpu batch amortizes those per hierarchy updates as well as atomic
operations + cache lines bouncing on updates.

On the other hand spinlock would do the unconditional atomic updates as
well and even much more on CONFIG_RT. A plus is that the update will be
mostly local so cache line bouncing shouldn't be terrible. Unless
somebody heavily triggers pcp cache draining but this shouldn't be all
that common (e.g. when a memcg triggers its limit.

All that being said, I am still not convinced that the pcp cache bypass
for isolated CPUs would make a dramatic difference. Especially in the
context of workloads that tend to run on isolated CPUs and rarely enter
kernel.
 
> > It is true true that appart from user
> > space memory which can be under full control of the userspace there are
> > kernel allocations which can be done on behalf of the process and those
> > could be charged to memcg as well. So I can imagine the pcp cache could
> > be populated even if the process is not faulting anything in during RT
> > sensitive phase.
> 
> Humm, I think I will apply the change and do a comparative testing with
> upstream. This should bring good comparison results.

That would be certainly appreciated!
 
> > > On the other hand, compared to how it works now now, this should be a more
> > > controllable way of introducing latency than a scheduled cache drain.
> > > 
> > > Your suggestion on no-stocks/caches in isolated CPUs would be great for
> > > predictability, but I am almost sure the cost in overall performance would not
> > > be fine.
> > 
> > It is hard to estimate the overhead without measuring that. Do you think
> > you can give it a try? If the performance is not really acceptable
> > (which I would be really surprised) then we can think of a more complex
> > solution.
> 
> Sure, I can try that.
> Do you suggest any specific workload that happens to stress the percpu cache
> usage, with usual drains and so? Maybe I will also try with synthetic worloads
> also.

I really think you want to test it on the isolcpu aware workload.
Artificial benchmark are not all that useful in this context.
-- 
Michal Hocko
SUSE Labs