From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4B46AC433FE for ; Wed, 9 Nov 2022 08:06:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 61D1B6B0072; Wed, 9 Nov 2022 03:06:36 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 5A67A8E0002; Wed, 9 Nov 2022 03:06:36 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4203A8E0001; Wed, 9 Nov 2022 03:06:36 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2E7276B0072 for ; Wed, 9 Nov 2022 03:06:36 -0500 (EST) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E589B40716 for ; Wed, 9 Nov 2022 08:06:35 +0000 (UTC) X-FDA: 80113171950.20.3561E1B Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf14.hostedemail.com (Postfix) with ESMTP id 64838100018 for ; Wed, 9 Nov 2022 08:06:27 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out2.suse.de (Postfix) with ESMTPS id CF4E71F8C4; Wed, 9 Nov 2022 08:05:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1667981128; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=NtpuDwdakbsSweUDnOE8zAX0zXuYtEwWTzH9ZnpMCs8=; b=lViDlySknjzjl7K3UwqMWkFkLRCXSNasuR0ZF0+MldKSfQJZ3QyiiPpUXRzAJg95iSXXIn fdbGXvK8zzgSbxslolLq7XdBqXTtd0E5KCtQHOyy2Nn2/3EBrfweJRkiBn44TId/IZOOzo WCspmBuikEMQfpAIci0sYwXsNfU9Aig= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id A8E91139F1; Wed, 9 Nov 2022 08:05:28 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id zTxfJkhfa2PCRgAAMHmgww (envelope-from ); Wed, 09 Nov 2022 08:05:28 +0000 Date: Wed, 9 Nov 2022 09:05:28 +0100 From: Michal Hocko To: Leonardo =?iso-8859-1?Q?Br=E1s?= Cc: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , Valentin Schneider , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Frederic Weisbecker , Phil Auld , Marcelo Tosatti , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v1 0/3] Avoid scheduling cache draining to isolated cpus Message-ID: References: <20221102020243.522358-1-leobras@redhat.com> <07810c49ef326b26c971008fb03adf9dc533a178.camel@redhat.com> <0183b60e79cda3a0f992d14b4db5a818cd096e33.camel@redhat.com> <3c4ae3bb70d92340d9aaaa1856928476641a8533.camel@redhat.com> <4a4a6c73f3776d65f70f7ca92eb26fc90ed3d51a.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <4a4a6c73f3776d65f70f7ca92eb26fc90ed3d51a.camel@redhat.com> ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=lViDlySk; spf=pass (imf14.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667981191; a=rsa-sha256; cv=none; b=vZgzuN1htSGwxcJD5NOiqhKocWOHKCGdceWi4wqiKT+LJnqyDGISCI6RNdfBAwCZwqwWu4 1CebmovNDzBCYLKhRv0cE6pRz9sXlX0kDt5KuMFTg4dTjVW9Po/qZ9h3a4k1bU5Ju8uSsG Nlz+dNRD1p7BWOol3wGO5kuaRZ/pjJE= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667981191; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NtpuDwdakbsSweUDnOE8zAX0zXuYtEwWTzH9ZnpMCs8=; b=QY+yMpL3OVWC1sun7tuKa/v4ymDXBBSotUT30I1YBKA7s2V+1dtMojN5/JlK05+Q7t/QTI U05MruWV84/iU9sXG4JnvKJrzgXiFyWkCJCOnpDrDt6zzZ2g2n0BaF8PxXTGO/UubLP9UK i5O5/g3X7w3LWoftrNgtq4ZuXFk5xvQ= X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 64838100018 X-Rspam-User: Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=lViDlySk; spf=pass (imf14.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-Stat-Signature: scjdi37huw7r56c43yod5qebqhjkpy8j X-HE-Tag: 1667981187-843700 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 08-11-22 20:09:25, Leonardo Brás wrote: [...] > > Yes, with a notable difference that with your spin lock option there is > > still a chance that the remote draining could influence the isolated CPU > > workload throug that said spinlock. If there is no pcp cache for that > > cpu being used then there is no potential interaction at all. > > I see. > But the slow path is slow for some reason, right? > Does not it make use of any locks also? So on normal operation there could be a > potentially larger impact than a spinlock, even though there would be no > scheduled draining. Well, for the regular (try_charge) path that is essentially page_counter_try_charge which boils down to atomic_long_add_return of the memcg counter + all parents up the hierarchy and high memory limit evaluation (essentially 2 atomic_reads for the memcg + all parents up the hierchy). That is not whole of a lot - especially when the memcg hierarchy is not very deep. Per cpu batch amortizes those per hierarchy updates as well as atomic operations + cache lines bouncing on updates. On the other hand spinlock would do the unconditional atomic updates as well and even much more on CONFIG_RT. A plus is that the update will be mostly local so cache line bouncing shouldn't be terrible. Unless somebody heavily triggers pcp cache draining but this shouldn't be all that common (e.g. when a memcg triggers its limit. All that being said, I am still not convinced that the pcp cache bypass for isolated CPUs would make a dramatic difference. Especially in the context of workloads that tend to run on isolated CPUs and rarely enter kernel. > > It is true true that appart from user > > space memory which can be under full control of the userspace there are > > kernel allocations which can be done on behalf of the process and those > > could be charged to memcg as well. So I can imagine the pcp cache could > > be populated even if the process is not faulting anything in during RT > > sensitive phase. > > Humm, I think I will apply the change and do a comparative testing with > upstream. This should bring good comparison results. That would be certainly appreciated! > > > On the other hand, compared to how it works now now, this should be a more > > > controllable way of introducing latency than a scheduled cache drain. > > > > > > Your suggestion on no-stocks/caches in isolated CPUs would be great for > > > predictability, but I am almost sure the cost in overall performance would not > > > be fine. > > > > It is hard to estimate the overhead without measuring that. Do you think > > you can give it a try? If the performance is not really acceptable > > (which I would be really surprised) then we can think of a more complex > > solution. > > Sure, I can try that. > Do you suggest any specific workload that happens to stress the percpu cache > usage, with usual drains and so? Maybe I will also try with synthetic worloads > also. I really think you want to test it on the isolcpu aware workload. Artificial benchmark are not all that useful in this context. -- Michal Hocko SUSE Labs