From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 35752C38142
	for <linux-mm@archiver.kernel.org>; Wed,  1 Feb 2023 04:36:19 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 5C7D96B0071; Tue, 31 Jan 2023 23:36:19 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 578056B0072; Tue, 31 Jan 2023 23:36:19 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 440A76B0074; Tue, 31 Jan 2023 23:36:19 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id 320056B0071
	for <linux-mm@kvack.org>; Tue, 31 Jan 2023 23:36:19 -0500 (EST)
Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 01C8F1A0350
	for <linux-mm@kvack.org>; Wed,  1 Feb 2023 04:36:18 +0000 (UTC)
X-FDA: 80417461278.10.407DD55
Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124])
	by imf17.hostedemail.com (Postfix) with ESMTP id 5F4D84001B
	for <linux-mm@kvack.org>; Wed,  1 Feb 2023 04:36:16 +0000 (UTC)
Authentication-Results: imf17.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=e46637Ba;
	spf=pass (imf17.hostedemail.com: domain of leobras@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=leobras@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1675226176;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=G/0YJBuKx6oClvbqujgba9q91Kv0a93ZKVrUHKXFvoE=;
	b=scqF6narOn7m4nsw3MyyHQTDilSzUFM+FarV6IfTgjOyhVrj8Y7Rh1SHLdGOVjcCak9MJF
	2HPViBdimcgJkLWuikj9C32vIQc3TCe4bdk39TzZl5FeHgLkG9agxUfZJkRGkpmZUaxQyH
	QGCKcsN5U4CmRll3YLZYtK8Ym+7SBWw=
ARC-Authentication-Results: i=1;
	imf17.hostedemail.com;
	dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=e46637Ba;
	spf=pass (imf17.hostedemail.com: domain of leobras@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=leobras@redhat.com;
	dmarc=pass (policy=none) header.from=redhat.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675226176; a=rsa-sha256;
	cv=none;
	b=uimTCaxlWd2XfEF85gei75D2KL8Fa8qpIBPEubOBQRP9zKrG0qU7+zlhyGrfb89HHt6IAy
	dsEG1D9EbCm/znPz6ZX4r1k3VL8Lh10nDzuc2zr/qXLyOGfcGtjFairToB5lu+GJqI6OwY
	Nc9OprnHbLjhtrvUMoEZ6EktNBzh1iI=
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com;
	s=mimecast20190719; t=1675226175;
	h=from:from:reply-to:subject:subject:date:date:message-id:message-id:
	 to:to:cc:cc:mime-version:mime-version:content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references;
	bh=G/0YJBuKx6oClvbqujgba9q91Kv0a93ZKVrUHKXFvoE=;
	b=e46637Ba4wgv3TpZ9IVh84tFrPQ/D4KZAVlEJXIOMp5D0ZtO7X5wwpz2q8XroGfgPYQqhQ
	wppjKsKFFJkCKFKJY9c+5rufGylfe47KDeGzhQTyTC6NV8aIFMRhiSoP8PLrmAOAToyQnw
	KDMWILQDATupWLgVe8T7MASeecDenm0=
Received: from mail-oi1-f198.google.com (mail-oi1-f198.google.com
 [209.85.167.198]) by relay.mimecast.com with ESMTP with STARTTLS
 (version=TLSv1.3, cipher=TLS_AES_128_GCM_SHA256) id
 us-mta-609-00x-6GkPP6msiaVB5Baisw-1; Tue, 31 Jan 2023 23:36:14 -0500
X-MC-Unique: 00x-6GkPP6msiaVB5Baisw-1
Received: by mail-oi1-f198.google.com with SMTP id k2-20020a544402000000b0037806f41283so4701841oiw.8
        for <linux-mm@kvack.org>; Tue, 31 Jan 2023 20:36:14 -0800 (PST)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=mime-version:user-agent:content-transfer-encoding:references
         :in-reply-to:date:cc:to:from:subject:message-id:x-gm-message-state
         :from:to:cc:subject:date:message-id:reply-to;
        bh=BiFkmS3tbcnlONUhHqfYNt9fP+0R9mL8avFHp+lRK9A=;
        b=u0CH5s98qH11kEN49sM+BOaIV9RivsxSKVG8ZdsYsl/WDTtBe4Jck2Qsw8ZkBCRnfp
         1cUVCzfX8gLKI460Eq0fCSYYkvPQ3MXzx94/IrntJv7OTG2GwC+linqakdDdPqlIIfwb
         tuiKHG/+AA+FbL2qy+fdOQ8MP2hYPP6/ODbb4lGSA1V1mq4lsEOuyKLGrqarZmlOPwb6
         jDS8HrDVG8AAJtkOLQBPcdb4oc+z9gMFsieq+PWP1Z0+fjrqHB59n1/zx4KitbNUxk+Y
         7/Mdfl6prdDA1sCawesDqZ0dHJiLTIfYTGKrupqNw5TVcZqBv7GVXBDp8MrILoBbXUbZ
         UNpA==
X-Gm-Message-State: AO0yUKWMJvKWmQ7qWCQMd1HSUO1tBYsPnbDgmzEswaqxfs3DiOD/a69Y
	KrAh0/PepGorl3gygQuU1NkpWxaaOANV/VzlHOrPCMn+9KQHXTEG+mpcGtBEqVDVCmxxka/A5/l
	UYRY/L9bkZ5s=
X-Received: by 2002:a05:6871:721:b0:152:4c46:fa6c with SMTP id f33-20020a056871072100b001524c46fa6cmr348959oap.17.1675226173556;
        Tue, 31 Jan 2023 20:36:13 -0800 (PST)
X-Google-Smtp-Source: AK7set9I+XbBlHDkMe6JZXfDndS488Wpi+CAx0Lsc2GvEYuE/fTytkaATmKQj00gyUrFyQU54ZtHcQ==
X-Received: by 2002:a05:6871:721:b0:152:4c46:fa6c with SMTP id f33-20020a056871072100b001524c46fa6cmr348944oap.17.1675226173240;
        Tue, 31 Jan 2023 20:36:13 -0800 (PST)
Received: from ?IPv6:2804:1b3:a800:6912:c477:c73a:cf7c:3a27? ([2804:1b3:a800:6912:c477:c73a:cf7c:3a27])
        by smtp.gmail.com with ESMTPSA id fz16-20020a056870ed9000b00160323101efsm7400841oab.42.2023.01.31.20.36.09
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Tue, 31 Jan 2023 20:36:12 -0800 (PST)
Message-ID: <5ba79c4feb829ed75cfd98cf5c8042dcb2ddea91.camel@redhat.com>
Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining
From: Leonardo =?ISO-8859-1?Q?Br=E1s?= <leobras@redhat.com>
To: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Michal Hocko <mhocko@suse.com>, Johannes Weiner <hannes@cmpxchg.org>, 
 Roman Gushchin <roman.gushchin@linux.dev>, Shakeel Butt
 <shakeelb@google.com>, Muchun Song <muchun.song@linux.dev>,  Andrew Morton
 <akpm@linux-foundation.org>, cgroups@vger.kernel.org, linux-mm@kvack.org, 
 linux-kernel@vger.kernel.org
Date: Wed, 01 Feb 2023 01:36:07 -0300
In-Reply-To: <Y9j9BnMwfm4TJks7@tpad>
References: <20230125073502.743446-1-leobras@redhat.com>
	 <Y9DpbVF+JR/G+5Or@dhcp22.suse.cz>
	 <9e61ab53e1419a144f774b95230b789244895424.camel@redhat.com>
	 <Y9FzSBw10MGXm2TK@tpad> <Y9IvoDJbLbFcitTc@dhcp22.suse.cz>
	 <Y9LDAZmApLeffrT8@tpad> <Y9LQ615H13RmG7wL@dhcp22.suse.cz>
	 <0122005439ffb7895efda7a1a67992cbe41392fe.camel@redhat.com>
	 <Y9j9BnMwfm4TJks7@tpad>
User-Agent: Evolution 3.46.2
MIME-Version: 1.0
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: redhat.com
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Rspam-User: 
X-Rspamd-Server: rspam03
X-Stat-Signature: tpkecrhfz1n6q4cc7hf66ejzxhuyqncb
X-Rspamd-Queue-Id: 5F4D84001B
X-HE-Tag: 1675226176-299202
X-HE-Meta: U2FsdGVkX1+mJbCdnwPCpAyNOipjKiONAvPKUbHI/svpinOgXcdlfm1o00hIjMb+vDmcqHcet7hCe5WwKpYR7Ips0cw3jV3Cp80Dif4PWTO4U2U/IeiJ4MKUT6B8PiqLSkpnEobX0rBPAs3kDORHuLGhTHTQQosUG6x6arsrmzFZ9CIkloti398iE+YdsxN8tZITw2bbyNlosWnoJ+y9AwAeqynAnYJjCINuM4dT/19MCW+m/Syx2zk7c5IGUR6bCSYBvEC1ZscR6iZsjSYPDMNdoGKJjm0TWetjHGZUuzyxrFwm4yaw2usuLQHGKpdHcalPLKeTrwndV/FrrVlGmSDLO/g0JPVboxIfwB0p/d2mbCOt9FtaeM2AIql7aVFsXcBqHC57OBSsZRLBQq5Kgm/xcnI4PL5KsfcgR7iGKuu/uqyPGwgIWc06r06hwwhZT/aC8vTkzTy47Mjt6B9ZGpByluPs7FNYHBoP9ng17poGGMQ29QE/c1HzzKZRik3FIqPsTW3MR9uV1KD7g/wtJtVRmTGHez+PXul9dTiVWmsrtO54GRKPVv1t9g0YvEky+rLgdTEanLa+5bPWyIxAkCKVMWoKtqUw0XgkueyyQVUdrTmlCv5rPWcXTsV4JMfZd8pc9SnlejLOhzhGf3f9kpsB2y1zA1dQJigMaJ3zLVq8TmIeTLAhHckLZWv7pbjnLMKZXzVSFmm5a+SyPCt+ARYncpAUBTLDa3mk/qEWH/h8/L/+PMRS5P92gk59woa6QyNJY6t64dkPj7T3CbVDas7uxx1WhSs7rhP8GCvdC0QKM5VTrcL/w812EHRrwmtb1rBEybWok697I1SLoiO3dN+T8y4f7Wq937SayaqVHOsCEB4ixVrJMrFnFhln5+Rw0T4m/GLNim3IBEic9MCwxDrblNZuWxf/6gutEhrLlDMQ80eWHKs/lyA3vPHSnnb1JcawW5DCOVSQF+mKYvD
 O5O4go34
 Y7EyAWaj/fp32L3sJx9hg/8sb1+zgLUK1NrYWQ1nY5HNb03YdhiW1jVQ9YbvNcMJY18k5ohoJXcyajNwCO8sHYzZgB6DqanYYdfUJfk6rddjqRe7hsdoqWu6ItwNZz+Nve7IIZX7F+f3r5LBR0wf1Vt6PSZCt7AowjxZX81LWspkpLkJ62q8dyifAZuCvwWur5cPO5q5CKfJwQkRnM6ewNH5l7xpdgItQ5Ic2axJjCkv+rP+Y0pBQKupfaQo8pjXVI/jOL8jROL44VsrI/kAZHMsfo8qhdlCbeuVj7108RUqBtkBqW0WPNFpV4Pk/gJrRa9t1ud194T86kNcTGGwbqGAV2f313Ecyj9LOKS6Oskw9WUhNPvhAXr9+zm0ho3I9F8CcUSk4GhFYFHaBjHUVFvNR+hQps2ELVFZeYd//auhOzrlCYztiIMTmFpfeDtDJDskM7yZqp+fitDBn+BLQg8ENDKLnSOD2csCpXKlIsCGBNh5Ovmm9vXnZ8g==
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, 2023-01-31 at 08:35 -0300, Marcelo Tosatti wrote:
> On Fri, Jan 27, 2023 at 03:55:39AM -0300, Leonardo Br=C3=A1s wrote:
> > On Thu, 2023-01-26 at 20:13 +0100, Michal Hocko wrote:
> > > On Thu 26-01-23 15:14:25, Marcelo Tosatti wrote:
> > > > On Thu, Jan 26, 2023 at 08:45:36AM +0100, Michal Hocko wrote:
> > > > > On Wed 25-01-23 15:22:00, Marcelo Tosatti wrote:
> > > > > [...]
> > > > > > Remote draining reduces interruptions whether CPU=20
> > > > > > is marked as isolated or not:
> > > > > >=20
> > > > > > - Allows isolated CPUs from benefiting of pcp caching.
> > > > > > - Removes the interruption to non isolated CPUs. See for exampl=
e=20
> > > > > >=20
> > > > > > https://lkml.org/lkml/2022/6/13/2769
> > > > >=20
> > > > > This is talking about page allocato per cpu caches, right? In thi=
s patch
> > > > > we are talking about memcg pcp caches. Are you sure the same appl=
ies
> > > > > here?
> > > >=20
> > > > Both can stall the users of the drain operation.
> > >=20
> > > Yes. But it is important to consider who those users are. We are
> > > draining when
> > > =09- we are charging and the limit is hit so that memory reclaim
> > > =09  has to be triggered.
> > > =09- hard, high limits are set and require memory reclaim.
> > > =09- force_empty - full memory reclaim for a memcg
> > > =09- memcg offlining - cgroup removel - quite a heavy operation as
> > > =09  well.
> > > all those could be really costly kernel operations and they affect
> > > isolated cpu only if the same memcg is used by both isolated and non-=
isolated
> > > cpus. In other words those costly operations would have to be trigger=
ed
> > > from non-isolated cpus and those are to be expected to be stalled. It=
 is
> > > the side effect of the local cpu draining that is scheduled that affe=
cts
> > > the isolated cpu as well.
> > >=20
> > > Is that more clear?
> >=20
> > I think so, please help me check:

Michal, Roman: Could you please review my argumentation below, so I can
understand what exactly is wrong ?

> >=20
> > IIUC, we can approach this by dividing the problem in two working modes=
:
> > 1 - Normal, meaning no drain_all_stock() running.
> > 2 - Draining, grouping together pre-OOM and userspace 'config' : changi=
ng,
> > destroying, reconfiguring a memcg.
> >=20
> > For (1), we will have (ideally) only local cpu working on the percpu st=
ruct.
> > This mode will not have any kind of contention, because each CPU will h=
old it's
> > own spinlock only.=20
> >=20
> > For (2), we will have a lot of drain_all_stock() running. This will mea=
n a lot
> > of schedule_work_on() running (on upstream) or possibly causing content=
ion, i.e.
> > local cpus having to wait for a lock to get their cache, on the patch p=
roposal.
> >=20
> > Ok, given the above is correct:
> >=20
> > # Some arguments point that (1) becomes slower with this patch.
> >=20
> > This is partially true: while test 2.2 pointed that local cpu functions=
 running
> > time had became slower by a few cycles, test 2.4 points that the usersp=
ace
> > perception of it was that the syscalls and pagefaulting actually became=
 faster:
> >=20
> > During some debugging tests before getting the performance on test 2.4,=
 I
> > noticed that the 'syscall + write' test would call all those functions =
that
> > became slower on test 2.2. Those functions were called multiple million=
s of
> > times during a single test, and still the patched version performance t=
est
> > returned faster for test 2.4 than upstream version. Maybe the functions=
 became
> > slower, but overall the usage of them in the usual context became faste=
r.
> >=20
> > Is not that a small improvement?
> >=20
> > # Regarding (2), I notice that we fear contention=20
> >=20
> > While this seems to be the harder part of the discussion, I think we ha=
ve enough
> > data to deal with it.=20
> >=20
> > In which case contention would be a big problem here?=C2=A0
> > IIUC it would be when a lot of drain_all_stock() get running because th=
e memory
> > limit is getting near.=C2=A0I mean, having the user to create / modify =
a memcg
> > multiple times a second for a while is not something that is expected, =
IMHO.
>=20
> Considering that the use of spinlocks with remote draining is the more ge=
neral solution,
> what would be a test-case to demonstrate a contention problem?

IIUC we could try to reproduce a memory tight workload that keeps allocatin=
g /
freeing from different cpus (without hitting OOM).

Michal, Roman: Is that correct? You have any workload like that so we can t=
est?

>=20
> > Now, if I assumed correctly and the case where contention could be a pr=
oblem is
> > on a memcg with high memory pressure, then we have the argument that Ma=
rcelo
> > Tosatti brought to the discussion[P1]: using spinlocks on percpu caches=
 for page
> > allocation brought better results than local_locks + schedule_work_on()=
.
> >=20
> > I mean, while contention would cause the cpu to wait for a while before=
 getting
> > the lock for allocating a page from cache, something similar would happ=
en with
> > schedule_work_on(), which would force the current task to wait while th=
e
> > draining happens locally.=C2=A0
> >=20
> > What I am able to see is that, for each drain_all_stock(), for each cpu=
 getting
> > drained we have the option to (a) (sometimes) wait for a lock to be fre=
ed, or
> > (b) wait for a whole context switch to happen.
> > And IIUC, (b) is much slower than (a) on average, and this is what caus=
es the
> > improved performance seen in [P1].
>=20
> Moreover, there is a delay for the remote CPU to execute a work item=20
> (there could be high priority tasks, IRQ handling on the remote CPU,
> which delays execution of the work item even further).
>=20
> Also, the other point is that the spinlock already exists for
> PREEMPT_RT (which means that any potential contention issue=20
> with the spinlock today is limited to PREEMPT_RT users).
>=20
> So it would be good to point out a specific problematic=20
> testcase/scenario with using the spinlock in this particular case?


Oh, that's a good point, but IIUC spin_lock() replaces local_lock() in memc=
g,
meaning it will always run in the same CPU, and there should only be any
contention if the memcg local cpu functions get preempted by something that
calls a memcg local cpu function.

>=20
> > (I mean, waiting while drain_local_stock() runs in the local CPU vs wai=
ting for
> > it to run on the remote CPU may not be that different, since the cachel=
ine is
> > already writen to by the remote cpu on Upstream)
> >=20
> > Also according to test 2.2, for the patched version, drain_local_stock(=
) have
> > gotten faster (much faster for 128 cpus), even though it does all the d=
raining
> > instead of just scheduling it on the other cpus.=C2=A0
> > I mean, summing that to the brief nature of local cpu functions, we may=
 not hit
> > contention as much as we are expected.
> >=20
> > ##
> >=20
> > Sorry for the long text.
> > I may be missing some point, please let me know if that's the case.
> >=20
> > Thanks a lot for reviewing!
> > Leo
> >=20
> > [P1]: https://lkml.org/lkml/2022/6/13/2769
> >=20
> >=20
>=20

Thanks!
Leo