From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 20F19C38142 for ; Wed, 1 Feb 2023 18:31:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7237B6B0072; Wed, 1 Feb 2023 13:31:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D2946B0073; Wed, 1 Feb 2023 13:31:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 59CA26B0074; Wed, 1 Feb 2023 13:31:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 4B7296B0072 for ; Wed, 1 Feb 2023 13:31:52 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 1074FA0F30 for ; Wed, 1 Feb 2023 18:31:52 +0000 (UTC) X-FDA: 80419566864.09.CF68CC2 Received: from out-201.mta1.migadu.com (out-201.mta1.migadu.com [95.215.58.201]) by imf28.hostedemail.com (Postfix) with ESMTP id 73B0CC0024 for ; Wed, 1 Feb 2023 18:31:48 +0000 (UTC) Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=lmsm5EJw; spf=pass (imf28.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.201 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1675276308; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2k9RdWKbuchR7Tzph+cOMm2F328+gYOorWNDRphKPTY=; b=KOrBtcSqGHsu4qDlUP7PDK5Nmiix60kg9PtuJoWFdYFCEFToOpUT0Hg/Wv87zbVOdt+IEG Lofhk9VGmiQbAaP9r4z0x7ubt5wET7SmT4htd5mw1zHN1j8VzddSRpDGnOhOchceYbTz3h keo4vWec/MMU7LDxFXravWS/Q9MdLvA= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=lmsm5EJw; spf=pass (imf28.hostedemail.com: domain of roman.gushchin@linux.dev designates 95.215.58.201 as permitted sender) smtp.mailfrom=roman.gushchin@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675276308; a=rsa-sha256; cv=none; b=RbNUQ24EY7D/KEMH72phaeN1CZccBap9Lytcx/oB0wQWUKqmDp5JTfIiuztKnmtrzNZGZu 50s/A6eu3HsUs+RUXtjZaFCIAwP3gfzXq3iPP+gBDRgYeLmum3T0a//jx9tVLKSlQT8YAh 3J9DFQUcTJoAux2hy/h4DsdprsgSDZ4= Date: Wed, 1 Feb 2023 10:31:40 -0800 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1675276305; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=2k9RdWKbuchR7Tzph+cOMm2F328+gYOorWNDRphKPTY=; b=lmsm5EJw7Ag76Ab2u0I+odcANv0IJvSUVxiwTozoluqHE0itOLpy3pK5ukk3jFQ+0ZbszO cTqWApnPoXqtuihkawaz9n7RekvDrLJlvDoGpCOwa9TN/xc5xpQlU5QClV/g0GZuhsQm6f dndefcr6GKweZLSl45WWLTr2WF3u46E= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Roman Gushchin To: Michal Hocko Cc: Marcelo Tosatti , Leonardo =?iso-8859-1?Q?Br=E1s?= , Johannes Weiner , Shakeel Butt , Muchun Song , Andrew Morton , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v2 0/5] Introduce memcg_stock_pcp remote draining Message-ID: References: <20230125073502.743446-1-leobras@redhat.com> <9e61ab53e1419a144f774b95230b789244895424.camel@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 73B0CC0024 X-Rspam-User: X-Stat-Signature: jp6prcjhybb84c4fhud1654ib4an73ee X-HE-Tag: 1675276308-998679 X-HE-Meta: U2FsdGVkX1/UoIJoLOzoP3ggrGDMEPYK2oZ+XKQ6xSyib/sFhtNHX8y+ul9Tw2GO9MpmZuOloAZuf2LB7Y3nljeH0YZNOw6iAE2aDI/MkrUOzFGzo2fVYlW7UDap/F9qRM/w6RNVAzCfshFDfxkXIiNecLXF+VCn4hK7jLbhmTMITKsJN0DSAKw+iebYJ0k6kms5XLiwd5ZXFf5nYmRxwz5oAiZmpbhZeyYvhUcNj7FGHbfFNHdzG2Lln3+E0vXYpAj/C4Tg8yN7o8Gz+OkSbaEUpJn494jAbSAzCCHkcoborcO2B5aDL/2wfyqdjFj10Tx/i6LwsrlDORcxh+/8s8ZEZBj3cOueUyZ5EUaAqsep/+YO/KhguQMF5U+4qMaVtryqZunPOlFfwScqwJKMYY0F8u+OzveezqTZ4aDQ9KU3Xyn9MbanI/bgFtb8jkCVeVCwO+IsCueaZFxYpLgGKn82ywcQS1qUrq2L8cGiIumJL9C3dJCKMddlbe6ZxR3UHMyBlLa8OMNJdU/Avg6URngc/4tPIbcuj3P92nCjrYGncypnNycbgBiRuhqvG1165Ny/UXpY7G4MkaZsq2oz+vGp9PxRVTMxkbTYBiErfGSPAOPcLdfh5dGj4Sh5YeCpmPAP3G6qKVmXWUnhKc71GNX5WK1JcUF3tC4u2hCvZii4pN4Waw/B5nQU4hnRMNMRTiFaPmOu/q09RRZnAJRw4O/bndTDXKikPOR2QjzABOpFMYOQ7qd/4o67IWNttRDDrOR6PLkHLM3n5JshWFZqbJxW2zOmqylwLT1kIDwWbhMwKJGffDZu80b6HCYg8fXxwBIiqLlXiek69/tWmTsT2I0SwNrV4ROk36nhba+ZPZ/Ms1E57RJ0fZhraE/6i9d/aFYh1gnvculQKwrJlO//ah/QJ5AlWivLAdTG37BBChps0A1BHqB7wqJ7i2sULB6d08D6hXRaE/+Fyx1rBqZ zodwPzhG DIp2+SgmvH7UrRvfUYtBAdJ0GYA9z0FK7ZETtVl/Y0uq94CouvAk3t/5NOT2YSVoqDAL8mBpRXvuKIGDlDyHu91DbNKFlTSnLLDGt8NRWX/DowMciQf6Pr+lWhJivJsBfEVRNlEF+kTFxIW6MWzLhf7xkY4crzxoA8lDCxW9HfAM9Bl5xf4qlhlUCFObdN2FjWnu87bhuNylG5uuuVHiFL9ICVg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jan 26, 2023 at 08:20:46PM +0100, Michal Hocko wrote: > On Thu 26-01-23 15:03:43, Marcelo Tosatti wrote: > > On Thu, Jan 26, 2023 at 08:41:34AM +0100, Michal Hocko wrote: > > > On Wed 25-01-23 15:14:48, Roman Gushchin wrote: > > > > On Wed, Jan 25, 2023 at 03:22:00PM -0300, Marcelo Tosatti wrote: > > > > > On Wed, Jan 25, 2023 at 08:06:46AM -0300, Leonardo Brás wrote: > > > > > > On Wed, 2023-01-25 at 09:33 +0100, Michal Hocko wrote: > > > > > > > On Wed 25-01-23 04:34:57, Leonardo Bras wrote: > > > > > > > > Disclaimer: > > > > > > > > a - The cover letter got bigger than expected, so I had to split it in > > > > > > > > sections to better organize myself. I am not very confortable with it. > > > > > > > > b - Performance numbers below did not include patch 5/5 (Remove flags > > > > > > > > from memcg_stock_pcp), which could further improve performance for > > > > > > > > drain_all_stock(), but I could only notice the optimization at the > > > > > > > > last minute. > > > > > > > > > > > > > > > > > > > > > > > > 0 - Motivation: > > > > > > > > On current codebase, when drain_all_stock() is ran, it will schedule a > > > > > > > > drain_local_stock() for each cpu that has a percpu stock associated with a > > > > > > > > descendant of a given root_memcg. > > > > > > > > Do you know what caused those drain_all_stock() calls? I wonder if we should look > > > > into why we have many of them and whether we really need them? > > > > > > > > It's either some user's actions (e.g. reducing memory.max), either some memcg > > > > is entering pre-oom conditions. In the latter case a lot of drain calls can be > > > > scheduled without a good reason (assuming the cgroup contain multiple tasks running > > > > on multiple cpus). > > > > > > I believe I've never got a specific answer to that. We > > > have discussed that in the previous version submission > > > (20221102020243.522358-1-leobras@redhat.com and specifically > > > Y2TQLavnLVd4qHMT@dhcp22.suse.cz). Leonardo has mentioned a mix of RT and > > > isolcpus. I was wondering about using memcgs in RT workloads because > > > that just sounds weird but let's say this is the case indeed. > > > > This could be the case. You can consider an "edge device" where it is > > necessary to run a RT workload. It might also be useful to run > > non realtime applications on the same system. > > > > > Then an RT task or whatever task that is running on an isolated > > > cpu can have pcp charges. > > > > Usually the RT task (or more specifically the realtime sensitive loop > > of the application) runs entirely on userspace. But i suppose there > > could be charges on application startup. > > What is the role of memcg then? If the memory limit is in place and the > workload doesn't fit in then it will get reclaimed during start up and > memory would need to be refaulted if not mlocked. If it is mlocked then > the limit cannot be enforced and the start up would likely fail as a > result of the memcg oom killer. > > [...] > > > > Overall I'm somewhat resistant to an idea of making generic allocation & free paths slower > > > > for an improvement of stock draining. It's not a strong objection, but IMO we should avoid > > > > doing this without a really strong reason. > > > > > > Are you OK with a simple opt out on isolated CPUs? That would make > > > charges slightly slower (atomic on the hierarchy counters vs. a single > > > pcp adjustment) but it would guarantee that the isolated workload is > > > predictable which is the primary objective AFAICS. > > > > This would make isolated CPUs "second class citizens": it would be nice > > to be able to execute non realtime apps on isolated CPUs as well > > (think of different periods of time during a day, one where > > more realtime apps are required, another where less > > realtime apps are required). > > An alternative requires to make the current implementation that is > lockless to use locks and introduce potential lock contention. This > could be harmful to regular workloads. Not using pcp caching would make > a fast path using few atomics rather than local pcp update. That is not > a terrible cost to pay for special cased workloads which use isolcpus. > Really we are not talking about a massive cost to be payed. At least > nobody has shown that in any numbers. Can't agree more. I also agree that the whole pcpu stock draining code can be enhanced, but I believe we should go into the direction almost directly opposite to what's being proposed here. Can we please return to the original problem which the patchset aims to solve? Is it the latency introduced by execution of draining works on isolated cpus? Maybe schedule these works with a delay and cancel them if the draining occurred naturally during the delay? Thanks!