From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10DC0C3ABB6 for ; Mon, 5 May 2025 20:49:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 0F3976B0099; Mon, 5 May 2025 16:49:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 079706B009A; Mon, 5 May 2025 16:49:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E82846B009B; Mon, 5 May 2025 16:49:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id C76996B0099 for ; Mon, 5 May 2025 16:49:57 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 88DFFB9559 for ; Mon, 5 May 2025 20:49:58 +0000 (UTC) X-FDA: 83410046076.10.DA0FDDF Received: from out-174.mta1.migadu.com (out-174.mta1.migadu.com [95.215.58.174]) by imf10.hostedemail.com (Postfix) with ESMTP id A3C25C0009 for ; Mon, 5 May 2025 20:49:56 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ahAcf858; spf=pass (imf10.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.174 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1746478197; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=OXnskhOWpeWI0e/0mm8lMZ5XqD+oRKL2LPwz4l/PMBI=; b=1/tsNvcGQMtl/iW3Rd6mXRO3/8Q50AqJb8fS+M4n8p8SrHHy6sFgNZhubszGYz4w6yBM6x 7GbzKsomZkLeIgTd1MQNr1RTmx9ylpTGrsQWNee4057oUXmJS+SOdT1hoKkoJXopIanD05 Qin6ju0ZloZasOGEMmlok44c++5Ei4w= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=ahAcf858; spf=pass (imf10.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.174 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1746478197; a=rsa-sha256; cv=none; b=08Zg8DHXFtMQyVE9r4d03JDl+5S8vl3HVS2DAvnWX+jEDK/9gK71FUr92y/U++aeR22qKm ia6nVFghwVh0Vmi+2gfMv1qqnFzd09P9G9NY9eXZGw2BkqcCKj+oweth+gsWiKwCALdBOV 3igNwGjc0a7DxXd1e+XCxynq3QJX918= Date: Mon, 5 May 2025 13:49:46 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1746478194; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=OXnskhOWpeWI0e/0mm8lMZ5XqD+oRKL2LPwz4l/PMBI=; b=ahAcf858/5Hm3ezeiisZ4WTTLdRS5hJGHpFzbfKMCqqLIuBQj7tciw3L3RZcPfd5PO1gTK QWLKGxM8ABZv/l4euaYLcdZIjUPgP2i0nYO7qmvo4IZcAubQH3T6+48kFKU9m3oHGuVZdL mmRjo4RYH9HxPCzyMhWBRplB2m/AAzM= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Vlastimil Babka Cc: Shakeel Butt , Alexei Starovoitov , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Alexei Starovoitov , linux-mm , "open list:CONTROL GROUP (CGROUP)" , bpf , LKML , Meta kernel team , Sebastian Andrzej Siewior , Eric Dumazet , Jakub Kicinski , Paolo Abeni , "David S. Miller" , netdev@vger.kernel.org Subject: Re: [PATCH v2 3/3] memcg: no irq disable for memcg stock lock Message-ID: References: <20250502001742.3087558-1-shakeel.butt@linux.dev> <20250502001742.3087558-4-shakeel.butt@linux.dev> <81a2e692-dd10-4253-afbc-062e0be67ca4@suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Stat-Signature: opnktp3w16zne1fw88f61rjt8r1i7txq X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A3C25C0009 X-Rspam-User: X-HE-Tag: 1746478196-481997 X-HE-Meta: U2FsdGVkX18PA/Am1cQG9XHNoP7AKF2KesNThj2lP3zljsYo8cuvxAAUJhBlqTPKkFHyA9oBsn8/+MayOFi6bAW/MNyN+ysKb1aP1ucT+KgzkpOChSXU6N8ZLNw0gCCU/j4qvdF8MpO8o/6Hd6Sc5RM691JVEqi62TZOfNfj66pchsKrQ64sBruxohahy5IBg8kts89MZ0vpx7V0Zuq9eN1BkNOUX3hQa7ANJ+ExI7zwTex253Y8f0aeB6JwYlwtHfnV+VAjWS4vr0n3fq2+8Xabuf8Sdm7BphtsfxqEekv8SyVtlOfIZIfH2fRJD3upQGbx9Cwi6dwP3cUsSGat+ZJcJ5Fu3gf+JcGG8q55lrRnj0kjj1fggExQKj0wbiwEuT6gHm1CeCNhHKbLfwWfwha5NeXmzaN/Hx4zjrLXc08sYnzgVWPZrPfmGmwS+MNWmEU1E36n+TIlgEaiXECGW9Xc5/26jomNLWXajazM39Zo/7JwalsLW2XCb5e/pWVZjIiaDfbwFlA/zls3Z/mmeujkr01P9cl75RgIMDAxCwXwC/O7q9+M7xhWpPJ4OjHhhqhnjhsURJY5GWjs0GttV/fX2kWLzEM8uLN2GGNvoiEOwOoHcncsGVPUBMGgWlClFKz7/11ZLxv0v9MXz91p2ZVsvz2zw702iFzzPPhi7Tp7Fh3xgkbx9PMrTHuY/dhcxiDIKqk562b7VAKbZ7hbOypnFDBNC9S7RE3EDbTCOKWB9LSYRw9MzX6kWlBEb0OATyoNiGZg8P99LV2baLcEtPX2LGDcAIttNj1K5zoOJJ/httzZFMgwYpIJOzxaUI6uLF/jg5QXot/kpsN6EgpQoqTwoB/sU7bHoFYm6ZwG52Cq/vrWq1JGotRPGQ2ZxgIWHUtA9yU5z/ffngYTvjYhM3/k/18XvTJZpad8RXIcWGvQOQZX7yJoqw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, May 05, 2025 at 10:13:37AM -0700, Shakeel Butt wrote: > Ccing networking folks. > > Background: https://lore.kernel.org/dvyyqubghf67b3qsuoreegqk4qnuuqfkk7plpfhhrck5yeeuic@xbn4c6c7yc42/ > > On Mon, May 05, 2025 at 12:28:43PM +0200, Vlastimil Babka wrote: > > On 5/3/25 01:03, Shakeel Butt wrote: > > >> > index cd81c70d144b..f8b9c7aa6771 100644 > > >> > --- a/mm/memcontrol.c > > >> > +++ b/mm/memcontrol.c > > >> > @@ -1858,7 +1858,6 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages, > > >> > { > > >> > struct memcg_stock_pcp *stock; > > >> > uint8_t stock_pages; > > >> > - unsigned long flags; > > >> > bool ret = false; > > >> > int i; > > >> > > > >> > @@ -1866,8 +1865,8 @@ static bool consume_stock(struct mem_cgroup *memcg, unsigned int nr_pages, > > >> > return ret; > > >> > > > >> > if (gfpflags_allow_spinning(gfp_mask)) > > >> > - local_lock_irqsave(&memcg_stock.lock, flags); > > >> > - else if (!local_trylock_irqsave(&memcg_stock.lock, flags)) > > >> > + local_lock(&memcg_stock.lock); > > >> > + else if (!local_trylock(&memcg_stock.lock)) > > >> > return ret; > > >> > > >> I don't think it works. > > >> When there is a normal irq and something doing regular GFP_NOWAIT > > >> allocation gfpflags_allow_spinning() will be true and > > >> local_lock() will reenter and complain that lock->acquired is > > >> already set... but only with lockdep on. > > > > > > Yes indeed. I dropped the first patch and didn't fix this one > > > accordingly. I think the fix can be as simple as checking for > > > in_task() here instead of gfp_mask. That should work for both RT and > > > non-RT kernels. > > > > These in_task() checks seem hacky to me. I think the patch 1 in v1 was the > > correct way how to use the local_trylock() to avoid these. > > > > As for the RT concerns, AFAIK RT isn't about being fast, but about being > > preemptible, and the v1 approach didn't violate that - taking the slowpaths > > more often shouldn't be an issue. > > > > Let me quote Shakeel's scenario from the v1 thread: > > > > > I didn't really think too much about PREEMPT_RT kernels as I assume > > > performance is not top priority but I think I get your point. Let me > > > > Agreed. > > > > > explain and correct me if I am wrong. On PREEMPT_RT kernel, the local > > > lock is a spin lock which is actually a mutex but with priority > > > inheritance. A task having the local lock can still get context switched > > > > Let's say (seems implied already) this is a low prio task. > > > > > (but will remain on same CPU run queue) and the newer task can try to > > > > And this is a high prio task. > > > > > acquire the memcg stock local lock. If we just do trylock, it will > > > always go to the slow path but if we do local_lock() then it will sleeps > > > and possibly gives its priority to the task owning the lock and possibly > > > make that task to get the CPU. Later the task slept on memcg stock lock > > > will wake up and go through fast path. > > > > I think from RT latency perspective it could very much be better for the > > high prio task just skip the fast path and go for the slowpath, instead of > > going to sleep while boosting the low prio task to let the high prio task > > use the fast path later. It's not really a fast path anymore I'd say. > > Thanks Vlastimil, this is actually a very good point. Slow path of memcg > charging is couple of atomic operations while the alternative here is at > least two context switches (and possibly scheduler delay). So, it does > not seem like a fast path anymore. > > I have cc'ed networking folks to get their take as well. Orthogonally I > will do some netperf benchmarking on v1 with RT kernel. Let me share the result with PREEMPT_RT config on next-20250505 with and without the v1 of this series. I ran varying number of netperf clients in different cgroups on a 72 CPU machine. $ netserver -6 $ netperf -6 -H ::1 -l 60 -t TCP_SENDFILE -- -m 10K number of clients | Without series | With series 6 | 38559.1 Mbps | 38652.6 Mbps 12 | 37388.8 Mbps | 37560.1 Mbps 18 | 30707.5 Mbps | 31378.3 Mbps 24 | 25908.4 Mbps | 26423.9 Mbps 30 | 22347.7 Mbps | 22326.5 Mbps 36 | 20235.1 Mbps | 20165.0 Mbps I don't see any significant performance difference for the network intensive workload with this series. I am going to send out v3 which will be rebased version of v1 with all these details unless someone has concerns about this.