From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8FB99C4345F for ; Fri, 19 Apr 2024 16:11:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 07AE36B0089; Fri, 19 Apr 2024 12:11:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 028E76B008A; Fri, 19 Apr 2024 12:11:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E32F56B0092; Fri, 19 Apr 2024 12:11:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id C437E6B0089 for ; Fri, 19 Apr 2024 12:11:46 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 34DC2A2547 for ; Fri, 19 Apr 2024 16:11:46 +0000 (UTC) X-FDA: 82026772212.10.C6C41A2 Received: from out-184.mta1.migadu.com (out-184.mta1.migadu.com [95.215.58.184]) by imf10.hostedemail.com (Postfix) with ESMTP id 1C521C0024 for ; Fri, 19 Apr 2024 16:11:42 +0000 (UTC) Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=K1flVzQT; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1713543103; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=1M1JrcjILmKzY/pZvXVWiYtUaqkIh9p05oVmfNf0rww=; b=6ITTEsb1d9IpTDWQhRqmtIiJloOttkDXkwzZOmxVd663cqNDxBd08quU8MexXllNnyfmqX ttD80Q5Gwr4uNtamHE8hGBo0uGUclDXkN7HDLJrmWgPbKSmdvsI1G/rhfLdV8l5e7dgyvO hlvQu4POCzCMEZOOYDRR/V+Y74oTn4U= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=K1flVzQT; dmarc=pass (policy=none) header.from=linux.dev; spf=pass (imf10.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.184 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1713543103; a=rsa-sha256; cv=none; b=XcK/y01e2+QnXMOECzSrJhCkp1LaV8JLvXMmH9f+W2Fbgpb0I46hQSsJCr+rXchrRwUAwi O3evA0Pjpj3lC3v2dtRXvo61iwcb+Q3QHxtziqcVf9GjrpA1mKed6HCB4gHV21hP9nKEhP ie7MbyFvl9UVtfsZs+6jJLDaoCZ1VIA= Date: Fri, 19 Apr 2024 09:11:35 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1713543100; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=1M1JrcjILmKzY/pZvXVWiYtUaqkIh9p05oVmfNf0rww=; b=K1flVzQTM+oFA2r7zbEZfWL1Cx45FZeshKvy/vWeELZmkSdsf1KvuEo2B1ZxDCkvrBw2te R6sUPQwVqPUrsBOcAs0PiOqaZKW0Z3YrHoCJbASaz+8VkxORdept/DSJTLhjQW4N4ye1xQ 25T+Xm6zatF7tmP04YMcsMfql4Xb+M0= X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Jesper Dangaard Brouer Cc: Yosry Ahmed , tj@kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, cgroups@vger.kernel.org, longman@redhat.com, netdev@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, kernel-team@cloudflare.com, Arnaldo Carvalho de Melo , Sebastian Andrzej Siewior , mhocko@kernel.org Subject: Re: [PATCH v1 2/3] cgroup/rstat: convert cgroup_rstat_lock back to mutex Message-ID: References: <171328983017.3930751.9484082608778623495.stgit@firesoul> <171328989335.3930751.3091577850420501533.stgit@firesoul> <651a52ac-b545-4b25-b82f-ad3a2a57bf69@kernel.org> <6392f7e8-d14c-40f4-8a19-110dfffb9707@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <6392f7e8-d14c-40f4-8a19-110dfffb9707@kernel.org> X-Migadu-Flow: FLOW_OUT X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 1C521C0024 X-Stat-Signature: 1gp3agjp3913794fnwe5j3drft7hfj3m X-HE-Tag: 1713543102-361332 X-HE-Meta: U2FsdGVkX1/NR0vTRqdCi6l9gbS3cEA6RgAZ8AORUMga/n+pZltKSU24sc4bVM725yyHsPOCMkdqcmd0dRyRGY3yPSxarHsePCbsCoC0VQ9Yxbi64FLXl9wxLIbEbxqRuPAKudF4uzmDPIVIXhQTN1+yu1HTaxf7FKWm2Y2mIYLas5VIsUv7X3EcoZcBKHZwTH2pTYPP2ZBjI7OQ8WsWc79mJPEySQvMGy7Pw862q/qfw+cpCrwQjPDl0lLsYVMv2ewwrlNVsfeaMkGOEQ0vt2cnq2dvAtRDz4o79pJSmGM+EJEbO6wezQJdobU7Y+4o0tlGWUFNShbGeqZmkNgKVSfBf3rB2zktStxAqOO8iZGupgYvWZbtSRs0EsnwEZThZFSWQDOIvEVJjQKm7rENWDZg8iPVVBY1bpzgR5ZPvs1ke7DYwNqgvdUr2XRXGrH7gmRg5eUqS8kJAblb6cAflvbKG992sUlmmFn5Ace7qRakjRzkx1GXmXh97I7ynXqZzoQdmjNVHPysw/wNDGeK9Ci3EaJm/agcc+OV8ObdJP9P3CsYORoUBycFKx/5XE3FxeLHJ+1SmWij1GmS+rraB1JvnUUWoyoUXt+6skQaFQ+oo7A+kuXYWydC35WZM30NhphQZpUrSPgb4ev/Vrjll4d4q9S3uI2i9mu/EZFEMnzqrinep3rTh/eWCPeSWaznAyMjW1uHb672pqncRuCJkj1TYIaZJfdC7qHIqSxAfwZBsg1p28yOBLiOGw1TLAEQgPTWu9Ckwz0kLBMcu3wzIYo1F2XnWSb7ZKHhWIaUmAlPxrOJNlDjRyaaIeLIH17aCsFHmf94OTYYSMicZtbBc9fuRtS2BG58aSTnS9IiriOX58c37h0JaRMZVOXvrKzKv5gwSygRtCAhVyPzo8pnHvDtJATJTES2gigcyiJ4GjxL9oI2jBYxLv6MtQBd+jlDXZvRp8MGd4o+wEfuF0T Ihvesbfu Jwnzm5FsKisOYGyd9guILa0s7zBf6OSPHhr3sfjDmX/t598oSVKEgn7Nw2H1U1ZNVyTMbsQxDe1PpyEo2XDY8xtC3hv3b+zTosJmcfQU0ilZllbOXu8QO5o/75lG5owBSzPqUzKjzUwI1LIWaDgLTk0iHfZRI1lFjNS0SaOqruOKj+qlsAZw1Yv2ocq9WIkdqomKVYHAjKFx5Ixr8iDQznZ9IyOZUo2TyoOjFb5Km0X6az+yz5wtEzsU+ZT9Elcg1maQDq+Zkb3S7bcu/k872iJSqCsAYFTrCBY2lpeDl9Uittts= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Fri, Apr 19, 2024 at 03:15:01PM +0200, Jesper Dangaard Brouer wrote: > > > On 18/04/2024 22.39, Yosry Ahmed wrote: > > On Thu, Apr 18, 2024 at 7:49 AM Shakeel Butt wrote: > > > > > > On Thu, Apr 18, 2024 at 11:02:06AM +0200, Jesper Dangaard Brouer wrote: > > > > > > > > > > > > On 18/04/2024 04.19, Yosry Ahmed wrote: > > > [...] > > > > > > > > > > I will keep the high-level conversation about using the mutex here in > > > > > the cover letter thread, but I am wondering why we are keeping the > > > > > lock dropping logic here with the mutex? > > > > > > > > > > > > > I agree that yielding the mutex in the loop makes less sense. > > > > Especially since the raw_spin_unlock_irqrestore(cpu_lock, flags) call > > > > will be a preemption point for my softirq. But I kept it because, we > > > > are running a CONFIG_PREEMPT_VOLUNTARY kernel, so I still worried that > > > > there was no sched point for other userspace processes while holding the > > > > mutex, but I don't fully know the sched implication when holding a mutex. > > > > > > > > > > Are the softirqs you are interested in, raised from the same cpu or > > > remote cpu? What about local_softirq_pending() check in addition to > > > need_resched() and spin_needbreak() checks? If softirq can only be > > > raised on local cpu then convert the spin_lock to non-irq one (Please > > > correct me if I am wrong but on return from hard irq and not within bh > > > or irq disabled spin_lock, the kernel will run the pending softirqs, > > > right?). Did you get the chance to test these two changes or something > > > similar in your prod environment? > > > > I tried making the spinlock a non-irq lock before, but Tejun objected [1]. > > [1] https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/ > > > > After reading [1], I think using a mutex is a better approach (than non-irq > spinlock). > > > > Perhaps we could experiment with always dropping the lock at CPU > > boundaries instead? > > > > I don't think this will be enough (always dropping the lock at CPU > boundaries). My measured "lock-hold" times that is blocking IRQ (and > softirq) for too long. When looking at prod with my new cgroup > tracepoint script[2]. When contention occurs, I see many Yields > happening and with same magnitude as Contended. But still see events > with long "lock-hold" times, even-though yields are high. > > [2] https://github.com/xdp-project/xdp-project/blob/master/areas/latency/cgroup_rstat_tracepoint.bt > > Example output: > > 12:46:56 High Lock-contention: wait: 739 usec (0 ms) on CPU:56 comm:kswapd7 > 12:46:56 Long lock-hold time: 6381 usec (6 ms) on CPU:27 comm:kswapd3 > 12:46:56 Long lock-hold time: 18905 usec (18 ms) on CPU:100 > comm:kworker/u261:12 > > 12:46:56 time elapsed: 36 sec (interval = 1 sec) > Flushes(2051) 15/interval (avg 56/sec) > Locks(44464) 1340/interval (avg 1235/sec) > Yields(42413) 1325/interval (avg 1178/sec) > Contended(42112) 1322/interval (avg 1169/sec) > > There is reported 15 flushes/sec, but locks are yielded quickly. > > More problematically (for softirq latency) we see a Long lock-hold time > reaching 18 ms. For network RX softirq I need lower than 0.5ms latency, > to avoid RX-ring HW queue overflows. > > > --Jesper > p.s. I'm seeing a pattern with kswapdN contending on this lock. > > @stack[697, kswapd3]: > __cgroup_rstat_lock+107 > __cgroup_rstat_lock+107 > cgroup_rstat_flush_locked+851 > cgroup_rstat_flush+35 > shrink_node+226 > balance_pgdat+807 > kswapd+521 > kthread+228 > ret_from_fork+48 > ret_from_fork_asm+27 > > @stack[698, kswapd4]: > __cgroup_rstat_lock+107 > __cgroup_rstat_lock+107 > cgroup_rstat_flush_locked+851 > cgroup_rstat_flush+35 > shrink_node+226 > balance_pgdat+807 > kswapd+521 > kthread+228 > ret_from_fork+48 > ret_from_fork_asm+27 > > @stack[699, kswapd5]: > __cgroup_rstat_lock+107 > __cgroup_rstat_lock+107 > cgroup_rstat_flush_locked+851 > cgroup_rstat_flush+35 > shrink_node+226 > balance_pgdat+807 > kswapd+521 > kthread+228 > ret_from_fork+48 > ret_from_fork_asm+27 > Can you simply replace mem_cgroup_flush_stats() in prepare_scan_control() with the ratelimited version and see if the issue still persists for your production traffic? Also were you able to get which specific stats are getting the most updates?