From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3C5A8C28B30 for ; Thu, 20 Mar 2025 14:43:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5CA22280005; Thu, 20 Mar 2025 10:43:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5546C280001; Thu, 20 Mar 2025 10:43:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A691280005; Thu, 20 Mar 2025 10:43:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 13C87280001 for ; Thu, 20 Mar 2025 10:43:50 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 7AEA2C154A for ; Thu, 20 Mar 2025 14:43:50 +0000 (UTC) X-FDA: 83242198620.16.1B57AAD Received: from mail-ed1-f51.google.com (mail-ed1-f51.google.com [209.85.208.51]) by imf17.hostedemail.com (Postfix) with ESMTP id 5043940009 for ; Thu, 20 Mar 2025 14:43:48 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pvn+tlk2; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of gthelen@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=gthelen@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742481828; a=rsa-sha256; cv=none; b=yDg8+N8j4VF16R38yhNbgkRNaepua+X84NZUWQqZrqJzMUUUxP6a0bglQGW3Wyb5o7Q9k0 YrBr9bIkmISHQ3RPiqG03hIqShXEmR6H9mv/+vCLb04575APW7KLvDAmSkTpgW4Qf7d4DQ mEDGIEdiI35vdi0mXMJbZ72NpFmIDuU= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=pvn+tlk2; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of gthelen@google.com designates 209.85.208.51 as permitted sender) smtp.mailfrom=gthelen@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742481828; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=wGT8HOddMLQhY6sBbkKYQxOTAVe3/AEazyhK+vDdPHw=; b=ytZ9+r2sP1xZQMpjiWVnNdmB36uIeiq5TQYmXxd5otz0oKgU7KLC9sAofhow+mz+1CyZyq IaMy53coCbVKdyOBjCNnk8c6/EcTxri8Zjtt4gQ607nRJ9c3yUIDogtMskIb+REy1kNRk+ Kkd1IdGZPuFYvtc5g/HCTm0NyD+7Xlc= Received: by mail-ed1-f51.google.com with SMTP id 4fb4d7f45d1cf-5e789411187so9190a12.1 for ; Thu, 20 Mar 2025 07:43:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742481827; x=1743086627; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wGT8HOddMLQhY6sBbkKYQxOTAVe3/AEazyhK+vDdPHw=; b=pvn+tlk207LadkqMv2cgPLr26tIhopOKWtZrjiSU0IKjbO1blOm+mYFhQuLm9+yn7A buWe9VF7ijE21M1welAQV7Eg9WjxXh+dWd2d4B4f0vgVBWuX1Pqv8yj1vQHGx5dDGA0r 3EiSq7gyjHv2dzwDOylDP9vZC0YOaGHakMLXxpAbYZhp+nlGQkXBZW0ZX1UzZzVn9OIP aVpnkhCrwLN4DgSUBXk28sFogssabIWrMMLUshadDuPZZ0Bzf0BTfrxvoD2ZyH9gkZra 11c7Wwu5L/OHHl8c/dNQDzaPPFkiJJ13yOa7IeByuscwwWmniHmDDIRGuBkNTTvx4Jk7 oPKQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742481827; x=1743086627; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wGT8HOddMLQhY6sBbkKYQxOTAVe3/AEazyhK+vDdPHw=; b=XicQkwYefnDgecdyheBsd3TmOxHLaZ1J8lyYg1GLULQbF+aDZCNcqBBPK9a2kNCS9c E32Z0wq6SH9ROEE/WS0R44HVXTRl4ZMCdDg3/1R82YF3w6hqx0KPo39usw5TaYrpQpHR SYGHpwhOTn9gtR7jOHf3LEj4Cqs7sx8XDlBWEsyQeRJDDyH43gTTax+cAGE7qqLOk9fu Al3g5lFzOs+WZaLzNw0See41F7D3UHEVqQ1kaFZ1ZobBbSq6pQAa7xhNh0dOSKbXKnZP cljj4k1SJEWZcQAYndpncBebu18INOKrcFlsZPXGyEuO4DiYTvU/3sxwxOYtMBZm96Iw t4wg== X-Forwarded-Encrypted: i=1; AJvYcCU3wNE3Gp1ekLNsnw1EWqzmRMOK3FGBs7Rx8QxTzKDgtd9e7mVGyKlamBK7YGafpndKo/gAt1mTPA==@kvack.org X-Gm-Message-State: AOJu0YxLJ2floafednOk2SiL07RIM1GnwSG/pC8Kefi9d1JY20YEhx1+ jPvyS5oMi5r9P0/+mqzhedxPNIfsjM0KD39UqZUrKmsuIRTMv2De6BpFpg8TnKTlj1StT4f/mRJ sGxE8SFRkPkHkBZrhFSO3tM3vl+xmn/lhFcPj X-Gm-Gg: ASbGnctsjZi+IpI+eHVhSXJhxHgJ9y8Hm7xwdDf+SvMCDyNX8AfqZHfyzsA8j2bBWa1 ivM+EmU7HkyMOUwg2yW1f+EpLZYMS2Aejsm1mPj++h8Lo8M2QUGM7xozQViQtJACCDvh3kqNQR3 DKmUyHjSOtTb7BJvQkOPWVw8Eo6g== X-Google-Smtp-Source: AGHT+IEIfxBJfgkvCqN7KZ1CjD2eqOP4kW0Od3M1A1rjOEVGB9a1+cYgt4BZRw48lCPJb9JyIB8HqibtwXhdyhauBk4= X-Received: by 2002:a50:cd93:0:b0:5de:bcd9:4aa with SMTP id 4fb4d7f45d1cf-5ebb557340amr77436a12.3.1742481826376; Thu, 20 Mar 2025 07:43:46 -0700 (PDT) MIME-Version: 1.0 References: <20250319071330.898763-1-gthelen@google.com> <54wer7lbgg4mgxv7ky5zzsgjv2vi4diu7clvcklxgmrp2u4gvn@tr2twe5xdtgt> In-Reply-To: <54wer7lbgg4mgxv7ky5zzsgjv2vi4diu7clvcklxgmrp2u4gvn@tr2twe5xdtgt> From: Greg Thelen Date: Thu, 20 Mar 2025 07:43:10 -0700 X-Gm-Features: AQ5f1JrsBCAzWz7lOGHe9DRZZeo48XSMPjuZ3fiimpdEWqFCeYJ39zulqPDioKA Message-ID: Subject: Re: [PATCH] cgroup/rstat: avoid disabling irqs for O(num_cpu) To: Shakeel Butt Cc: Tejun Heo , Johannes Weiner , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Andrew Morton , Eric Dumazet , Yosry Ahmed , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Eric Dumazet Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 5043940009 X-Stat-Signature: 8pmaomyx7fbe38derep7oq55pmedejby X-Rspamd-Server: rspam06 X-HE-Tag: 1742481828-650232 X-HE-Meta: U2FsdGVkX18AUHDfFGTulp2hkHKTnwH/wIyqyq+dcGdvLBDKSqMObqEV12EqcllbWzdAO9aTbo/PSWzLaDNEds9sJZb/nrho+4n1iGK0dEPTC5Zg86LAHEHgsQ0Zaz9IYFn4uxihLAV2j45hyZyV6Ecdc5qsifCs0L9rKzd5vmqCg+5S5YNDxxnqQC34Gu9Zd37JVaI91ecT0xBviyO3fb0lMnhuo10JTKLboNAwlxxwZIELmyIAO8hwYV/dWiOzKIV7ijrRqhXOVFubqgOeaOJx+PTT0wRu5GSviUtgb2aa4NKNrjjYDa+8XvMqqP8j170wqhC4GSCtAFmKdChk/KG3tYX5GcYzNkpvlqAHnlc4+wrDHAS3blP6wkrmS96WP0FkUrCSJ8jpVWDApfSzJAJsnk6DTdwhmbp224PpS3NvLkQf/fVxjoTKMzluGWMmMJL3AUhSpr27/H3gsb8SIuDCunAjhk88cJe4ScSB5RIZSuTjeKaDoQk3gTdbMOoONJSJFfe/zvrhggJs0nichrYxwS3eU+EJrp5YZYi0U9plJDS70ALdrCvjOQ2FDIDJ0O3Hce16Uza0JXqhWyZtKzksGKypNzX/kbFimKcriq5pARhgKLPzxK9pfMjbwxwOKJ6io+AI9c3IW1UWs31Xg+YcFWH3szIupisv1dSgTzUglwFIPOR8nvqyM5JYPubxomSJGDNiXG9vu+5EpjgmsfAHyBydtwefRumEL8ldRtn6Cg2z8jmP0NH0LsdmKdyJwlloEET53LT3CJVJyGcQ4NtShoqN3Ht8wOI/6g2xWsxlHT3w3CHtbCAPMW4/RTiHZvgnbLEuO71piyb1/rv4jvgbdVR/i2sxKIwpBVf8KEWT7IvYmf3Espb2heXafl1S7t18W+//6PvJJ7PU4VM/GVR8O47I+6q5wDNtmHSbg9Svn+cDMOO9M6ZWj2xKo+KD0RZtB/qWaA8TjVd9omg yINPumQW nMMxi7Ty92HiZOPqEjeFhZ9Chx329/E030QPT+J+rasPatk72Adfgby0jfOrt7uwqw38lhMJ019/55LG2r6zDzU047qvpPO0DoFPUGEIVZjJI4AFcqugbmqQEPOP2zni1Yq0RlyEVRfl3092jKISmpsrN30f4h5ZcrGL2gZ1DTYo4zeIYqKoVtL2exw7mv8wW18rnVgwdqyPtMcr4OIC7WSr+Xj83kq3MS0snKA588YBeDzKpLSx1JtUq0YntWerfCBdFoEm64R9Hyeh9yibQMe6AAaFC+TkEE+KHoxirvch+Ml/eOZBJubWchWw5t+fWXBSk/fFFw9Rwqp3dp9uPEr5l9A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Wed, Mar 19, 2025 at 10:53=E2=80=AFAM Shakeel Butt wrote: > > On Wed, Mar 19, 2025 at 12:13:30AM -0700, Greg Thelen wrote: > > From: Eric Dumazet > > > > cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while > > iterating all possible cpus. It only drops the lock if there is > > scheduler or spin lock contention. If neither, then interrupts can be > > disabled for a long time. On large machines this can disable interrupts > > for a long enough time to drop network packets. On 400+ CPU machines > > I've seen interrupt disabled for over 40 msec. > > Which kernel was this observed on in production? > > > > > Prevent rstat from disabling interrupts while processing all possible > > cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. > > Doing for each cpu might be too extreme. Have you tried with some > batching? > > > This > > approach was previously discussed in > > https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/, > > though this was in the context of an non-irq rstat spin lock. > > > > Benchmark this change with: > > 1) a single stat_reader process with 400 threads, each reading a test > > memcg's memory.stat repeatedly for 10 seconds. > > 2) 400 memory hog processes running in the test memcg and repeatedly > > charging memory until oom killed. Then they repeat charging and oom > > killing. > > > > Though this benchmark seems too extreme but userspace holding off irqs > for that long time is bad. BTW are these memory hoggers, creating anon > memory or file memory? Is [z]swap enabled? The memory hoggers were anon, without any form of swap. I think the other questions were answered in other replies, but feel free t= o re-ask and I'll provide details. > For the long term, I think we can use make this work without disabling > irqs, similar to how networking manages sock lock. > > > v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds > > interrupts are disabled by rstat for 45341 usec: > > # =3D> started at: _raw_spin_lock_irq > > # =3D> ended at: cgroup_rstat_flush > > # > > # > > # _------=3D> CPU# > > # / _-----=3D> irqs-off/BH-disabled > > # | / _----=3D> need-resched > > # || / _---=3D> hardirq/softirq > > # ||| / _--=3D> preempt-depth > > # |||| / _-=3D> migrate-disable > > # ||||| / delay > > # cmd pid |||||| time | caller > > # \ / |||||| \ | / > > stat_rea-96532 52d.... 0us*: _raw_spin_lock_irq > > stat_rea-96532 52d.... 45342us : cgroup_rstat_flush > > stat_rea-96532 52d.... 45342us : tracer_hardirqs_on <-cgroup_rstat= _flush > > stat_rea-96532 52d.... 45343us : > > =3D> memcg1_stat_format > > =3D> memory_stat_format > > =3D> memory_stat_show > > =3D> seq_read_iter > > =3D> vfs_read > > =3D> ksys_read > > =3D> do_syscall_64 > > =3D> entry_SYSCALL_64_after_hwframe > > > > With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the > > longest holder. The longest irqs-off holder has irqs disabled for > > 4142 usec, a huge reduction from previous 45341 usec rstat finding. > > > > Running stat_reader memory.stat reader for 10 seconds: > > - without memory hogs: 9.84M accesses =3D> 12.7M accesses > > - with memory hogs: 9.46M accesses =3D> 11.1M accesses > > The throughput of memory.stat access improves. > > > > The mode of memory.stat access latency after grouping by of 2 buckets: > > - without memory hogs: 64 usec =3D> 16 usec > > - with memory hogs: 64 usec =3D> 8 usec > > The memory.stat latency improves. > > So, things are improving even without batching. I wonder if there are > less readers then how will this look like. Can you try with single > reader as well? > > > > > Signed-off-by: Eric Dumazet > > Signed-off-by: Greg Thelen > > Tested-by: Greg Thelen > > Reviewed-by: Shakeel Butt >