From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F945C35FFA for ; Wed, 19 Mar 2025 07:17:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1B2F4280003; Wed, 19 Mar 2025 03:17:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 162BC280001; Wed, 19 Mar 2025 03:17:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 02B4E280003; Wed, 19 Mar 2025 03:17:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id D648A280001 for ; Wed, 19 Mar 2025 03:17:51 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5A0551CD050 for ; Wed, 19 Mar 2025 07:17:53 +0000 (UTC) X-FDA: 83237446026.27.38180E0 Received: from mail-lf1-f50.google.com (mail-lf1-f50.google.com [209.85.167.50]) by imf09.hostedemail.com (Postfix) with ESMTP id 3D00F14000A for ; Wed, 19 Mar 2025 07:17:50 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=llJzpXks; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of gthelen@google.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=gthelen@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1742368671; a=rsa-sha256; cv=none; b=XlAJMdPIueVj6FzQxKdoMDBtBCFksoKxhRNoKSCCmIFlbYnTTvJn88wkXo62P6rHjwwh9v rqnyCk8/LTXAioxBCMRwckwJaGUt/qFl7y9ugshOnVEWcBsBc5g+yHcr2ejDElJYNgEUCp uEmlahuSG4fWLcPunvO2UWDRYjrYW0E= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=llJzpXks; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of gthelen@google.com designates 209.85.167.50 as permitted sender) smtp.mailfrom=gthelen@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1742368671; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2KU9HnwjDebldqtBx4QeCjTCYcgz8DT96ZzxqdTu8Q8=; b=Asll0X9FYr68I6vRrr2DiLqFJjeUGDF+8Pb9Fxp9zYN7gohjLGnW10JL3LGjBT5MLxcR5V /DDTfNa9nJBoNclwjSoChHvSYIisZGR+ivVxTlP8K+zAHxDDy2HkQHcpnb/RZPxwlEKVMj sKxNilS6HkOdhqixhQWydx5gLHzS2k8= Received: by mail-lf1-f50.google.com with SMTP id 2adb3069b0e04-5498c5728aeso5432e87.0 for ; Wed, 19 Mar 2025 00:17:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1742368669; x=1742973469; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=2KU9HnwjDebldqtBx4QeCjTCYcgz8DT96ZzxqdTu8Q8=; b=llJzpXksGocPObfDVEHYMTJJ0qiqL3Ofa3sOj4e2uufUCAmWQmgcqeywuKkE0jhJcJ XMrWvPfHAx7U91IaqSjPnXnLcKQYneeECcWaXRUeMXL9JydS3bdGrtWZ9e286cIym4rd 3LZB7rWVIAbirrBHc+QR/2Mj/3f8FwhcMOB7Vx0XqIgNuTOJe2bktGcQtgBZC4y4BoRO 1EBtDU14AMM5lbdiv2J/tKks/QDy1W1nGOH22eLLu0qpARCRZduWwhUjXZN3no4bEoNz 1Z8sQTw/CDSDNJBA9XsLCvfV8Vvpe6GtaCsI38eNxhIkVL87oeC/SZvzGkEQxPeYZWJJ fxxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1742368669; x=1742973469; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=2KU9HnwjDebldqtBx4QeCjTCYcgz8DT96ZzxqdTu8Q8=; b=Mp9kTfNhfnG+69sz7idXu3e6MhHfUWNpn6VTneVtpO1gmJJ8QZT1Ma53C9/IR6vwIV Wl/9OSl6xxs8XKQZskrK6LSua6xnVL4LKMBLdJiwHKTaSdwssPK+Ne278mzHxJjpFEmy yChfud82ldIj8FIQwtkM1FVGw43aLbCvfDPE5prFUV6IuvjanR+FvgNYhmzxybtPZM2E SdmGs3IeNyU11+6bipNWkR2ZV71iK0h2sbIGuHbn/QwoqQEcNGOk3jMWxiJMyv9XgNoa 4PO1MLJIXevKL7AIjxPwjUzJrM3goJPNT0LUvAAfaXkxWNTojDV/5q+PSTPD89ypgvg4 Kh3g== X-Forwarded-Encrypted: i=1; AJvYcCUSKjK+cC9AB06Weo9z9JCk3YmkJUgO89Xl3K4hVvR9YPy7cREtsys/unX4HwP7FtPHtuGMmtqxpw==@kvack.org X-Gm-Message-State: AOJu0YzA9+VO4vw1KVby6STijdaoN5rboVpkxyBRlIq7gyXxauMSvypJ /MsHrp5TasZ73lpSSkNdMzQHr+7kXfBq01md6sMWLVrn/6aQhEd1GR4FdQ0GyLEv48FCjx402X8 y8MHt6lvcrau2xbu6uGVsl92umr85bzuwankG X-Gm-Gg: ASbGncsa0Ab1LCprFnAS6x6mNv4yOyWBkkJxlkgZd42LwL7/Ax/jUXM1XBxIe6TEcdN jeFpy4b5LyhdWuOsHl20hIqZugcYxyVHX5ZS2VcbZyD9x41JSpWviZcq+s6hrGeMJyz3OzJkPal Z3BRyhnkor0PBNqDL27er7skkSww== X-Google-Smtp-Source: AGHT+IGAuYELn1G87/xbGf4nJJAc6yxKnaHo9lkXX2pHo7+K5xnrKuihHzBJgwOlUA+gTl6bWjvlUKKpgs3k02Bbgoc= X-Received: by 2002:a05:6512:3da7:b0:542:6b39:1d57 with SMTP id 2adb3069b0e04-54acafe30e4mr153843e87.3.1742368668855; Wed, 19 Mar 2025 00:17:48 -0700 (PDT) MIME-Version: 1.0 References: <20250319071330.898763-1-gthelen@google.com> In-Reply-To: <20250319071330.898763-1-gthelen@google.com> From: Greg Thelen Date: Wed, 19 Mar 2025 00:17:12 -0700 X-Gm-Features: AQ5f1JoP9xcA4Ytb0_6iu1s5e84MKlmwW3vfsv1tKFtcMFfXXevba6DA_ParSgI Message-ID: Subject: Re: [PATCH] cgroup/rstat: avoid disabling irqs for O(num_cpu) To: Tejun Heo , Johannes Weiner , =?UTF-8?Q?Michal_Koutn=C3=BD?= Cc: Andrew Morton , Yosry Ahmed , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Eric Dumazet Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Queue-Id: 3D00F14000A X-Rspamd-Server: rspam05 X-Stat-Signature: zqgi8knjtprdwrxw3znnbf6g8dswpdzn X-HE-Tag: 1742368670-950227 X-HE-Meta: U2FsdGVkX18AOchbfDxAm7pElHb5UjOUp4R45xbVg420qA4ldumN3tgTRsSkBaFl3nQpJl2J89q7CXrlsKbsklPdvet+mIoFRSEY8S7kUmROaz4Uze6SW4JevYQ1fvAUhaoQ9x2ExOXhRPiBZ/ccyCStzc2ZcvjnY4JIi34IqyiC5BPc/XztGOEmcUlHqd8hizh9c33BLL6zGf8YM9Uh56iIHzTKcb7ozrTKOkFUuUkkOhHC/qVk+KnYPpVUxDBcWteO+SPAes85cDOLRTmv6VZRyuxYKI/+Fr57HPF+j1JFNYEdQHaMVS9iOlz4i3sHUYrTqxt0JrotfChhiEWqNep+69QucfBR88423G7chKTDVFdYQu/55iIHw9N3hajz88KRkX+emYDR3mj4afO+oi1ntivgU8R+sl5xPulxN6E/MaSzD0FHU6d0wuRO0hV3JvvPlQViJEqg77X/9ORIpEw62GjEgTe7pgXO9j53gWm73aqawsCMIPRTQ5CBDmMUZRH94u4KebKQNIRnLiJahbL6ItyWzmV3c0aLC0qD73ujHfvHiTNtUd6tcGRMHH3a5feM0nO8PPDWNlBvulSanx6E2YQjbWCQ6SPpqROehJ9IeLdMmFS/eA6IsYbdj6x3xneW7dcgGt/XCBnjkytc6dVG0wHjvKvJgWfDCYYK0ytJcOVOeUdbSejHw9wrRToDOhav7WX6URbsHVF6MJJ3BWoYWtrWLREi+5+kZObK8QulXWHJSDysI4EdyPC/9kn8lufP/ZcvqBLrcs392RUB7o3AxI/TPfhLeuROIPc1cgp0GgOfAOp5f/1p8fojyzMAWfeOepnTOQkxAKyHGWsEiD4LQ0L8pZsMWUNg1WIGMW3XTOicDPBJWiziD7xvaeiQ2fov3XECk7jqZd3elsewoAdBcedu+tdCsZK8Wm/LW+mvB0i9O4Gtk2n0A840LiguMwukkK7eT9ZB6d9NXcq vyRKmmvU pAGgYLG8EYmteKV5BAJeLDxmN4YjtGQRVtvWO2sxZwYJ+Cw2wASGMGJL+eRFNFoc7mqcJYb+NDDbCQGzj3IMHoAKpqyndQmQ2o08WVZ4fjaq701zBHr57Dh33DyUCH7i6jKSubNGwZAfs0Yu1neCHXDvjNlJ/Pz+qJlkaEbxpaiTPBda3hFPg90zdCBe+nvCrVNxiiaFhUFJjYb6uPGdp0CRPGzQcdIfZoowpMSEAJqvcg3nugIj+aznx8tS+LIWFn5OxrLIcAJnMmd9xQ1/5HVoYs3/oCP+pTMic99/a3UhEcr8yvblw3ojYzQ+FMmqGyJ15riRVwcfvoYvBCfk/tfiliQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: (fix mistyped CC address to Eric) On Wed, Mar 19, 2025 at 12:13=E2=80=AFAM Greg Thelen w= rote: > > From: Eric Dumazet > > cgroup_rstat_flush_locked() grabs the irq safe cgroup_rstat_lock while > iterating all possible cpus. It only drops the lock if there is > scheduler or spin lock contention. If neither, then interrupts can be > disabled for a long time. On large machines this can disable interrupts > for a long enough time to drop network packets. On 400+ CPU machines > I've seen interrupt disabled for over 40 msec. > > Prevent rstat from disabling interrupts while processing all possible > cpus. Instead drop and reacquire cgroup_rstat_lock for each cpu. This > approach was previously discussed in > https://lore.kernel.org/lkml/ZBz%2FV5a7%2F6PZeM7S@slm.duckdns.org/, > though this was in the context of an non-irq rstat spin lock. > > Benchmark this change with: > 1) a single stat_reader process with 400 threads, each reading a test > memcg's memory.stat repeatedly for 10 seconds. > 2) 400 memory hog processes running in the test memcg and repeatedly > charging memory until oom killed. Then they repeat charging and oom > killing. > > v6.14-rc6 with CONFIG_IRQSOFF_TRACER with stat_reader and hogs, finds > interrupts are disabled by rstat for 45341 usec: > # =3D> started at: _raw_spin_lock_irq > # =3D> ended at: cgroup_rstat_flush > # > # > # _------=3D> CPU# > # / _-----=3D> irqs-off/BH-disabled > # | / _----=3D> need-resched > # || / _---=3D> hardirq/softirq > # ||| / _--=3D> preempt-depth > # |||| / _-=3D> migrate-disable > # ||||| / delay > # cmd pid |||||| time | caller > # \ / |||||| \ | / > stat_rea-96532 52d.... 0us*: _raw_spin_lock_irq > stat_rea-96532 52d.... 45342us : cgroup_rstat_flush > stat_rea-96532 52d.... 45342us : tracer_hardirqs_on <-cgroup_rstat_f= lush > stat_rea-96532 52d.... 45343us : > =3D> memcg1_stat_format > =3D> memory_stat_format > =3D> memory_stat_show > =3D> seq_read_iter > =3D> vfs_read > =3D> ksys_read > =3D> do_syscall_64 > =3D> entry_SYSCALL_64_after_hwframe > > With this patch the CONFIG_IRQSOFF_TRACER doesn't find rstat to be the > longest holder. The longest irqs-off holder has irqs disabled for > 4142 usec, a huge reduction from previous 45341 usec rstat finding. > > Running stat_reader memory.stat reader for 10 seconds: > - without memory hogs: 9.84M accesses =3D> 12.7M accesses > - with memory hogs: 9.46M accesses =3D> 11.1M accesses > The throughput of memory.stat access improves. > > The mode of memory.stat access latency after grouping by of 2 buckets: > - without memory hogs: 64 usec =3D> 16 usec > - with memory hogs: 64 usec =3D> 8 usec > The memory.stat latency improves. > > Signed-off-by: Eric Dumazet > Signed-off-by: Greg Thelen > Tested-by: Greg Thelen > --- > kernel/cgroup/rstat.c | 12 +++++------- > 1 file changed, 5 insertions(+), 7 deletions(-) > > diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c > index aac91466279f..976c24b3671a 100644 > --- a/kernel/cgroup/rstat.c > +++ b/kernel/cgroup/rstat.c > @@ -323,13 +323,11 @@ static void cgroup_rstat_flush_locked(struct cgroup= *cgrp) > rcu_read_unlock(); > } > > - /* play nice and yield if necessary */ > - if (need_resched() || spin_needbreak(&cgroup_rstat_lock))= { > - __cgroup_rstat_unlock(cgrp, cpu); > - if (!cond_resched()) > - cpu_relax(); > - __cgroup_rstat_lock(cgrp, cpu); > - } > + /* play nice and avoid disabling interrupts for a long ti= me */ > + __cgroup_rstat_unlock(cgrp, cpu); > + if (!cond_resched()) > + cpu_relax(); > + __cgroup_rstat_lock(cgrp, cpu); > } > } > > -- > 2.49.0.rc1.451.g8f38331e32-goog >