From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A9B7C282CD for ; Mon, 3 Mar 2025 15:19:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 065EE6B0085; Mon, 3 Mar 2025 10:19:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id F30A8280001; Mon, 3 Mar 2025 10:19:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DAA636B0089; Mon, 3 Mar 2025 10:19:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id BA5E46B0085 for ; Mon, 3 Mar 2025 10:19:27 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 32821120AF1 for ; Mon, 3 Mar 2025 15:19:27 +0000 (UTC) X-FDA: 83180598774.14.D42F29B Received: from mail-wr1-f51.google.com (mail-wr1-f51.google.com [209.85.221.51]) by imf12.hostedemail.com (Postfix) with ESMTP id 35E5B40027 for ; Mon, 3 Mar 2025 15:19:24 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="cw8+fgq/"; spf=pass (imf12.hostedemail.com: domain of mkoutny@suse.com designates 209.85.221.51 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1741015165; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=XKFppOwg0uMRAdpNAovXQCNWA/tqXocPE3Airj8rpdM=; b=wbiHawIGzlMD9ubZPQagQGT5+E+x4BN6Yi0HTbqgjz79YbnOgvGulCgU3q+HkSnb8DfJ+6 nMcdJvQwXH+LAqbNSxksqlgBy42wqAr3ubVMkw9qrQo+lF9H568dpFQtRm/6PTQ0dn5YvH ZKGi1F/WdUSrnTQ2i1xVLQ5XtFQiabg= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=suse.com header.s=google header.b="cw8+fgq/"; spf=pass (imf12.hostedemail.com: domain of mkoutny@suse.com designates 209.85.221.51 as permitted sender) smtp.mailfrom=mkoutny@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1741015165; a=rsa-sha256; cv=none; b=BqxTSmcTIVLaO3YWwSn+BpUgfREiBg4Jkn8AvL8nZJFxdh//AMV3m9UFHbZwK3grxCmpQX n+6coA9uWg294OFYGWP+CVpGoRBIDgVCXWaKg+BDR4kZ2hkF3013vxqmLCNHOqBRy837TX CU48mSN4SqM3KDGmF89F04STb7XR1hg= Received: by mail-wr1-f51.google.com with SMTP id ffacd0b85a97d-38a8b17d7a7so2698355f8f.2 for ; Mon, 03 Mar 2025 07:19:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=google; t=1741015164; x=1741619964; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=XKFppOwg0uMRAdpNAovXQCNWA/tqXocPE3Airj8rpdM=; b=cw8+fgq/o6etRTsTO6583a3+qwjSA6fJ5vUFQLd2D9aL/PfIHB+jGCS/3DIGjMpvYU VMOyv0ZMopqKmfrz0+Kf4WSK5qBULORnFOl5L4/tpGY8uOWrw5mYMPZOo/1f+W/0Av9H HzRr+Nvpfv0YpXzNCy+w1EyRzjNrurXhNGyu015Cra47OrPEKD92FZX4IXS5HQZMf+JW 2CSSKPuC3MKNLVho3iArNJHZG9n6/vaSWS+JPFaz61QuTa9NzARkSMa+3f37DxoC/2XZ QTRyBiNVyPKoSz7NetB6AojkPbbzUvkNs/+AK7U/DaVgZSnLSYgnjh4qcfMrqftOFRer lPcg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1741015164; x=1741619964; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=XKFppOwg0uMRAdpNAovXQCNWA/tqXocPE3Airj8rpdM=; b=MrJVTgfzLJb/cNusxf1YYl5dh+NfhcYiQRBlkMdDhi3m8KJrgtl4Z0S/O5OYeshtc/ Rfe+DzEuHwr+ysq8mr9zvCFqBbjxpf22eV+u18AzXB43HqrRqci2yBU1XjZ22xqghOUs FuNPmIs5L9pq5Aa1HjvTot5KtCsK25S9WM7kSNe4nefEfs5JiE24esmxGr2VC0nwRynM FhOaJfwPLBC7A1PQVGGZ1rArwNx/7CXxSOK0Gt0aFkTwpJpI/q0lt1xaHuuVvlWQ0k0/ Gu1+QYOYFOSTHi6gmBQs4ni4Tkx2n1snBE1USD3ZdzgksBT4vC5WTqdHkr8xOZemkjBU eL/g== X-Forwarded-Encrypted: i=1; AJvYcCVDQqjfYYnUa0ZJSxeOKILUKVCYCleo/fy2jd1H+CDkSAzPbZVrShYJJNL35dY9n4lgvpJNaX8HfA==@kvack.org X-Gm-Message-State: AOJu0Yy7wA//ouVem8Q+5wZWRbglEM3xOSqeiiBifOCE0Sel3MkIGUPz j3OwKdX3N2PrevqiguLZNpCRCJ46ecxNI0NiFm9yLRjxXb20EOmumEf2FpW2rac= X-Gm-Gg: ASbGnct4/QKTg7zYS020w/AjZmke1lV3AY8L0mspsG/APzzYfHR/dvWcvgoTI8es9pB FD0JxyJJOYlp8g74cahNU1Rkrzz6qtsofmnPtZuz63hygZBQva7ChazjPGvpn5GvFraqVDv98SP Nrvn8bdAey8q45I6U5FjFtWppxG8YZYoPYHG7bBBpZIcZOQhs06gTufiF93gkDZ2O6lzDsvDsCi s9j73gPlR8msaN8yzwAv67iEDHSg0y2l9AHR5PcPL1x5RW3iSto/NxrtwQq756vMiYKqEV71jsL A4HLf3JhDJK6iaIOFbRLn5VB6zkqvoUEg/EaIX9FfBHvuyk= X-Google-Smtp-Source: AGHT+IF4ZJDwFWyjZ0Z3M7N1ufBGR7VNTMRXLhtz8p3NUCTDInJgzI8NFEHLq1pQ7XZz0Rw+6zSK3A== X-Received: by 2002:a05:6000:1a88:b0:391:78a:33de with SMTP id ffacd0b85a97d-391078a34d7mr4047609f8f.12.1741015163582; Mon, 03 Mar 2025 07:19:23 -0800 (PST) Received: from blackdock.suse.cz ([193.86.92.181]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-390e484451fsm15057087f8f.63.2025.03.03.07.19.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 03 Mar 2025 07:19:23 -0800 (PST) Date: Mon, 3 Mar 2025 16:19:21 +0100 From: Michal =?utf-8?Q?Koutn=C3=BD?= To: inwardvessel Cc: tj@kernel.org, shakeel.butt@linux.dev, yosryahmed@google.com, mhocko@kernel.org, hannes@cmpxchg.org, akpm@linux-foundation.org, linux-mm@kvack.org, cgroups@vger.kernel.org, kernel-team@meta.com Subject: Re: [PATCH 0/4 v2] cgroup: separate rstat trees Message-ID: References: <20250227215543.49928-1-inwardvessel@gmail.com> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; protocol="application/pgp-signature"; boundary="kjq6t4ilt444tubi" Content-Disposition: inline In-Reply-To: <20250227215543.49928-1-inwardvessel@gmail.com> X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 35E5B40027 X-Stat-Signature: y7o7qjf3ii99iphg7tg5qm136i3owcd5 X-HE-Tag: 1741015164-724055 X-HE-Meta: U2FsdGVkX180c0XJVXXqJyxlBlALE60rHHGlIPg3PwH/7QABqlaDu0h97dXeMqu6cry/dsVbN3LgiAGpkr0w2NS6GFYE3VP88BzfRzQT6w3fO1Hlfi6vo81iSKtRr0M2F5OKo0oUnphArEu2rKN2oxjVXjh+INbL6riKgkqrc8q7IwfChTz0IiZNIESuuWb9IWtQosMyS55iySe6xQBiiPcq9cIWJA/fYIgoFf5evAE9Ek4ERbJyQ7VbScgYMqZ9tuz1AySyE9ejXuI+k8zqXbBadY/gBIhEZqcQSAz5mb8K2QIrFfmUZi3+NVCXBh1XLz5krulAZIRsAPIU5kaB4yl3ZT9l98DlDQqpedcC6Z8sAoXaDM6WlidXd/iHgfvhiHQ7IBK7fsf6KctNAu2K5pcgBrHd/a3N+tdizYAKjfH7e0KCDseF+rZxE7RT1jVM5KZtQ8EvcFJebtpRfNHoqX2lcowkPDCeM86GaRTj05C0pJeEFpgo66L4FTL+kEEh/TEP1OwpDyJ3XOm+KOPxVxFoG5I0JIvB1AkFtxZ/4Fo+D0FgQuHhW5f1mLcu0tOyzZUoHC6ygc4lk9wkoRdHjHUBWG05Vmce3c1NEM5ibjNN7qjj5Ymo9CkVH4u7bbfF6MiU8h592MgHUiCvso5opso1ivlhKo/i2ZI864kW8vl8aaJ0TYGCuZohqKvkWJWj7hk2WinWzX+plHU6tsfrPG+l3aDmKdDwWjsLBT/Vw/X+6IKUFGLWh+pmLV438wge05RzU8mAZDVvUokFTnCMkjswRgBqhye9OgxePC7T/SmsloPocuplu/0Z2oVY88+G/p1lD2bvAnF6q+zJ0uHcPRC0FpL69eUvdcNLeQMae9GJOwYWwJ3PaVt2qyKqMXVVMo1MshY5qUIszIufsk8tzwWrQyhOmGI2QkHMYQBvQWAYd/V1cvuu0MZ8lPngGIAZOhyS/sPoVtqDm6OWo1P r2rHtQGg 6rLuvC0xxzc7faqMi9qmi5/mGcGs5UQGEbvvOjj5gs6gNFoPyH5zgFVsgG0BQpVsi8+8KIYylF3lcxXlMcUBX4KiDdOpvQnm97kZbyKwXWNFRVHihjebgx+LCKY2csbkgTxybgq7XMyQkvoBrxsg04OotVvV4J5LGvdTbUh0NDUVV1iMzQFC8HtNEKhmpj84RBt6h2c25Ti1FMGEFVyhG1Zp78KOWQ+xrloN6w8QhMsWAN1W0C912JkzKtMOWt+9gAv6Q93zYwa1fHbAWH+O21b3iknKtdWd/3KMQGWOJrk/B/jd3xi2g2y8INZS0jdLEysckXPLdlg2+sEJKPJvfb+iYgDcI99nZ9wQHnz3k91nuoVAgzgIrzGykmkNu8BVbuJnNBHWDmoCI215Q4N4Qt+X7JNdOczj9Etq+OYQKc6GgAH8D5CoEvHDOVNeddPq9d+sK0niXxyE1yV4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000121, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --kjq6t4ilt444tubi Content-Type: text/plain; protected-headers=v1; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Subject: Re: [PATCH 0/4 v2] cgroup: separate rstat trees MIME-Version: 1.0 Hello JP. On Thu, Feb 27, 2025 at 01:55:39PM -0800, inwardvessel wrote: > From: JP Kobryn >=20 > The current design of rstat takes the approach that if one subsystem is > to be flushed, all other subsystems with pending updates should also be > flushed. It seems that over time, the stat-keeping of some subsystems > has grown in size to the extent that they are noticeably slowing down > others. This has been most observable in situations where the memory > controller is enabled. One big area where the issue comes up is system > telemetry, where programs periodically sample cpu stats. It would be a > benefit for programs like this if the overhead of having to flush memory > stats (and others) could be eliminated. It would save cpu cycles for > existing cpu-based telemetry programs and improve scalability in terms > of sampling frequency and volume of hosts. =20 > This series changes the approach of "flush all subsystems" to "flush > only the requested subsystem". =2E.. > before: > sizeof(struct cgroup_rstat_cpu) =3D~ 176 bytes /* can vary based on confi= g */ >=20 > nr_cgroups * sizeof(struct cgroup_rstat_cpu) > nr_cgroups * 176 bytes >=20 > after: =2E.. > nr_cgroups * (176 + 16 * 2) > nr_cgroups * 208 bytes =20 ~ 32B/cgroup/cpu > With regard to validation, there is a measurable benefit when reading > stats with this series. A test program was made to loop 1M times while > reading all four of the files cgroup.stat, cpu.stat, io.stat, > memory.stat of a given parent cgroup each iteration. This test program > has been run in the experiments that follow. Thanks for looking into this and running experiments on the behavior of split rstat trees. > The first experiment consisted of a parent cgroup with memory.swap.max=3D0 > and memory.max=3D1G. On a 52-cpu machine, 26 child cgroups were created > and within each child cgroup a process was spawned to frequently update > the memory cgroup stats by creating and then reading a file of size 1T > (encouraging reclaim). The test program was run alongside these 26 tasks > in parallel. The results showed a benefit in both time elapsed and perf > data of the test program. >=20 > time before: > real 0m44.612s > user 0m0.567s > sys 0m43.887s >=20 > perf before: > 27.02% mem_cgroup_css_rstat_flush > 6.35% __blkcg_rstat_flush > 0.06% cgroup_base_stat_cputime_show >=20 > time after: > real 0m27.125s > user 0m0.544s > sys 0m26.491s So this shows that flushing rstat trees one by one (as the test program reads *.stat) is quicker than flushing all at once (+idle reads of *.stat). Interesting, I'd not bet on that at first but that is convincing to favor the separate trees approach. > perf after: > 6.03% mem_cgroup_css_rstat_flush > 0.37% blkcg_print_stat > 0.11% cgroup_base_stat_cputime_show I'd understand why the series reduces time spent in mem_cgroup_flush_stats() but what does the lower proportion of mem_cgroup_css_rstat_flush() show? > Another experiment was setup on the same host using a parent cgroup with > two child cgroups. The same swap and memory max were used as the > previous experiment. In the two child cgroups, kernel builds were done > in parallel, each using "-j 20". The perf comparison of the test program > was very similar to the values in the previous experiment. The time > comparison is shown below. >=20 > before: > real 1m2.077s > user 0m0.784s > sys 1m0.895s This is 1M loops of stats reading program like before? I.e. if this should be analogous to 0m44.612s above why isn't it same? (I'm thinking of more frequent updates in the latter test.) > after: > real 0m32.216s > user 0m0.709s > sys 0m31.256s What was impact on the kernel build workloads (cgroup_rstat_updated)? (Perhaps the saved 30s of CPU work (if potentially moved from readers to writers) would be spread too thin in all of two 20-parallel kernel builds, right?) =2E.. > For the final experiment, perf events were recorded during a kernel > build with the same host and cgroup setup. The builds took place in the > child node. Control and experimental sides both showed similar in cycles > spent on cgroup_rstat_updated() and appeard insignificant compared among > the events recorded with the workload. What's the change between control vs experiment? Runnning in root cg vs nested? Or running without *.stat readers vs with them against the kernel build? (This clarification would likely answer my question above.) Michal --kjq6t4ilt444tubi Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQTd6mfF2PbEZnpdoAkt3Wney77BSQUCZ8XIdgAKCRAt3Wney77B SQlPAQCeynpWgAPt0el8l5KJlZ99IpKQrie9PgCZTQHF0ABO4QD8Di7HgP246Tve 151ZUABaRzz1VVGavOKpFTtGSx3YdAk= =K93X -----END PGP SIGNATURE----- --kjq6t4ilt444tubi--