From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C31BC3600B for ; Thu, 27 Mar 2025 14:39:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BC5F62800F9; Thu, 27 Mar 2025 10:39:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B75112800F2; Thu, 27 Mar 2025 10:39:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A3CD32800F9; Thu, 27 Mar 2025 10:39:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 877752800F2 for ; Thu, 27 Mar 2025 10:39:17 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id CD4F257888 for ; Thu, 27 Mar 2025 14:39:16 +0000 (UTC) X-FDA: 83267588712.10.007EA46 Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) by imf15.hostedemail.com (Postfix) with ESMTP id 79279A0022 for ; Thu, 27 Mar 2025 14:39:05 +0000 (UTC) Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZrOeAK+e; spf=pass (imf15.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743086345; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u82ONPXPU91yIttrfyA9Tihmj4fGNtFwpkyHVRCFmYA=; b=T8a5lbQ3HTGAXXtB49Uq8rRbxkDzekRSwAwcTU5NYpxjs79fvT6Gn3owKZ5LfA16ke3LuQ VG1Y2kcypvZQXM+qR1YVKcZUEM/ZdIkN35M1xA1zpQ8yM+bqkRX+tUkBdlsSYAne2xxrHM dm2bUZNnyI9jyprO4OZVB9uOoble0G8= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=ZrOeAK+e; spf=pass (imf15.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.221.42 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743086345; a=rsa-sha256; cv=none; b=Jn0M80SsxYthewm88xTXTHp6RKPzEQ7Z9dR3Epf8bxwNTGThWDpRy7x4izvYrgr7FMKNXN EDZam0/26u+j1SW7LwblbNWnPDnUaDmCMDrSe0hM5WYLWYNH6Tbsw2aMe2BI/OGP+LuFnV m1lJNKw/Zc1cuA/S/FFk5uf7wknNd30= Received: by mail-wr1-f42.google.com with SMTP id ffacd0b85a97d-39127512371so650486f8f.0 for ; Thu, 27 Mar 2025 07:39:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743086344; x=1743691144; darn=kvack.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=u82ONPXPU91yIttrfyA9Tihmj4fGNtFwpkyHVRCFmYA=; b=ZrOeAK+eqn2H69RJXq75MP4Kbci44YY649KLOoINMpTcOOa+MfmkQHFm+7m4fkM7Y7 2sce8P8jXQ8WY1KO2qtUsyB0xliQOpmk4dJOuSFKYq6otAZl9sIRNkxFqUNhckS5G9pu kudjvXtP9MyISflyRlb+eCs0YBnieg/YN/bPC6ifnJ8pdKbsFdeiQskv3i5BWyR8hcjC 1oc0ZWc5E3Yy5KZf2kt8D+cPRBUg0r9jePLbQx4dk1t1Ryv9sy9WOKUBhQUim37LeR3H zIx+CjNj6/dCEVTzqR48hIqZACwYomR3GsKqUwie+3ryCOC1IecRGQGrpAe/+Cso4+2g Ngig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743086344; x=1743691144; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=u82ONPXPU91yIttrfyA9Tihmj4fGNtFwpkyHVRCFmYA=; b=USUWiz2nSaj3OrpfdojqIBwKBMmYxkCrRfD6AVCC9Bs3vd8/GRxZabBAOZYyK4/H/J IAsYPlrOcTZJDsteNWBhAi2hUJHjJBLiP4giKT5TBzKgpxfh4S5r6RdMN47/DHvDFmlO CSNtiYnYxl9D/8n1nBb8faGObzyTUkyez2U2dxE7peX3chtxku2ixDRdeYdiVnbhQux5 /RzvnwO2FUnDZdxTkstMXK4f4F0QgjTUAtyLsvhiJf56YThKT0pkV6U0q/gAgWCUfr/w RTz7WZo3qwpqNr69ZC/QV+XW4TTHqWbtzP/T4A7fnglrS00eB3xUTu0ISTQ+PvQleYED WJMQ== X-Forwarded-Encrypted: i=1; AJvYcCUqQ+jz0v2yuP9Zg6G3yY6eiwES0vRlS0YrM0frvbGLJgUg5q4EHPNfxJUgXTVFAD3qtDbPW5oj0w==@kvack.org X-Gm-Message-State: AOJu0YwwDpMMV2x4gJsJxBEX/1cJKKlXez5eW0bNzTRv2yRI5er/ft7W ijPPUS+YYdXVZw8ZFyQ8GyzVdN2fIDjsHcj/8gG/DOVcIczeSpJL X-Gm-Gg: ASbGncspB23Js16EIdUgIeiQka4qL8F0fRh1QwEb6QZxvFyiBynWVwpPU6vLl8PKFQA 4t/qFvbGbktxYDMgxxc6GpwqFZ4OJsXmin6MwL67IwjSY5bGO3b2n1NZk+JOSl4iht3UP0/L7NO mi5TYsoc830VAgpb3ogVikWzh3c8Bnk9gijTOX437OERQd+35Rmgkeup//bbz/AFxvpJQqV5qTR RYqdD3ppIQWDtm8c/8fySzUJZ7a+WBwR+o+ivDlgzgu8SNnicTOaEgaO2yaHMJkYE51T2oKYUKP UJiZY8mqTnF79z/G/XOCGaQZ6OPAHS769gLbMw32h/0yNki0uZODKN2fFA== X-Google-Smtp-Source: AGHT+IHIYyxBLsfyfwB72W3HrvPr9/6S85TLty5UHu4H+YbkreFKm3JMZPvoIHJpmVfVuy6D/+deUQ== X-Received: by 2002:a05:6000:1889:b0:391:48d4:bcf2 with SMTP id ffacd0b85a97d-39ad1749a44mr3201709f8f.12.1743086343393; Thu, 27 Mar 2025 07:39:03 -0700 (PDT) Received: from f (cst-prg-15-56.cust.vodafone.cz. [46.135.15.56]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-3997f9e6539sm20191479f8f.77.2025.03.27.07.38.59 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 27 Mar 2025 07:39:02 -0700 (PDT) Date: Thu, 27 Mar 2025 15:38:50 +0100 From: Mateusz Guzik To: Yosry Ahmed Cc: Greg Thelen , Tejun Heo , Johannes Weiner , Michal =?utf-8?Q?Koutn=C3=BD?= , Andrew Morton , Eric Dumazet , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Eric Dumazet Subject: Re: [PATCH] cgroup/rstat: avoid disabling irqs for O(num_cpu) Message-ID: <2vznaaotzkgkrfoi2qitiwdjinpl7ozhpz7w6n7577kaa2hpki@okh2mkqqhbkq> References: <20250319071330.898763-1-gthelen@google.com> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="fqcrruvxc7rr34k3" Content-Disposition: inline In-Reply-To: X-Rspam-User: X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 79279A0022 X-Stat-Signature: ykgdarpzheds7q6p4o8uwm3sm3fq4wdx X-HE-Tag: 1743086345-315075 X-HE-Meta: U2FsdGVkX1+QMqDLAuCzZtUyseXkVGOhmlBQhheKJRnz79lE5JysqHGJqT4q4V92rD9K+AZKxg8lndfquWToG23UMR3XIG4xjL5nF9RWARJXnp4tMO7IfdbEhLjig/tTPuJBKfc1bAMTC1GWsYl3lX6rLvEmbDS+T9wdavtFt9mWN3JFKBzO9qqh1+B5tAVqWXL0POpvOcRscfYk2u4e0zi9RJPW4IvTUNnO0EC/3Y+C8iEk4MNVOu82fYxPXKvmPrBbttKOdsWz5R0Y4y9qzL+D7c2wJjkcyqoEqflBblZROrYAGIm2qw2H+iYuiAmkYvKZ1raCyag/SydxcMf5VUZigNJUB6dtWmoEG0UcjaRsE2KV4rEYuI5QwS0FRfJ6nFeIB2Ub1xU46vdWJTp2s5gndKI2cgJEif6pLegUghvrrQ2nTfJzQuMrWkdIH8v4TwsSEX6VrkYVKVg7Ef2dFfvww98muTb1MWb9EhgDzhza9z3AzXPgQYbzcOanomBG405RGv7dP+/u1k3+ZuhKdvj46hdSukPz15rwvXC4iriMCbHnRXlLU5KE64XIfmQ7keLzGI7onOy25+BXKfTR37B68FaatZcBfLXW0krqKopEil0A+ApMgKhmJIqMo059f1yP6b5CltSa69PoKcaF1DSt/xWuFrZ5/pk3+FYsXvgsOX9ghSkDalLu2NveGJThsx+tRnJ3oIBh0BkLUfCOLb7+MmEDjWU6RRG0Y6cM5HH9iItivHqCawHIHPSF2v+NTm3Y4u8brH9EsozDLqo+a1dOFKhVCzmgSxsYeDihbjC0Uw0/NPqFjRZGVCuyszV60Bz68dU4BKPajtjKdr9VamfeweCEwyZvqQ0OLg8OvtpDQuaqcz6E1o0BlGGbeYsSabO9p5gdERmvbbpz8orNZPbJrAGdokb4rNv5/pb+KVDrycxJXKMtcVhMMGeVILsPuZJb82qVGNfedL1JsTX Mk0N3ajg agJu+u+LfihhXC8imnojHKuzRvVl0/HK7j6NGoedVEoEpSxhUNCFUP1/4wPKTEBSmIN2Djb/E25lyWYg7CLUr9JwRk0PE600sQ2r4k/ZW1RvB6+2gfuXwGi/z2ZEwo2OfjJ4hFY5DPsvI8tLpQQ2UwBz3hr5kQqjRpwyF7OJ0MepE+/N6IQVVO8fTnpprsgyRkZf3ZLvebRcKU/cGXT1yPN0eXzIgrRjiOjyFkcZx/Q7Xyah3X7eSK0E5cuZwBvK9McBOw5BcvWKIIkRsero5X+gYT3frgHVg1lO4cARWPGugQYCHJr7YL7yFCCnBhwuoJY52Ug40M22HLBD5wWEJwAfOBMVWxzFF/T7cdGeCbTkNSpF+tjFHXfS5nzr+1dLOpYF1VuKde593kXkGLcKxWKX7mr2iTVHsspu0mpiD4cdtnqV96Q8F5lK8wxv8FAQuyoDURIKc11cN6BAPSxPCh2yvLVcBT9Wy80d3hH+inlwK5dInxrcKcR6+hMR72lOh1hEZtxUzO5g0vZCeVUVFHgooeydwr90Bo46nEXGIy0zzpoWL5/x1s9tOUQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: --fqcrruvxc7rr34k3 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline On Wed, Mar 19, 2025 at 05:18:05PM +0000, Yosry Ahmed wrote: > On Wed, Mar 19, 2025 at 11:47:32AM +0100, Mateusz Guzik wrote: > > Is not this going a little too far? > > > > the lock + irq trip is quite expensive in its own right and now is > > going to be paid for each cpu, as in the total time spent executing > > cgroup_rstat_flush_locked is going to go up. > > > > Would your problem go away toggling this every -- say -- 8 cpus? > > I was concerned about this too, and about more lock bouncing, but the > testing suggests that this actually overall improves the latency of > cgroup_rstat_flush_locked() (at least on tested HW). > > So I don't think we need to do something like this unless a regression > is observed. > To my reading it reduces max time spent with irq disabled, which of course it does -- after all it toggles it for every CPU. Per my other e-mail in the thread the irq + lock trips remain not cheap at least on Sapphire Rapids. In my testing outlined below I see 11% increase in total execution time with the irq + lock trip for every CPU in a 24-way vm. So I stand by instead doing this every n CPUs, call it 8 or whatever. How to repro: I employed a poor-man's profiler like so: bpftrace -e 'kprobe:cgroup_rstat_flush_locked { @start[tid] = nsecs; } kretprobe:cgroup_rstat_flush_locked /@start[tid]/ { print(nsecs - @start[tid]); delete(@start[tid]); } interval:s:60 { exit(); }' This patch or not, execution time varies wildly even while the box is idle. The above runs for a minute, collecting 23 samples (you may get "lucky" and get one extra, in that case remove it for comparison). A sysctl was added to toggle the new behavior vs old one. Patch at the end. "enabled"(1) means new behavior, "disabled"(0) means the old one. Sum of nsecs (results piped to: awk '{ sum += $1 } END { print sum }'): disabled: 903610 enabled: 1006833 (+11.4%) Toggle at runtime with: sysctl fs.magic_tunable=0 # disabled, no mandatory relocks sysctl fs.magic_tunable=1 # enabled, relock for every CPU I attached the stats I got for reference. I patched v6.14 with the following: diff --git a/fs/file_table.c b/fs/file_table.c index c04ed94cdc4b..441f89421413 100644 --- a/fs/file_table.c +++ b/fs/file_table.c @@ -106,6 +106,8 @@ static int proc_nr_files(const struct ctl_table *table, int write, void *buffer, return proc_doulongvec_minmax(table, write, buffer, lenp, ppos); } +unsigned long magic_tunable; + static const struct ctl_table fs_stat_sysctls[] = { { .procname = "file-nr", @@ -123,6 +125,16 @@ static const struct ctl_table fs_stat_sysctls[] = { .extra1 = SYSCTL_LONG_ZERO, .extra2 = SYSCTL_LONG_MAX, }, + { + .procname = "magic_tunable", + .data = &magic_tunable, + .maxlen = sizeof(magic_tunable), + .mode = 0644, + .proc_handler = proc_doulongvec_minmax, + .extra1 = SYSCTL_LONG_ZERO, + .extra2 = SYSCTL_LONG_MAX, + }, + { .procname = "nr_open", .data = &sysctl_nr_open, diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index 3e01781aeb7b..f6444bf25b2f 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -299,6 +299,8 @@ static inline void __cgroup_rstat_unlock(struct cgroup *cgrp, int cpu_in_loop) spin_unlock_irq(&cgroup_rstat_lock); } +extern unsigned long magic_tunable; + /* see cgroup_rstat_flush() */ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) __releases(&cgroup_rstat_lock) __acquires(&cgroup_rstat_lock) @@ -323,12 +325,18 @@ static void cgroup_rstat_flush_locked(struct cgroup *cgrp) rcu_read_unlock(); } - /* play nice and yield if necessary */ - if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { + if (READ_ONCE(magic_tunable)) { __cgroup_rstat_unlock(cgrp, cpu); if (!cond_resched()) cpu_relax(); __cgroup_rstat_lock(cgrp, cpu); + } else { + if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { + __cgroup_rstat_unlock(cgrp, cpu); + if (!cond_resched()) + cpu_relax(); + __cgroup_rstat_lock(cgrp, cpu); + } } } } --fqcrruvxc7rr34k3 Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename=disabled 69869 30473 64670 30544 30950 36445 36235 29920 51179 35760 33424 42426 30177 31211 44974 34450 37871 72642 33016 29518 31800 35730 30326 --fqcrruvxc7rr34k3 Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename=enabled 63507 50113 36280 35148 63329 41232 51265 41341 41418 42824 35200 35550 54684 41597 55325 36120 48675 41179 39339 35794 38826 37411 40676 --fqcrruvxc7rr34k3--