From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0EFE7C3600B for ; Thu, 27 Mar 2025 17:48:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F24C6280112; Thu, 27 Mar 2025 13:48:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EAE052800FF; Thu, 27 Mar 2025 13:48:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D27BA280112; Thu, 27 Mar 2025 13:48:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AA6EC2800FF for ; Thu, 27 Mar 2025 13:48:11 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id E3A8112144F for ; Thu, 27 Mar 2025 17:48:12 +0000 (UTC) X-FDA: 83268064824.13.F046F16 Received: from mail-ed1-f48.google.com (mail-ed1-f48.google.com [209.85.208.48]) by imf20.hostedemail.com (Postfix) with ESMTP id F2FF71C0002 for ; Thu, 27 Mar 2025 17:48:10 +0000 (UTC) Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iXPGWP7J; spf=pass (imf20.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1743097691; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=NHz+Ccq4y4In5QM8QHHqPDgkxTMlvKloGXrTWL6Om+g=; b=G03sIIYs8o12jLaCJiBaM87K2irMp7ZLlUGQQSQjibeNL6xTHSzio80/swFCXbiUhooxhN EpmzkecuPiKDuqF7EBma+7rS/lEilbyYBdT4Q/3i4L0K923ICzpkxY4FIzwSimz3ik9P5c yWnQ6DvItRKG1aa43SD6sKqTma/UKcU= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=iXPGWP7J; spf=pass (imf20.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.208.48 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1743097691; a=rsa-sha256; cv=none; b=USZQfDqYkuggQ2nJZPf5IthdxVFiKc2ottEsnkDaBhNJvm4kV5fT+rr9vEBrruaDrypv3K he1SCrrvwafzrKqu3Nj2migZpK4o1Djj1h+6rUom7PhOPll31x33MZHu+XygWZsTqJHTJg epZmmdnREKVwimfF3X5Rsxa33JlTxXw= Received: by mail-ed1-f48.google.com with SMTP id 4fb4d7f45d1cf-5e5e22e6ed2so1737524a12.3 for ; Thu, 27 Mar 2025 10:48:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1743097689; x=1743702489; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=NHz+Ccq4y4In5QM8QHHqPDgkxTMlvKloGXrTWL6Om+g=; b=iXPGWP7J8+fH3oPKJJMMepp8etEI1DJookhI4qlslYkXOu8NvZ5PKmwhLj33UfATUI aM4vQ3rvVA3iwVNPqgp6muqGCXDdVdw/kWdLWL6q9zvpqzEbcA7bXd2o0308dcV1Ra0r ZTDB01jnvGarsIKukn2npCMkxTVAgWIA9RmNgXmiyECy4obbNeScAmWDL/BIQwUVQx55 BC2TJ9Ptdbu24Q6agkXoIzuxABgHljxPkctj6FnQe9C3shC7rzfgYT164/KcjnJNRMyk 6nQUASoVGsofG/jPndXNKVVdFEJGH8tZYXJaQHU/LdvgJ9gDukjYdvtq9kV6X9kva4ow 6DRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1743097689; x=1743702489; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=NHz+Ccq4y4In5QM8QHHqPDgkxTMlvKloGXrTWL6Om+g=; b=PQfJcPmgenc2EjZyWa+2smdypsXFA+3bIHhdjdlm0PlVVg7/eQYXaujW4UOXxI+6f2 LmxdsVd3MLyQuwuk6ekJeBE5BT0LgqIL9lqpgPmMHGdAdP/t1x1T76AAVJO6RhJhDuJQ +bre5K4BYTd4giWdfKuSeWc6m5bgPLLrpzVWjZeTi4nXwWiKAyYQtWSE9e7EyL0vXcOR aELuU0NzsQtO4XP3utft22YJA+ExEyQOp4N81EDwn8TCT4ePO01+jAhZo4Gwh16z4eaw Km/+lZqSvRKDrrM/boiH9LAWawRFqw9HF4MMjgMHCuDiAqKd0McSBDaUiK+NhVJGLhA7 5JxQ== X-Forwarded-Encrypted: i=1; AJvYcCV7My/RBMKv6kGF6nwZ+I2mzhzSFE+7vzi/SgmALjK2uh6u77SKY6u4HY6oEJajaMWAuy7HjY1S5g==@kvack.org X-Gm-Message-State: AOJu0YzYHUeK+1Q6sFEqm/D++xZ797mifZPnFEFvgV+FSiROHkv/56Yc 8iKdGXD9BBSJJrRZAmBRdKHdCo0WFY+NEYcXd5rNimBX9H0wpbPFbA5kitQhDJ6CqbYVoTm08oT pfsL6NzlmgOqE97Dga/0/4Sd5hxU= X-Gm-Gg: ASbGncuw6qsliS3b81u4nr4UWrbn8tLpYPUTK/dvJUHiDo8p4TBGRhoeEvOWlDsUJNB yenDwkhnzsEAxG2JSxD5J2HJ0td1GqnA2j7yxAawRYafvulmh66RYoVZxe2H8bx4dnKq1IgGT2K hYikuQamxCNrFyInUSJN1JeidEn08KR4b/Dqo= X-Google-Smtp-Source: AGHT+IHLz7Hd7zwr8CdH3MUmiQ27N1XbY21K7o2Ht6T9pVr9uanxh8tkVxriW4wUWJJi1IrGKk7/tBQ/Dwyel4aAsZI= X-Received: by 2002:a05:6402:278a:b0:5e7:f707:d7c4 with SMTP id 4fb4d7f45d1cf-5ed8f902a81mr3792714a12.31.1743097688940; Thu, 27 Mar 2025 10:48:08 -0700 (PDT) MIME-Version: 1.0 References: <20250319071330.898763-1-gthelen@google.com> <2vznaaotzkgkrfoi2qitiwdjinpl7ozhpz7w6n7577kaa2hpki@okh2mkqqhbkq> In-Reply-To: From: Mateusz Guzik Date: Thu, 27 Mar 2025 18:47:56 +0100 X-Gm-Features: AQ5f1JrXAHRzHRyquIDDNJZAWzDXofV8J8rFba84g_qbEzkzCrXyFGy0GURok_Y Message-ID: Subject: Re: [PATCH] cgroup/rstat: avoid disabling irqs for O(num_cpu) To: Yosry Ahmed Cc: Greg Thelen , Tejun Heo , Johannes Weiner , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Andrew Morton , Eric Dumazet , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Eric Dumazet Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: F2FF71C0002 X-Rspamd-Server: rspam05 X-Rspam-User: X-Stat-Signature: 556enabdqedtg79f1ibbf89tp4bpxixa X-HE-Tag: 1743097690-417029 X-HE-Meta: U2FsdGVkX19WtDRgwbVtOSySrTp9lvK0oB1CABjU7Dh4oqgm5a4djwa99K3Q61Bnu3KHJI4/SEx08fE+X6IbHBw7IaeNyy2cQbO8DIkn7OuAGAJKAsPrZznGtwkzgwdJRCzfBy2lrVak862EM4925A8BD3AjvnmlIZ6lKVmsdZWOu9QGBv+8Z+/KvWDsdGNbA6HikoitMOvA5b6q8lJp4mjEowtwXJ5YhqJjHkrISJsBLeB2J0uZ1euHoUflBHiyPaubQZGAPEP4GcFUH/Fqgvkxk7Mun8jzxfr6m0DzHT9ILRi+wvMRhSV0s3HDIqrnOI2w0B2fW29vIEPv1cbJQS6KYDW1LsZ3wbykdeBJexmg0VsSD2kIbQwETFpElsWgGaZSaTdKt3+ngCiDCOtCHTTra3skC7WTocP3ueRo1uXXMzswYoUbTpvaxAIiwDrv1BrcaW7l9k+aZ00tqRr7/dilHnsQCNkP1w9pTt3d9azbypUzjd4fF/ZKifCE7f7daW9oz0+ilF44AYERmMpARGDJxH7ubToMRF2FIOTFJ9AA7C2H3uKbO+okbLaNd/IwfFWIarYYxzediOC40Ej/qH4DmmKVvm/5gckupoNv+LFz3iMIQNcsFDtjI05L72mfBAt/ydxuChRkCzi7r1mvh3OEKppzS9e2Y7YRRiWHZ5Sw05CAC65TQKGnxS7CjZ4BWF2mcuH9BtT9GeeF8TKaqCIVuWZzo9jkA7lxaXdsWbKHPnkU4VUJTplIcHEOedW0Ej9Bn4wCLUD4aeBCheQmc8di9vxNU4Gd9oxEmmCaRfVJxqB+jdusTzpogqccy+AEnsJxjEXL5TBpX+b8PDuQvAAersSs5db5rRd7qsfbGOHDv2Zpl1fRwIKT6JEOUUBiHdMthO9hT84ktZ/2Dh3n7pe5HlhzVya49mepimibkfoi78oHQJ9TW+RrpTHZ7BjD4SXAb1eGIUGmcUM5Zp3 7f7/Un7G FzwsE2OLHB2Ov2Crk+iZzLg7KW7g4saUofixY+hVTxiNJrTWvvPJF4FMQq6AG16n5M8r/Xzw6Re9s/gj/mBROP8JBv/gbI3O7AqHN5YLN6JhnPz6AAqcnZL5YbbvP8MVHzfYuBRByvNZztPc1BRMAM5pXdBu+yv6rzXJJF+QEd2A+AG5JyvdcpWv19V0vjbDAoEXyinNoWJ47Og4pxQPl2dOZ17sVrXGWAn9HbQBod/PcLqaT+3qkRjVdkGX/FA4KAz1LGcH606dUP25nK7LWXCkq+9X8uSS7ms8+XXigZh45Cz0hF1JH/xMUxusHMwYzIUDq35cQhtche121JD9hOKOrPsDpnZVv7kE97z6qzSQ2VR0m49NzQ+D3wxxYFmlRgC0r6xggydRuTUw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Mar 27, 2025 at 6:17=E2=80=AFPM Yosry Ahmed = wrote: > > On Thu, Mar 27, 2025 at 03:38:50PM +0100, Mateusz Guzik wrote: > > On Wed, Mar 19, 2025 at 05:18:05PM +0000, Yosry Ahmed wrote: > > > On Wed, Mar 19, 2025 at 11:47:32AM +0100, Mateusz Guzik wrote: > > > > Is not this going a little too far? > > > > > > > > the lock + irq trip is quite expensive in its own right and now is > > > > going to be paid for each cpu, as in the total time spent executing > > > > cgroup_rstat_flush_locked is going to go up. > > > > > > > > Would your problem go away toggling this every -- say -- 8 cpus? > > > > > > I was concerned about this too, and about more lock bouncing, but the > > > testing suggests that this actually overall improves the latency of > > > cgroup_rstat_flush_locked() (at least on tested HW). > > > > > > So I don't think we need to do something like this unless a regressio= n > > > is observed. > > > > > > > To my reading it reduces max time spent with irq disabled, which of > > course it does -- after all it toggles it for every CPU. > > > > Per my other e-mail in the thread the irq + lock trips remain not cheap > > at least on Sapphire Rapids. > > > > In my testing outlined below I see 11% increase in total execution time > > with the irq + lock trip for every CPU in a 24-way vm. > > > > So I stand by instead doing this every n CPUs, call it 8 or whatever. > > > > How to repro: > > > > I employed a poor-man's profiler like so: > > > > bpftrace -e 'kprobe:cgroup_rstat_flush_locked { @start[tid] =3D nsecs; = } kretprobe:cgroup_rstat_flush_locked /@start[tid]/ { print(nsecs - @start[= tid]); delete(@start[tid]); } interval:s:60 { exit(); }' > > > > This patch or not, execution time varies wildly even while the box is i= dle. > > > > The above runs for a minute, collecting 23 samples (you may get > > "lucky" and get one extra, in that case remove it for comparison). > > > > A sysctl was added to toggle the new behavior vs old one. Patch at the > > end. > > > > "enabled"(1) means new behavior, "disabled"(0) means the old one. > > > > Sum of nsecs (results piped to: awk '{ sum +=3D $1 } END { print sum }'= ): > > disabled: 903610 > > enabled: 1006833 (+11.4%) > > IIUC this calculates the amount of elapsed time between start and > finish, not necessarily the function's own execution time. Is it > possible that the increase in time is due to more interrupts arriving > during the function execution (which is what we want), rather than more > time being spent on disabling/enabling IRQs? I can agree irq handlers have more opportunities to execute in the toggling case and that the time accounted in the way above will include them. I don't think explains it, but why not, let's test without this problem. I feel compelled to note atomics on x86-64 were expensive for as long as the architecture was around so I'm confused what's up with the resistance to the notion that they remain costly even with modern uarchs. If anything, imo claims that they are cheap require strong evidence. That said, I modified the patch to add a section which issues conditional relock if needed and smp_mb otherwise -- irqs remain disabled, but we are still paying for a full fence. smp_mb is a lock add $0 on the stack pointer. Note this has less work to do than what was added in your patch. It looks like this: switch (READ_ONCE(magic_tunable)) { case 1: __cgroup_rstat_unlock(cgrp, cpu); if (!cond_resched()) cpu_relax(); __cgroup_rstat_lock(cgrp, cpu); break; case 2: if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { __cgroup_rstat_unlock(cgrp, cpu); if (!cond_resched()) cpu_relax(); __cgroup_rstat_lock(cgrp, cpu); } else { smp_mb(); } break; default: if (need_resched() || spin_needbreak(&cgroup_rstat_lock)) { __cgroup_rstat_unlock(cgrp, cpu); if (!cond_resched()) cpu_relax(); __cgroup_rstat_lock(cgrp, cpu); } break; } With this in place I'm seeing about 4% increase in execution time measured the same way, so irq handlers sneaking in don't explain it. Note smp_mb() alone is a smaller cost than the locked instruction + func calls + irq trips. I also state I'm running this in a VM (24-way), where paravirt spinlocks also issue a lock-prefixed instruction to release the lock. I would say this very much justifies the original claim of 11% with the patch as proposed. --=20 Mateusz Guzik