From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 359EAC3DA5D for ; Mon, 22 Jul 2024 21:32:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9A26E6B007B; Mon, 22 Jul 2024 17:32:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 952326B0083; Mon, 22 Jul 2024 17:32:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 819D26B0085; Mon, 22 Jul 2024 17:32:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 667196B007B for ; Mon, 22 Jul 2024 17:32:16 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 15EA5C14C5 for ; Mon, 22 Jul 2024 21:32:16 +0000 (UTC) X-FDA: 82368687072.25.BB08E28 Received: from out-180.mta0.migadu.com (out-180.mta0.migadu.com [91.218.175.180]) by imf11.hostedemail.com (Postfix) with ESMTP id 8BFB140009 for ; Mon, 22 Jul 2024 21:32:13 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=l3c9Zvlk; spf=pass (imf11.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.180 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721683889; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hlW4hE/bVZrm7lqT+13f/Ft9E0jLCjkDyeH7NK78Ioo=; b=t/3KN36BggCfHtRNjH1H0SfLC2I1W5GLim9WskvodAryhOdFKFA1aCiBc11D1xP8vH98QN vp4+8XzaqlZwTqKFEDvd7grZkAQu2m9VbN6iBaalxQNtBYGJtnocKJPkYvXMFP9JbEahq6 EeUvirivWkDlmXIeJyLDZUb0+UsH/q8= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721683889; a=rsa-sha256; cv=none; b=bO01qMQri83aITi26Lbr+QTJnCgJg3gLZpa0TSY8s+BE6nojGGrmQ6x1YL0TMiu/ao8Umd MpOR2h/VYoHJENTMQmIrClPJrM/ifcr50AUAVgFOxXcQGmJKd10jDOLYtYTIPzQIl8XCPA 046aCRS+St1VgLfesO7HD7NUpG6uJ6o= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=l3c9Zvlk; spf=pass (imf11.hostedemail.com: domain of shakeel.butt@linux.dev designates 91.218.175.180 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev X-Envelope-To: yosryahmed@google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1721683931; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hlW4hE/bVZrm7lqT+13f/Ft9E0jLCjkDyeH7NK78Ioo=; b=l3c9ZvlkouHAlNrKNYGTeotRR/IB559jgdrUkaXA7qGWwaYL1BnJLdOe3L1ukoX4ukDTyx YYfawvGJ9nyvUvcwpDlpN8uUMZh+ILhgW5W8CMk7qQw+9QfqvEEnSYLnwdVTMpqkJncr9Y fsOeCZPPotvHVIgUWfLeCk7KHqu3w8Q= X-Envelope-To: hawk@kernel.org X-Envelope-To: tj@kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: lizefan.x@bytedance.com X-Envelope-To: longman@redhat.com X-Envelope-To: kernel-team@cloudflare.com X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org Date: Mon, 22 Jul 2024 14:32:03 -0700 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Yosry Ahmed Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes Message-ID: References: <172070450139.2992819.13210624094367257881.stgit@firesoul> <100caebf-c11c-45c9-b864-d8562e2a5ac5@kernel.org> <5ccc693a-2142-489d-b3f1-426758883c1e@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Queue-Id: 8BFB140009 X-Stat-Signature: tucsr8hw9x9sczmqw5kaigbxcgitgjrp X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1721683933-797593 X-HE-Meta: U2FsdGVkX1880TvvPrQaOXHAUoxpaaQiv0JqDRUz8lZ+BO8y1qG2mGQjQpfsyee7FK3UMuMFXygJZRX8qLRKfh8dHSccPUrsS2cYoAJdQKAHlzObfe/q0zaxoIAlL67iMXlHXfEqW6FracrHugRwUoEGrc3qtAOjwO/SiH+8QCdWVBUyp8B0UK/sN9TjIYREKeTjHoxIgRTG5EHOQpGrBeQYYQo3aEK/zci1F2RgRiA09T5ulc5rMwCGiLsRzXpDpZ+Ogguh/49BKyCUDN/33rUIGl7xgk4hSsa5awPwC2J2AQN7f40HEcrpbpWjzFda7cf58XMkedv7ZdTYulp7NRhl/fUg9fHP7zrAZuxMzO20VPbDrXw7qjlLGjCUQpzt+tlTl1hGIX/sVXPQVbhBPteVbl1hYKE0kXcNzeUi8sGVh1ElyQ89r1ueXFmmumNEVdoUrLe8uDJMyQRytfO5JjYV9fTihUUaOtA1ZN7vdP7Uw3jbGb53+iLhVUGOclz5Gy37EjBqFt80ppDuaqBMU8E3KqxQsDcHocld5iO5FnOR+7JqKvhjGDXjCqxtlKS5lqS/eASV9vc4HeE27q40OhLfSIHkW6ZAbUoVluGhqWj/9ZTfk8H5S8c4uBcKVktlIE7lXlL1l6VUvyRhY+Jb2oV1HRYEsrwCmP/po8J/iWQNpdf6rVmT63VCeFuG9Sa5oD/HghU5i+mav9bbTAMKbf+C/lHzORWY3qRCv5fLGU8jFx7GF/GR56zdlgN46rYOmeEVmFhR8vau7cE6FTIOUliy739CVglqYmIptOsalQHGp2/7VDVC0Ymx33gN7O73W4sQ9APdAymp1KHMSSozlxLQgLH6suPWZdKB6LGVoITn1B2nXQy9NiFAR+OLXNE/J/epaOKR42Fubm4vet6bGjerjZGZIh/haOpTOJX+Op8csJOddh2NvZmMu5IAXUe/3fjvpobfYzjO5hAEtS3 s5Tj/qa/ SrDSeBNZS4up9yuZtnh0RdpKew4T47kNi3lq0MoSiAMe7mwykff9BBWT6AVhLEl1kJnzAn3Iz7UElAK9oJ+n8ip0tY2IMYtk6ZK6EFJTaVkMukVvNaFs0+9I8EaO0VLCoPIBgRwr8Pl9uvT+yvtj3rVaOJxKtW2TR9aIgdchJXVLa8yLtTBLkEJXHVUpEENMXy1tzrOEd28RaXsk= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000326, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 22, 2024 at 01:12:35PM GMT, Yosry Ahmed wrote: > On Mon, Jul 22, 2024 at 1:02 PM Shakeel Butt wrote: > > > > On Fri, Jul 19, 2024 at 09:52:17PM GMT, Yosry Ahmed wrote: > > > On Fri, Jul 19, 2024 at 3:48 PM Shakeel Butt wrote: > > > > > > > > On Fri, Jul 19, 2024 at 09:54:41AM GMT, Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > > > On 19/07/2024 02.40, Shakeel Butt wrote: > > > > > > Hi Jesper, > > > > > > > > > > > > On Wed, Jul 17, 2024 at 06:36:28PM GMT, Jesper Dangaard Brouer wrote: > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > > > Looking at the production numbers for the time the lock is held for level 0: > > > > > > > > > > > > > > @locked_time_level[0]: > > > > > > > [4M, 8M) 623 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > > > > > > > [8M, 16M) 860 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > > > > > > > [16M, 32M) 295 |@@@@@@@@@@@@@@@@@ | > > > > > > > [32M, 64M) 275 |@@@@@@@@@@@@@@@@ | > > > > > > > > > > > > > > > > > > > Is it possible to get the above histogram for other levels as well? > > > > > > > > > > Data from other levels available in [1]: > > > > > [1] > > > > > https://lore.kernel.org/all/8c123882-a5c5-409a-938b-cb5aec9b9ab5@kernel.org/ > > > > > > > > > > IMHO the data shows we will get most out of skipping level-0 root-cgroup > > > > > flushes. > > > > > > > > > > > > > Thanks a lot of the data. Are all or most of these locked_time_level[0] > > > > from kswapds? This just motivates me to strongly push the ratelimited > > > > flush patch of mine (which would be orthogonal to your patch series). > > > > > > Jesper and I were discussing a better ratelimiting approach, whether > > > it's measuring the time since the last flush, or only skipping if we > > > have a lot of flushes in a specific time frame (using __ratelimit()). > > > I believe this would be better than the current memcg ratelimiting > > > approach, and we can remove the latter. > > > > > > WDYT? > > > > The last statement gives me the impression that you are trying to fix > > something that is not broken. The current ratelimiting users are ok, the > > issue is with the sync flushers. Or maybe you are suggesting that the new > > ratelimiting will be used for all sync flushers and current ratelimiting > > users and the new ratelimiting will make a good tradeoff between the > > accuracy and potential flush stall? > > The latter. Basically the idea is to have more informed and generic > ratelimiting logic in the core rstat flushing code (e.g. using > __ratelimit()), which would apply to ~all flushers*. Then, we ideally > wouldn't need mem_cgroup_flush_stats_ratelimited() at all. > I wonder if we really need a universal ratelimit. As you noted below there are cases where we want exact stats and then we know there are cases where accurate stats are not needed but they are very performance sensitive. Aiming to have a solution which will ignore such differences might be a futile effort. > *The obvious exception is the force flushing case we discussed for > cgroup_rstat_exit(). > > In fact, I think we need that even with the ongoing flusher > optimization, because I think there is a slight chance that a flush is > missed. It wouldn't be problematic for other flushers, but it > certainly can be for cgroup_rstat_exit() as the stats will be > completely dropped. > > The scenario I have in mind is: > - CPU 1 starts a flush of cgroup A. Flushing complete, but waiters are > not woke up yet. > - CPU 2 updates the stats of cgroup A after it is flushed by CPU 1. > - CPU 3 calls cgroup_rstat_exit(), sees the ongoing flusher and waits. > - CPU 1 wakes up the waiters. > - CPU 3 proceeds to destroy cgroup A, and the updates made by CPU 2 are lost.