From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7098DC3DA63 for ; Tue, 23 Jul 2024 06:24:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8DF796B007B; Tue, 23 Jul 2024 02:24:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 88FBE6B0083; Tue, 23 Jul 2024 02:24:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7569D6B0085; Tue, 23 Jul 2024 02:24:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 57EF36B007B for ; Tue, 23 Jul 2024 02:24:55 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id EE201141C28 for ; Tue, 23 Jul 2024 06:24:54 +0000 (UTC) X-FDA: 82370029308.27.AFE31F6 Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by imf17.hostedemail.com (Postfix) with ESMTP id 1A24040013 for ; Tue, 23 Jul 2024 06:24:52 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="x/fOyLdW"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721715858; a=rsa-sha256; cv=none; b=Zd+OHfUr/JQ80XZkmhcj6T2YkkGysfkXNeRYiuxkfZQpruXCClgkLywcnRUhPxMQ34FJio MGiBMQM5hrq/AXU67lc+oWaUh69/4ul4tuyLFpWGF9LcGT8JpJVwglEQB79EydMqfb4qyg TNbYyLblbuZUe5+veLbAOI0eYS/+Nrc= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="x/fOyLdW"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721715858; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PuZYvoWq8whgmpL1IFSnglW/V1EXZpX9C/pX5nqTQxs=; b=3WysgXx9chm2NtmnBVx1NAnPSwoNx6iSHsNdbunEKcOb6BGLUQOHSIYYlSax9uyC2fSPik HSWuuUF0Q/XoLPtqJqkxsXU8mCkRBzli+U0AkwW9i+WYscn+FWtakGGNByzOjaVMvtTIVr KJFSPdRuwk9rJMJy8x934TYcvDUv9Zw= Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-a7a9e25008aso18812766b.0 for ; Mon, 22 Jul 2024 23:24:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721715891; x=1722320691; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PuZYvoWq8whgmpL1IFSnglW/V1EXZpX9C/pX5nqTQxs=; b=x/fOyLdWF8C82NZTI3G1G9JdGZdpmSwNzQPyS1Pba0L4EF6rGEidhBTaTrUZwe3jlm C/IfCYqDgngQ63CuhfHjGITDHSG30SeCwyE6cPt2YKa1T9iw0jwFT+ax2I7wkMM+s7+C 1HRZA2LetjQsRk30RS1UyKX3Mb9uASZh+Lybcsmz4NS0n7OhdzehtGCtsv8N1h4ic0Zo omy1sV2M82+2NBrUFqhmWJNFvY7wzc6rdBIyXWs5WY6+kRaQ9TAQWIdyu61nfwh6BGr3 YuDZ+qLvojBzJE5iZgre1cQyb9NeBOjS7Kt4CHlD9uTmz+vVV+ENIHL3gngd698Cq3tI 85TQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721715891; x=1722320691; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PuZYvoWq8whgmpL1IFSnglW/V1EXZpX9C/pX5nqTQxs=; b=HLy0ytgi1sTQZWelaoSkUKfIe0Kzw0PKm6YL59P0Ejb/u4prwKPP2/4bDf81WHy0lh PpeZfn1GeZ+MRgUq8SorexeNukX/hjAe6/+nkGRw0D+iG9RaoUQjVSn4EWALESDYw2ZQ e4VzFGjVFDZCF5+G/LRYV4TNFc6DSfwnNQ8Y5uPOsrVRxNVgqaKRSJ5mliLDqkhSrA0P JwTgi2EtPnVHNlZtSID2Nf1Ep82DhDXFgm4kRSa3mnZneLr+g9BOEt9HGa/mqMcJma/1 dOGAcCPVp7Um5EcFeQNP9BcNGZfGS8zFjYL+zIv21TFAR3i2+Q/AeQpu9+Ztr4TFlHTV +olQ== X-Forwarded-Encrypted: i=1; AJvYcCUpyBT1TbRdr7CeTvZWgGDlrM6pJPQl+vo8eSLyT1did+RJck0EzU3UWlt+APOwUhvi7Tw/ILA4MqnZu+hVv3QG1W4= X-Gm-Message-State: AOJu0YyX0yhg6x0b+pBMZ4peKe5nq/MpL0IUDrmcOuJSFUHc+wvPp808 5dAvb14CSIhoexnmaWCDt9dCCg1SAci+Z+FOHldblXKlITz2RU8N3FgPqUvN51AVjnzk4Z14Hu4 Um5YIMx5uOWD/fGyWkMtUBIPEvT/Te2S2PaxB X-Google-Smtp-Source: AGHT+IGiQyOg6jtQNoklz56+z3NglFn5Y42p/i72Wp48qUQKGIwp/zuniQZ48JW9nBcy3wageNryev5duikJU+EPMvY= X-Received: by 2002:a17:907:7296:b0:a79:8149:966e with SMTP id a640c23a62f3a-a7a4bfe6355mr623341266b.1.1721715890808; Mon, 22 Jul 2024 23:24:50 -0700 (PDT) MIME-Version: 1.0 References: <100caebf-c11c-45c9-b864-d8562e2a5ac5@kernel.org> <5ccc693a-2142-489d-b3f1-426758883c1e@kernel.org> In-Reply-To: From: Yosry Ahmed Date: Mon, 22 Jul 2024 23:24:14 -0700 Message-ID: Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Shakeel Butt Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 1A24040013 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 31acae9khke6buj3i1d57rdwkdrh6zcs X-HE-Tag: 1721715892-358309 X-HE-Meta: U2FsdGVkX1+GZ5OtY8FWXw63yldVbAro6QNX5QlcxDZRea0FlBh7XLb5NZs2OgWA+/VojI2H0UYCmCj1ZLL5HnQwJp8SS72BqnDViWy2IS95+IlaiC/qRIHWdCeYK4eHFQTO5e5xkQoa4avzd+Bo86MkS8GXTOUaxJjA0V1jmE+9uUI38Nv0qIaNj86rYGENX3J6sHJXbo+GtffZIk4H1mqw40wo5jA/mXW565fdfqaGvUSCE50f3xn8Bq5QXyHYArsR/oiPtqb2w/MyMBX9sj3WpXuDDH+VPEfoox1NOtEbX+WQFuDzBizm5G3jdsHKRILCl06u793YYStntGIb+UOqMaWvXWq9tcWIy20lvEhGjTYc1qSkyD1kx70ks7nskdGfwpZeKL17nahQdqccmVzBskFqsGkiL5hxJOhGwqooxtyhf3mzTZmNKZS+/k1sBtsaepPDw4VUKAjJAlSPw+XXUQljH7ljsp5lBCNLeGavEndcXPAHWKygwhVaU0QQcl8iAgLbR5LoPdGDBwdFNWWj8Z3n+JV7gBV6NA2+pRsEpXMqETngCFzopl9YOFsahffaHEFxlZOa8zEt4avxWBvVEuX/j4RDIPmfywczoGwW7Vf1j7G/jF0uwKApKWoMmtqpZAR4fb0mwLEFtfGl4MX47u2PuZ7MBioUWNOFsbbmCBkMjVcJTQEaL3BIH6IYXoI1xVJhWPqCjPEBIVa5A4oqupCBqYsJTSQaZy1k/jQnzvVIn6XFhUDkfJneM2e230cI6u7JGnMOvyDg3P2Leoh2/Tg1VUOitsuqzkEhefvG1GyhHAUXp9lhR7uGVtuZ+FqP1ylp2LTvgKnIEFPKQhUx1T+sg/j+hgR09pzlkm2qjXRjXpJW4ACtrNNwn5qdhBB8KajxjFTslo7ZSyr/mdNrhYjoAjsQXdthgVbnEkGnTumWB7NbUC1ejTwVU59yOnfB0UW56HogdUx2Ajl 2AIIOZk9 NkFWSaZuyXxE4JFL1GIq52FU6hHLhUOWngqIQVbA7AGPBtorFqpREIMp6xyPgE2xkFLzmYpA8azLxADq1FngRNgk2uC14YvRaSR6NF1XSXnfF71gECcaTfyhbPOo7irCkvxHcAyfRnffmpU9ZFvpboS+ZAdbBgZ1DhB7URpRT+4CfnwFs/8nUXCFeZRd7Fa2D2ngLf5Dcqa7eNe8w4Rxi5D5UW94MITslChvJowyjzHRdZ+E= X-Bogosity: Ham, tests=bogofilter, spamicity=0.147486, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 22, 2024 at 3:59=E2=80=AFPM Shakeel Butt wrote: > > On Mon, Jul 22, 2024 at 02:32:03PM GMT, Shakeel Butt wrote: > > On Mon, Jul 22, 2024 at 01:12:35PM GMT, Yosry Ahmed wrote: > > > On Mon, Jul 22, 2024 at 1:02=E2=80=AFPM Shakeel Butt wrote: > > > > > > > > On Fri, Jul 19, 2024 at 09:52:17PM GMT, Yosry Ahmed wrote: > > > > > On Fri, Jul 19, 2024 at 3:48=E2=80=AFPM Shakeel Butt wrote: > > > > > > > > > > > > On Fri, Jul 19, 2024 at 09:54:41AM GMT, Jesper Dangaard Brouer = wrote: > > > > > > > > > > > > > > > > > > > > > On 19/07/2024 02.40, Shakeel Butt wrote: > > > > > > > > Hi Jesper, > > > > > > > > > > > > > > > > On Wed, Jul 17, 2024 at 06:36:28PM GMT, Jesper Dangaard Bro= uer wrote: > > > > > > > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > > > > > > > > > > Looking at the production numbers for the time the lock i= s held for level 0: > > > > > > > > > > > > > > > > > > @locked_time_level[0]: > > > > > > > > > [4M, 8M) 623 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | > > > > > > > > > [8M, 16M) 860 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@= @@@@@@@@@@@@@| > > > > > > > > > [16M, 32M) 295 |@@@@@@@@@@@@@@@@@ = | > > > > > > > > > [32M, 64M) 275 |@@@@@@@@@@@@@@@@ = | > > > > > > > > > > > > > > > > > > > > > > > > > Is it possible to get the above histogram for other levels = as well? > > > > > > > > > > > > > > Data from other levels available in [1]: > > > > > > > [1] > > > > > > > https://lore.kernel.org/all/8c123882-a5c5-409a-938b-cb5aec9b9= ab5@kernel.org/ > > > > > > > > > > > > > > IMHO the data shows we will get most out of skipping level-0 = root-cgroup > > > > > > > flushes. > > > > > > > > > > > > > > > > > > > Thanks a lot of the data. Are all or most of these locked_time_= level[0] > > > > > > from kswapds? This just motivates me to strongly push the ratel= imited > > > > > > flush patch of mine (which would be orthogonal to your patch se= ries). > > > > > > > > > > Jesper and I were discussing a better ratelimiting approach, whet= her > > > > > it's measuring the time since the last flush, or only skipping if= we > > > > > have a lot of flushes in a specific time frame (using __ratelimit= ()). > > > > > I believe this would be better than the current memcg ratelimitin= g > > > > > approach, and we can remove the latter. > > > > > > > > > > WDYT? > > > > > > > > The last statement gives me the impression that you are trying to f= ix > > > > something that is not broken. The current ratelimiting users are ok= , the > > > > issue is with the sync flushers. Or maybe you are suggesting that t= he new > > > > ratelimiting will be used for all sync flushers and current ratelim= iting > > > > users and the new ratelimiting will make a good tradeoff between th= e > > > > accuracy and potential flush stall? > > > > > > The latter. Basically the idea is to have more informed and generic > > > ratelimiting logic in the core rstat flushing code (e.g. using > > > __ratelimit()), which would apply to ~all flushers*. Then, we ideally > > > wouldn't need mem_cgroup_flush_stats_ratelimited() at all. > > > > > > > I wonder if we really need a universal ratelimit. As you noted below > > there are cases where we want exact stats and then we know there are > > cases where accurate stats are not needed but they are very performance > > sensitive. Aiming to have a solution which will ignore such differences > > might be a futile effort. > > > > BTW I am not against it. If we can achieve this with minimal regression > and maintainence burden then it would be preferable. It is possible that it is a futile effort, but if it works, the memcg flushing interface will be much better and we don't have to evaluate whether ratelimiting is needed on a case-by-case basis. According to Jesper's data, allowing a flush every 50ms at most may be reasonable, which means we can ratelimit the flushes to 20 flushers per second or similar. I think on average, this should provide enough accuracy for most use cases, and it should also reduce flushes in the cases that Jesper presented. It's probably worth a try, especially that it does not involve changing user visible ABIs so we can always go back to what we have today.