From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E30B3C30653 for ; Tue, 25 Jun 2024 21:25:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58FBA6B009A; Tue, 25 Jun 2024 17:25:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 53F3A6B009B; Tue, 25 Jun 2024 17:25:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4070A6B009C; Tue, 25 Jun 2024 17:25:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 23E606B009A for ; Tue, 25 Jun 2024 17:25:18 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 967B0160652 for ; Tue, 25 Jun 2024 21:25:17 +0000 (UTC) X-FDA: 82270691874.27.8E80F82 Received: from mail-ed1-f50.google.com (mail-ed1-f50.google.com [209.85.208.50]) by imf09.hostedemail.com (Postfix) with ESMTP id AF6E5140013 for ; Tue, 25 Jun 2024 21:25:15 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=i9AeRqMd; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719350707; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=r2nKpI1QhRDiz0AIUr6dLuT5KkDGFiB0nn/OHOxptIg=; b=Mkp5FKy7EhCTb2d1XidOPdToUcCa8mFzYQSegYaNuYK+oRCmRB5GADY5pJuVtcg2Bp/4x+ hngLI9PeRy1M+jFm9+mC2+yz3wRMxtXNuNzkHD56weV+49Sw6RYeYKznWsXjgSt9Gz8s2N HAWVNE5dnTjmYd7dyZKXcth2WRJqvvI= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=i9AeRqMd; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719350707; a=rsa-sha256; cv=none; b=X1DkpeX/WZh97qEFBR3UjGdm3MUXcRzCsbyfw3WejwZqUV4zYgi5VE8OdYwRD6FIeq5dqM 4qry+GzK6JFm5mY8Ald41BevgFXH4a5AWNeqKlv+5DHys6sfdtHfXt6q4wXJTnNiHDL2qw 5BgnSL9x+QqOJm0NMzDcb10Ui5ugPu8= Received: by mail-ed1-f50.google.com with SMTP id 4fb4d7f45d1cf-57d07f07a27so6872411a12.3 for ; Tue, 25 Jun 2024 14:25:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719350714; x=1719955514; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=r2nKpI1QhRDiz0AIUr6dLuT5KkDGFiB0nn/OHOxptIg=; b=i9AeRqMd1sAmCvLw64lJsRh2ouG9thBWTFfb/Wb4+XQTJjF+sn1Rgk1UDCzAiO8cLL K+/BEGcSa3WJNzBUL+zDHWVxDxTdHnGVGM+72hJwqshHui1HdoKQD6s0UO2E46fgiHzs 9VjVGL9mUlGF7h4TEshQ8g4tl3q6RC8mY8AQ2ZuZekkGKJiRdmZ0kq9pQp4jh12RhE9y 0G80M1TXrIDiYsIWPcsjV153Uv19SnLm/ulNXiT6GAARs65OA2MOFgaeVscOm0a4q5NL 7AYHseGGSFMe9Svk7PKwgRn6o71Gy7yHjaIdh92WtcCThvTNCtU41s9ErOnsLgVXMu1I DIBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719350714; x=1719955514; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=r2nKpI1QhRDiz0AIUr6dLuT5KkDGFiB0nn/OHOxptIg=; b=fWgjMi5Q7jak/q6JeUk0aQcChFD3WrRJgrGqwv56MAyucvkcqyOIT2XyEuUjoKfn0H HH1WpphjAuLhi2tFKvyDtYy3rg/QIqgs/3CaWzGLnlbgD5uo73J/JU/VbAJpwG4LIDt9 PmfTeEYDMVj68S2QvTwAA31wG/hmiTc+TRCR/eJP10SV34zsZwXsQxsqNmMshNaAFNP0 8H+rOrJ/FPW47WmCl8tAoNDHVoFmrb/bYLOae26SS7YIdyynsmeDl/omNxIlWwEtihMU tsB5Iw4VCHUt/v76Grva/C/j308XzeqXHWLrSaGtvb3xxNneB+urldVA4Ldj7FAtDzXc TkYQ== X-Forwarded-Encrypted: i=1; AJvYcCUBkg7lD9aLvAZCFK6oeKZcsVWa4XGUzzE562EiOPq9Ge8nU4AGn0eUnXX+gwHkNP35axoqc+0IlSyHNfpAnPmu7Fc= X-Gm-Message-State: AOJu0YwQcLKHSZTf72IOJ4gR63uzdTriw/9DcJuuCOsnhgVIuFxqBo7i l0ke1HN5m2Wely2s/vOKju0U1/stNynSVbiFDcDWvtjNbLrB+NWDPmpP5AESjo5h9eRQkQs94fo Jzae0XKu0XPpd3QALFNxTVafDeM+hUK4Ac2Oa X-Google-Smtp-Source: AGHT+IEH5SQT0BSWNwtzH1MmW/zob6h/XOBPfp32D/tXHtok0ERfuzb089Zk4lxI9x0o5cEi2GptQc64uRehiKZE0Os= X-Received: by 2002:a17:907:a60c:b0:a6f:e111:a152 with SMTP id a640c23a62f3a-a7245c48345mr567878466b.62.1719350713495; Tue, 25 Jun 2024 14:25:13 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Tue, 25 Jun 2024 14:24:35 -0700 Message-ID: Subject: Re: [PATCH V2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Shakeel Butt Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: AF6E5140013 X-Stat-Signature: noboz1b48hjg93k9oo7dsgy35bfgp39i X-Rspam-User: X-HE-Tag: 1719350715-484217 X-HE-Meta: U2FsdGVkX1/BMtk4MvRQvlpVmxPiw6p8QmYiNl8Bsr0/ifP3qGPvvOsXQb0/xIdAtVxfJgdfwwzLhv1GL63a+jzWEDAYrPdXx+76gtPcxilFRgbX4gVWYJahIzVD/k9DeUiW7WE9kc5ps8uZZ4gERKtRRMrAHlH9hkIdo4Jw9sMkhcnCzzDSsiWH5uvZjjl7kBcbSc9+f5eOpvanET/jVCp2a3GrPZTBLRl24b90ImUcMgy3em3w2HSsgxsv9qMfUuvhyxgdJ7lYMbdT0KhylXYP2bCCaQfPBrA/Ezso/ZLb2q50vu4yAlW09Dp7zR1nS8s7apb3Uo6apKl5cVTthUYtm4EFKoD+AC5U4StmfltjD3mxpxUjQVpmWbkX0s4i3nS2sApx9UrRuJhZB5/SVKg1MBn9OHdjsB6f2//qgWAG3QvqmrB5FN5X9n5y9T9K/rtRL1ll+Lk+yvmpnUTDmWyLj5B6x2tA0gZ0BWdYrXVKmKYseENO8qOpdpY6Ll7+Wd9r/zFcL7jUOO0rE8fNflTyn20QqZc6dXa1OtQXwBNmV46SfdNAmU0vfE6WAewCBW9fPbCvw6ud0FpeDZc4GRHb1e/iF3GeGwnHDQPakb4LFqCTzl+rDHNRESTk5geGGxNGgmuf7ODVtwsQQ5kTq9lq7mARtMyuGsF0BoR4R8sx4pqTXsx9yqgUA6ujIjzZq3qDhzx0gAd9PNDcHPm52Yrxa7cKC4qngolavGQd/OpeWk6b03AR4VaLnBhsG897z8BclWhf82LhyomTpYPDMHfW2o9OmeY7lHkwVG15dNeEnNK/QJ2UT2OXOQpVQ3tEu7/LZvNgEqrid7ZsElpJnfO+RsdcHLYJmZ0rPrDOwY3uJ22024+CezIo/7nBLX9VfBlmOvgaNG+n5r+7WwgmWqCs7q3g9Pamq/7nrvAcpvdQXWTHFz/dQW6QwLp2WILYWBk15PvkyuGNnsn21tD BP9UQ1Qq lzh7bmPfTHD8e4RDbuVl6p4fcpcztaDlhEA329ymEJpAxzS3Zv/+I9YJfysuqjCl7hWm9xURpHhRBNhm90DbU1qh0tQ6k3DTPcOLDwBoXfJeBnnn1eouWF441oAUkrIVDcXjHT4VfHqx86oFuUKKpvraDaBB8jSun5W3ii4oLcPMcxlNGwVK0ePZrNFMQ50ybpNABAz6hwnmiauF2Hue3pbt9ZyV4bVnnG9zjCrhGz/5ztkIPuJcCaJoVo4RatFkI5z5C X-Bogosity: Ham, tests=bogofilter, spamicity=0.061932, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 25, 2024 at 2:20=E2=80=AFPM Shakeel Butt wrote: > > On Tue, Jun 25, 2024 at 01:45:00PM GMT, Yosry Ahmed wrote: > > On Tue, Jun 25, 2024 at 9:21=E2=80=AFAM Shakeel Butt wrote: > > > > > > On Tue, Jun 25, 2024 at 09:00:03AM GMT, Yosry Ahmed wrote: > > > [...] > > > > > > > > My point is not about accuracy, although I think it's a reasonable > > > > argument on its own (a lot of things could change in a short amount= of > > > > time, which is why I prefer magnitude-based ratelimiting). > > > > > > > > My point is about logical ordering. If a userspace program reads th= e > > > > stats *after* an event occurs, it expects to get a snapshot of the > > > > system state after that event. Two examples are: > > > > > > > > - A proactive reclaimer reading the stats after a reclaim attempt t= o > > > > check if it needs to reclaim more memory or fallback. > > > > - A userspace OOM killer reading the stats after a usage spike to > > > > decide which workload to kill. > > > > > > > > I listed such examples with more detail in [1], when I removed > > > > stats_flush_ongoing from the memcg code. > > > > > > > > [1]https://lore.kernel.org/lkml/20231129032154.3710765-6-yosryahmed= @google.com/ > > > > > > You are kind of arbitrarily adding restrictions and rules here. Why n= ot > > > follow the rules of a well established and battle tested stats infra > > > used by everyone i.e. vmstats? There is no sync flush and there are > > > frequent async flushes. I think that is what Jesper wants as well. > > > > That's how the memcg stats worked previously since before rstat and > > until the introduction of stats_flush_ongoing AFAICT. We saw an actual > > behavioral change when we were moving from a pre-rstat kernel to a > > kernel with stats_flush_ongoing. This was the rationale when I removed > > stats_flush_ongoing in [1]. It's not a new argument, I am just > > reiterating what we discussed back then. > > In my reply above, I am not arguing to go back to the older > stats_flush_ongoing situation. Rather I am discussing what should be the > best eventual solution. From the vmstats infra, we can learn that > frequent async flushes along with no sync flush, users are fine with the > 'non-determinism'. Of course cgroup stats are different from vmstats > i.e. are hierarchical but I think we can try out this approach and see > if this works or not. If we do not do sync flushing, then the same problem that happened with stats_flush_ongoing could occur again, right? Userspace could read the stats after an event, and get a snapshot of the system before that event. Perhaps this is fine for vmstats if it has always been like that (I have no idea), or if no users make assumptions about this. But for cgroup stats, we have use cases that rely on this behavior. > > BTW it seems like this topic should be discussed be discussed > face-to-face over vc or LPC. What do you folks thing? I am not going to be at LPC, but I am happy to discuss this over VC.