From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFDD9C30653 for ; Tue, 25 Jun 2024 20:45:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 884EF6B0093; Tue, 25 Jun 2024 16:45:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 834FA6B0095; Tue, 25 Jun 2024 16:45:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FBA36B0098; Tue, 25 Jun 2024 16:45:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 51BAA6B0093 for ; Tue, 25 Jun 2024 16:45:43 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id CDAA61A065B for ; Tue, 25 Jun 2024 20:45:42 +0000 (UTC) X-FDA: 82270592124.07.1AD5C34 Received: from mail-lf1-f53.google.com (mail-lf1-f53.google.com [209.85.167.53]) by imf03.hostedemail.com (Postfix) with ESMTP id F062620017 for ; Tue, 25 Jun 2024 20:45:40 +0000 (UTC) Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=cBPCrvvW; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719348326; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=z2P9ILY2ra5KqWGL88ltMawjFM8f8JfLEH0YHaS8cks=; b=Yz4ew3qPliEGxskpqbC6cBbIlngfHMv9874L969Qy1ldc2uLPj3Tld2cZMBqd4pWsxqtra +AsnXMiX05GhCwQxGJmsOFZIDodcyQXMR5YRGmfIGjKFsaYaVfdETDDzMwt6pD964YwYQZ YxlSM9jhN13wm5sVQ7dLTzwP9oup344= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719348326; a=rsa-sha256; cv=none; b=o6qARKlPcp2aOVX4GbEV1lhEe44LztQTT4SGh/hDmU80YfEFKgrviBiJZI0942flXRb+r5 C/g/KAg4itgTobglPy5zRKVDdf2NpXs3Z28Vv6c8GYnbEuUp1eoafx9F32euXASfp9Git9 0srOaBOWhItz4DwQvmtocFa0dPMy3W4= ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=cBPCrvvW; spf=pass (imf03.hostedemail.com: domain of yosryahmed@google.com designates 209.85.167.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-lf1-f53.google.com with SMTP id 2adb3069b0e04-5295e488248so6860069e87.2 for ; Tue, 25 Jun 2024 13:45:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719348339; x=1719953139; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=z2P9ILY2ra5KqWGL88ltMawjFM8f8JfLEH0YHaS8cks=; b=cBPCrvvWwx9v5OOBTkZtR8PLWAsDUrccGgQiAAblrZ7OzsHuUDQ5s2u1Ifb7JYCVu8 jltfznMP57ErQkAwV55GkP4HRs2H4brwZkqIxGhN9sxSYCHFLYHxjedFiYHEti6RRKuk EoVyOUf+DUfDFsxsv7fbe/v1DRCaKuDoRWWHP0SFdO3vsMsK01Oa2zJRd2Wre06rR3yO 569tOp/KTAR94q0Gkuf1m1e3kEMF9c5sPmkHIGR1AN6BZ9x2N9OgMWLgokudcOp4GjvC H2gN3gGQ5eyTk7A+QL1A2z1llDM8jwWLraAlFQoLw3rzA8KWAXLzR6ZDhu6P2OFBri4A r2gw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719348339; x=1719953139; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=z2P9ILY2ra5KqWGL88ltMawjFM8f8JfLEH0YHaS8cks=; b=CCh/ekUGCh7tntBStmENx7kNcwqEX+6sZO+B7oxn1eOfY4HWial+q7Xgdxgxi2GSXR +BYljMU9+jqK76MszTnJp+t1F/BkqQajCaP6UELkDQk7cVnTcVb0dmXGEzsOpMre+COM Jfq5EGRHeTNc9hlQvn45tkFeFeXJuxmKDsR4/VjhK5KQzEaJbK9ZiLvd3wmfhdHoOMdj lmpOovtnvALl6GjSvxnsftHKafr7Vb2sArP3nDawLGJQunXCZja3oeWlvMgWofzvG1SO ZgwXWYOOGnoTfLCz2eRo2AWK2nIXbM7Sj2BzE64WNgPNH1WwsZ9d3nO0AKUJoDjzLKlv Ue9Q== X-Forwarded-Encrypted: i=1; AJvYcCWkeBfZIP21LcqW4LgtZ7/oaN0gvDWlqLg6Pug15slUeHxhCLrE6pxwxyx1+DOCxwjiNm6GlaEhSkWLxoWuuiEVDTo= X-Gm-Message-State: AOJu0Yw9rVokUYJVQvE+JJ7btUHA+X3WrdXgU2/+OYiVCHhF8s3zMpmt cT+rgAkISeUKCjQbMrf5xVJkgjK3S3tPaMDP5IllfHa2687PR4IgHU0L0qOgcXn2Nm7RTGyLNyq sdMBLB03iHBw+hQni0dTH/b7FbaO/3pel6suW X-Google-Smtp-Source: AGHT+IGOwOIPTe1qutlh1r9o9P0YrOPCFsE9vhdaTAjW/ucgyIdCYhurdY+x8wDI4ZrsfLorY8Gw/yOrHlksTP5KFMk= X-Received: by 2002:ac2:550d:0:b0:52c:e126:3d50 with SMTP id 2adb3069b0e04-52ce183bf44mr4713958e87.36.1719348338679; Tue, 25 Jun 2024 13:45:38 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Tue, 25 Jun 2024 13:45:00 -0700 Message-ID: Subject: Re: [PATCH V2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Shakeel Butt Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: F062620017 X-Stat-Signature: y7wytjxqemq3wr5ptz5nseb4jjn94cja X-Rspamd-Server: rspam09 X-Rspam-User: X-HE-Tag: 1719348340-569686 X-HE-Meta: U2FsdGVkX1/We02znhWWcSELod9WrDUZsFA2VmJ1HRWdsd4ZyK6cUFDEptYcUr/A28/Y+Xy6lTYphYecDLNHN5FBTsFki9h7Hl3R/IvoSxbhy/MZlYQ0i8f6W+eUlNdTGt7PPckZJpI/yfG2HBnHiqNI5aojdM/AnwJ1biBgPWAhUpKPLlZwBVXq5txirE7xH8rK7DKHseAYGrZ8tZ0EO3ECkDvD2RUXS1dWQ3xnFkIxFOQFRK3T0lhwx3Y4crM09SDiIKa/LxE7+5JWS6CZgfrAK5FN90BONPOqEU6eq2zdq/drC/Mf/XZKCYOUPtUe99hYfAF0ZNBGJeZaxt2dsnMhamCmvYCK3GVPpnO4WiZVU9Zua75NUtOoJr9xQzWatWItrKDAQiHZnzKSi9GA6uGNsh07pqoLkFOymawTdo1K6rDPc3YRy8VI3Vh4xWCnWi+TKSq/yx4u4Qq7lwzc4RWMXagSyEAkEfEz1Ux2SQ+f/PK6gP9FFYIHpW7qVeYTm4l3/sxav01XDPmM2NGSoJ2TlT3369Q0AA+vs1z/NdK2r+FFJUq+LpuiUpiZ0xadvMbYMpHfe3tchPMhseL6kb/hBGkomb3lkRiBg88NVDfl8/8w3AXNPSsdTK4FqTn3ESNwkeLPldqMPE1I6kEV3+aUUYNz55lCe0WqzvmZV0pJB5zsqknDQFR+Rk6P5LysJHgcAQCk+MI0GhdNGiR8Zr/PFFGYWgKq8MhPqXaISPGvh71DZ4r9FWfQHLgt8LEVNAko2T4P6daAinMas9llMcd3I6Fx3fwgw8iqg4nclseT5GvAoU+JyfjUupctzpTMTUearQkHaz+IeBwJUI7Zmt29zIzyIilyILA40eQCcG0vUNulisQMOm6suI8VGOfHOI3IqMTI5Imdua287RIoQRoD4i4fFehTJ/WpXWwJ8Jrz4j/btEHk3lWXRwJi7bOv/83FD82Mrc+BcG8rPYL GeHfLksU MYj6PDESB1bSaOuf1+QGOy59823BpJTZ29GpZlpZ0XfW6oCNQ4+oYvK7xc3zFudqAjhSX0eteXusp0jKKM03NqjGv3BenOSvAKVdEfbTixoqnDHi1jqttjPSQKtPLWdhDB4ZsrDXfKY1nOuF/5vaXLPZdpev/L6ufcw6OgWCwXZjBpjTTXPoE21c7he/0lNJTi+7U/ZW81KM5cLXY1l4Bh7tvsDb1Tqdu0HSriTF5iypuxsiCQCXjbaGm7ShNByr3ZvV+ X-Bogosity: Ham, tests=bogofilter, spamicity=0.060084, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jun 25, 2024 at 9:21=E2=80=AFAM Shakeel Butt wrote: > > On Tue, Jun 25, 2024 at 09:00:03AM GMT, Yosry Ahmed wrote: > [...] > > > > My point is not about accuracy, although I think it's a reasonable > > argument on its own (a lot of things could change in a short amount of > > time, which is why I prefer magnitude-based ratelimiting). > > > > My point is about logical ordering. If a userspace program reads the > > stats *after* an event occurs, it expects to get a snapshot of the > > system state after that event. Two examples are: > > > > - A proactive reclaimer reading the stats after a reclaim attempt to > > check if it needs to reclaim more memory or fallback. > > - A userspace OOM killer reading the stats after a usage spike to > > decide which workload to kill. > > > > I listed such examples with more detail in [1], when I removed > > stats_flush_ongoing from the memcg code. > > > > [1]https://lore.kernel.org/lkml/20231129032154.3710765-6-yosryahmed@goo= gle.com/ > > You are kind of arbitrarily adding restrictions and rules here. Why not > follow the rules of a well established and battle tested stats infra > used by everyone i.e. vmstats? There is no sync flush and there are > frequent async flushes. I think that is what Jesper wants as well. That's how the memcg stats worked previously since before rstat and until the introduction of stats_flush_ongoing AFAICT. We saw an actual behavioral change when we were moving from a pre-rstat kernel to a kernel with stats_flush_ongoing. This was the rationale when I removed stats_flush_ongoing in [1]. It's not a new argument, I am just reiterating what we discussed back then. We saw an actual change in the proactive reclaimer as sometimes the stats read after the reclaim attempt did not reflect the actual state of the system. Sometimes the proactive reclaimer would back off when it shouldn't, because it thinks it didn't reclaim memory when it actually did. Upon further investigation, we realized that this could also affect the userspace OOM killer, because it uses the memcg stats to figure out which memcg will free most memory if it was killed (by looking at the anon stats, among others). If a memory usage spike occurs, and we read stats from before the spike, we may kill the wrong memcg. So as you said, we can experiment with in-kernel flushers, but let's keep userspace flushing consistent. Taking a step back, I just want to clarify that my arguments for the flushing changes, whether it's in this patch or your ratelimiting patch, are from a purely technical perspective. I am making suggestions that I believe may be better. I am not trying to stop any progress in this area or stand in the way. The only thing I really don't want is affecting userspace flushers as I described above. [1]https://lore.kernel.org/lkml/20231129032154.3710765-6-yosryahmed@google.= com/