From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0BADC2BD09 for ; Mon, 24 Jun 2024 19:38:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 358386B02AC; Mon, 24 Jun 2024 15:38:12 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 307646B02B6; Mon, 24 Jun 2024 15:38:12 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 157786B02AC; Mon, 24 Jun 2024 15:38:12 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id E9ED86B02AA for ; Mon, 24 Jun 2024 15:38:11 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8C3111A1394 for ; Mon, 24 Jun 2024 19:38:11 +0000 (UTC) X-FDA: 82266793182.11.5CB46A6 Received: from mail-ej1-f50.google.com (mail-ej1-f50.google.com [209.85.218.50]) by imf04.hostedemail.com (Postfix) with ESMTP id B67D740013 for ; Mon, 24 Jun 2024 19:38:09 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jHGnS6Lw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719257878; a=rsa-sha256; cv=none; b=oN1XWbz+LEEH9wWT83YsFTsJqcpxRDTkTqNNRvpoG6W/do/dZYTdql0I2ghdxxvaybRpoK lV3pYL6awo8ggzF3RpjO8Bhh1tKOCx/s7StvD3bYcJ0ufTPUx4RvJIV8aCpMhcRNraoPO1 O2Ap4snyUEuqAzkqDhrFZRAivCpS5xk= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=jHGnS6Lw; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf04.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.50 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719257878; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6KWvh6Ar08KHiNHQZJwXg90A0+PoCPg8pkzQX9pNfaU=; b=hWXq3zXOBQbDWu5iJbNzliqiyIyJBWAypZ331FR1MzaFnVz4cw0MWl3NjmmE6RMu2klLmM jizjWysuEgVP/SlabC/sTcvL4OpcEgbyBZF2VRl9kQnX7m5gTypsV6V2BT3XnV4UdzzNCy WFHSF4DTIC4bLA61ik+p8KykpJJbRxo= Received: by mail-ej1-f50.google.com with SMTP id a640c23a62f3a-a724a8097deso192870066b.1 for ; Mon, 24 Jun 2024 12:38:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719257888; x=1719862688; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6KWvh6Ar08KHiNHQZJwXg90A0+PoCPg8pkzQX9pNfaU=; b=jHGnS6Lwi4Rv38iduOci2QVkgPZ6bJ/7SijUBTsRd64OP4HAWsioJUCT52mQIqHhP0 wIUAJg2/zmBfX/J8F7xlR8ba6sfaRVKJ6c9VINl6VSqP/Vnh+Kx2Qtw8biJk5KbsXxxk odsA3LncFbiXXYQLLhVaMIo9mLQbAU3yh3Ita3PvRAv6EPqCA9Kt3AKaMjjMjjRjtWuW lX28LoJZf7gyUPjbo88bq0z8rga5cICR4ZFrd5Bh6UHIy9+65oKAGqGy7UDyAyoc+bvT Z08zFh4JwAKhuckl87s0qtsoPGsdGrlGNRDM5pSkmgGvk5JHKy3RMyZGhXXPIY+y5zMm FyYQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719257888; x=1719862688; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6KWvh6Ar08KHiNHQZJwXg90A0+PoCPg8pkzQX9pNfaU=; b=rJH/lWXv1SPOI3KVxHcZ0Vl/u9yS9p1RxdKBoNS6ZT5T2Zj8vT+Q7o/XLgtsijMbDs mb6IZ8ULxVsDPsRnDfbeRN+tVw7RBCEK8NJyFMxJ7YEVyk6t0bqgGmnUqn0UgWGwxyJY w/oWXxLK2iZ0E7DtcHR+6W4jCMUHXUDJaMVuZKcas1q6hM5eafP7LjFh3PhNzF5Mi3Uo bvAxm65hRW/rgL5YFUSq0HC0y56nuwcTC7/conswzwFOlkmamd7J1yPZvKsH9HQOF1A+ PUVk2gLOXvp8Qy61/Yi33PpMRutYNktEUd9F27wvwyXeYiGtu/Pes+z6i1YiAFAX9htU YeFQ== X-Forwarded-Encrypted: i=1; AJvYcCX2wl13WpU0kFtC/voQUbP89jV5RXa43PNj0X9KLKjWKQqp89APiVXm5m5/rSiGujEGSxYtbmABpnU3qkMzssu86Fw= X-Gm-Message-State: AOJu0Yxs/D8RogjLEfcyLTWWmnjxVnqW3JCQf9sVqUOH6wZQl4JRoWAl nb8TRyh7j4VUOObzhtdlx+IDTSW4h65EHpLHvahsJUTT9RHW+BKrPdcwddMSe7x1QpK4wYeLJtD be5xiH++nQMneJk3BZgpl1t5JlZyngrxPgyTgZnG3di72ebf67XA4 X-Google-Smtp-Source: AGHT+IHF9/TR2BFeoREtJbqL7iiT0kIQYsP8YmxY3YVgiJ8zGa3ADbSEdOf7eG4D9vf/IO0pHVLcINZCJQXlcXQDsV8= X-Received: by 2002:a17:906:ced9:b0:a6f:8265:8f2 with SMTP id a640c23a62f3a-a7242cb716emr329469166b.37.1719257887818; Mon, 24 Jun 2024 12:38:07 -0700 (PDT) MIME-Version: 1.0 References: <171923011608.1500238.3591002573732683639.stgit@firesoul> In-Reply-To: From: Yosry Ahmed Date: Mon, 24 Jun 2024 12:37:30 -0700 Message-ID: Subject: Re: [PATCH V2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Shakeel Butt Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B67D740013 X-Stat-Signature: m67fdqb5cadag1xz66e5gf3ydkz75hfu X-Rspam-User: X-HE-Tag: 1719257889-476811 X-HE-Meta: U2FsdGVkX18aJV8hvMO3wj+7IF44/gUYp+HdsCrzo/zAV/q4EFXeW8ryGhfC9Hn15UuvWU3ljHTfhoMAsnjdpq1uLxs6pmqiqUV4+CdsUfUh5t3TvbuiNJrN3gLDNaxNcBTAZew/Zrgbu9kx+fQt/VmC6kqajFDWWKzsJ3Ess+HAmsTXyl8OhMZjQq0x0agJVCoUNWJ5h8Bg2fPNiePcaH8s/g/Cs4r9OFfkQfZKYe1DOz4a/pwNWCxvY7tdYWgkOD1lz3v/XGsKQzM6xBauxXaJtvIUS69ru6JVl9UioOt1qjtsP5SeiH9rDby/x8/V/uKWYgxWDKhmoKNzGw2l6Eg9BJG1EfhFFoCtQijsEvbj4rWxJ5rZeZIjxi5cfv5YYJVmJjojTaMHkTUqIqsmag/pGEphBb5CXrA38tHHkzlrrXodyfFbop3eus67ZSDQyuAUBgi+udXl8XnwvsyNTBWWN0UOXR4UW+1VuRw9RxmGRSSHn0BkO0B/oJEHcN/4hisURTJaoMXYvruff/CN+dbiOY/xb48fNGU3Y+gp9LheR9Va3TUEGFE5KRkPjv71vgohj4zrPdn65k3X+YseFhbQGQzxi43AMcxtpivTSMmcHzpFosmhL6HP7R8mvuhksTzBsPStb/6UTB5HPl4ti1ahYKuElVlxJ3C7ZBMT81ThaJKN52QZoZTb7MJgM5mGaHPcYF53lBpoS6PR6StHT2VQmvf5CDuBXw+NyFfMAULWD2cPM+yUxM2m9L8W17sXjNFpMwsF5Ys5IiC3GhGO+iH6nsPiEgAXJ1vOLJIcHmz/sBN/IySKMz8hXjQiSHuuFB/674370MvN79c3dZVPg1m9jna5CY9eo37IvYDK6lpbNDVraHHJQfnKVpeWfFKDIUZgxwCjdMdEOwIAbIn2QJI6TbGWxFocYaoT0v1WiImyk9n7Uh4VrgMq6Tm960GAFeWPexogewZtq9V5k44 J/jRFuxq Q/aFw6dkHsGEWtoIUXGRFpcYgkN/zjhfMc36ee1py86bKOR9kWWyy8yOjsXweI2qz3gRSlvfmXZ1H0EZAcjHmazOziPG7ONDizFi8IMP4OJGOmmrnTaQFjJBLhiTX3K7S8IhwKi/hrDHBkEwc9Sag3XrOCcnK6VqQK6eufVlYG8AxrbSPYhewlbgwjucYGMW+q10BTIGikTLgH889OWhW0EvwtihCK5R7K+7VDRAvhHkzsAo= X-Bogosity: Ham, tests=bogofilter, spamicity=0.006426, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 12:29=E2=80=AFPM Shakeel Butt wrote: > > On Mon, Jun 24, 2024 at 10:40:48AM GMT, Yosry Ahmed wrote: > > On Mon, Jun 24, 2024 at 10:32=E2=80=AFAM Shakeel Butt wrote: > > > > > > On Mon, Jun 24, 2024 at 05:46:05AM GMT, Yosry Ahmed wrote: > > > > On Mon, Jun 24, 2024 at 4:55=E2=80=AFAM Jesper Dangaard Brouer wrote: > > > > > > > [...] > > > > I am assuming this supersedes your other patch titled "[PATCH RFC] > > > > cgroup/rstat: avoid thundering herd problem on root cgrp", so I wil= l > > > > only respond here. > > > > > > > > I have two comments: > > > > - There is no reason why this should be limited to the root cgroup.= We > > > > can keep track of the cgroup being flushed, and use > > > > cgroup_is_descendant() to find out if the cgroup we want to flush i= s a > > > > descendant of it. We can use a pointer and cmpxchg primitives inste= ad > > > > of the atomic here IIUC. > > > > > > > > - More importantly, I am not a fan of skipping the flush if there i= s > > > > an ongoing one. For all we know, the ongoing flush could have just > > > > started and the stats have not been flushed yet. This is another > > > > example of non deterministic behavior that could be difficult to > > > > debug. > > > > > > Even with the flush, there will almost always per-cpu updates which w= ill > > > be missed. This can not be fixed unless we block the stats updaters a= s > > > well (which is not going to happen). So, we are already ok with this > > > level of non-determinism. Why skipping flushing would be worse? One m= ay > > > argue 'time window is smaller' but this still does not cap the amount= of > > > updates. So, unless there is concrete data that this skipping flushin= g > > > is detrimental to the users of stats, I don't see an issue in the > > > presense of periodic flusher. > > > > As you mentioned, the updates that happen during the flush are > > unavoidable anyway, and the window is small. On the other hand, we > > should be able to maintain the current behavior that at least all the > > stat updates that happened *before* the call to cgroup_rstat_flush() > > are flushed after the call. > > > > The main concern here is that the stats read *after* an event occurs > > should reflect the system state at that time. For example, a proactive > > reclaimer reading the stats after writing to memory.reclaim should > > observe the system state after the reclaim operation happened. > > What about the in-kernel users like kswapd? I don't see any before or > after events for the in-kernel users. The example I can think of off the top of my head is the cache trim mode scenario I mentioned when discussing your patch (i.e. not realizing that file memory had already been reclaimed). There is also a heuristic in zswap that may writeback more (or less) pages that it should to the swap device if the stats are significantly stale. I did not take a closer look to find more examples, but I think we need to respect this condition at least for userspace readers.