From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25191C2BD09 for ; Mon, 24 Jun 2024 21:43:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9AA8A6B00D1; Mon, 24 Jun 2024 17:43:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 9594E6B0125; Mon, 24 Jun 2024 17:43:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 784656B00F7; Mon, 24 Jun 2024 17:43:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 5D0306B0364 for ; Mon, 24 Jun 2024 17:43:43 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C777E1614F6 for ; Mon, 24 Jun 2024 21:43:42 +0000 (UTC) X-FDA: 82267109484.30.8CE16CC Received: from mail-ej1-f51.google.com (mail-ej1-f51.google.com [209.85.218.51]) by imf09.hostedemail.com (Postfix) with ESMTP id DD861140002 for ; Mon, 24 Jun 2024 21:43:40 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NY6rkBTx; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719265409; a=rsa-sha256; cv=none; b=BYgE9jZMUYKa9wRTnIZ3ZcPo61fZ3dFtIUKO4ckZ93+Q8qJ/LakzfsbYEJ8vWXZ4O3VrqJ XgbLuj3PKs7jm5iQYTeuiiWpgIZO1ywB592ThUm/nf0TvuUPgPJlJVLWYw1PgA4VDyC+Z4 RoboO3t2PFTkag41g4mo29nFCSAtmd4= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=NY6rkBTx; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.51 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719265409; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=KBQQkF7jnFf1unx0i+gasghZQg9ERLHRTAKjxia0KvA=; b=J1tm5jvZ07eRBctxjVZW254pLp7c+SXRoeiPqMI2Zmj8Fceif5AVHQ8PGSqkfcP80mcdvV 2lask7FKvjkIep2n+wnFJTR5YjAAE/4i5+NAc6pyujH76dQ7PHveij83SnWrjcEH5ESCGl OLxp70oW15eeGWDUlKz4CIdr/c1bk6s= Received: by mail-ej1-f51.google.com with SMTP id a640c23a62f3a-a62ef52e837so586767866b.3 for ; Mon, 24 Jun 2024 14:43:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719265419; x=1719870219; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=KBQQkF7jnFf1unx0i+gasghZQg9ERLHRTAKjxia0KvA=; b=NY6rkBTx09iygeZz4yGsw7X6UBo+kPHcu4WBpLDI15uAWPSGpDlTbOashRHvNzRzWd JYL8PiRaUwBSewXmbvtZVnAo9zHobx20NUGYZdTyNRup4Kd0XjuVcoDiupDr9Eou4dQ5 /TdoltNCfro1Ufty6yZdAZLR3nCv+mEXPyOrPve6hvIDWyq6o2DE0kyQZ2W0yifmqd0U 87YYwEBnuMsKcb03pj/VxciUwyHRGKPaA38VRZFjhKXhAxZgUZkvQMHeH6L/jH8c1yUs GvEhSPAgEkvKMkU5pkqk07T/DIlOLevFIFN6h/uB1UE+hS9grfjYP3v82G2hpCjUQs1a Phww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719265419; x=1719870219; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=KBQQkF7jnFf1unx0i+gasghZQg9ERLHRTAKjxia0KvA=; b=UcVdpJz1QqxuonMttB6BLaDKNLpyUVq9WJkQgRaW0mKoXpU50GDRwag8tHAw1yaw/E i+aZQ70ww/RKImT5h9n2rAllRDtjEOsc4lD+qfSFlPs+asPFzx/UheWAZG/7jj9KhEzV 028wJxNOzDfhpmgL9aVaxisw3btYSN3aXb4BLaB5wSmYjdijWP0HKC5kQsJ9U4Pjik6C Iek/wdCwBoGinSbDrWo35of7NkE566K7kqnE2hVlHzcplgLZ8cFvv/gi/ttvAo5/sBEz jrOb3SZbkksZ149dEDeLRjGTCbXVcTJparPtTZhnIP+hy8m3rAe2gwluL83rWgJq+7GM GsjQ== X-Forwarded-Encrypted: i=1; AJvYcCUTGHwsI1u3cnHO1i1xuRpaQWKKn6mgcVLohGDbjX7HEIaNjYqh6LUzC4sa+tE1d88ELQo8JD2kcKRfznR7MkEC1R4= X-Gm-Message-State: AOJu0YycgdYfO+QE1XxPuvEbxacCkLX2E23AlQ5zn57jTmyPpiu6tgle v7JfGeBvtucIi+MvrahEXpJB5LbDIuhSYpR2KyS19BXMmamw8FP9uDsworGuVJxxchjtlSlyiec R6bNPGGAT1Ju941UHBhNKEmzzg5ZLjo+XHrhA X-Google-Smtp-Source: AGHT+IHSGsOvZormRUjChRBfHCy0njIO3YyoJFfM96/78++UPsUyWr7g001iVi2mXjSIPG08qRLkYSTJbqmI4kZ4JGA= X-Received: by 2002:a17:906:9814:b0:a70:c038:ed01 with SMTP id a640c23a62f3a-a7242c39c54mr413051766b.27.1719265418462; Mon, 24 Jun 2024 14:43:38 -0700 (PDT) MIME-Version: 1.0 References: <171923011608.1500238.3591002573732683639.stgit@firesoul> In-Reply-To: From: Yosry Ahmed Date: Mon, 24 Jun 2024 14:43:02 -0700 Message-ID: Subject: Re: [PATCH V2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Shakeel Butt Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: DD861140002 X-Stat-Signature: 7r8xczbxyjri1b74js6gmb47uih7dbkb X-Rspam-User: X-HE-Tag: 1719265420-221923 X-HE-Meta: U2FsdGVkX18M785yUL27jN1As7DSSl4cL93INApaVJmOvP+WfL8aS2XHLApiBtHhxB/F0X005TBspo9eybnVFULgKv1oQ1hSdCkzFXQ1uHDVkigutsRu5eGr+MSLyuoY7sXGPwM8C1TubZ12LkFKR3XupT5d3k+FIwwN8iJG9lQ5vAf8IjhUk6Z32nkrAlKYjWQO2VUBgHsBl+sRvbUv7fdd1HZpskAP3zp3cHoS78Zla4rXQ2xrW0bEEaWPrQeEkeCsV+HYnz9U9GNA1RXS13rhKPoXU4UOLg1abbjSV0oHGqzok0Leb5bdPpYKVenH4k4QkVKBLkoYgksVXDImd7wS0CSpMehMBbtUBidbeO9VUr24HGiLrrwfmmaZxUFxy1b+MThLeZSjiMXkZuwyPoXfR+AEqRXLwqOM1HR6H1Q45py4kl6CzhA5XI18j8pHOFg5S7tyxXTpMcFoBa5qUAm2fdExp2L8pKmtqQ+CkfGWjR1QdX3PCq84Y/sq/X8nL4OKUYwgGSwrR0tKBpaZu9ns5okgxwfFNZcxIA/ZoZBTzQWZqIjiOv/j1DoC93wIHQj0jF07tBsFTLuYqYZAl2WiCXUPFjrP52hxp/uWia6QJ3F75LDSOMnN+0dedkgxbL0SaV3kjfvCZxJVpGYBx6mTRb3s6IK/wOg3W7y8DOVtY4kh6uo8oZhBoAvHxKthPJhkwBlx1md7k09J7jqmBhJ5L0wYCK7853U9nLVWI6aiDZ7F+Ktq+/tXmgeYiuES1HCToFwIEyP4GoqQQVDrUiVE3LzqXJpvZLgXwzjYjjT+Ixuh7njtE+ovX3lKt4sO9UGzGKisCrIS9IHs7zuUknBWsxTx5sF2pSfbIvKGzTC0xvq5By5vwDXUF5rv2FUfiXj/dpQePHZSPU61ytNApDojRt0K5oY/kLO0qb6s3bBQtoXJksdBPaqDRVsiJkh3Fdvbna3G41jp54ltrsV diXq4LTf /tOVFqXNRtrPjzymG9gryg+i2n4mV8oMSVqIzUrz7h7YpZQE0TqCNNuG6XrORzdlifTgZUnZPPbF9mqXQl2ebxkdHQuHAIfic5w/KHXFK9Vrph7wstSstEoVmfaF8zKIvXNOcaEvOobXsurr/8kM9LjRuvm+wyfuPghOHXt6QU0FmS0S5XRmKQifSeuF0jTQTgzW24TpC0bEcCurkIiHAGpj2hdlgZ2pgJmL6OdnDw2OZAOE= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000510, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 1:18=E2=80=AFPM Shakeel Butt wrote: > > On Mon, Jun 24, 2024 at 12:37:30PM GMT, Yosry Ahmed wrote: > > On Mon, Jun 24, 2024 at 12:29=E2=80=AFPM Shakeel Butt wrote: > > > > > > On Mon, Jun 24, 2024 at 10:40:48AM GMT, Yosry Ahmed wrote: > > > > On Mon, Jun 24, 2024 at 10:32=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > > > On Mon, Jun 24, 2024 at 05:46:05AM GMT, Yosry Ahmed wrote: > > > > > > On Mon, Jun 24, 2024 at 4:55=E2=80=AFAM Jesper Dangaard Brouer = wrote: > > > > > > > > > > > [...] > > > > > > I am assuming this supersedes your other patch titled "[PATCH R= FC] > > > > > > cgroup/rstat: avoid thundering herd problem on root cgrp", so I= will > > > > > > only respond here. > > > > > > > > > > > > I have two comments: > > > > > > - There is no reason why this should be limited to the root cgr= oup. We > > > > > > can keep track of the cgroup being flushed, and use > > > > > > cgroup_is_descendant() to find out if the cgroup we want to flu= sh is a > > > > > > descendant of it. We can use a pointer and cmpxchg primitives i= nstead > > > > > > of the atomic here IIUC. > > > > > > > > > > > > - More importantly, I am not a fan of skipping the flush if the= re is > > > > > > an ongoing one. For all we know, the ongoing flush could have j= ust > > > > > > started and the stats have not been flushed yet. This is anothe= r > > > > > > example of non deterministic behavior that could be difficult t= o > > > > > > debug. > > > > > > > > > > Even with the flush, there will almost always per-cpu updates whi= ch will > > > > > be missed. This can not be fixed unless we block the stats update= rs as > > > > > well (which is not going to happen). So, we are already ok with t= his > > > > > level of non-determinism. Why skipping flushing would be worse? O= ne may > > > > > argue 'time window is smaller' but this still does not cap the am= ount of > > > > > updates. So, unless there is concrete data that this skipping flu= shing > > > > > is detrimental to the users of stats, I don't see an issue in the > > > > > presense of periodic flusher. > > > > > > > > As you mentioned, the updates that happen during the flush are > > > > unavoidable anyway, and the window is small. On the other hand, we > > > > should be able to maintain the current behavior that at least all t= he > > > > stat updates that happened *before* the call to cgroup_rstat_flush(= ) > > > > are flushed after the call. > > > > > > > > The main concern here is that the stats read *after* an event occur= s > > > > should reflect the system state at that time. For example, a proact= ive > > > > reclaimer reading the stats after writing to memory.reclaim should > > > > observe the system state after the reclaim operation happened. > > > > > > What about the in-kernel users like kswapd? I don't see any before or > > > after events for the in-kernel users. > > > > The example I can think of off the top of my head is the cache trim > > mode scenario I mentioned when discussing your patch (i.e. not > > realizing that file memory had already been reclaimed). > > Kswapd has some kind of cache trim failure mode where it decides to skip > cache trim heuristic. Also for global reclaim there are couple more > condition in play as well. I was mostly concerned about entering cache trim mode when we shouldn't, not vice versa, as I explained in the other thread. Anyway, I think the problem of missing stat updates of events is more pronounced with userspace reads. > > > There is also > > a heuristic in zswap that may writeback more (or less) pages that it > > should to the swap device if the stats are significantly stale. > > > > Is this the ratio of MEMCG_ZSWAP_B and MEMCG_ZSWAPPED in > zswap_shrinker_count()? There is already a target memcg flush in that > function and I don't expect root memcg flush from there. I was thinking of the generic approach I suggested, where we can avoid contending on the lock if the cgroup is a descendant of the cgroup being flushed, regardless of whether or not it's the root memcg. I think this would be more beneficial than just focusing on root flushes.