From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3CB4C2BD09 for ; Mon, 24 Jun 2024 20:18:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 56B286B036A; Mon, 24 Jun 2024 16:18:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 51A7D6B036B; Mon, 24 Jun 2024 16:18:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3E2B16B036C; Mon, 24 Jun 2024 16:18:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 213A46B036A for ; Mon, 24 Jun 2024 16:18:24 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 7F4F2A3D91 for ; Mon, 24 Jun 2024 20:18:23 +0000 (UTC) X-FDA: 82266894486.14.4E24A38 Received: from out-173.mta1.migadu.com (out-173.mta1.migadu.com [95.215.58.173]) by imf29.hostedemail.com (Postfix) with ESMTP id 9F2C612000C for ; Mon, 24 Jun 2024 20:18:18 +0000 (UTC) Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Ei00rqAk; spf=pass (imf29.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.173 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719260282; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=VRd3PmTY5UDQojim7KON/1YaVpEPKqpnbeP6DbFTTdQ=; b=eBWt4ff+9hY4td/2XrMScLfDV1mldf+Pkc/2UxsgHS+QiAuYB3BwbY1tGgMUC2u3hhXlWD dX4DDTbMkU0la6X+vJkSI+XwF1PyhHpAnk9kjMGLTPin5Nl4eAc8z5376kwLP0Y+D8lOCb 3hiI3OWmBPXC3SvPp/VRLkld2/aDEGo= ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=linux.dev header.s=key1 header.b=Ei00rqAk; spf=pass (imf29.hostedemail.com: domain of shakeel.butt@linux.dev designates 95.215.58.173 as permitted sender) smtp.mailfrom=shakeel.butt@linux.dev; dmarc=pass (policy=none) header.from=linux.dev ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719260282; a=rsa-sha256; cv=none; b=yiYRDUR5EwneUOjSBGMb3auonT+L2sBO7soOVyprzDRIQcH3JfnPV2Ie3DNgBVl14IXWyU A0kYaQkMLwJtYJvHr0onHRKn2kLTw/4pZXaO1Wbo5rzGw5r2gFn+a5fj4k9e+7JBGrFyE0 04qz709IiyEG7RzzSJ2hLdALvsyQAK0= X-Envelope-To: yosryahmed@google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux.dev; s=key1; t=1719260295; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=VRd3PmTY5UDQojim7KON/1YaVpEPKqpnbeP6DbFTTdQ=; b=Ei00rqAkiPH/30y5uW/S/EmogethEbiYnuAewFxlIUWu+NM8LqEEErboIcNNVu/1tXM4je jkO/otYygv8cn2Q+/VpPUqEVU5OQMNvUqzZELT9/Sju0Yb1cn7V+b+B72Q1CGsL1+MoySO bnxZwTpS+b5XslS41iLwlDohKacruX4= X-Envelope-To: hawk@kernel.org X-Envelope-To: tj@kernel.org X-Envelope-To: cgroups@vger.kernel.org X-Envelope-To: hannes@cmpxchg.org X-Envelope-To: lizefan.x@bytedance.com X-Envelope-To: longman@redhat.com X-Envelope-To: kernel-team@cloudflare.com X-Envelope-To: linux-mm@kvack.org X-Envelope-To: linux-kernel@vger.kernel.org Date: Mon, 24 Jun 2024 13:18:10 -0700 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. From: Shakeel Butt To: Yosry Ahmed Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH V2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes Message-ID: References: <171923011608.1500238.3591002573732683639.stgit@firesoul> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: X-Migadu-Flow: FLOW_OUT X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9F2C612000C X-Stat-Signature: zr6j7i16hgmypj47193rqampf54r5e5z X-Rspam-User: X-HE-Tag: 1719260298-3492 X-HE-Meta: U2FsdGVkX1+PqmkXfxtKTicz1SxbqlljfQYPv0ujS472pa0svevh1XWFtieNPTL/3ODlC0XzXUKnUF2ZncGH/DzxIpQuHekV1e6wqCCm+kXrkbGAMzdw3STFozWYQZI1KWZrPchjcxB+nvPUM+peFv0MCyQbWouE08sy/x2+fByaTmMV7JMviG8AGlA6rUKF9Xa5Ip4oqLLEzNbhw4CQy6tocbJkW0ctjkT8tDYyPjWXgm24tGGfQKV4RZD26bToKAyNq557yMZX/IdFUC5sDbtzKZ3Yz8VrnpJRznMrLdVE6JH1nnlPHhQlymEHBtTDOn1Hcm3IU6R4JBir965houZDILpWbJoa5VFZBVu2Pq1fIMZzrHXwBXG8thYKZNw0nhBG+XfAJGXLcVfV1fhQRn/53ktkrdI8DHorfe3gISeOZ9ZlFgVlMi3AFhp6+aTGzNEAIWzvEH3ndRi1/RKbgNldpo5i+jpycu3z/jpRdP7Kg94Tex5U4oCQ7tUIA2jQt5ewMbqnohYbPoxYMDP8QUUFvCwFFX7/aeTt6AI2DNc+aOjoKjsMjSg/Dy1M10UPwa44VQRm8PZdK448opeuC3ITZsEp9r2PaCPcQCmn2Ag/fdoJBG041m3KxauTCdTNhwqKJFOU16hKSm+uE1tA7OszLTQr/BbjPAmsODI6A6/ZPknqXVATJ5mjb6sYeiX2wqoczSSbOBTVjmbBcmMy4KetFP35M34jG0z3JtEFesBd7CeFf9L31cnrw64O/MHH7oeBtJjColmRWvd3cNgAwJJrle8RHzCwLNUTe+EA7rXwvUKM84ifgB6xGkB3dr3lQicCxzMW3HcTvZRQvs0YrnIcCq9gnNq5/8JtAsQgqynd0wrboYsvUmZHXjmMlNYwJ3PU4Vt+zZBLLUWtmgqe+pVxyF1fLAQy9/v1BD9r/aTFklmXotqvNjevYdL17lbA7VSLuOI2LG08d88K65O xrrzIUQa CUXfja5sxFJPlVQsiQXLPDXluHJCU7HI9oT+7hK9w/TuPp7PQukTySoo/CxnlQu5Xb9dZoAyGTACzx0oGJJ0aa7lRANZwzerbQOukYq94sYuBzD8no/Yh6wM3qIIHAPHd3SsMcFIoCFEnEHHts3p6o8vM70vsqEme9gTvlk92WALXmQxmQ9lE9NLK+3d6+eb0hcPQwS53u4kR6zw= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 12:37:30PM GMT, Yosry Ahmed wrote: > On Mon, Jun 24, 2024 at 12:29 PM Shakeel Butt wrote: > > > > On Mon, Jun 24, 2024 at 10:40:48AM GMT, Yosry Ahmed wrote: > > > On Mon, Jun 24, 2024 at 10:32 AM Shakeel Butt wrote: > > > > > > > > On Mon, Jun 24, 2024 at 05:46:05AM GMT, Yosry Ahmed wrote: > > > > > On Mon, Jun 24, 2024 at 4:55 AM Jesper Dangaard Brouer wrote: > > > > > > > > > [...] > > > > > I am assuming this supersedes your other patch titled "[PATCH RFC] > > > > > cgroup/rstat: avoid thundering herd problem on root cgrp", so I will > > > > > only respond here. > > > > > > > > > > I have two comments: > > > > > - There is no reason why this should be limited to the root cgroup. We > > > > > can keep track of the cgroup being flushed, and use > > > > > cgroup_is_descendant() to find out if the cgroup we want to flush is a > > > > > descendant of it. We can use a pointer and cmpxchg primitives instead > > > > > of the atomic here IIUC. > > > > > > > > > > - More importantly, I am not a fan of skipping the flush if there is > > > > > an ongoing one. For all we know, the ongoing flush could have just > > > > > started and the stats have not been flushed yet. This is another > > > > > example of non deterministic behavior that could be difficult to > > > > > debug. > > > > > > > > Even with the flush, there will almost always per-cpu updates which will > > > > be missed. This can not be fixed unless we block the stats updaters as > > > > well (which is not going to happen). So, we are already ok with this > > > > level of non-determinism. Why skipping flushing would be worse? One may > > > > argue 'time window is smaller' but this still does not cap the amount of > > > > updates. So, unless there is concrete data that this skipping flushing > > > > is detrimental to the users of stats, I don't see an issue in the > > > > presense of periodic flusher. > > > > > > As you mentioned, the updates that happen during the flush are > > > unavoidable anyway, and the window is small. On the other hand, we > > > should be able to maintain the current behavior that at least all the > > > stat updates that happened *before* the call to cgroup_rstat_flush() > > > are flushed after the call. > > > > > > The main concern here is that the stats read *after* an event occurs > > > should reflect the system state at that time. For example, a proactive > > > reclaimer reading the stats after writing to memory.reclaim should > > > observe the system state after the reclaim operation happened. > > > > What about the in-kernel users like kswapd? I don't see any before or > > after events for the in-kernel users. > > The example I can think of off the top of my head is the cache trim > mode scenario I mentioned when discussing your patch (i.e. not > realizing that file memory had already been reclaimed). Kswapd has some kind of cache trim failure mode where it decides to skip cache trim heuristic. Also for global reclaim there are couple more condition in play as well. > There is also > a heuristic in zswap that may writeback more (or less) pages that it > should to the swap device if the stats are significantly stale. > Is this the ratio of MEMCG_ZSWAP_B and MEMCG_ZSWAPPED in zswap_shrinker_count()? There is already a target memcg flush in that function and I don't expect root memcg flush from there. > I did not take a closer look to find more examples, but I think we > need to respect this condition at least for userspace readers.