From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1882C2BD09 for ; Mon, 24 Jun 2024 17:41:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EB7BB6B038E; Mon, 24 Jun 2024 13:41:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E407A6B0391; Mon, 24 Jun 2024 13:41:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CE10B6B0393; Mon, 24 Jun 2024 13:41:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AD7E56B038E for ; Mon, 24 Jun 2024 13:41:28 -0400 (EDT) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 2CCBA1A129E for ; Mon, 24 Jun 2024 17:41:28 +0000 (UTC) X-FDA: 82266499056.18.B9D3FB1 Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) by imf16.hostedemail.com (Postfix) with ESMTP id 5100918000F for ; Mon, 24 Jun 2024 17:41:26 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=b4RVPtBS; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719250875; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=6Xy0P0JjsXZxZJicveAS1RvVzevvMoz85YH/ChWlqVE=; b=SG24YKU7n0ZK5KRe5UHxq5maEpTL7YvMDF0btZnI9AnPjLKA9gr1kXZ3BJRCezGd+BrsTa yfMA5Zf137ODNBHRbiTf6W7ds0nkz2IPX4kwO4cIxlG/9MwRZd10lwC2XL1+xHUJg6XRIQ 6sEdFHA6H8b52AFLxFfQXi5vXmFtsII= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=b4RVPtBS; spf=pass (imf16.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.43 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719250875; a=rsa-sha256; cv=none; b=iZP1SSuSQ9gIwhP9YcTKLxgmQA5Ij/4ZHkJQej0TXn8ketxHvpItoogYjsk2fVl/yG1dNv mVtx+VNT2o/WqnT9+lI2kmthuxlXMY3QjMirMPg46cvsJNehLYHxV/CAs3O0ICYFroM1uG 2mwVJnI0Wdoi3oDfYGOV0859annBFYM= Received: by mail-ej1-f43.google.com with SMTP id a640c23a62f3a-a6fe81a5838so221642966b.3 for ; Mon, 24 Jun 2024 10:41:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719250885; x=1719855685; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=6Xy0P0JjsXZxZJicveAS1RvVzevvMoz85YH/ChWlqVE=; b=b4RVPtBShBgWgYpGwTXvhmkYAE4qUkG+bbaWuWxkqnylG9WIyIo5e0c15avBtPj4kl TJ5s0EUt/+XLyhmuifPXGWUBlmlKtXhmlDamtuVBneG//PiG8oEuQbmXIZh7H2mVX4za pIo5uaM05a60wqU29P/p6lWGfv/POP8gReCdPMEpGqp+OL37DgTMKIuc4n8eS6baWga9 US4sJ4n/55LZkIHRZhJTTmsRfyCEBrez3daXQuVfeEKo3zN6KtH40Fas1XOvlYUjZxuj lim9pA8WzNCXKIhGiqd008TJ0LfE2Q3i47WKCr/Vqm0OE+CqZmz35kM1WhgNWAKwbqKh NQHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719250885; x=1719855685; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=6Xy0P0JjsXZxZJicveAS1RvVzevvMoz85YH/ChWlqVE=; b=Q+8z3DFxkMJ3GnCjeHxuKBd5UFtdeANQ0b7+vbBeENLL6NQWath1NZFoPNJ8Jp+mI8 HYY4cfUcNcNGcETIilhcH0VjTZxtpZno522czUHO/9rkxF+CuXiWkqcS7eoYUgfd206Y NL/V4XI/o41xL0XeuSDuqEkZZA8OvYUOchmFyjKEkCJdf7CnSBGlf5yMwc6O7LjDSAEg gU9AuVaU1PIxBNI2FcyiiosnUZqAvQRqAtP/WizGB3Qa/eomXLWlAC/22ACZfJ+Ba3zX x9ryg5e9lZLSbxe2m2AT/Ya8Yok+MbcT+KBpLQixB7u2guc2rABEO/+yPDpVy5kzgKTY y1yg== X-Forwarded-Encrypted: i=1; AJvYcCXWL3MCp78YhgKsiFMx244bWh5ywplGo+z3WhdrJIp5DDjn8/Mr++d2Qpzt6q2C8LfoLel9diMkSIbqjFdYvEyIYIc= X-Gm-Message-State: AOJu0YwdPAAxJ1BRhXOdL8JJjjGaGcY8UityaJV5dhxcSTmEBTGcoyAD hlDZi5rnUldRl4Ucea1+AwGfaWxl08YJER+/0Rp8+I1GjxeSGq0YCc8+LHdpTxvTyXKuBVFoSZs s2cb4bJNy7mJx/lSYvi6iM+snvmmXFBfiPa6L X-Google-Smtp-Source: AGHT+IGN0kwEH305qXkCZ+rKKYQEXLaxOppsohbq8yTWO5V/Fuld5DFcgw2j47k12u8a0vEpENBk57Mi5GklK0CSoKI= X-Received: by 2002:a17:907:c301:b0:a6f:5f5d:e924 with SMTP id a640c23a62f3a-a7245b4c9bcmr439831666b.6.1719250884154; Mon, 24 Jun 2024 10:41:24 -0700 (PDT) MIME-Version: 1.0 References: <171923011608.1500238.3591002573732683639.stgit@firesoul> In-Reply-To: From: Yosry Ahmed Date: Mon, 24 Jun 2024 10:40:48 -0700 Message-ID: Subject: Re: [PATCH V2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Shakeel Butt Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: tbhpephomtrbhefsmm5oon1whzn6rn7b X-Rspam-User: X-Rspamd-Queue-Id: 5100918000F X-Rspamd-Server: rspam02 X-HE-Tag: 1719250886-445198 X-HE-Meta: U2FsdGVkX18n4+VS5MGYWso4dz74HUKt5JujC4FV5cbpSks52EYAP9Npu/umDjssNGnzXwKsKgkDEdgevDtMIC5psDV2Q3NXkG90bt2oRudv7iKd0yiDZ0p3QsBTU+T8QdAxHmADmEX1SXReaRKe2zRgY04CWHs1lcnDmqj9P9QKSPLHLcktfK1r+PgsqlK8Lr1XO8Aj6V1D+OceuSw02T6V3ra5gcJBaSKDYqSGpa06Ru9f8ZFcTVs2lSaFQBwBWT1qXnfHWOVbb4VDQ+b6LmqoKCUT7A6HF9bJarb33gYwIqG/ZucwIJKvD3UYvajvcetpL104osBAJ9KS5oVuabvJpvKW/pWBNWTnGGC9H8wJCh1fMHEf2h5Gk1grndocEKsHn3ylcn9nvZQfkUtKGl0UIovUTU358yFaQwp4rvFKJuG4OlQZ9V0N/kHuuk8ww3keexUUFbjSD4o8roWP4wl8r1pVAqNDemwwDRmQTVKebAbetHgy9imvoxZSDjIWO3IP6qmiqOKDq/NJy/Pr6F9oHOdv33BE63Ra1MX778tKF4mo/3YlnJl42xVxON3bSOOAJFygHyUu9haTYDMWSZjUbzfdfulnMOSua0scotuudPQJvXjQLEy+qT4ypcPBnrA8C/EwIeqtXiKa4O2MWbrKCTD1BJf/uCM0ZIJvWf7m5ICG4fBZQlS1VajIPyCpXNJC7fDuV6vNZvjQoPakI4YcOo289geXjdHiKsc8u8e4jNSsHy7pZV3PZjAAqMgMvie4eCkSKjVPeZX069tfHACeQKc5dgtxybw/O5wdHEfbO0gu1WlXPGdl++rHgZJ6YiCEKGXTHg+grvKIa61nv0K/FZ6jlJZAtSeBJJRWL2C7rVZjoIC7s8Usrl2M8+gM/9YyOO40cTg9Gjr/KZVe1llUfOwCN9XGAWEO9st3Kci5vTJ8eUNCWdvzyHJbe1hfqLaKPR8HjxF/VwKumBQ Ar+F8jAP X+/v4q34lIY1Mr87aRjH/qvDkhhtVSa/BPb3OR2FVHNoaaGcO5Zq+6z+sl3/3pClceNJ4a49hCGXsYfzsluB9+1Q7OzejAtMFAUyh30d2LKxFKaggKtbFXYwDVeIW08Ow5pv0FnQOOrHOEGt1xIcSmDKT/siucfNuVxFGKRDjcb+CJceSU85Ssqe7VyhKqvErL2nsdf9mAX96Nt0oTcsWFZ75dkO15PDLrcQ8Ennbcv4/yiaCnOLgKIpR9vNZxocBm/C/ X-Bogosity: Ham, tests=bogofilter, spamicity=0.007394, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jun 24, 2024 at 10:32=E2=80=AFAM Shakeel Butt wrote: > > On Mon, Jun 24, 2024 at 05:46:05AM GMT, Yosry Ahmed wrote: > > On Mon, Jun 24, 2024 at 4:55=E2=80=AFAM Jesper Dangaard Brouer wrote: > > > [...] > > I am assuming this supersedes your other patch titled "[PATCH RFC] > > cgroup/rstat: avoid thundering herd problem on root cgrp", so I will > > only respond here. > > > > I have two comments: > > - There is no reason why this should be limited to the root cgroup. We > > can keep track of the cgroup being flushed, and use > > cgroup_is_descendant() to find out if the cgroup we want to flush is a > > descendant of it. We can use a pointer and cmpxchg primitives instead > > of the atomic here IIUC. > > > > - More importantly, I am not a fan of skipping the flush if there is > > an ongoing one. For all we know, the ongoing flush could have just > > started and the stats have not been flushed yet. This is another > > example of non deterministic behavior that could be difficult to > > debug. > > Even with the flush, there will almost always per-cpu updates which will > be missed. This can not be fixed unless we block the stats updaters as > well (which is not going to happen). So, we are already ok with this > level of non-determinism. Why skipping flushing would be worse? One may > argue 'time window is smaller' but this still does not cap the amount of > updates. So, unless there is concrete data that this skipping flushing > is detrimental to the users of stats, I don't see an issue in the > presense of periodic flusher. As you mentioned, the updates that happen during the flush are unavoidable anyway, and the window is small. On the other hand, we should be able to maintain the current behavior that at least all the stat updates that happened *before* the call to cgroup_rstat_flush() are flushed after the call. The main concern here is that the stats read *after* an event occurs should reflect the system state at that time. For example, a proactive reclaimer reading the stats after writing to memory.reclaim should observe the system state after the reclaim operation happened. Please see [1] for more details about why this is important, which was the rationale for removing stats_flush_ongoing in the first place. [1]https://lore.kernel.org/lkml/20231129032154.3710765-6-yosryahmed@google.= com/ > > > > > I tried a similar approach before where we sleep and wait for the > > ongoing flush to complete instead, without contending on the lock, > > using completions [1]. Although that patch has a lot of complexity, > > We can definitely add complexity but only if there are no simple good > enough mitigations. I agree that my patch was complicated. I am hoping we can have a simpler version here that just waits for ongoing flushers if the cgroup is a descendant of the cgroup already being flushed.