From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B4763EE49A4 for ; Tue, 22 Aug 2023 15:43:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1AE3C94002C; Tue, 22 Aug 2023 11:43:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1389394001B; Tue, 22 Aug 2023 11:43:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F40FC94002C; Tue, 22 Aug 2023 11:43:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id E0AE394001B for ; Tue, 22 Aug 2023 11:43:53 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B54BC1602E7 for ; Tue, 22 Aug 2023 15:43:53 +0000 (UTC) X-FDA: 81152161146.12.E40C630 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf17.hostedemail.com (Postfix) with ESMTP id E21C140014 for ; Tue, 22 Aug 2023 15:43:51 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=sEQshxKo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692719032; a=rsa-sha256; cv=none; b=vyeCAwNqmT2pSamV1LGpx/1jvcFLokbXwt69gyOIN5cEY96eAFa/3UROEhQsKy6U+MlSRS 7iWc8v5ItDesXWqCmBerR0v3bPfoV00+EV3EvTzKofjelPmOtJ+xr9W7CX2Yckt89XQ+2j zySfvks3Xwloy35u9leH3qDnggGYFik= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=sEQshxKo; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692719032; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aHPwQRfzvw0kSRyeNQxBzdbaew4IqY6cIfInQbasrDQ=; b=6fPf+ess2O830nKl9JTeUfd2tI5YjKd4Dlw5Aom3Xq7nqTJbYNkB1tD3J/PoZjsg309tHB Y4xy9Zx07g0YnTT210jxnjE+oX9AOtR4JSJSPZMumyGupluoVfo+FBZHf0WAM/50bC7DF0 bWGKegPd3y8+MJ72n4ytl4Wax/IEeew= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-523bf06f7f8so5578112a12.1 for ; Tue, 22 Aug 2023 08:43:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692719030; x=1693323830; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=aHPwQRfzvw0kSRyeNQxBzdbaew4IqY6cIfInQbasrDQ=; b=sEQshxKo1c53nQhGLbHCCdc+S2tBe70IsqZWr7uYwwq9MoMEMzlTYnO6hFGblId8HR DzzXIr0Ng5kZ8/r2nmhyqI70Cd91DmIvGtUBCxTGKWkDWaAHPTwgsIY9Q8aM7BIQoe1N AFJXDgT/DK9PdVByD6peAL9817rZ2/DecmzYDeKW4k8VYQFxiZvcB1tNj6kVv7l55IG/ R2ZX4wdjG0RXpwDlzSlIiiT17KUinChOvJXjhixzKBGtmb4QrfWMjNdC8QhiwnuOOQ5t vhTR2MV2OtnpVDusGoen77jQIoKs6WbqROpZbkfQGnaroO2feltIB5XZ+bsfnT0OjlyD MueQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692719030; x=1693323830; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=aHPwQRfzvw0kSRyeNQxBzdbaew4IqY6cIfInQbasrDQ=; b=FN0eV+wCMKh4b9Rm401/U4L4BYZP9JwsJyHK5w6uAvXXEoeHqKtYmD4TI78iNSkkha ko+0b8Rv3GEc/6QTEEcGOW7/Mlh7DWG/SVSXHTcR9cd0IUiYExbjx89s8PVhx2PxVrnf O54ia67NPdBQIXZvwVERlU6iXA3IC+jCYyaQIdbbPjcAQJA2GLzcYZBkKFUXrn/D+PrE QgcEyzwA48ZzOipRNkLLR2uILCGUZX6uYesHHyPX9sAGX/91j9Sd3mAXad9LitO1MKga /Z3JuG0RapQJ4d+/ai5Nl7hPppxp5o/Bcb5TSjTTSZlbRRlHoS6DF/5URtJ182YPRHZe Ld8g== X-Gm-Message-State: AOJu0YyqKHmhADaSAkq1aJyaAEeDk/bi0/6MJBPstP8Y2uC5N2MiMKc5 sWR2qa0bLmVx6RcGWyyl4HaXFWjWugd01QhYwREpjw== X-Google-Smtp-Source: AGHT+IHS97sSNWyiXgCNScdPoxBn0oRhZd31+T5kmk9JQHR1a3PFHWfxZOba5fcmlpUo/3o5Ke4/hjD+6EC5oavPmwY= X-Received: by 2002:a17:907:2c57:b0:98e:419b:4cc6 with SMTP id hf23-20020a1709072c5700b0098e419b4cc6mr7061038ejc.70.1692719030006; Tue, 22 Aug 2023 08:43:50 -0700 (PDT) MIME-Version: 1.0 References: <20230821205458.1764662-1-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 22 Aug 2023 08:43:13 -0700 Message-ID: Subject: Re: [PATCH 0/3] memcg: non-unified flushing for userspace stats To: =?UTF-8?Q?Michal_Koutn=C3=BD?= Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: E21C140014 X-Stat-Signature: yzk7arxaniot7s9bwi1qwupnixzzw9ps X-HE-Tag: 1692719031-641770 X-HE-Meta: U2FsdGVkX18ay09cZoZQ9ubqQhGTi5LfaKP1TKtWIRgS551zS2V8ORiApHemjgZqIPlog5vmi5s0myD7dL7sA2ouufJVWoYH99da/GZWBay5L74bOEmYxVMTlPCsN/AUu2WNlB/oWayvOwa2ztb73icCk2PRM4a0Ujtyn4pWqWGFu0l3wlQvtklu2IV51zK6iCtmhYsvFp3SgyfK4qk6df/r/T+cFfDZCQP2qQsqsZ6l+rUvBCYM4qlAob4UmNbGn2q3H+BVdfEpe6AwzMaB4VKmOXIZkeLIhiLZXodkIUDQFLwaYdFpAGnVWIDBtRGqdcKIPrfHqmEz89dtJSwM7I/S6E9Pmxu4ZZ5EFto+dYP0sDLZB+Qg/+4P87T1GvX7bc/9+86PQ4rbxRSIF5iA6qo0Ax70MdZXHlyFTcnq4eY04tj9E2H1BhzqvMqZlXvmBpWnfRqUCXUz8PA3hQKI2Xa8ULfW9Pj2sNcqzPOGraoQPCimTEW73MCUD8pm93mj9hmp7nkVAfFOXVwFuhpK7Xz7BUiGK5BX+osk7IaLcZnXZddHn20XeSNiQc3x6uT893LDdP+sGR1O7hgDhmuORlU/wKDbCPZT7jgO+Al6FpHFpUUcSQNazq1z/CmtU3c20Vd4qBGsqJY8eQd9vVK5A2au28Ombt/C3Mtv/JyAeWHBaBI3i233L4uYGTbMh1arB7zcb4QQBSPKpkzOaNlxu/L+FpitBIop75BhoSySy5vvSrx6tZs9lfN1n7ZgZpgW4a2qKazmVuoMtf/c3nfhXoL4rUNFs7T6Y+au4g3fywsoWVbPsrM5oDtXgsnX8fPhVsRhLDFbtbLhjqQPDnFJYrHHyXCg3ieg1H/JhFeinQs1xBnVTA1QxkTmhGXkkO4FZNVikmYpzVfQOjutEU8seKbvYXt+RLyqNMKTfw6kJ0i7uApafTWi6HdXIgjfwIv3hNrWtbtCyya/StWQJfs Oi26w5Xb TyEsmE/vY/qfwpxXfd/ck3IsltqiuhoclMwdTVHpvcPw7AuKVnadWGYE1Je3teGXPcI/lLmBnccZu4Af5HCk3X92+U+g9U0BMqnE+vH0qGu/3nuKi9f+F3GUpq+U5m+PnI0rpq9nd4DNCb9dFNhEFbQi52eWumFMG2SLchVODLq/Ty92fWXIiMMhGdogk3gfhHFionxj4H/oXFO9v9zZGayupZxuLQUMPmcEcQNYwU5eHTHTgMfFwGjywsYNWM1k9S76cvTjVQSzoQTQIbpUuYHTqAHVrhJjghtyZ32VUCWLOHLKZsjEqswl3yQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 22, 2023 at 6:01=E2=80=AFAM Michal Koutn=C3=BD wrote: > > Hello. > > On Mon, Aug 21, 2023 at 08:54:55PM +0000, Yosry Ahmed wrote: > > For userspace reads, unified flushing leads to non-deterministic stats > > staleness and reading cost. > > I only skimed previous threads but I don't remember if it was resolved: > a) periodic flushing was too much spaced for user space readers (i.e. 2s > delay is too much [1]), > b) periodic flushing didn't catch up (i.e. full tree flush can > occassionaly take more than 2s) leading to extra staleness? The idea is that having stats anywhere between 0-2 seconds stale is inconsistent, especially when compared to other values (such as memory.usage_in_bytes), which can give an inconsistent and stale view of the system. For some readers, 2s is too spaced (we have readers that read every second). A time-only bound is scary because on large systems a lot can happen in a second. There will always be a delay anyway, but ideally we want to minimize it. I think 2s is also not a strict bound (e.g. flushing is taking a lot of time, a flush started but the cgroup you care about hasn't been flushed yet, etc). There is also the problem of variable cost of reading. > > [1] Assuming that nr_cpus*MEMCG_CHARGE_BATCH error bound is also too > much for userspace readers, correct? I can't tell for sure to be honest, but given the continuously increased number of cpus on modern systems, and the fact that nr_cpus*MEMCG_CHARGE_BATCH can be localized in one cgroup or spread across the hierarchy, I think it's better if we drop it for userspace reads if possible. > > > The cost of userspace reads are now determinstic, and depend on the > > size of the subtree being read. This should fix both the *sometimes* > > expensive reads (due to flushing the entire tree) and occasional > > staless (due to skipping flushing). > > This is nice, thanks to the atomic removal in the commit 0a2dc6ac3329 > ("cgroup: remove cgroup_rstat_flush_atomic()"). I think the smaller > chunks with yielding could be universally better (last words :-). I was hoping we can remove unified flushing completely, but stress testing with hundreds of concurrent reclaimers shows a lot of regression. Maybe it doesn't matter in practice, but I don't want to pull that trigger :) FWIW, with unified flushing we are getting great concurrency for in-kernel flushers like reclaim or refault, but at the expense of stats staleness. I really wonder what hidden cost we are paying due to the stale stats. I would imagine any problems that manifest from this would be difficult to tie back to the stale stats. > > Thanks, > Michal >