From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82A7EC71153 for ; Mon, 28 Aug 2023 16:15:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A74F78E0024; Mon, 28 Aug 2023 12:15:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A24708E001A; Mon, 28 Aug 2023 12:15:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8EDDA8E0024; Mon, 28 Aug 2023 12:15:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 7EB028E001A for ; Mon, 28 Aug 2023 12:15:44 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 57AD71C94BF for ; Mon, 28 Aug 2023 16:15:44 +0000 (UTC) X-FDA: 81174014208.24.2282E6B Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf22.hostedemail.com (Postfix) with ESMTP id 7459CC0021 for ; Mon, 28 Aug 2023 16:15:42 +0000 (UTC) Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=z9hy+WFb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1693239342; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=afndblcX2lRDuYfUApnBPqsi2v65E21M+ezFScH4A9I=; b=7Itk0raoHzfJaJQ8PlH4LcpzXsvygluZPLcT5InEBfHPrdut1Br8hxveejXHEVklKf/9BP YnJnSEyf4vLt5aeS0n3qhy54aeST59DoPqd3HAJO/Ih7XsvkxrFfIQQ74Panm0VW+P1ZNu ioDSHW/keFKOLv6l8DTl6YAhRsjC7Zs= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=z9hy+WFb; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf22.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1693239342; a=rsa-sha256; cv=none; b=vlNuxZZm6JgCxAoEzvW1eXaM/9gKB8saJBH33xPZ5O7MlmgDKL+CsvjxN/VqHV27OelfcL XWNibz3wtiy202pOiQx8eAEUM9JLa/R9B/l2F82yf18q3+ObSLt+1aQuM1hPuSp78vUBCa LhlH5bwqUW5VMP7FOXGdsBPr+p9lh/4= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-9a2185bd83cso443229666b.0 for ; Mon, 28 Aug 2023 09:15:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1693239341; x=1693844141; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=afndblcX2lRDuYfUApnBPqsi2v65E21M+ezFScH4A9I=; b=z9hy+WFbV3mqJF5+e6G+9u318/zbHUxgPF7+6+RgQPVzS2F5Y8Bz0h2q8ekbD5EBSf 3emje4+dmIqG2khkmT56tknzwt2pJK9F5mUEBE1sRZqjj4NoKg1RIMx1tMK/OatzB47d myhHtFxWRnoQX7/L26zjMrpg118ISAL9x21YlZJg+AuLaeVErL1X/bQheZkJUhNp+o26 W0+oN1sA/CLo6UgAX6spklh9+MAMtebMFNvwI0vI801GXMOn1XvB5k/7E0EWQFbAkupP 0xLP96B2UJ+NjPvLpb1tylYlizIBbIEOFofWuZi6xRkuy1C2JgwPw6fASZILjuvpW2M3 zaeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693239341; x=1693844141; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=afndblcX2lRDuYfUApnBPqsi2v65E21M+ezFScH4A9I=; b=UhuH4EwZClSjLxhYn88PgEljrc8V3toVkZuzcNmjxJjwmOQgf38sI0yeGAqmkHvnZo fQlSDy+4rwn+0d5ZCzQOCUVr1hpf9M3PzXqL7gY4qV+RJR0QrL8nZOLCl8Styw8iim6o NWssgT7DwQBzD4q8DYpFIWr7qO3B05+iJsJVj1L065Y7vpAb9DTI9r/VqMxTzf9o0ptu HkeyY9ScI91Oomcy9Qo6pKSLQMD0OvJ/WMJ4r9BP/zN/qpnY3Vtf6xGELlcdH24tmQgu wOinhm6O0PQslXI0ACyQk5WbMVLRFFqLpBx7eQFJ1qus8kRu9E3a+tbFSYyhRU/Ohjo+ aEig== X-Gm-Message-State: AOJu0Yy4H5rikACjlzFY4YqwQUc6nCdtibZgHLr9UcTw+fBH2INPuXdi dRfN5hviSj3FmHvYH+dT/PLjEkwRNke4VSU+GwkCtA== X-Google-Smtp-Source: AGHT+IHTrCXkoHR92O+FxDNQrmaptlu1BrNxnDGg+mTPtYzWw8N3OL4JEbvvAvwBdOsk+vHpp15nQIkSs29QnAyLlxA= X-Received: by 2002:a17:907:7758:b0:9a1:f10d:9751 with SMTP id kx24-20020a170907775800b009a1f10d9751mr11005838ejc.23.1693239340768; Mon, 28 Aug 2023 09:15:40 -0700 (PDT) MIME-Version: 1.0 References: <20230821205458.1764662-4-yosryahmed@google.com> In-Reply-To: From: Yosry Ahmed Date: Mon, 28 Aug 2023 09:15:04 -0700 Message-ID: Subject: Re: [PATCH 3/3] mm: memcg: use non-unified stats flushing for userspace reads To: Michal Hocko Cc: Andrew Morton , Johannes Weiner , Roman Gushchin , Shakeel Butt , Muchun Song , Ivan Babrou , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 7459CC0021 X-Stat-Signature: 4j1mh6x6kxwwehxdy1fh3ej1wqwacyud X-Rspam-User: X-HE-Tag: 1693239342-829097 X-HE-Meta: U2FsdGVkX18+QF3VKIro7LPXPirdnFnTvmch7BU5DimQj+Je176FagtZgA6eOfzG5V3KzeDzp2S67X0WBInW1xth5okb8evWZz8uDtYxZ2OEZ5KF3xB1x/r3gnaAr2WDW7QsiPtmt852KHlRyAZMRAONJVWfAWgR2L8lpG1ZbI9ORhkP2cRw8OWqE+RSGrDu6b9Sji2aeCA1KDkAUUZTXYbSM503T5UpxqFmWBMn1d74NG7jaRcEcX3IAIbNHh4S00LmX7xeZFTIO7tTKQtKyTqjG5T/hKAautA+/Moalem+faxlXN0X5ikgzV1cw/LKry3NTxk994OW4UIY8SMIwOx01OBoieIQFhW/VTyaLqhAQqP965GYHE2HPsfZDhrm704mwEfI39Zklpww6Am+WwcaD7+VAtDzTM3FDBRORXEXuvdBus3C+xsBdGbDKU3OJZoMXxGk5pM2erljNmvm4PDJs25a4Id8DP9EdiNeRALR9bm/nogzTrl0UEduhzsOSG6X2FRxfGK6Xpz4xfUx5Lfthc4QyPKqeq23CCDzoSXPncB+hwgco4xD3NXC0FOqGXVxHXFK3RTOu2RJF9khN85uY1DJwu9MTow9Gv+CDZ/clYsdxRv1LTc5IjcDjFfGRX1HFJE//jPR4RwzYSjy9OvgGTpP721OZp1tDv65P30kcFbZvcANBcz55I/KWwwnw7x/QcnFiCg0LqEQizrQvskio4c6AQDpxHQrhKC6npluLAEThCcnK9lOfh4Rbc5lHY++UST5kr0g2cksVKc23Jf9P5fw9VBdeKWtJ8ci4pzfJke3uetaFpT9bE9YbhP8TwToc+jAQ0D5+qOpad6Wh7dNvemllCdVSEVzkiHvQTCtiK+GJSTvf2B9DJBBRAven1BoJWm+iCuQQj9OvLpnD1wpI6XVLfoIrSNozOS7KW6fBJynnax+KmD9smgAGllHJwuZ80jT9zy49VEvd0p LD9HS+aO dJl2knv3pMjuqrsrg/vXcMUYYe13bpTEEaHCU6FuVzbOYkWX2aLf5FTt3VuRsDum4LmFlFcfP3xekxnYfU5k+XaasXEA/JvuBhvgT9rfQjF9hey2CeP5yM6GJoRdfApzGlhw73jOcjx2pJx4HISbKZzij/tz6hMLwHeV4C9MjG7Kx+6lLescIORu66dQpdXKmpP0LhVcA+ZjGYknm7iWkjKa9CngxALtAW5zbzlVqZGxvCXi3f3Ovwk8/CDRuj915m0KUiyILmG2xAKv8pO/8Lpq6t76Ap6VLlZL2 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Aug 28, 2023 at 8:47=E2=80=AFAM Michal Hocko wrot= e: > > Done my homework and studied the rstat code more (sorry should have done > that earlier). > > On Fri 25-08-23 08:14:54, Yosry Ahmed wrote: > [...] > > I guess what I am trying to say is, breaking down that lock is a major > > surgery that might require re-designing or re-implementing some parts > > of rstat. I would be extremely happy to be proven wrong. If we can > > break down that lock then there is no need for unified flushing even > > for in-kernel contexts, and we can all live happily ever after with > > cheap(ish) and accurate stats flushing. > > Yes, this seems like a big change and also over complicating the whole > thing. I am not sure this is worth it. > > > I really hope we can move forward with the problems at hand (sometimes > > reads are expensive, sometimes reads are stale), and not block fixing > > them until we can come up with an alternative to that global lock > > (unless, of course, there is a simpler way of doing that). > > Well, I really have to say that I do not like the notion that reading > stats is unpredictable. This just makes it really hard to use. If > the precision is to be sarificed then this should be preferable over > potentially high global lock contention. We already have that model in > place of /proc/vmstat (configurable timeout for flusher and a way to > flush explicitly). I appreciate you would like to have a better > precision but as you have explored the locking is really hard to get rid > of here. Reading the stats *is* unpredictable today. In terms of accuracy/staleness and cost. Avoiding the flush entirely on the read path will surely make the cost very stable and cheap, but will make accuracy even less predictable. > > So from my POV I would prefer to avoid flushing from the stats reading > path and implement force flushing by writing to stat file. If the 2s > flushing interval is considered to coarse I would be OK to allow setting > it from userspace. This way this would be more in line with /proc/vmstat > which seems to be working quite well. > > If this is not accaptable or deemed a wrong approach long term then it > would be good to reonsider the current cgroup_rstat_lock at least. > Either by turning it into mutex or by dropping the yielding code which > can severly affect the worst case latency AFAIU. Honestly I think it's better if we do it the other way around. We make flushing on the stats reading path non-unified and deterministic. That model also exists and is used for cpu.stat. If we find a problem with the locking being held from userspace, we can then remove flushing from the read path and add interface(s) to configure the periodic flusher and do a force flush. I would like to avoid introducing additional interfaces and configuration knobs unless it's necessary. Also, if we remove the flush entirely the cost will become really cheap. We will have a hard time reversing that in the future if we want to change the implementation. IOW, moving forward with this change seems much more reversible than adopting the /proc/vmstat model. If using a mutex will make things better, we can do that now. It doesn't introduce performance issues in my testing. My only concern is someone sleeping or getting preempted while holding the mutex, so I would prefer disabling preemption while we flush if that doesn't cause problems. Thanks!