From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE47BC001DE for ; Wed, 16 Aug 2023 02:20:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C154494003B; Tue, 15 Aug 2023 22:20:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BC5978D0001; Tue, 15 Aug 2023 22:20:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A660394003B; Tue, 15 Aug 2023 22:20:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 951858D0001 for ; Tue, 15 Aug 2023 22:20:02 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 4A7A7B2548 for ; Wed, 16 Aug 2023 02:20:02 +0000 (UTC) X-FDA: 81128362644.03.A6ED424 Received: from mail-ej1-f54.google.com (mail-ej1-f54.google.com [209.85.218.54]) by imf07.hostedemail.com (Postfix) with ESMTP id 84AF24000D for ; Wed, 16 Aug 2023 02:20:00 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=0QZSUunP; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1692152400; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=V37BsPFrQfflT/FozBg7eajhwQW33aYWS46wbu2Ypkg=; b=ZOBjMLWxGZ2uOoU9v9vAWFMGyF4w6sUP2GiBYTp6HUfEcQLMXpxrzkSyITHXPuYjWb6TlH n4NrIzFpvG/PxcbiPeBFGeZXmi51v7yP8bIT+d7dtcNcaLkvpXo7AO83R8XTKRDCyfy6ln s41MIvivSZe7eNZdfkJJEL+Bye4GaO8= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=0QZSUunP; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf07.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.54 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1692152400; a=rsa-sha256; cv=none; b=Rz7A9c7KABs7t8ba1uDev/wnJiSzV21M8LqZfq8GNq2MEEOirwAUT/mSwH8/977LQxbdP8 V7UPmz2vIB9SsTbKZeRzWVxyc/IyImsYoIt22jpyk5b6Wmegkw3Y+PZfU2Q4L1LHL9Ri86 dVSO7w2iWXmTE2VgY8CsE9ZLqqeDC1o= Received: by mail-ej1-f54.google.com with SMTP id a640c23a62f3a-99c0cb7285fso743683166b.0 for ; Tue, 15 Aug 2023 19:20:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1692152399; x=1692757199; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=V37BsPFrQfflT/FozBg7eajhwQW33aYWS46wbu2Ypkg=; b=0QZSUunPFCDBu7ePJO+ZUAJzTXgigsi5VNYuL05PmuqaCgP+zWcsDOoK4uBnj+Gg1B xiu4JLzGNSuShm+gtMpqjUgpqeK36gFrPFC8krbioG1dOUspb6EXb8NiXABp6ifpZvJL nBWvhdyDy9Cjrc6JGod54iQ1PqCr9i3QhL/7EAY9AgeqxMV1VMzo01SQtoKrD56q6GEc iljoQajqNzWVfbgnMusqhIDqOZrGH6GWVvoplEdL5ZrGaEs37M9t4k1R5zgdAMLawa1A 7qFUznQjxAotmuX9LT0Xu8BuoXq3POeGNydhcfsOZ/7mojILhRLmA8/S2vhzLXnFN7qA p1xA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692152399; x=1692757199; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=V37BsPFrQfflT/FozBg7eajhwQW33aYWS46wbu2Ypkg=; b=ejD8FsjAXzAIYmAJQBhxw5nQNN+EZGhiMjJWQB25jJgQhZu+fRASmksWDwajR0U+Le drR/wIT7Rnm926aKDP3x3Xy0lfSls+4jsywGgP3L04iGUAF0G57mGle1PtCQCj95SHXp FLiPMz8MUiuXD2i8AS3q4jj+EOIlqVxZIIUOWl3xrrFLQlmmFk/xBU62KU54E9bV40jT KUTDDNgnvG1y0gh9dzdlaWlYbiIG4BbqcuDkVRMMNyveSgOmZVuC5vHAp9+8uBQggQIC 8dO8peAgNGISgsYU8Q6L7Ha2MJaHDzJIYog5Z2mm1dju0uHLBegoIvJXyTpxOPKiWFrX kPaA== X-Gm-Message-State: AOJu0YysR/zL/YS7lUn//JOy2DfAg7ivJSYwVUqiJiGPgrVH9b7HW/IL OKTeGANs1BFrWW/TqiMkkojc48l9+fKhP9yUn7eeVA== X-Google-Smtp-Source: AGHT+IGju8euBER6YklL00LW/Ec3/FON2TX1HNNygdv92iS7R2wg8oOA36FqPIeuOss8xS3y2AIWArv/piWWChe07I4= X-Received: by 2002:a17:906:9bd4:b0:988:f1ec:7400 with SMTP id de20-20020a1709069bd400b00988f1ec7400mr323021ejc.36.1692152398846; Tue, 15 Aug 2023 19:19:58 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Tue, 15 Aug 2023 19:19:20 -0700 Message-ID: Subject: Re: [PATCH] mm: memcg: provide accurate stats for userspace reads To: Shakeel Butt Cc: Tejun Heo , Michal Hocko , Johannes Weiner , Roman Gushchin , Andrew Morton , Muchun Song , cgroups@vger.kernel.org, linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ivan Babrou Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Stat-Signature: iitehbdsus8ke7b7pfrpiedheoitceq3 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 84AF24000D X-HE-Tag: 1692152400-42446 X-HE-Meta: U2FsdGVkX19jixsq23RtEsVU/wIXofX4zvk8zEckV3I1OqAuGk0bdtU8Y+MlEhqPrBweAd+Ux83b8Xq6ca0Hp2Dfx6v1NoEYcgyZDt+2EHYKfNM1RYQfUSYtaXI55WOecrx3OMf5tLWQNiVD/WFY9jNvbD2ctn9pGAxX4dVaaeRQWTzoTsR+Jp58txd+CdoRYZqaK6v4fX3cclPFsWez87XFXcHB4UhJqKQSv3Cy+ftMVuda3rSnsOYwiGZb7D2g2nEt4hVtHuTfu2W0aRGSfU6Am7xTPyyJFxkQ6Ll+t34WzTKvdDKKHZFJbNiFYXlfvhr+fz0Mv2q8tOwcCE++e3jpilDbRROd1IC8zqHH48/cDG6I6lt16G56rU5/NGAydKUMbhUd5WoEYKPrEF3r+5XWqD9xQXihk9SXUScmT+PeLwrL8QME9fvWol2FZ6ohBPypclWsj8JZNsrfemw+ZAKn3mez5OEJea1CVql7SAIlx+6NULuqp8rCjXTRJsnQfAqv4JhH6H5ScBiHvBdeDpprE2anFTzGPwhCFEbMPTJs28SaAd4cZ6H2ti9CJO1YQK/KNFd789SE26Hkt4p5WOePm35vR4sTjwSyYZBNJIVRPKeBmH5ZVHVhji0dXdfseUpVgpnCjxmE3Ol4NBGFhJXR/sgPxdKXbHS4doF2R5xBLFMAVJXhHczftS6YeegCniRPELK3AEQ9cUwWXSpIcyE2DKMeNg+ZljVsAlTVQHXE0p69/7FWrKlF9+4OJau0CbseanfRC4s7DEEwz6ms36Cpf2S06etj7PwYNd6a5oNXFHsGzQ3+zTsP3Mf4K1XcbzXo8OAs5nG3oTOhePyFDcyw9a4VbgC2K4V/6u5+pIO1hWsPgtdsHxG0I1t/bWZ4thYEmO8Wz6H/ARJSHTbff6qY1lryqVvfTpMNzUOGb/PyKg/nqBucmdgMCBIepuc96EqGb+LHLxR5V53s7kH mmT8XMnF TiUx/7+wxxT81PCTdd0zZvzCXji8fqh1Ho18b/w6wk/f2t24UQu0T3JEXXG5BTYizkfix2mYskv9GcAXsxjyn3NenPnsJAbosPB7H5RGjlG8PbFphUXZnccxXnzQVMXECd6sv7+XVVI4/szd77q67VeQ23r7V/Ay9C1gfTIlmvlTMg9VYzqPji7a8Hfqg16eJEreEgypDO6kNfMp6JylvJWsZuOvuOUzH1B25Xpoqvy6PVMh1FfXnYj6Zaw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Aug 15, 2023 at 6:14=E2=80=AFPM Shakeel Butt = wrote: > > On Tue, Aug 15, 2023 at 5:29=E2=80=AFPM Yosry Ahmed wrote: > > > [...] > > > > > > I thought we already reached the decision on how to proceed here. Let > > > me summarize what I think we should do: > > > > > > 1. Completely remove the sync flush from stat files read from userspa= ce. > > > 2. Provide a separate way/interface to explicitly flush stats for > > > users who want more accurate stats and can pay the cost. This is > > > similar to the stat_refresh interface. > > > 3. Keep the 2 sec periodic stats flusher. > > > > I think this solution is suboptimal to be honest, I think we can do bet= ter. > > > > With recent improvements to spinlocks/mutexes, and flushers becoming > > sleepable, I think a better solution would be to remove unified > > flushing and let everyone only flush the subtree they care about. Sync > > flushing becomes much better (unless you're flushing root ofc), and > > concurrent flushing wouldn't cause too many problems (ideally no > > thundering herd, and rstat lock can be dropped at cpu boundaries in > > cgroup_rstat_flush_locked()). > > > > If we do this, stat reads can be much faster as Ivan demonstrated with > > his patch that only flushes the cgroup being read, and we do not > > sacrifice accuracy as we never skip flushing. We also do not need a > > separate interface for explicit refresh. > > > > In all cases, we need to keep the 2 sec periodic flusher. What we need > > to figure out if we remove unified flushing is: > > > > 1. Handling stats_flush_threshold. > > 2. Handling flush_next_time. > > > > Both of these are global now, and will need to be adapted to > > non-unified non-global flushing. > > The only thing we are disagreeing on is (1) the complete removal of > sync flush and an explicit flush interface versus (2) keep doing the > sync flush of the subtree. > > To me (1) seems more optimal particularly for the server use-case > where a node controller reads stats of root and as well as cgroups of > a couple of top levels (we actually do this internally). Doing flush > once explicitly and then reading the stats for all such cgroups seems > better to me. The problem in (1) is that first of all it's a behavioral change, we start having explicit staleness in the stats, and userspace needs to adapt by explicitly requesting a flush. A node controller can be enlightened to do so, but on a system with a lot of cgroups, if you flush once explicitly and iterate through all cgroups, the flush will be stale by the time you reach the last cgroup. Keep in mind there are also users that read their own stats, figuring out which users need to flush explicitly vs. read cached stats is a problem. Taking a step back, the total work that needs to be done does not change with (2). A node controller iterating cgroups and reading their stats will do the same amount of flushing, it will just be distributed across multiple read syscalls, so shorter intervals in kernel space. There are also in-kernel flushers (e.g. reclaim and dirty throttling) that will benefit from (2) by reading more accurate stats without having to flush the entire tree. The behavior is currently indeterministic, you may get fresh or stale stats, you may flush one cgroup or 100 cgroups. I think with (2) we make less compromises in terms of accuracy and determinism, and it's a less disruptive change to userspace.