From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C38C7C4167B for ; Mon, 4 Dec 2023 23:49:41 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5E9C76B0089; Mon, 4 Dec 2023 18:49:41 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 59A546B008A; Mon, 4 Dec 2023 18:49:41 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 488C56B008C; Mon, 4 Dec 2023 18:49:41 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 37D226B0089 for ; Mon, 4 Dec 2023 18:49:41 -0500 (EST) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 01738140245 for ; Mon, 4 Dec 2023 23:49:40 +0000 (UTC) X-FDA: 81530780562.13.DDE32B4 Received: from mail-wm1-f45.google.com (mail-wm1-f45.google.com [209.85.128.45]) by imf11.hostedemail.com (Postfix) with ESMTP id 2B34B40012 for ; Mon, 4 Dec 2023 23:49:38 +0000 (UTC) Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zYtP7SDn; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701733779; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=w3cSwqrmkX4JsajZqjZTMkfGrxdPyLxfH/2QNfJnXUU=; b=EtUOLG4mLnRxTqyhZUFmsdM8Y3Fxzh57s+H6hpOF+aILuXSKljjTXrvvw3kHeOyYKmeUKC rWNNJIdORDVt6p/sqlKgMQdTmZEbhs4zFLNhOS2LDNbTj4U9+lm+YYIgX4EBdSNAx3gUmW xk7VHkQk+JfHo0aedP0zi5BGb5/hi8c= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=zYtP7SDn; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of yosryahmed@google.com designates 209.85.128.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701733779; a=rsa-sha256; cv=none; b=B71t2zV8VjPgzDK/Zw4y/jmgDNulot7EVVcMco0REtekoM6VNugPm2uP2SIbJ2PV89rFz3 W3vgF88wELrwGhYo2/+ManKQ89v6miBipkPVjEtVmjUk7FgTrwsAOux6MYKQ0Hac88uYsO fdWjIl/YZE5NB5MUj1wBinNJbz6GUhc= Received: by mail-wm1-f45.google.com with SMTP id 5b1f17b1804b1-40c0a0d068bso16933985e9.3 for ; Mon, 04 Dec 2023 15:49:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701733778; x=1702338578; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=w3cSwqrmkX4JsajZqjZTMkfGrxdPyLxfH/2QNfJnXUU=; b=zYtP7SDncfSRXLxkdnZMycLgcx7QYcej0z6bcUVSpvuLhruH+VT5eSKe9mL4pWo/Wk ImpBAqmzgC143PxcPMJldJADItOTYNA6CyxabzjlM//MF7vBmC+Z7VrydnfMYpcdJQmT HGU30e1eK5iU9JOeGPnZokvBh81Ic36lN6qI+ZWsP1qnj+oYdBjEUmuzddWf2+eQkLVn DEYlTV445zpqDra1D6LXoQenNOU9/yGEVnUo3RXQhkp1zmzZk6VmkPQUvBC1x7UUIx2G X0hi0LKitHsDp5MVgdSCAVZn/5OJIbfPLeYawvtJqDYuwCxhWswbrrCVBjyCqtuiFwFp sK1Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701733778; x=1702338578; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=w3cSwqrmkX4JsajZqjZTMkfGrxdPyLxfH/2QNfJnXUU=; b=ceO8EKAQtwKtf101Tm2lqmcDzmDCopWAWJy8ATdkq8YqubIVd33jQl7iJfQQ7HjV+M +YxM2YhiAl+i2CIoPBFhL7LIaIeGHx3+Wr1F8LiwXSGAN0ftxXbrlK3BqKfwUZ5GieM1 WBHZCqykLIbKAZ+z3CvDyVxHO4ejp20o7QoI59upiPDkBDPf7L2oI1dvtYmrN/2UPmkJ upbShKgJsai1WVViNMioi5Zs0cJXVA+NLsyJYzt5ctw2ndkQ0rXF4KBv2s3ZYh6SLHmG 48NQB6DXauilyW3Cho4ehsUk4i4aCxefZY8ChF834Xyn5ukK4apD+CFRmwTZIEoVu6hd J/9w== X-Gm-Message-State: AOJu0YxOwtsObNCUIWj/6XFGb3LNNiQkRLTJBRf85/6kCGWvY2LXvPka zFeHfq3tD0insOLfEevxfx/8MnEU6SbMHyYCdHA95Q== X-Google-Smtp-Source: AGHT+IFJVixn9IRyghYbQ/chviwEMl5ahVWnpLTtJrnlI+8MTwnrFk0sy0B4DOMO2QGdJID6Mz7b7izmi65WnmeEdl4= X-Received: by 2002:a05:600c:310c:b0:404:4b6f:d70d with SMTP id g12-20020a05600c310c00b004044b6fd70dmr3255223wmo.17.1701733777666; Mon, 04 Dec 2023 15:49:37 -0800 (PST) MIME-Version: 1.0 References: <20231129032154.3710765-1-yosryahmed@google.com> <20231129032154.3710765-6-yosryahmed@google.com> <20231202083129.3pmds2cddy765szr@google.com> In-Reply-To: From: Yosry Ahmed Date: Mon, 4 Dec 2023 15:49:01 -0800 Message-ID: Subject: Re: [mm-unstable v4 5/5] mm: memcg: restore subtree stats flushing To: Wei Xu Cc: Shakeel Butt , Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Ivan Babrou , Tejun Heo , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Waiman Long , kernel-team@cloudflare.com, Greg Thelen , Domenico Cerasuolo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 2B34B40012 X-Stat-Signature: frhcnx9kczeecoytkaut48e16f4hbuix X-Rspam-User: X-HE-Tag: 1701733778-900263 X-HE-Meta: U2FsdGVkX18hjaAa8biiOvOFXwXVSvsc7PVNGibsDFixhoc6rAB1OsG6aF/s4jsHNkGvqEaLYpdajzl67TcuLt/wwuFXlT4p33H1DMlYoPqf7SdePqUhSU5sw6d5OfzKmR3xo4SStDIXBnAv/WSXxZrPrFeGB+q2JMJKP25pEaaCvPUtEI3nIqJVtgg1Ef8XGcQAJfbM8wUHVfBaxD7Prro5XxkhU1CkxUTJdPWZrRHSZdJ49RU3mBSBnGDopIn/MsqJ5R9uZlU1upHfPQcmiSREnmNg9SMSshUZiDKryLmuddFtWazdVaWAYowtJBmWb1MPQQW7voYtjXbIA0y2Lis5U96UmzicHYiuipEsJMHCj0EECd5Z/OvW4/A+1YfcMDtHYqjPoVzPJrvjcSVH6KJZ73j1Ke8HT4G8B5sf56Oiu6+pDmYJXejLf5Agblk00xTnEU3aHOo+fvEGfRlxX/a9J+0R+c+giHnSogdGRuLo+qgANWyEXF0yJtqrBXsnl8HbB9ZKAlpmGdtJWq7TVE6NaEBpqrRVK1e7pnzTAyG1B/TxEcGvcaulD8TDqARopyjvWeGaOs7qmwgZF9XMqhTBDdFZIKIg+Aa45Kk6p4cquLY8xAlwT4sngtkBWawPYlgY72PQFOLyWz22iUtyGxXz6VbET23aFOSwvPls6A6973myxbqbZk/jruYZdIKK5vHpq4sIdEYSOOEoZ6VfhxDnM8Syh0Uu9Hoc2ikrtgA4OB4rtpna2XP6WPhhwTEBTZ43UOYbgD0QKTPGjEA/QTNCEtr/R4ubMF+TgxW4dmAsrCmiadLilsn6Vy+ALP9AO4sZvk/w1GFrbr7UqfrZf9GWs7OVFOOWCDvL6MoluO/V2gOc1wgu/ck+FBIjkvIS7Ipu3/rBsox13gM5df3YajKAjWa4CFGuXDbnFDUvY9mold/CU/+lH4LhlFbuNPZoFaBvjK5rdv38Nb8Rgdn /Q5LGh+T Obbso3l5W4JXM3vAK0qgDKuu4Z0N00j5m+dYmIdxD8Y0pWUiNSRvseJ3BqTjXnJMaNYEomJ8aU/g2ngDo/J6cXnWNAQl6WcfUSQtzmBSVHw+iFkpHic9zFFHYjJqaRDuTIfGritOOptqYrc3Dx3R3xUoc2m+FJLVARVuEoAZTZMqm5AuPa8mXSi2xIi8f655Dm4PmOrT0oc8QRGAGsZ6isdqojN+VmQ2pqhl/rAP23RgChJAv7fyVU5AhzLiYbL7Z12w2rHP/oAY95cXoOXtgnr4B1Zpfv7Th8WAgUHVU1IoKYyrd+Jbf1YVKAWsoaTxl+4vIoORRpD9xRK09qx36AcOePNkMicBx0JAQAmPZXFgmDBgru18sgWQgKw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000016, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 4, 2023 at 3:46=E2=80=AFPM Wei Xu wrote: > > On Mon, Dec 4, 2023 at 3:31=E2=80=AFPM Shakeel Butt = wrote: > > > > On Mon, Dec 4, 2023 at 1:38=E2=80=AFPM Yosry Ahmed wrote: > > > > > > On Mon, Dec 4, 2023 at 12:12=E2=80=AFPM Yosry Ahmed wrote: > > > > > > > > On Sat, Dec 2, 2023 at 12:31=E2=80=AFAM Shakeel Butt wrote: > > > > > > > > > > On Wed, Nov 29, 2023 at 03:21:53AM +0000, Yosry Ahmed wrote: > > > > > [...] > > > > > > +void mem_cgroup_flush_stats(struct mem_cgroup *memcg) > > > > > > { > > > > > > - if (memcg_should_flush_stats(root_mem_cgroup)) > > > > > > - do_flush_stats(); > > > > > > + static DEFINE_MUTEX(memcg_stats_flush_mutex); > > > > > > + > > > > > > + if (mem_cgroup_disabled()) > > > > > > + return; > > > > > > + > > > > > > + if (!memcg) > > > > > > + memcg =3D root_mem_cgroup; > > > > > > + > > > > > > + if (memcg_should_flush_stats(memcg)) { > > > > > > + mutex_lock(&memcg_stats_flush_mutex); > > > > > > > > > > What's the point of this mutex now? What is it providing? I under= stand > > > > > we can not try_lock here due to targeted flushing. Why not just l= et the > > > > > global rstat serialize the flushes? Actually this mutex can cause > > > > > latency hiccups as the mutex owner can get resched during flush a= nd then > > > > > no one can flush for a potentially long time. > > > > > > > > I was hoping this was clear from the commit message and code commen= ts, > > > > but apparently I was wrong, sorry. Let me give more context. > > > > > > > > In previous versions and/or series, the mutex was only used with > > > > flushes from userspace to guard in-kernel flushers against high > > > > contention from userspace. Later on, I kept the mutex for all memcg > > > > flushers for the following reasons: > > > > > > > > (a) Allow waiters to sleep: > > > > Unlike other flushers, the memcg flushing path can see a lot of > > > > concurrency. The mutex avoids having a lot of CPUs spinning (e.g. > > > > concurrent reclaimers) by allowing waiters to sleep. > > > > > > > > (b) Check the threshold under lock but before calling cgroup_rstat_= flush(): > > > > The calls to cgroup_rstat_flush() are not very cheap even if there'= s > > > > nothing to flush, as we still need to iterate all CPUs. If flushers > > > > contend directly on the rstat lock, overlapping flushes will > > > > unnecessarily do the percpu iteration once they hold the lock. With > > > > the mutex, they will check the threshold again once they hold the > > > > mutex. > > > > > > > > (c) Protect non-memcg flushers from contention from memcg flushers. > > > > This is not as strong of an argument as protecting in-kernel flushe= rs > > > > from userspace flushers. > > > > > > > > There has been discussions before about changing the rstat lock its= elf > > > > to be a mutex, which would resolve (a), but there are concerns abou= t > > > > priority inversions if a low priority task holds the mutex and gets > > > > preempted, as well as the amount of time the rstat lock holder keep= s > > > > the lock for: > > > > https://lore.kernel.org/lkml/ZO48h7c9qwQxEPPA@slm.duckdns.org/ > > > > > > > > I agree about possible hiccups due to the inner lock being dropped > > > > while the mutex is held. Running a synthetic test with high > > > > concurrency between reclaimers (in-kernel flushers) and stats reade= rs > > > > show no material performance difference with or without the mutex. > > > > Maybe things cancel out, or don't really matter in practice. > > > > > > > > I would prefer to keep the current code as I think (a) and (b) coul= d > > > > cause problems in the future, and the current form of the code (wit= h > > > > the mutex) has already seen mileage with production workloads. > > > > > > Correction: The priority inversion is possible on the memcg side due > > > to the mutex in this patch. Also, for point (a), the spinners will > > > eventually sleep once they hold the lock and hit the first CPU > > > boundary -- because of the lock dropping and cond_resched(). So > > > eventually, all spinners should be able to sleep, although it will be > > > a while until they do. With the mutex, they all sleep from the > > > beginning. Point (b) still holds though. > > > > > > I am slightly inclined to keep the mutex but I can send a small fixle= t > > > to remove it if others think otherwise. > > > > > > Shakeel, Wei, any preferences? > > > > My preference is to avoid the issue we know we see in production alot > > i.e. priority inversion. > > > > In future if you see issues with spinning then you can come up with > > the lockless flush mechanism at that time. > > Given that the synthetic high concurrency test doesn't show material > performance difference between the mutex and non-mutex versions, I > agree that the mutex can be taken out from this patch set (one less > global mutex to worry about). Thanks Wei and Shakeel for your input. Andrew, could you please squash in the fixlet below and remove the paragraph starting with "Add a mutex to.." from the commit message? >From 19af26e01f93cbf0806d75a234b78e48c1ce9d80 Mon Sep 17 00:00:00 2001 From: Yosry Ahmed Date: Mon, 4 Dec 2023 23:43:29 +0000 Subject: [PATCH] mm: memcg: remove stats flushing mutex The mutex was intended to make the waiters sleep instead of spin, and such that we can check the update thresholds again after acquiring the mutex. However, the mutex has a risk of priority inversion, especially since the underlying rstat lock can de dropped while the mutex is held. Synthetic testing with high concurrency of flushers shows no regressions without the mutex, so remove it. Suggested-by: Shakeel Butt Signed-off-by: Yosry Ahmed --- mm/memcontrol.c | 11 ++--------- 1 file changed, 2 insertions(+), 9 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 5d300318bf18a..0563625767349 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -749,21 +749,14 @@ static void do_flush_stats(struct mem_cgroup *memcg) */ void mem_cgroup_flush_stats(struct mem_cgroup *memcg) { - static DEFINE_MUTEX(memcg_stats_flush_mutex); - if (mem_cgroup_disabled()) return; if (!memcg) memcg =3D root_mem_cgroup; - if (memcg_should_flush_stats(memcg)) { - mutex_lock(&memcg_stats_flush_mutex); - /* Check again after locking, another flush may have occurr= ed */ - if (memcg_should_flush_stats(memcg)) - do_flush_stats(memcg); - mutex_unlock(&memcg_stats_flush_mutex); - } + if (memcg_should_flush_stats(memcg)) + do_flush_stats(memcg); } void mem_cgroup_flush_stats_ratelimited(struct mem_cgroup *memcg) -- 2.43.0.rc2.451.g8631bc7472-goog