From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADC33C10DCE for ; Mon, 4 Dec 2023 23:31:33 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F07066B0078; Mon, 4 Dec 2023 18:31:32 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id EB7B66B007B; Mon, 4 Dec 2023 18:31:32 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D57736B007D; Mon, 4 Dec 2023 18:31:32 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id C27B66B0078 for ; Mon, 4 Dec 2023 18:31:32 -0500 (EST) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 9CEA21403FB for ; Mon, 4 Dec 2023 23:31:32 +0000 (UTC) X-FDA: 81530734824.25.045F75E Received: from mail-pl1-f171.google.com (mail-pl1-f171.google.com [209.85.214.171]) by imf09.hostedemail.com (Postfix) with ESMTP id BDD2C140035 for ; Mon, 4 Dec 2023 23:31:30 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=EzrtW5O5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=shakeelb@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1701732690; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=0XB2Hd768QRSmXeqfgdQPhQTeGNeuuAaGIf9egd5wTU=; b=J91z86+aEUe/euWA2P1tLF5whS+9hnXi1omKSANM6YRsK5RDvVYqUFcOuXDzcb39oP1N9J 8uNIrueEDACmJpwNPtKh+g6koqFf33F8JbB9hiBd0vb8U/ToLxi+agrXHkuZAHleUsV0x4 29QIucKMzM9XTlI17CPv8RJCtNhy7ro= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=EzrtW5O5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of shakeelb@google.com designates 209.85.214.171 as permitted sender) smtp.mailfrom=shakeelb@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1701732690; a=rsa-sha256; cv=none; b=1p78wg9gSgD/qxRNeEpOfGEr+eZ5xV45rHBxAK9mqqADRbdbQSfVTuyxI/W3NtH6zhj++g 30jaAifV2JVByzURFD7/BIoPy/gwPwm6BBJiAKf7PWO58OVVxPix1dh8tBRk85IBpfWkvh hqGLdmXVztpzAS46Iyv2B6ao1qMkhE8= Received: by mail-pl1-f171.google.com with SMTP id d9443c01a7336-1cc79f73e58so27625ad.1 for ; Mon, 04 Dec 2023 15:31:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1701732689; x=1702337489; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=0XB2Hd768QRSmXeqfgdQPhQTeGNeuuAaGIf9egd5wTU=; b=EzrtW5O5/RImKMTznVOOv4s8wvtuqgBDMi7c2zvFxN9W8rUmQ0tw2s+oqpAWkoSRc9 AJS+/PMSXTgd3XXO4XocPr2wx7WSoplw2r4BbBY6eG+HGEHTxwNe89Nch2HVhKe7Z0NS 8V0PPTmJNZUcKLFun6bhiaKHT9jncNbq3WDcA3SvKF/6HUBqxGdYXW0kAXOjDN0tQQ8q A7WPApAKPuE+0zTdTsgN4LAh3KRtAS41ZSoLhSm42yVxB74i4zYeGU8lx9sDo3TEBMCC 4xglohICCunmE0yydJoS2JG730fItpArWgCnAgxIFs4hu0eNBxhxWMQ0pQdz5tW8QOmB Xokg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1701732689; x=1702337489; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=0XB2Hd768QRSmXeqfgdQPhQTeGNeuuAaGIf9egd5wTU=; b=wKF3HHUW9WxFqDPvc8GNV3e3ZTGQivynBIuw0oYkp/cx9erPJB8abjVj0WyaS12gmK L2GhDmSbogCyEn9i+szAUUUIgL1Ufyf0U6/rFbrtinywd1Y5DwVP70IHLkHj/rmtTosa nBp3T5gcOF7RGasPX8RcNILrdtJSK4KdgrA+WaJTF/0+bXyGtNWKck0qkHIc2g7iTUil 6q8QYhJS5W7ZCcewQCxfsx4y6iC3YQ1pkyI8mnJY3Mf0+9DeMdhKVZqQjkejJh147QC3 Yh3r3mBiQq/RwOQyeVv/QqnArelavn0j8Odv5tTOtUKaTWFxQkarXTEqppjaZqC8Hj+0 ZmOg== X-Gm-Message-State: AOJu0Yw8ysP7MoYI+3NCeI19+8UdDEDXw7v9iYk7lw6/UKDghTME84ZJ ZiEJNrBa82x0MkFRiMZtEs7F4hK9WYhFTD1WcF/kEg== X-Google-Smtp-Source: AGHT+IFmdkC9ch/3eLVBOiGZh8DIxkkGecScpXwg2x20sFeSN6PWtiNEFXSwFv8bhB6BZ0tDdbS2HkOyV0dkECqyPJk= X-Received: by 2002:a17:903:3247:b0:1cf:e100:a99c with SMTP id ji7-20020a170903324700b001cfe100a99cmr723356plb.6.1701732689357; Mon, 04 Dec 2023 15:31:29 -0800 (PST) MIME-Version: 1.0 References: <20231129032154.3710765-1-yosryahmed@google.com> <20231129032154.3710765-6-yosryahmed@google.com> <20231202083129.3pmds2cddy765szr@google.com> In-Reply-To: From: Shakeel Butt Date: Mon, 4 Dec 2023 15:31:17 -0800 Message-ID: Subject: Re: [mm-unstable v4 5/5] mm: memcg: restore subtree stats flushing To: Yosry Ahmed Cc: Andrew Morton , Johannes Weiner , Michal Hocko , Roman Gushchin , Muchun Song , Ivan Babrou , Tejun Heo , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Waiman Long , kernel-team@cloudflare.com, Wei Xu , Greg Thelen , Domenico Cerasuolo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: BDD2C140035 X-Stat-Signature: xpaa64pos6jjtkdb84mpypqz9ztegaf1 X-Rspam-User: X-HE-Tag: 1701732690-250908 X-HE-Meta: U2FsdGVkX19a7YuAFrKo6arIvxDHoHnoc8vsYLn6t+cOpfqdn4q4BEuoO0aYKa3AqIW9/aH6Y1xxPgG52WS9SYO4u9/scvfbBZ8n/OoUJ//H9CBNgFMdlgnMt3GWGnBqGUf5VXpHHRx0Xorj8IC+gEbhJSXjn8waOS9oScEiZHRa01ifxeDPouBucgyd5zVgafRTrhUNDlMoNtkamrITeQ4ROiXrjS7qcD6WeqJdvrZMVqI0CKkPM5w2DQaCQAaC6W4Xcqg9Xz4ric4FodvFaDE/Ryw9WoxkOcsrV0MLxeaguNcw4kZRGgnZcSBLEWgWmLdaYGqc7IJ61Tk7CqJZRofuaBKbweC8s0BLBQshE7A2QB64blmaSKzZH7vlQsiGU6CVoF+C8Pp+6JUkcXnqTKO/xlTOh3XvgShPd3AmCpRNCwADD1bqojhQyFObNDsJDx/zjWHQOyWtE/WmIlXNIwM6D0Lv2pXEXrSnkRSQsLGwguQqBN663xcCjrNzDJ6IrwTeAVOAuoeVNba4rr79HtQIIMMZf7XWejBD7Sl58genMXOGctBRuuWNfbAyZXWIDczihtp2qEORfXlv6xRaQ6noVbqjkH444NGG8bCmlYxT8VJaiKg/aq1IvzMKW1Ad4SQlMlDvW+k0CWgI2Ew+/BhstChrMm7tAYfl4vbd7qSMaEdMpk77IP9BILBLfWWC9vhATj1ZgXJroKJc84sgxQ2GV5Z5OxoTstUGXXlfA3gTKcBe+jPvTcKwbUKNDAYV8mdTzDUO5Fl1sCZG6bunAHmh5jQzMaEUlWPEl7g9soDOCwz+UgmEVEjxC9KMd1qnVTNPo2z7wIbJ0sCp3tikBdvS4uZoxHqgsBLxk0bPBNA3gRzMFCdjVcj00ngXZ2hstk/DfVlb1pIk/zr/pupiC4VG7DnAfGoVLVgh2oyeRS/j5x2sBbjQ0Mdfqd7QH+b2FD9/Y/UXbusdTZZVWmH 8RNQG8rL Oei15DYCiYB05N84sVAJ8XqJFLAH23oGV9bj28yTH6MfhvBmpPNvZGV3bI1AvJ7k3+U/zoPi0mvPRWE1UUdTPMdmMx714EZD2Da3eHBLanRYmG16KmiDQvLFehft+U070pdA9Wp9w8TwZXqeZ4UFtj1GDMupmsjWqkjixLz9CVEuYQh6UW+B/dwny7c/iDVeGqIMrSWdytopMp50vdqW7VmHFrQ7qfSmmtO4lZuqZZmwUSvk8VhYt2GWspLyhGSJS3KDv/os1zL5rG1Yj6+M9x/mIqeP6/WkWnT3mlhOIoJHRPHTmOwKIYSnmUqe6MXimxiIMaPqM2xUVf7vZsDiA+dtenDDJURZ2QmGEPY4Wp5sJVoQHtdIBxmUkSQ== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000092, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Dec 4, 2023 at 1:38=E2=80=AFPM Yosry Ahmed = wrote: > > On Mon, Dec 4, 2023 at 12:12=E2=80=AFPM Yosry Ahmed wrote: > > > > On Sat, Dec 2, 2023 at 12:31=E2=80=AFAM Shakeel Butt wrote: > > > > > > On Wed, Nov 29, 2023 at 03:21:53AM +0000, Yosry Ahmed wrote: > > > [...] > > > > +void mem_cgroup_flush_stats(struct mem_cgroup *memcg) > > > > { > > > > - if (memcg_should_flush_stats(root_mem_cgroup)) > > > > - do_flush_stats(); > > > > + static DEFINE_MUTEX(memcg_stats_flush_mutex); > > > > + > > > > + if (mem_cgroup_disabled()) > > > > + return; > > > > + > > > > + if (!memcg) > > > > + memcg =3D root_mem_cgroup; > > > > + > > > > + if (memcg_should_flush_stats(memcg)) { > > > > + mutex_lock(&memcg_stats_flush_mutex); > > > > > > What's the point of this mutex now? What is it providing? I understan= d > > > we can not try_lock here due to targeted flushing. Why not just let t= he > > > global rstat serialize the flushes? Actually this mutex can cause > > > latency hiccups as the mutex owner can get resched during flush and t= hen > > > no one can flush for a potentially long time. > > > > I was hoping this was clear from the commit message and code comments, > > but apparently I was wrong, sorry. Let me give more context. > > > > In previous versions and/or series, the mutex was only used with > > flushes from userspace to guard in-kernel flushers against high > > contention from userspace. Later on, I kept the mutex for all memcg > > flushers for the following reasons: > > > > (a) Allow waiters to sleep: > > Unlike other flushers, the memcg flushing path can see a lot of > > concurrency. The mutex avoids having a lot of CPUs spinning (e.g. > > concurrent reclaimers) by allowing waiters to sleep. > > > > (b) Check the threshold under lock but before calling cgroup_rstat_flus= h(): > > The calls to cgroup_rstat_flush() are not very cheap even if there's > > nothing to flush, as we still need to iterate all CPUs. If flushers > > contend directly on the rstat lock, overlapping flushes will > > unnecessarily do the percpu iteration once they hold the lock. With > > the mutex, they will check the threshold again once they hold the > > mutex. > > > > (c) Protect non-memcg flushers from contention from memcg flushers. > > This is not as strong of an argument as protecting in-kernel flushers > > from userspace flushers. > > > > There has been discussions before about changing the rstat lock itself > > to be a mutex, which would resolve (a), but there are concerns about > > priority inversions if a low priority task holds the mutex and gets > > preempted, as well as the amount of time the rstat lock holder keeps > > the lock for: > > https://lore.kernel.org/lkml/ZO48h7c9qwQxEPPA@slm.duckdns.org/ > > > > I agree about possible hiccups due to the inner lock being dropped > > while the mutex is held. Running a synthetic test with high > > concurrency between reclaimers (in-kernel flushers) and stats readers > > show no material performance difference with or without the mutex. > > Maybe things cancel out, or don't really matter in practice. > > > > I would prefer to keep the current code as I think (a) and (b) could > > cause problems in the future, and the current form of the code (with > > the mutex) has already seen mileage with production workloads. > > Correction: The priority inversion is possible on the memcg side due > to the mutex in this patch. Also, for point (a), the spinners will > eventually sleep once they hold the lock and hit the first CPU > boundary -- because of the lock dropping and cond_resched(). So > eventually, all spinners should be able to sleep, although it will be > a while until they do. With the mutex, they all sleep from the > beginning. Point (b) still holds though. > > I am slightly inclined to keep the mutex but I can send a small fixlet > to remove it if others think otherwise. > > Shakeel, Wei, any preferences? My preference is to avoid the issue we know we see in production alot i.e. priority inversion. In future if you see issues with spinning then you can come up with the lockless flush mechanism at that time.