From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E43E7C3064D for ; Tue, 2 Jul 2024 12:00:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 83F4A6B009E; Tue, 2 Jul 2024 08:00:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7ED5D6B009F; Tue, 2 Jul 2024 08:00:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B54D6B00A0; Tue, 2 Jul 2024 08:00:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 4E8796B009E for ; Tue, 2 Jul 2024 08:00:46 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E36AB4212E for ; Tue, 2 Jul 2024 12:00:45 +0000 (UTC) X-FDA: 82294670850.14.4DD74FA Received: from mail-ej1-f44.google.com (mail-ej1-f44.google.com [209.85.218.44]) by imf17.hostedemail.com (Postfix) with ESMTP id 9D1994004E for ; Tue, 2 Jul 2024 12:00:43 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Vm2Qv2xr; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1719921613; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EZ7jeVzIv0vrTPCXNsfT+SHNwN0Ej4JGm3GWkjnRfu4=; b=0qGoRqLeyqkkNGHTkr9+UJaioAZ88mQOOtvjQWWQAOxRmdg2bTbTYhJBBabCw9Jj+Szuy+ 8MhMALFKTaixWzIRn7H/mDEQjQMCoor89ekxF6BO1IciVDuXLwb7M6YRQid9mTunkwVfc5 W/mFUKiaeMnos7S5aylA9HF3gnCjqH4= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=Vm2Qv2xr; spf=pass (imf17.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.44 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1719921613; a=rsa-sha256; cv=none; b=it4aci8aAhvL9USAcy+uCTMTW91EMuyM8h/nWixz80XFrL6HMkKy/7yHkMiaFXPapyl6U7 weeHVNqQoC3Hfug4AY7DtYk3C6CUuIEW8+W4QlMpFfHDpdIbful1tfiTQXNCIo5W7nW/iI NXUL87n6OAyf5aYqFUl+q93eH8qFNSU= Received: by mail-ej1-f44.google.com with SMTP id a640c23a62f3a-a7527afa23cso249351066b.2 for ; Tue, 02 Jul 2024 05:00:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1719921642; x=1720526442; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=EZ7jeVzIv0vrTPCXNsfT+SHNwN0Ej4JGm3GWkjnRfu4=; b=Vm2Qv2xrx8ZHkCAwAp1J6e1DyRAs4+va0udpv8wRHQvTGYv+7Y5wez4Gpg/TccQI2M 60SE76TAlaolNsOIOaQuAGng1bZsybSG/xBYwQyS/FjfPS2QK1qN6K65yvD01CnGdsKM RgPFNKWUEuiENWtuEAlTjWJoDb7biPmk2qoa3BXNHEF7kRx9DnDoM/QQe2K7cfC2Uc95 XsNc8NWO/fGKn9ZZgWGhHY4BtJlunXD2EwpwKdX/3MxR25dRuMXh3Ralvsq1FVJBdDPQ ZuSqKHjQULKpaZZYGrXtNXUKe0yXlf6lyU+/LqpsSWTMbdp73oX2XvTdChWbsaiuOYBC HHbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1719921642; x=1720526442; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=EZ7jeVzIv0vrTPCXNsfT+SHNwN0Ej4JGm3GWkjnRfu4=; b=AOpR7MNhK6tJe/3n6LDVi7EM0b8ElhJnQRILxU7Pdqox2rFwtev5f1A0qz9gFoC7Ff XjpAaeBIbaEx0oM2iXW91PKE2OxcTegqI0TrgKUHaLTiOX0Yxl1o+3KEB9+i6ak3OEJ4 r+HOBAT4Oq6urc10KcjXGODunyW3fdzqLfQ2PQdSiHHkYUCAIccqzbZSmNHDsiJu+U32 N0FYjgqHmrKTj6QxrIif4ihZA+puhy7Y00f9GNMCf6Fwfw8QN2OgROGSwuV5jGrq+QUG i+w+XhGR+EB16yFyV4/mbYRuwZxoLLC4/1CnVigigrvn9inBPWWJQEUeGi8vgbvJ7RxX uyOg== X-Forwarded-Encrypted: i=1; AJvYcCX+fkpPBU/uPw0HrrD5duHGMLuGLslKkNKcUcrTveCqLbB2qxJ2K59UumvyjJcHEplv4rDIZPtGMahvhzZA+ULkr+8= X-Gm-Message-State: AOJu0YwfvRUoT6XQO3cyIkgfb73ACwiNn+t+NdyWOeXwn5TsHVLc5uph A472dFEqcIWyMOyl9odyikxOWm+9WOqztLxu3CTwz2qt7+J5R0SJWu5+XUGWJDlkO6mPEAPVC/K 7SXYOu2WPn4yAfKEcgRMgQS5ntgWUSXddWcRg X-Google-Smtp-Source: AGHT+IFPDU49eK6Qn1vTi7quTRJvUyfAtJV+lSnJs1GLv+/TXU+UgqL1l8LiV5fSOSvIiLkCuEQyAZurNkyMSf357hE= X-Received: by 2002:a17:906:c141:b0:a68:b73d:30d0 with SMTP id a640c23a62f3a-a751446804amr698380466b.6.1719921641704; Tue, 02 Jul 2024 05:00:41 -0700 (PDT) MIME-Version: 1.0 References: <171952310959.1810550.17003659816794335660.stgit@firesoul> <171952312320.1810550.13209360603489797077.stgit@firesoul> <4n3qu75efpznkomxytm7irwfiq44hhi4hb5igjbd55ooxgmvwa@tbgmwvcqsy75> <7ecdd625-37a0-49f1-92fc-eef9791fbe5b@kernel.org> <849e7b86-b971-47d7-8e31-7eee0918ea33@kernel.org> In-Reply-To: <849e7b86-b971-47d7-8e31-7eee0918ea33@kernel.org> From: Yosry Ahmed Date: Tue, 2 Jul 2024 05:00:05 -0700 Message-ID: Subject: Re: [PATCH V4 2/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Jesper Dangaard Brouer Cc: Shakeel Butt , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 9D1994004E X-Stat-Signature: 7x3r9jxet847tratx14obqz5osxs8pxe X-Rspam-User: X-HE-Tag: 1719921643-422861 X-HE-Meta: U2FsdGVkX1/zRb4jtKUQGZTc6kn6nYC5epxC2lVhCX4TvKPEzIdQ4Pnr+h0AdyfKh+VV9kdd1TFF8Es5lOA5U5IaWCzuXIr0zbofgIA9tZDDLxUwe5wi6t0Zrj6UdU9LClm98/GqF/ChSCNyb/r/UViF9DbQB2yQ22FPhIxJDNg1+VZeA7WYhHnjzFbDBy+Wm7xfmuR+KcgYKnbCu9uZpMsF95Adyh1Fb0ExXUFmYsk9qO73CwwlANdJ7YSQNYAbshrN0Sy/l59PrjI1LoSSEG2fR36rJN5+TFr0P6G/ODJUU9CDoOkmIkINWmNzIj5gOT7ZclV1oWKXn7rbbQa0XSN/5hrSGkqQSkj+gb82iMfQIWS0rtZ2EocT1ifULFnhPZCyEgYcRluhMvJ9sS2mrIka5bL/0z9Qt1EiiLHxFAoojhbVjT1P1Tm7nt+GNPK9AIdiXA87Mmw4/BdP431icFa63yKwEHxvqoe3+ICRc2eZhTNQ6kAqFDljQXhsiVsZJRt1aD6pD7pMueMMRprmebUXPC/P5nRpc96OPDpymNC4porWyWg6ZIq7rL4YXomwxTFrc9oDNREZShlnFB7GIvYBpaVjA6Kxk//XkJyumnsnvSnTLc5+RziJyB3UJ/OOL7EVmqbDGsAto0uAJz4yP9qP53GZFnXYtypxPiVuKN0DKmhJZt16vR6JNy5yypWCjr87mygyXd9jbfHCMwKgNjwVyWDoKxEPFVvDQZ0J1CklRzH8lFEr6SrMQ9FAAwEJFO1XCsU/mG2NGlbcClk5uMMLBxV1UpL0sAdrgGIprVp44Inx5qhyJ7JRaH+WA1vO3ozOE6bcQ/QGmTB9Z6i+q1Uv5MGPIPsQMYjfkorF2RF47cpbQbTNndUHmz+NIRH4quluL462E5sO0iW8QPSQ/SmM4wcOe6XLx8StvFduDCAIzPUrkJb2z+zd08zlxFZeAqYzZRIn464iPM/50hK gdUqu07R JXFgT5Xxzue2cDWlC6IYe7qMWigCn8slqpz88BAExGWmkgyyu5jgFJ9Id8xO4g0KQZ2H9/Gsqmn+CBdqcoI6S9k5D8yf4MJMcFwt0QrmnzIxt8T6JGWzQ9rZHeD5E8OOlH8US8B+0o1xGncfCK8KXNLfIBAXa08yncYk5jpgxh1Uigto9OnvRpY1y7LJbDPKD2GDAq32o+YUO6kWHtTOEox1Uvt9p/eCNW83j X-Bogosity: Ham, tests=bogofilter, spamicity=0.250194, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Jul 2, 2024 at 3:35=E2=80=AFAM Jesper Dangaard Brouer wrote: > > > > On 29/06/2024 00.15, Yosry Ahmed wrote: > > [..] > >>>> + /* Obtained lock, record this cgrp as the ongoing flusher */ > >>>> + if (!READ_ONCE(cgrp_rstat_ongoing_flusher)) { > >>> > >>> Can the above condition will ever be false? > >>> > >> > >> Yes, I think so, because I realized that cgroup_rstat_flush_locked() c= an > >> release/"yield" the lock. Thus, other CPUs/threads have a chance to > >> call cgroup_rstat_flush, and try to become the "ongoing-flusher". > > > > Right, there may actually be multiple ongoing flushers. I am now > > wondering if it would be better if we drop cgrp_rstat_ongoing_flusher > > completely, add a per-cgroup under_flush boolean/flag, and have the > > cgroup iterate its parents here to check if any of them is under_flush > > and wait for it instead. > > > > Yes, we have to add parent iteration here, but I think it may be fine > > because the flush path is already expensive. This will allow us to > > detect if any ongoing flush is overlapping with us, not just the one > > that happened to update cgrp_rstat_ongoing_flusher first. > > > > WDYT? > > No, I don't think we should complicate the code to "support" multiple > ongoing flushers (there is no parallel execution of these). The lock > yielding cause the (I assume) unintended side-effect that multiple > ongoing flushers can exist. We should work towards only having a single > ongoing flusher. > > With the current kswapd rstat contention issue, yielding the lock in the > loop, creates the worst possible case of cache-line trashing, as these > kthreads run on 12 different NUMA nodes. > > I'm working towards changing rstat lock to a mutex. When doing so, we > should not yield the lock in the loop. This will guarantee only having > a single ongoing flusher, and reduce cache-line trashing. If the direction we are heading in is not supporting multiple ongoing flushers then sure, that makes sense. But if we plan to continue supporting multiple ongoing flushers, then I think we should fully commit to it. Let's just avoid a halfway support.