From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CDF42C3271E for ; Mon, 8 Jul 2024 15:26:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 57FF26B0096; Mon, 8 Jul 2024 11:26:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 52FFD6B0098; Mon, 8 Jul 2024 11:26:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 445D06B0099; Mon, 8 Jul 2024 11:26:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 280D96B0096 for ; Mon, 8 Jul 2024 11:26:51 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id C6C30161355 for ; Mon, 8 Jul 2024 15:26:50 +0000 (UTC) X-FDA: 82316962980.22.B5A9A96 Received: from dfw.source.kernel.org (dfw.source.kernel.org [139.178.84.217]) by imf24.hostedemail.com (Postfix) with ESMTP id EF57F18002C for ; Mon, 8 Jul 2024 15:26:48 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=UFTDBjFU; spf=pass (imf24.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1720452393; a=rsa-sha256; cv=none; b=3vj4wwaMqdxssCj9+yDE0znZ+nAlURVkCNPWkCuS8rSHGjMx3xfnQpfjBCh71edjR5gHnE n7Bf/3iWTeafw+yxTqey/lHuMPlpQqSCP3x87qqvXXMFbpeyyYxbBbmB1kt27xdqP/hbSQ K+2KGov1EPQjjl+6iYYwv1WBESS9Jl4= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=kernel.org header.s=k20201202 header.b=UFTDBjFU; spf=pass (imf24.hostedemail.com: domain of hawk@kernel.org designates 139.178.84.217 as permitted sender) smtp.mailfrom=hawk@kernel.org; dmarc=pass (policy=none) header.from=kernel.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1720452393; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=5hmqF7+mJQHl0483Jfd7w9gfbYYOgGS7V15ROMcow8g=; b=MRsP9QZl+cklJspnSUq2ytqU1gvfb+ZJt6XiMSe3xCxR/9PKOi11EkYFH17gv87toiZPT4 XclXo6ubrOLN/SWSxh4vds2m9DXqtuaAJfq6m6ZWdqZv+ameNmZt1svS/FlC71/EjyHrVE d1SxjLhWS8dvmMwEvNcg1ZMhdSU6V2A= Received: from smtp.kernel.org (transwarp.subspace.kernel.org [100.75.92.58]) by dfw.source.kernel.org (Postfix) with ESMTP id DF38660EAA; Mon, 8 Jul 2024 15:26:47 +0000 (UTC) Received: by smtp.kernel.org (Postfix) with ESMTPSA id B86E7C116B1; Mon, 8 Jul 2024 15:26:45 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1720452407; bh=cjsprHnjp5XNdWf8EStlLIvj8tCDoS/WMOdakGDpGjQ=; h=Date:Subject:From:To:Cc:References:In-Reply-To:From; b=UFTDBjFUIPjatPoSN6tTKvG49TYK5WSHWkE0aUx1HmML6OYv2xpDhfINMaaGi8A6O ljJd5IR7M+FwCnZf/NSOAsDnp08L/bQHG4ethTQia8mpITFcN2Lv/sGb3ltKjIB4dQ aApEV+sEJmqn6Mz8SkKwZIA/kf6y92ZvNU3PWuH33QXbitQu43G72/ZDynPmse30fJ pAiY/qIREXjZxF4R9hjK3Uc0/kiYjWG7Eopo2BQd1wv0YRE31eK0Bvq9EoT140la/Z 6NKCVwQZTSfAQtnVFQNg0YqVmEV2q5xp4zJmRHFXkrLqrH6x3n/SvJQjOAwtQTssOQ 5/keIjNZ2aENw== Message-ID: <9a7930b9-dec0-418c-8475-5a7e18b3ec68@kernel.org> Date: Mon, 8 Jul 2024 17:26:44 +0200 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Subject: Re: [PATCH V4 2/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes From: Jesper Dangaard Brouer To: Shakeel Butt Cc: tj@kernel.org, cgroups@vger.kernel.org, yosryahmed@google.com, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org References: <171952310959.1810550.17003659816794335660.stgit@firesoul> <171952312320.1810550.13209360603489797077.stgit@firesoul> <4n3qu75efpznkomxytm7irwfiq44hhi4hb5igjbd55ooxgmvwa@tbgmwvcqsy75> <7ecdd625-37a0-49f1-92fc-eef9791fbe5b@kernel.org> Content-Language: en-US In-Reply-To: <7ecdd625-37a0-49f1-92fc-eef9791fbe5b@kernel.org> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit X-Stat-Signature: i5q1hpgqsyt8hnmbjdf6ih8mhm8kc1e3 X-Rspamd-Queue-Id: EF57F18002C X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1720452408-841040 X-HE-Meta: U2FsdGVkX1+Q+HMqL+xZtK+AHVAcM3CEbJySc0Ll+EkOTMzR697+JT9MV2K/qW27mSIhIBYnwSuBqIHQxMWqx6N2XEhgAzhYt6Mk0fE1uPJecftOhGI7sGt2b/OUvXRM2IjaFXSVqBjdNG1GjEDE1f5aechoVP5Bh8o9uzpz0gMBIcT1f4OOB8IUS9NJ/p5Dr1ZHUiNosmjGCDgV6eNwEQRsyT531PckV6ULxnHQzm2GStKZzScErBz0JVFXUJL/T0bNI/ahRSeat8WY4Os8U/zsFh+W48K+aJTNnrp7/PkPrDEsKEyEvZqmFB/T6t0odGQbRf15Ev0XTfypUUOB6L6riTjvjPVVwJ76eTth4Hq/rlIUsgp4Rkn50zOZ3Nqekr+Pc3TZvSCllvmW/Vk0CteyRAlApURvRlDUc/MTPTbG2atzs5AlUjeXyGz5UcTe27H29ZDkZib/V9OJOfxgtPt/8y35LweINwVcl8sGgBznFj/CQrrrjPm25wjQr/UuzicF6cJj5oDGAEgBhgM2zxTO0FO74hMJFgIcx/rHfuoCUfUWtMIfL7wQY2iYnUrpZ0H8r6CV63gLO21uIhi6J1gkcfMzjVmWbyI5UnFm/ry5OQHPAemSX17dsu//kCxVRHtFWnr8Klz9ixRD7Q+6fO2GzPt+rsfeQM7JCFR8A++D3eFVz6pT/DafWNaaVDqdoef11YVXDlbAt1UbgXi/4xxmOZCLqxulu+86tdIUN9mmw/oT26JbKocapvKGxggZfRCeGaWIqTnrYYB6f2RLMVc8FePoKrhLPVzd3lI57inUTMPR2QwDM1D1+KPh8WHykxdGg1tnfzfk/An963CB4oCUA/0r6BND6+dKPr4rVLLL5gh8myDJU/Ex0yhutEugaX51ra+fahCN77FekLUar5p7Yu4Ctvk6Y8Y8cO7cruWyjcLHJW995xo9jWIic5FmEDAqR+YtLUoe0vGKhAl Xv901f3+ 2QGEXTTP3fbVIWs34aC7ohGljqhmyo4HMZpPraXqG6jfzocq7juSq/6XZAdYaw1SgcynVxMz38avbSjyxGZbNyPByR8ViGs9R0eavgnP31XJHVhTH9djgGV5XwoUxo88i5l02/MnvskFFdq8gvPm2atpNoU1txLJPVX2FktEKBX4Ck+3G46hNImxOJZgoxAtHYPePkcC63LzUsHlDST/3hRWhaSs8ltdMRg0haIN6BXY7qJJMR17hFxR37xNGoqPn1W2N X-Bogosity: Ham, tests=bogofilter, spamicity=0.002008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On 28/06/2024 11.39, Jesper Dangaard Brouer wrote: > > > On 28/06/2024 01.34, Shakeel Butt wrote: >> On Thu, Jun 27, 2024 at 11:18:56PM GMT, Jesper Dangaard Brouer wrote: >>> Avoid lock contention on the global cgroup rstat lock caused by kswapd >>> starting on all NUMA nodes simultaneously. At Cloudflare, we observed >>> massive issues due to kswapd and the specific mem_cgroup_flush_stats() >>> call inlined in shrink_node, which takes the rstat lock. >>> [...] >>>   static void cgroup_base_stat_flush(struct cgroup *cgrp, int cpu); >>> @@ -312,6 +315,45 @@ static inline void __cgroup_rstat_unlock(struct >>> cgroup *cgrp, int cpu_in_loop) >>>       spin_unlock_irq(&cgroup_rstat_lock); >>>   } >>> +#define MAX_WAIT    msecs_to_jiffies(100) >>> +/* Trylock helper that also checks for on ongoing flusher */ >>> +static bool cgroup_rstat_trylock_flusher(struct cgroup *cgrp) >>> +{ >>> +    bool locked = __cgroup_rstat_trylock(cgrp, -1); >>> +    if (!locked) { >>> +        struct cgroup *cgrp_ongoing; >>> + >>> +        /* Lock is contended, lets check if ongoing flusher is already >>> +         * taking care of this, if we are a descendant. >>> +         */ >>> +        cgrp_ongoing = READ_ONCE(cgrp_rstat_ongoing_flusher); >>> +        if (cgrp_ongoing && cgroup_is_descendant(cgrp, cgrp_ongoing)) { >> >> I wonder if READ_ONCE() and cgroup_is_descendant() needs to happen >> within in rcu section. On a preemptable kernel, let's say we got >> preempted in between them, the flusher was unrelated and got freed >> before we get the CPU. In that case we are accessing freed memory. >> > > I have to think about this some more. > I don't think this is necessary. We are now waiting (for completion) and not skipping flush, because as part of take down function cgroup_rstat_exit() is called, which will call cgroup_rstat_flush(). void cgroup_rstat_exit(struct cgroup *cgrp) { int cpu; cgroup_rstat_flush(cgrp); >>> +            wait_for_completion_interruptible_timeout( >>> +                &cgrp_ongoing->flush_done, MAX_WAIT); >>> + >>> +            return false; >>> +        } >>> +        __cgroup_rstat_lock(cgrp, -1, false); >>> +    } >>> +    /* Obtained lock, record this cgrp as the ongoing flusher */ >>> +    if (!READ_ONCE(cgrp_rstat_ongoing_flusher)) {