From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id ADF50C3DA61 for ; Fri, 19 Jul 2024 03:12:15 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 022D76B0088; Thu, 18 Jul 2024 23:12:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F14766B0089; Thu, 18 Jul 2024 23:12:14 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DDC316B008C; Thu, 18 Jul 2024 23:12:14 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B7B046B0088 for ; Thu, 18 Jul 2024 23:12:14 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6A22CA2076 for ; Fri, 19 Jul 2024 03:12:14 +0000 (UTC) X-FDA: 82355028588.14.58D5274 Received: from mail-ed1-f52.google.com (mail-ed1-f52.google.com [209.85.208.52]) by imf12.hostedemail.com (Postfix) with ESMTP id 8D9B740016 for ; Fri, 19 Jul 2024 03:12:12 +0000 (UTC) Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=u8klY9m2; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721358700; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PWrlWFsEEKaXoZhx0GTyR+UbVi2aC5Ulw5/+3CDCuiw=; b=etQYkEyZbzRk22+2EeKtrucEmWdLxVi/TLBRYT5ToiyFPixZtE23M7Hzs2cW5VPwuu0bv9 pAI77MAiOO7SN/pzcHPxMTNcch8URRNM+Q0IJdjF7cSWp+2+DN2ZEpfU2Qap+2RtNXF0oc j24WkO02cMGCMog/awVP4nzZOfGX2Rw= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=u8klY9m2; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.52 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721358700; a=rsa-sha256; cv=none; b=2PynOj+wRcXOGm0dCO/3tNW4IvKjY/WotCfA4zKFcG3T28m0PbDezTVDAA6Aww8tnTz8RU CqT8GE4LzXamFiEhomTQq7bjxqm0UqJdPL84AGyxEWVAEoKdt9tY0CDnE9SK573UvswJ/a gK4BUGMSFJ4tpx8elFwz1YPlzLL7LGA= Received: by mail-ed1-f52.google.com with SMTP id 4fb4d7f45d1cf-5a09634354eso470222a12.3 for ; Thu, 18 Jul 2024 20:12:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721358731; x=1721963531; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=PWrlWFsEEKaXoZhx0GTyR+UbVi2aC5Ulw5/+3CDCuiw=; b=u8klY9m25MT+H/3hoOlBA50yX92FOUNxh+HUe6GmPF06gJ7NN7uNmZ4yCzudTbyUql jj22lZFDIte0q//zYAdWTf4RE8fy4BalI71AFdlfCNvv6yHh1w/Y/4LoaJwQRD6Xny+U f6nH0EYWCk89Ebg3AXdrNSk3jv3FWyoFSDrwKJC90x07tCWtpiCDoPB5/02a1evH9ueR VOHYna+9ZAs9vaMD1rJJN91SkzYpF1sJUN8iNSps/HKl005rfw6726ib4u2NlqYQbbKy wCJrjx8gw9/zhyiBzFGZDEknfpunl+KRklKCBM0LPBLAhTj2UVrC/RboxELjPA8RFIS/ NSRw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721358731; x=1721963531; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=PWrlWFsEEKaXoZhx0GTyR+UbVi2aC5Ulw5/+3CDCuiw=; b=n/cT7Ocf/D/8ecKUoXzz0WCUe+TlRbmeB0sI4z0rEmhdoRa2CbOEvo5L0oQp2ST1zw X1a5BVtJkaTY6rGtjQrtqCAEMb72tu145AlQzMZlz358dvnRPI0hTATKz/eaX19Q0kYp OBI1dv5vLxN3jxqIeqTpppW8oTYDAz42JYYou+CbAl/5A4603fQDVYWKtZRBpCts1dMy s7LhBluLi8LzTHLimfp7hQjQ3cctw2R33SHxbzLyIesgRFZEnOc5xYNIi6t9qAm83GfO qbC3Tz2L3dvK2LbYg+4WiixrZpL38wlkA+lZrwt5v8LDivemCee/t15nHjXOOuR7G9gR y1vw== X-Forwarded-Encrypted: i=1; AJvYcCWFUwqjTwL06ngv9Tu6b3OGdHDiSbVgbjNFdy+7XUUnAmQqMx5QrAXyWH0LAgc9G9IimtmUT3/udxL9tjgp7haptB4= X-Gm-Message-State: AOJu0YwX5+3IHFiOoRRyvn+lgXh2AWwwE91OWEVsrEQwyPBYboOBR6fg h9arKIPNN/C44QDZT3KKibBLC0A4Zc8aHnh2XnzT8Jp295/bSSPrRjUovwyR9WZEVlCtDIV8QIm STwnAfUh5NQfoUJG4SToJLFfWzs2TIA35NBB4 X-Google-Smtp-Source: AGHT+IFkZK7JtKXq3yk+FqsXb0FFcMjlEbMm01qd/yCdC4ALR8VoUvA++xZIWpXXc+YkUidBTWzX0vp3A2obLGcWs5I= X-Received: by 2002:a17:906:487:b0:a79:fbf0:8106 with SMTP id a640c23a62f3a-a7a01139bc2mr457799066b.6.1721358729904; Thu, 18 Jul 2024 20:12:09 -0700 (PDT) MIME-Version: 1.0 References: <172070450139.2992819.13210624094367257881.stgit@firesoul> <100caebf-c11c-45c9-b864-d8562e2a5ac5@kernel.org> In-Reply-To: From: Yosry Ahmed Date: Thu, 18 Jul 2024 20:11:32 -0700 Message-ID: Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Shakeel Butt Cc: Jesper Dangaard Brouer , tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Stat-Signature: rzbgsktsfdzrr4ga1z9tieygfmc7syxt X-Rspam-User: X-Rspamd-Queue-Id: 8D9B740016 X-Rspamd-Server: rspam02 X-HE-Tag: 1721358732-703655 X-HE-Meta: U2FsdGVkX19Yap/cUJ3ORw1RGiUGzMCcxM7vTt/i5B3m0AwG+8SfUSEVF1HeD91oizn/jqJ/0e29Rq0HN3jjHt2IWBvaqtRP3hKH5MB6VZMCOG89v67qOE//7flaqhi/Grtf9eD/6eLLAnSIf2ePD6fcu07yO5It9SEFS2uhJi2W3Mwx2B+GhlwHJmqOlEIeiNqQRSIq10XqaQD6aXfdHTtSU1zYLbIrnwLl9srXbRoul/hV8kkY8ubN9G5xf+LRbD14N/mESm1MEKs/6us38bzGHx38VYQY2eYGHdzR/GsmrKglct6LULnXV9uOK49Thc2xYuFEsI3MfZ5QyyBl8htC5mRlWDbS2oUV+F54+2g/+J/OC4VHAiCJARomq9uNDtclEn2SzsxjI3Z02ExmSWxAGnXhV4vn+BVt0NB20GWtxQLSPHLFgpghQn2BSzMo4mhP9jycbX9SuTo7yUjQwVD2AGmIVVRq5VWROnnaShmkwgkzWhQUU1c0kTmzoFt4JKvFptJPhHhoK8mHTJ8VUxmV5/73iqia86Z0xoX925LWMHV6zXq8oEGqCj/0yWsbjiB1kvpnASqe4g/ufLhp6G7u26C6SQ2UaH62BPoU+CHrKK5sFzFdTKwh7pkrkbwpO49Ga3z5tZMgP9gIjlJOK9BSpiQwtvsInaAhnx0iFfgKrTuQC2jZiE3tY1jB16FfJwFsMqKG72jpzyBNBfh/KYducIIZMl78WTQAGOzKiQZ9NH/qWlO3+lE4Qg7/3NYRCntC2QQFY1F6ktyGCiB9NqU6oITaS8Q2+YrFiuxElGhz99Q+slB62e2AZKY6V8KxXTP5EYEwUTqB6AHVimKDLShhpVX8LY8Im1zteQ2yvsa7YxuH1Zl8krgwfn9XU6uaD5Ajdhkb7BSFYZw2wkaGab32WM1ad68V5dWjAnOKxm+Dpn9q7u4iMa9Nj92aLUOsnyFZ1lIFN4It7Xw8yzq w2cU1TiP I4wmwrw8OA28Zk2Cq7bj1vxutSbOMGwUxYITBk8BodBb2YgXHUGFn8IW+fslRHXVW4n8n8kA4fXzhN5rXYdQONQN626HIvS3BaeFskgWdE7qUPRfXsNXFeYFDcqO5dQMPu2tmO8yTbFzJGIQZG+1ld47PoL9DsbsO1N9G3TKQiwtmN+k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.036031, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Thu, Jul 18, 2024 at 5:41=E2=80=AFPM Shakeel Butt wrote: > > Hi Jesper, > > On Wed, Jul 17, 2024 at 06:36:28PM GMT, Jesper Dangaard Brouer wrote: > > > [...] > > > > > > Looking at the production numbers for the time the lock is held for lev= el 0: > > > > @locked_time_level[0]: > > [4M, 8M) 623 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ | > > [8M, 16M) 860 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@| > > [16M, 32M) 295 |@@@@@@@@@@@@@@@@@ | > > [32M, 64M) 275 |@@@@@@@@@@@@@@@@ | > > > > Is it possible to get the above histogram for other levels as well? I > know this is 12 numa node machine, how many total CPUs are there? > > > The time is in nanosec, so M corresponds to ms (milliseconds). > > > > With 36 flushes per second (as shown earlier) this is a flush every > > 27.7ms. It is not unreasonable (from above data) that the flush time > > also spend 27ms, which means that we spend a full CPU second flushing. > > That is spending too much time flushing. > > One idea to further reduce this time is more fine grained flush > skipping. At the moment we either skip the whole flush or not. How > about we make this decision per-cpu? We already have per-cpu updates > data and if it is less than MEMCG_CHARGE_BATCH, skip flush on that cpu. Good idea. I think we would need a per-subsystem callback to decide whether we want to flush the cgroup or not. This needs to happen in the core rstat flushing code (not the memcg flushing code), as we need to make sure we do not remove the cgroup from the per-cpu updated tree if we don't flush it. More generally, I think we should be able to have a "force" flush API that skips all optimizations and ensures that a flush occurs. I think this will be needed in the cgroup_rstat_exit() path, where stats of a cgroup being freed must be propagated to its parent, no matter how insignificant they may be, to avoid inconsistencies.