From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EA0EEC3DA59 for ; Mon, 22 Jul 2024 20:13:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4860C6B0082; Mon, 22 Jul 2024 16:13:16 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 435F06B0083; Mon, 22 Jul 2024 16:13:16 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2FDC96B0085; Mon, 22 Jul 2024 16:13:16 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 1395C6B0082 for ; Mon, 22 Jul 2024 16:13:16 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B37C641A27 for ; Mon, 22 Jul 2024 20:13:15 +0000 (UTC) X-FDA: 82368487950.19.B974F8D Received: from mail-ej1-f45.google.com (mail-ej1-f45.google.com [209.85.218.45]) by imf13.hostedemail.com (Postfix) with ESMTP id D413020017 for ; Mon, 22 Jul 2024 20:13:13 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="WF/jfj21"; spf=pass (imf13.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1721679148; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uJd3T01oYiu8j24gOC/Fh6PQsujZog8KkMGblRRhmrY=; b=jb0dZkIDUe++1z+RpQOQgN/9BMlzDQrdSw/NolVnWlZ6dRpj8Ug1TTf2FDPKWCyfOKF00B 0VdKEW885PKiUCj4xWL59LjF4gP/bh6NOMypCimZExj4VzcDNiftWDSlbInxtSCOTwofzn kh0VC54HAjBXlMGViHRgOqMwWJ4NN3o= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1721679148; a=rsa-sha256; cv=none; b=vKUlqHoUTTIpgKz+t2vskXjGy62DLJIelqLFnWe4ocQR7t/MOvByACdJz19Yp8OnCA+hwB fEr77EqwgUquANict7Fcl7Iry6xctvS0erCgNqlxDduOBSahrqAHrh9C6rb9V0BLfUX3tN k+q9zprFdl3AFfqzFDIWa7c9Nxh24NY= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b="WF/jfj21"; spf=pass (imf13.hostedemail.com: domain of yosryahmed@google.com designates 209.85.218.45 as permitted sender) smtp.mailfrom=yosryahmed@google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-ej1-f45.google.com with SMTP id a640c23a62f3a-a77e7a6cfa7so480965366b.1 for ; Mon, 22 Jul 2024 13:13:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1721679192; x=1722283992; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=uJd3T01oYiu8j24gOC/Fh6PQsujZog8KkMGblRRhmrY=; b=WF/jfj21rJgJ1WXfYgehtTkWKvCyR4X1yWcffhjvbd6FDlrmalNQ7Q2Fg5OrLn7pDu YHKZb/cAYMuMatfdsQIT+x10ZTjL8h1Zpr4PIJzQO4aj99AdeMQDiF9Rptn+2TYdimTE 03nnkMll6UoW/higf6ZABX6nhETM3zUYJo3rWo+wMxCHH4WDe7dmKlr/nc0keEHgRfLn d0oFqXwdG5AQBExEaIz8hN+Txbk54k4QxOB6lRSUlBapXt4bsX+MYseH92NfE9A4g/8B s2nSmmjJQaT53ts0zLCT/oxE5xDnmFS29eWg6GLbFgfNIQPnkI/KrQSLDYiWOgtOeflN GoNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1721679192; x=1722283992; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=uJd3T01oYiu8j24gOC/Fh6PQsujZog8KkMGblRRhmrY=; b=S42wXV1Z+No7+y1fsefy9msSPULELFtKpS6PSE1Dwi3U34OZ70IqNQtAbwVruvq4Z2 YGBBFVzXNAx9r/idbUEQjZBqzBMWo0qGquOZDptSTprZ/cm6HYcce/pE4Ql+QoVPM8x1 5DQdQsSkZWVrkjZgvaaihPq0UrC7zuC3Oe67lBn+3EJfePzrURMN9Kk6+LwT6vuNq3j8 nFfw5IgUAgbqwyj74ZtkoRhjT2IZFeazAm9BQtnI8eQWthVbYp+ADEm2BnRC1ct79+6G L4RhJojnaEnv+eb2mCNXFFRj5SK1e1OC28S753bKbpVrYQL30MZbZoMNjiaiReXG2kQR +aoA== X-Forwarded-Encrypted: i=1; AJvYcCVD4Nap8h3r2c+V1qtLD7S0VpuJyj0lsWLknjADp7BI5R9qVWKRhwP+Amd0brTBdtJh6SGsuYA3qpbu6/ambBg6HJQ= X-Gm-Message-State: AOJu0YwNW5ihhYUHe8SbEenDFrdv0mt4mOXfCPWFZx+Vx/IilTs/AoL1 RzdzyHMUGGl47OAE6GCfMbcF4jUDoxm/BGmcHSk8R8jP6Z0P6WDq1ri6CODgxrhwxbvcw/kHVZ1 WPLEgVT2RKvQDo1rti4znA01rILSkFS5xuCXz X-Google-Smtp-Source: AGHT+IHCjnNfOg3AA/xVJEkipAdE4qARcHS7Iri9W3Xj51is2F8E2h6wLQ5Ncs2KJuGZQiG6jIsE2XCmE+iY3tHppiY= X-Received: by 2002:a17:907:97cd:b0:a6e:d339:c095 with SMTP id a640c23a62f3a-a7a8847a8f6mr69961866b.47.1721679191779; Mon, 22 Jul 2024 13:13:11 -0700 (PDT) MIME-Version: 1.0 References: <172070450139.2992819.13210624094367257881.stgit@firesoul> <100caebf-c11c-45c9-b864-d8562e2a5ac5@kernel.org> <5ccc693a-2142-489d-b3f1-426758883c1e@kernel.org> In-Reply-To: From: Yosry Ahmed Date: Mon, 22 Jul 2024 13:12:35 -0700 Message-ID: Subject: Re: [PATCH V7 1/2] cgroup/rstat: Avoid thundering herd problem by kswapd across NUMA nodes To: Shakeel Butt , Jesper Dangaard Brouer Cc: tj@kernel.org, cgroups@vger.kernel.org, hannes@cmpxchg.org, lizefan.x@bytedance.com, longman@redhat.com, kernel-team@cloudflare.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: D413020017 X-Stat-Signature: cpowmiyak1hj8ubozkoiw5tzfdn5i837 X-HE-Tag: 1721679193-252225 X-HE-Meta: U2FsdGVkX1+T5+mNtBnes+MtaZrEHBacK46kd+5Q2r291HbQITfvbGzETvHOmObshYZLx/3owb6kIrlHDdF25czlKdIeGvFhz0aPvov1Nx/kRv1nm7Fc3BvgYLPkXPHv7lrnwNrUunjy2txIUwGN6s8f2uKDUh8JcBe9QyP9glB1MMrsXUVtahHXqRi3TbGy3IKezf2jfECy2QkgVHn+ad8Spo/80QDn1BfcZ6txUmAbPHtWPgczJxxH1Q9+fycGG91A9KdT70yD8SjrxrQNd2QcoYGnVs6TVcCw8A/dw7FaURX2ymWI+oyJv3ZXHyCfYi+GNrG/M1BIZf8AeFQCE3pWrgkH6XvM0bRIkKf7RsX1cwpSdwjLyyJ9YvziBayjqf/0AowhN0iEs9BzX7kepJHjgC9LDXVoSSYNmkLqd+srLBN55VDKhHsBp+zenmEXn6o8MGiFb9d1RUKMgWxjXC8MD8LRsfRdBnQ4MfteBgffmAoj1HFLt1ElTAHIl3M3Yrjz+NkuR//2eq8UbCZ86GJCovcMgjAUK3iQVKzCjL99Rcwdrikf1GZuHsxJ4Uyb7b4CliH8C/ugMLtzx/n8ZTZTU8XbezUkKMGu+dOaKW62ZOo9Wvx+DziXbO4bhjp1ieZYxgWgEsI7jA3qJZZSdqNw/Ztn9AtsoWMEYEKXGy+I1DIbzzrBPLlt6lRZ5aMTIR8fWkbnZF7Dzt9QM4AWub5+oz4xECuWhtcIPewMmDuLpq8V373LtRbOgfogpjceuoOwUhdETt376nyS0Wma3TUEtZWwDfczR4f9gDV6K6NDeD5peDuILittZ7l7DB9Ldq7oxSXzkznC3x2zD4k18bWEkKUWVmNN/qNV4hF11vFwmL4QLhI0xIxWUelJyVSsVKUabKk9q9s+S/KT5VJxN25FKDSa+Ve4PE5p4FmLmkDDxK99t7UUPq1OBWEpuOuWoso45eu31TzQCkXOLAS gImejMOl QzjDNNCMzG+bU4PrLNVRP5aEYzcemYp+pAszbKq9oTR8oJoStntsf7QyihIHy8h7gHbskextezXpx4NLbtpAJGu//RHl6vTkrj79gMBGokCAqye4kmoxAoZn9ybefxx3TVOVbowfHfv8IxRmBGLFD3j3q3EtdsJzr/c84vOUqzpYMlbjSu3jvXtZs+4AiHXavaMeNPt8QDas9tT08dGR5SpvOt9B+W70OPdXZUI0RO69/utuB8Vt/H74bIgDh5tH7jJYyGJy+pg23v9k= X-Bogosity: Ham, tests=bogofilter, spamicity=0.052499, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Mon, Jul 22, 2024 at 1:02=E2=80=AFPM Shakeel Butt wrote: > > On Fri, Jul 19, 2024 at 09:52:17PM GMT, Yosry Ahmed wrote: > > On Fri, Jul 19, 2024 at 3:48=E2=80=AFPM Shakeel Butt wrote: > > > > > > On Fri, Jul 19, 2024 at 09:54:41AM GMT, Jesper Dangaard Brouer wrote: > > > > > > > > > > > > On 19/07/2024 02.40, Shakeel Butt wrote: > > > > > Hi Jesper, > > > > > > > > > > On Wed, Jul 17, 2024 at 06:36:28PM GMT, Jesper Dangaard Brouer wr= ote: > > > > > > > > > > > [...] > > > > > > > > > > > > > > > > > > Looking at the production numbers for the time the lock is held= for level 0: > > > > > > > > > > > > @locked_time_level[0]: > > > > > > [4M, 8M) 623 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ = | > > > > > > [8M, 16M) 860 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@= @@@@@@@| > > > > > > [16M, 32M) 295 |@@@@@@@@@@@@@@@@@ = | > > > > > > [32M, 64M) 275 |@@@@@@@@@@@@@@@@ = | > > > > > > > > > > > > > > > > Is it possible to get the above histogram for other levels as wel= l? > > > > > > > > Data from other levels available in [1]: > > > > [1] > > > > https://lore.kernel.org/all/8c123882-a5c5-409a-938b-cb5aec9b9ab5@ke= rnel.org/ > > > > > > > > IMHO the data shows we will get most out of skipping level-0 root-c= group > > > > flushes. > > > > > > > > > > Thanks a lot of the data. Are all or most of these locked_time_level[= 0] > > > from kswapds? This just motivates me to strongly push the ratelimited > > > flush patch of mine (which would be orthogonal to your patch series). > > > > Jesper and I were discussing a better ratelimiting approach, whether > > it's measuring the time since the last flush, or only skipping if we > > have a lot of flushes in a specific time frame (using __ratelimit()). > > I believe this would be better than the current memcg ratelimiting > > approach, and we can remove the latter. > > > > WDYT? > > The last statement gives me the impression that you are trying to fix > something that is not broken. The current ratelimiting users are ok, the > issue is with the sync flushers. Or maybe you are suggesting that the new > ratelimiting will be used for all sync flushers and current ratelimiting > users and the new ratelimiting will make a good tradeoff between the > accuracy and potential flush stall? The latter. Basically the idea is to have more informed and generic ratelimiting logic in the core rstat flushing code (e.g. using __ratelimit()), which would apply to ~all flushers*. Then, we ideally wouldn't need mem_cgroup_flush_stats_ratelimited() at all. *The obvious exception is the force flushing case we discussed for cgroup_rstat_exit(). In fact, I think we need that even with the ongoing flusher optimization, because I think there is a slight chance that a flush is missed. It wouldn't be problematic for other flushers, but it certainly can be for cgroup_rstat_exit() as the stats will be completely dropped. The scenario I have in mind is: - CPU 1 starts a flush of cgroup A. Flushing complete, but waiters are not woke up yet. - CPU 2 updates the stats of cgroup A after it is flushed by CPU 1. - CPU 3 calls cgroup_rstat_exit(), sees the ongoing flusher and waits. - CPU 1 wakes up the waiters. - CPU 3 proceeds to destroy cgroup A, and the updates made by CPU 2 are los= t.