From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD3D4C0032E for ; Wed, 25 Oct 2023 06:23:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 35A9D6B031C; Wed, 25 Oct 2023 02:23:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 30B266B031D; Wed, 25 Oct 2023 02:23:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D19A6B031E; Wed, 25 Oct 2023 02:23:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 0F10C6B031C for ; Wed, 25 Oct 2023 02:23:13 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id DAE9BA04AB for ; Wed, 25 Oct 2023 06:23:12 +0000 (UTC) X-FDA: 81382991424.09.A12FF52 Received: from mail-ed1-f53.google.com (mail-ed1-f53.google.com [209.85.208.53]) by imf21.hostedemail.com (Postfix) with ESMTP id 167A01C0009 for ; Wed, 25 Oct 2023 06:23:10 +0000 (UTC) Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=r2nGJzS5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698214991; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nQbB4bwPVZ7SBi8C6gh7OCcW70kLdzSL7ZqDE/GfPyM=; b=olwXrEtPq/JTzTdgBqYNBkbobFij0zY7MZDQwPRDnmdS47qkreLeacjr27/6v45BokAMHW 6yPqnNtUjaIpnCHzXvfnuaE5NZZo4bLIdO/OzAEZeT/pSfK9delZ21H82xD98/eh6+4BXB LtogpsMAhopvdtGF4TEt+P41kbmDqh4= ARC-Authentication-Results: i=1; imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20230601 header.b=r2nGJzS5; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of yosryahmed@google.com designates 209.85.208.53 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698214991; a=rsa-sha256; cv=none; b=vYFzpSQNDkdXMqfFA9K+szg9/B4M3AL1jatYLgPJYSSHOZ6UPqm/sAvPoYhUEzAO7wGj7+ GLg6Evz+G79WyxQzHeBli4eEuHWJnnEGk7c99lNUOExB2hv5NN0yr3Vm9fxMOQxYdsCsNq 1x2EPYY48Fll0RtEOabridmwjpmATQE= Received: by mail-ed1-f53.google.com with SMTP id 4fb4d7f45d1cf-53de0d1dc46so8573468a12.3 for ; Tue, 24 Oct 2023 23:23:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20230601; t=1698214989; x=1698819789; darn=kvack.org; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=nQbB4bwPVZ7SBi8C6gh7OCcW70kLdzSL7ZqDE/GfPyM=; b=r2nGJzS5ewnkjvoC9gMjz6iMKme4S0SLgSK4O2Aj3H/ksUwmH1+WLzMVZzefZCbauz v+YXZCmnTNZnqBGpUPggmbyFE8TO1gbxwXNzayfVCe6qO5y9x4dxIOvvao1pRHOAvaxD CB/xaigpWY89dk18q85iUn/rnl3JM7LbxSFsu/CYuLtKvDNmzmjsiLF5jUZDkeV4NcR0 8gsgfQ3qCRkH1SsM8SFfsORAtW4Lu3WOdrUvNkDGDXeri+Lk6FxZ7cl4jZ+b2i3Zomro upZ85iAHkLsiOquBpfs2Dmlh/NtO16lrBc/eaun6h8Z3eyDL+H9jNOXsyrXl6DTcfxVQ dB/g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1698214989; x=1698819789; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=nQbB4bwPVZ7SBi8C6gh7OCcW70kLdzSL7ZqDE/GfPyM=; b=r2e3A6VBOOkNXOF2Pn1CnMPhshEjeEcTEgThQs7ZMASGLxt9m85xkNdIUBRq7NHRZp FqOe0RlvzycrOBFX5992lawWrNNLO58EHpvveHRc8lv+eqAF/C5UvHs/KhtTkN2euIUK qUEiTnM8T1tv+X4/XfZmI9bjyAb3Nj07A+nAXKRYUpwrSKK+0xWbb760hVzjrQ80oJ6t uLxGOGkXwXO1+b571GbKt0Okk5JlzSTI2Kz9mLj522AQaae922jtLIc0lNvvv9gjZ6jD eNL7sCTiQ+aIC1bF5Q9HfNpQifG4rEnMzEuy69MsqHSZFUR1g+Nm3sin93LjSS2wnOht Hlbw== X-Gm-Message-State: AOJu0YzUWg+A8DEM7v9qQl67LrrwJnui1nLc9iyR+PQy53FdeBhOtfzm HN3gy6P1D0W6W/LEZPd9JUwHc1uDKjMytNfj3f322A== X-Google-Smtp-Source: AGHT+IE4wzVGyva3F8aqQ4UNl8uI27Tc9h8knuHkhSOuPmZWNO6sz2hREqb1zMTmRAyOKxNKYEU65LWTXFGRf0nH6wU= X-Received: by 2002:a17:907:9342:b0:9be:40ba:5f1 with SMTP id bv2-20020a170907934200b009be40ba05f1mr12127756ejc.60.1698214989310; Tue, 24 Oct 2023 23:23:09 -0700 (PDT) MIME-Version: 1.0 References: <20231010032117.1577496-4-yosryahmed@google.com> <202310202303.c68e7639-oliver.sang@intel.com> In-Reply-To: From: Yosry Ahmed Date: Tue, 24 Oct 2023 23:22:30 -0700 Message-ID: Subject: Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg To: Oliver Sang , Shakeel Butt , Johannes Weiner Cc: Feng Tang , "oe-lkp@lists.linux.dev" , lkp , "cgroups@vger.kernel.org" , "linux-mm@kvack.org" , "Huang, Ying" , "Yin, Fengwei" , Andrew Morton , Michal Hocko , Roman Gushchin , Muchun Song , Ivan Babrou , Tejun Heo , =?UTF-8?Q?Michal_Koutn=C3=BD?= , Waiman Long , "kernel-team@cloudflare.com" , Wei Xu , Greg Thelen , "linux-kernel@vger.kernel.org" , Domenico Cerasuolo Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: 167A01C0009 X-Stat-Signature: r488z4ep1ezo1mhkbn7ehitkg8puiwmq X-Rspam-User: X-HE-Tag: 1698214990-962407 X-HE-Meta: U2FsdGVkX196yrNNGvN+bTzQ64FUlfEyipgg2qHPqBqMfWzfAW/AACX2S4AxgfeiCMvfr0NVus94n7qyNdwlciE/F4f+X9pinMDzjZ5MbYKBxtCff42Eib6zB2WkvRbdPxr1LoVJemSUa94+dY5jY7pPnNSYb7b115ZLwTEzdtU9kq0EGdBvcYYZmV+rsLOrAzr7HziDhTNXxQdKP/Lm/q8iKmwXUZnSlPy4E1S5G0oeIp+p6R4L11bysOWbK5t0Cwy4gQspT0M9i9K49U995nUR6n7RDt7+TTvyJOq7p2mgx+DdYhOEIhSsOJZYuAb4w7GThbK0VCTSk/H0yVn3dPiPy4H7UhsPiPoB62g7ee9gPWBnuPMDIU9h90vj+Cp5Je2Qd7FVINXrW0K13V6YKHWw7eeMjU1aXv2JmjJVKzcVol53FA39WzGel5/zUsTuns4KC6lyIDsjRcoEgIcSmQiyAYcjsY/W+dQi2CWSYs2WEqahQu61K4FM86ky+l/AbrbczZhajsCD+S+MrwPGlq/TB39iTm10ch8v5kOkhaX+gAR0dd8LbYMzWrgzjD/o6jDPEa0P3TuEaglIS0WwN6Eo4Bi01lHQvXhMy+dmgG11NXWXc+zFwTOtnAe6THxYsxI4mT7g0JmALhknsrVaoGkq/u7d20WtnirIHeHkHVEaNGN/6OaXVdeVbuU8zSINtIrttvXghZ/Ko7CkPg/gvpomUDLZhnQBReX7Iwz/BIUjukJi1j0zhhNTYHpfBJIItUeXvV0v3CCxayDdMya65FbutTtXQTBf9zvdwtVEd8w0piufm4HnxUEAd/gYBxi2igFMKDXiO1CCyrYByX6HE2/XBN8NZLEygQ6Vf+ID3yLymXZ38+fRCqtSyBhD+aUu9SBjGC1ELXylqnV4jVEkr23kG/0suqyzq3lvD5lIInttlVV3YaRvTHkuafJ44NXyET81lKffTjuU7CEo0I6 Ap0IQN/t Hgk8gSErZTlZnoEUSVuS5FmXysP16DJOgbswyURhdYDqtHKEeM85jr0SKwnNLMJRApeCVGxVGG3ZC53cSLNCTEZlz6oLJ/eLwe6i7Zl0Dj0tgMazNs6cQOUh2BEdSvIihH0Td0j4sEpfusFyFqb60IWW2VoiRP/ikgHQsUl4yv4sfRdm9ZXnEWaFtpeZqAeiTHXoVVM0W6oRs93AK6fz1EJFEC5+1Z4VbLXd3zc3z770W2b/Bb3LcHowCbiQqsVQmy+y6OUa096ZjOyWl3bBhaisW5OWmIG4Qcj5hJPuZbZyfsZcfda6T7POC/gQN2XJU2zE42fuETnWsihBh+XMFbmqsCz0dbggXJvyvoIFjS/TI1JvOEQfYf+2uoRwXvvGKZAMwqnf6j5y6UT4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: On Tue, Oct 24, 2023 at 11:09=E2=80=AFPM Oliver Sang wrote: > > hi, Yosry Ahmed, > > On Tue, Oct 24, 2023 at 12:14:42AM -0700, Yosry Ahmed wrote: > > On Mon, Oct 23, 2023 at 11:56=E2=80=AFPM Oliver Sang wrote: > > > > > > hi, Yosry Ahmed, > > > > > > On Mon, Oct 23, 2023 at 07:13:50PM -0700, Yosry Ahmed wrote: > > > > > > ... > > > > > > > > > > > I still could not run the benchmark, but I used a version of > > > > fallocate1.c that does 1 million iterations. I ran 100 in parallel. > > > > This showed ~13% regression with the patch, so not the same as the > > > > will-it-scale version, but it could be an indicator. > > > > > > > > With that, I did not see any improvement with the fixlet above or > > > > ___cacheline_aligned_in_smp. So you can scratch that. > > > > > > > > I did, however, see some improvement with reducing the indirection > > > > layers by moving stats_updates directly into struct mem_cgroup. The > > > > regression in my manual testing went down to 9%. Still not great, b= ut > > > > I am wondering how this reflects on the benchmark. If you're able t= o > > > > test it that would be great, the diff is below. Meanwhile I am stil= l > > > > looking for other improvements that can be made. > > > > > > we applied previous patch-set as below: > > > > > > c5f50d8b23c79 (linux-review/Yosry-Ahmed/mm-memcg-change-flush_next_ti= me-to-flush_last_time/20231010-112257) mm: memcg: restore subtree stats flu= shing > > > ac8a48ba9e1ca mm: workingset: move the stats flush into workingset_te= st_recent() > > > 51d74c18a9c61 mm: memcg: make stats flushing threshold per-memcg > > > 130617edc1cd1 mm: memcg: move vmstats structs definition above flushi= ng code > > > 26d0ee342efc6 mm: memcg: change flush_next_time to flush_last_time > > > 25478183883e6 Merge branch 'mm-nonmm-unstable' into mm-everything <= ---- the base our tool picked for the patch set > > > > > > I tried to apply below patch to either 51d74c18a9c61 or c5f50d8b23c79= , > > > but failed. could you guide how to apply this patch? > > > Thanks > > > > > > > Thanks for looking into this. I rebased the diff on top of > > c5f50d8b23c79. Please find it attached. > > from our tests, this patch has little impact. > > it was applied as below ac6a9444dec85: > > ac6a9444dec85 (linux-devel/fixup-c5f50d8b23c79) memcg: move stats_updates= to struct mem_cgroup > c5f50d8b23c79 (linux-review/Yosry-Ahmed/mm-memcg-change-flush_next_time-t= o-flush_last_time/20231010-112257) mm: memcg: restore subtree stats flushin= g > ac8a48ba9e1ca mm: workingset: move the stats flush into workingset_test_r= ecent() > 51d74c18a9c61 mm: memcg: make stats flushing threshold per-memcg > 130617edc1cd1 mm: memcg: move vmstats structs definition above flushing c= ode > 26d0ee342efc6 mm: memcg: change flush_next_time to flush_last_time > 25478183883e6 Merge branch 'mm-nonmm-unstable' into mm-everything > > for the first regression reported in original report, data are very close > for 51d74c18a9c61, c5f50d8b23c79 (patch-set tip, parent of ac6a9444dec85)= , > and ac6a9444dec85. > full comparison is as [1] > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/tes= tcase: > gcc-12/performance/x86_64-rhel-8.3/thread/100%/debian-11.1-x86_64-20220= 510.cgz/lkp-skl-fpga01/fallocate1/will-it-scale > > 130617edc1cd1ba1 51d74c18a9c61e7ee33bc90b522 c5f50d8b23c7982ac875791755b = ac6a9444dec85dc50c6bfbc4ee7 > ---------------- --------------------------- --------------------------- = --------------------------- > %stddev %change %stddev %change %stddev = %change %stddev > \ | \ | \ = | \ > 36509 -25.8% 27079 -25.2% 27305 = -25.0% 27383 will-it-scale.per_thread_ops > > for the second regression reported in origianl report, seems a small impa= ct > from ac6a9444dec85. > full comparison is as [2] > > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D > compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/tes= tcase: > gcc-12/performance/x86_64-rhel-8.3/thread/50%/debian-11.1-x86_64-202205= 10.cgz/lkp-skl-fpga01/fallocate1/will-it-scale > > 130617edc1cd1ba1 51d74c18a9c61e7ee33bc90b522 c5f50d8b23c7982ac875791755b = ac6a9444dec85dc50c6bfbc4ee7 > ---------------- --------------------------- --------------------------- = --------------------------- > %stddev %change %stddev %change %stddev = %change %stddev > \ | \ | \ = | \ > 76580 -30.0% 53575 -28.9% 54415 = -26.7% 56152 will-it-scale.per_thread_ops > > [1] > Thanks Oliver for running the numbers. If I understand correctly the will-it-scale.fallocate1 microbenchmark is the only one showing significant regression here, is this correct? In my runs, other more representative microbenchmarks benchmarks like netperf and will-it-scale.page_fault* show minimal regression. I would expect practical workloads to have high concurrency of page faults or networking, but maybe not fallocate/ftruncate. Oliver, in your experience, how often does such a regression in such a microbenchmark translate to a real regression that people care about? (or how often do people dismiss it?) I tried optimizing this further for the fallocate/ftruncate case but without luck. I even tried moving stats_updates into cgroup core (struct cgroup_rstat_cpu) to reuse the existing loop in cgroup_rstat_updated() -- but it somehow made it worse. On the other hand, we do have some machines in production running this series together with a previous optimization for non-hierarchical stats [1] on an older kernel, and we do see significant reduction in cpu time spent on reading the stats. Domenico did a similar experiment with only this series and reported similar results [2]. Shakeel, Johannes, (and other memcg folks), I personally think the benefits here outweigh a regression in this particular benchmark, but I am obviously biased. What do you think? [1]https://lore.kernel.org/lkml/20230726153223.821757-2-yosryahmed@google.c= om/ [2]https://lore.kernel.org/lkml/CAFYChMv_kv_KXOMRkrmTN-7MrfgBHMcK3YXv0dPYEL= 7nK77e2A@mail.gmail.com/