From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 978FCC7EE2E for ; Thu, 8 Jun 2023 16:34:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCC8A8E0001; Thu, 8 Jun 2023 12:34:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C55906B0074; Thu, 8 Jun 2023 12:34:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AF5F18E0001; Thu, 8 Jun 2023 12:34:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9E8436B0072 for ; Thu, 8 Jun 2023 12:34:07 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 66F8BC0363 for ; Thu, 8 Jun 2023 16:34:07 +0000 (UTC) X-FDA: 80880127734.25.5A67B3F Received: from mail-il1-f182.google.com (mail-il1-f182.google.com [209.85.166.182]) by imf14.hostedemail.com (Postfix) with ESMTP id 0BE6D100006 for ; Thu, 8 Jun 2023 16:34:03 +0000 (UTC) Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=GspKuJq3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of yuzhao@google.com designates 209.85.166.182 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1686242044; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=c89NpI0V0ZVvNNbEBSn+gUv6LN8PBGc2l0LO77uP+pQ=; b=kXhfetnDl3vl2y02SzBfPGb0WJDIr3eL9gbiUGRQSbyKIjpnRwQ5x2iB8AI0XYTu+koQd1 SB2qSETjEJTIHTsSVvkGCsFcI+IaLCYGcD+uAJx7e+of5ZefOGfMXUSDO858ve9R0VboEm Oodm98sKqT+I2Ln8/9sHMNhDUKS4nGg= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=GspKuJq3; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of yuzhao@google.com designates 209.85.166.182 as permitted sender) smtp.mailfrom=yuzhao@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1686242044; a=rsa-sha256; cv=none; b=M6rq/tiiFSl0M99xzxooKWYioPOriI6iYaQwV8DLczbnvDAvkE21PvyrcJAzR64h2vsjcZ jNZjkgxgnajAbdUWdEgKfKdTo5K4EAExV2cUnEL7a2h0G11HfVSbd0Z+tsMIZ8QEYdGRhy 2Ll4zQ8gu+qHQy/zt6JB5+LTZjZD02c= Received: by mail-il1-f182.google.com with SMTP id e9e14a558f8ab-33bf12b5fb5so113985ab.1 for ; Thu, 08 Jun 2023 09:34:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1686242043; x=1688834043; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=c89NpI0V0ZVvNNbEBSn+gUv6LN8PBGc2l0LO77uP+pQ=; b=GspKuJq3CLr+8UaqirGKWuyltIMBY+NUYpPlBImfM+jgMiZyBgtgf0UXLAucmWC+rY 4Fm3Wd2HCGLUlxhzxSrW4gW3fi8cftBsNPxTlbTL0vj5kM+H5KtZnPuMcmErvKv+FSHC Ju7pGkiA+gh5KzTMaTmQP+U5N0xW6zyJmmqqpvSCdLh5GjSgASKNhr6cr+gTLE4isUv3 epvFOQSOhoOP5QBpCSikMUA0LCWL4lhElEUKxXRCRiZ8RirZu97GurUNNs356nDMxg9Q wfu5b5+2ZJ9OAqpmn5UlEKm8x5Fm7DxQkqqaOCQiVCNXM2MvdSHsX9nVxsqyVfQ2L9zW ybHw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686242043; x=1688834043; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=c89NpI0V0ZVvNNbEBSn+gUv6LN8PBGc2l0LO77uP+pQ=; b=PkNelC67NxOgogQbpHmSCchpriRnjJsLEDE7u1UoPrxuYnjE2UMGo0KZWL2CmUiNez GifomTAen6EoQF99dMFtvB9B/jcuAwmBbattCGqImaYPMoLP4Csvi+kSHwM+mMXHMB8B 00v4WGJT9jeKR3UKy+2NHDH97EHlFc0+exmA+RQid+xLJc4evvGpD8pSLJnyYfROenRP LI+fCMFS+B1USsQi56MXL8LtSUDGcrZ7LNu7ZDUyiaBwRVMl3724HLsdtGCh6TzIYtqC kgKd8bMkBPURk0uMBT9zKwPb8lZ2E/zX1xMHNXnavVLLbnYrruK3wIommZSxru9eTDjr rzCw== X-Gm-Message-State: AC+VfDzUb2LyQz9v9wc5PXYnhUUFRY5yXeLGyMd87l99Dd3wZnAtYHg/ /dhgUK076SZXG/M1TMbKUwZLEEcyqpTdc7kQi/YXeg== X-Google-Smtp-Source: ACHHUZ6NOD/QUbRlWIgNJxPHtKVhUxa3UvUGFGJsGaWce3Db5F7bF0H6JfpQNeTtRvXtN5ydaEYTzbQFWfPdaqkgHWU= X-Received: by 2002:a05:6e02:1bcd:b0:33b:4a8c:2147 with SMTP id x13-20020a056e021bcd00b0033b4a8c2147mr142919ilv.8.1686242043095; Thu, 08 Jun 2023 09:34:03 -0700 (PDT) MIME-Version: 1.0 References: <20221024052841.3291983-1-shakeelb@google.com> <20230608111408.s2minsenlcjow7q3@quack3> In-Reply-To: <20230608111408.s2minsenlcjow7q3@quack3> From: Yu Zhao Date: Thu, 8 Jun 2023 10:33:26 -0600 Message-ID: Subject: Re: [PATCH] mm: convert mm's rss stats into percpu_counter To: Jan Kara , Shakeel Butt , Wei Xu Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, mhocko@suse.cz, vbabka@suse.cz, regressions@lists.linux.dev, Yu Ma Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 0BE6D100006 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: acky5deh4ssejfty7gkq9zwnqbmrwnc6 X-HE-Tag: 1686242043-488019 X-HE-Meta: U2FsdGVkX18dTdus9vgHpwaysAWXKH3vYkD7HFWKYkPuoKVBekyFuVJXVeAMmUdc6aDN101GscaZZVB4qwynqZuXJoa7nXHVme9ZGw6PMJo+1V9d/nCMwmaN9jWeXBaozWIGfPrtNHyBKXulsvKiJhS9gYOn+deRkENF72PwuAuyaYGI6pMGtrc4j/4k+QAj+OwkKyYrQGs98x6SqZZPhdarrKRmyKE8RJ53LGYZNMY5mX63zfIa21NWZ6UZbTn6eLh/gNgH/slJ4tebwQ26u+BKcXyuFIDfg/QnHjzJFs0yXxMFZ6Rch5Ew+nj0RUvhoEyOZ0RmZX4aYPCi+qE3xxBW2/QsPGL7wooyehwxE6EG0/0MPPafBgEP8Zu1S2PY7qi/R/ObDQgq0krCI8+o8Dy3lA572whD/byj//3+BVBCX0iUZa7Ht0uhtymlPhjUrl2XythSBhGkHwvSO1dkcct1ErR9oR8Cx7InZALhMqVW6O5MfkcC2VF/F88L+QW1MYKsVouAx2iB57Kt5bnVPa7SFjdKx8kIiM3n46MeF6DPNEBXreejC7pK70UK2TaQVLyv02m074vCjmo3Go4wNpG9emg/oaMiIGlg9MlyH/+SnI981Yd52mwW6jnEy+ZutPwpbnsy55rybTKaX3HXCyGuwkFdW3YCTa+RnqBZBc7WQ+vZbVUlherSdreCFlz+mAHZrdsPGEYZhbEKrGCzXNbTMwUKGsfZaVZdywdPXWgQWn/YukZBApzB1rUyWbWsGhnJPz+mQPQPWsN+K1xOZ4tEPp5jB4/ZhMlC3Ae3q2yTOPS17OjKfVl0ZpWUZMR7DX+wzv8/MFGo7CZUbdDwQu7C3zGDL17Aw65AsBR/E1+aaqSrRwa9DMC3fOcst206wwgHgiGr/J35xckzJcwu6AJpez4Tmm/XXyVgIeviLJXbUih6W73MZ2paw7zwoHJD+QqxXi51SPtyj6b04xR RpKuhiv/ SLt2R8PfhXMMIoFTIHkvOLkbq6rJpvZNFEXtYFiwJ0+GbXnNU/e7Ppr04hhKR2rvTlvKLgZWZMIWPv2dZWLQ6+Pat3lQRdEeULEtrVFXJiAm0FrYy8mQQSH1669hjCYzsTJ6MWsN0FBqJAS62hOwlTfi7j0K+n+IjG4F9ekvgCZt7LhoMiCqEOQBu2NBBtiKdkpV65x/j1S1vHtll0YVg0mH3MRMiL5aaxDzbp7MzGtoZf/xQd9ung/f1uhFaH3N8cMMN3D37LzNCDkg1KdbOBiZla+1+3y7qYgFayHIxwgU21ajTaKCyMfbt1mTdQQp0b2Y4ECvU8SvNmZPRnw9Xfak3ebp3ab0wTa7RbNoBBLSreNkuHOdFRuCd4+BYzxm1YjsN X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Thu, Jun 8, 2023 at 5:14=E2=80=AFAM Jan Kara wrote: > > On Mon 24-10-22 05:28:41, Shakeel Butt wrote: > > Currently mm_struct maintains rss_stats which are updated on page fault > > and the unmapping codepaths. For page fault codepath the updates are > > cached per thread with the batch of TASK_RSS_EVENTS_THRESH which is 64. > > The reason for caching is performance for multithreaded applications > > otherwise the rss_stats updates may become hotspot for such > > applications. > > > > However this optimization comes with the cost of error margin in the rs= s > > stats. The rss_stats for applications with large number of threads can > > be very skewed. At worst the error margin is (nr_threads * 64) and we > > have a lot of applications with 100s of threads, so the error margin ca= n > > be very high. Internally we had to reduce TASK_RSS_EVENTS_THRESH to 32. > > > > Recently we started seeing the unbounded errors for rss_stats for > > specific applications which use TCP rx0cp. It seems like > > vm_insert_pages() codepath does not sync rss_stats at all. > > > > This patch converts the rss_stats into percpu_counter to convert the > > error margin from (nr_threads * 64) to approximately (nr_cpus ^ 2). > > However this conversion enable us to get the accurate stats for > > situations where accuracy is more important than the cpu cost. Though > > this patch does not make such tradeoffs. > > > > Signed-off-by: Shakeel Butt > > Somewhat late to the game but our performance testing grid has noticed th= is > commit causes a performance regression on shell-heavy workloads. For > example running 'make test' in git sources on our test machine with 192 > CPUs takes about 4% longer, system time is increased by about 9%: > > before (9cd6ffa6025) after (f1a7941243c1) > Amean User 471.12 * 0.30%* 481.77 * -1.96%* > Amean System 244.47 * 0.90%* 269.13 * -9.09%* > Amean Elapsed 709.22 * 0.45%* 742.27 * -4.19%* > Amean CPU 100.00 ( 0.20%) 101.00 * -0.80%* > > Essentially this workload spawns in sequence a lot of short-lived tasks a= nd > the task startup + teardown cost is what this patch increases. To > demonstrate this more clearly, I've written trivial (and somewhat stupid) > benchmark shell_bench.sh: > > for (( i =3D 0; i < 20000; i++ )); do > /bin/true > done > > And when run like: > > numactl -C 1 ./shell_bench.sh > > (I've forced physical CPU binding to avoid task migrating over the machin= e > and cpu frequency scaling interfering which makes the numbers much more > noisy) I get the following elapsed times: > > 9cd6ffa6025 f1a7941243c1 > Avg 6.807429 7.631571 > Stddev 0.021797 0.016483 > > So some 12% regression in elapsed time. Just to be sure I've verified tha= t > per-cpu allocator patch [1] does not improve these numbers in any > significant way. > > Where do we go from here? I think in principle the problem could be fixed > by being clever and when the task has only a single thread, we don't both= er > with allocating pcpu counter (and summing it at the end) and just account > directly in mm_struct. When the second thread is spawned, we bite the > bullet, allocate pcpu counter and start with more scalable accounting. > These shortlived tasks in shell workloads or similar don't spawn any > threads so this should fix the regression. But this is obviously easier > said than done... > > Honza > > [1] https://lore.kernel.org/all/20230606125404.95256-1-yu.ma@intel.com/ Another regression reported earlier: https://lore.kernel.org/linux-mm/202301301057.e55dad5b-oliver.sang@intel.co= m/