From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 6E35CC636D6
	for <linux-mm@archiver.kernel.org>; Tue, 31 Jan 2023 05:24:00 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id CE4506B0072; Tue, 31 Jan 2023 00:23:59 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id C94436B0073; Tue, 31 Jan 2023 00:23:59 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id B5D3D6B0074; Tue, 31 Jan 2023 00:23:59 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10])
	by kanga.kvack.org (Postfix) with ESMTP id A7F796B0072
	for <linux-mm@kvack.org>; Tue, 31 Jan 2023 00:23:59 -0500 (EST)
Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay03.hostedemail.com (Postfix) with ESMTP id 3E1F6A048B
	for <linux-mm@kvack.org>; Tue, 31 Jan 2023 05:23:59 +0000 (UTC)
X-FDA: 80413952598.25.B626797
Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201])
	by imf13.hostedemail.com (Postfix) with ESMTP id 6F1B220015
	for <linux-mm@kvack.org>; Tue, 31 Jan 2023 05:23:56 +0000 (UTC)
Authentication-Results: imf13.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=jsSIKEs9;
	spf=pass (imf13.hostedemail.com: domain of 36qXYYwgKCL0vkdnhhoejrrjoh.frpolqx0-ppnydfn.ruj@flex--shakeelb.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=36qXYYwgKCL0vkdnhhoejrrjoh.frpolqx0-ppnydfn.ruj@flex--shakeelb.bounces.google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1675142636;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=FxzXiYNEG3psU/UijmhvC35Bdkcw3jTJQ4HQ/yZCV/s=;
	b=x68wEkhWez/cj0exVPCIQrP2SyL3KYUBWY1xGQPCwE87Le7Io8cojY5WvzxXHv31Wbbl2J
	mQ0/xXMZoDz4H2xsktmAFzKW7alYSVlKecReQsebH6ONf4ho8slssqieGJKCfe0g2oN5BD
	rE1NLTmoAPk+pe0L0KAdY5LZy1Ua0vc=
ARC-Authentication-Results: i=1;
	imf13.hostedemail.com;
	dkim=pass header.d=google.com header.s=20210112 header.b=jsSIKEs9;
	spf=pass (imf13.hostedemail.com: domain of 36qXYYwgKCL0vkdnhhoejrrjoh.frpolqx0-ppnydfn.ruj@flex--shakeelb.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=36qXYYwgKCL0vkdnhhoejrrjoh.frpolqx0-ppnydfn.ruj@flex--shakeelb.bounces.google.com;
	dmarc=pass (policy=reject) header.from=google.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1675142636; a=rsa-sha256;
	cv=none;
	b=V7Sm2dAO08FKSv+5blVY7QJ630aGwDUThPxOTXu/G1DkGOzIGU3BJeDLFHTZakGpW2BCSt
	8Dm8MuIZQnS7v5vc8cwnT9mUZxhSQyuriS4+iAj2BuKsDyq4r/YcIp4hGm+z1AMN92zUy2
	bD3BY7WWDL5/ikFhZ23NtxhNV18FoIc=
Received: by mail-pf1-f201.google.com with SMTP id v11-20020a62a50b000000b00593b72f9027so2718287pfm.20
        for <linux-mm@kvack.org>; Mon, 30 Jan 2023 21:23:56 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20210112;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:from:to:cc:subject:date:message-id
         :reply-to;
        bh=FxzXiYNEG3psU/UijmhvC35Bdkcw3jTJQ4HQ/yZCV/s=;
        b=jsSIKEs9A35yLh7J4xlSyP7PNII9O8YK5EWEA9VEUEfjPQxLpsoAqLIhFwdSU8od9O
         si7SllGKNiJR0J8u5BKPinI9qIFVgh/U8VrJCyRES7mrmjV9L2g8fpV6oY2SZ0UvJ/S8
         Zs1UCIVpOjEQNLzYuyldbVekh1Vwp8agM2GCbhnE20eAFwDPUUZnsR7LZJ1Yc+AFIMUV
         gMfsEmuD76WUWUfS8mjqPR6xM/MYn2Kypa/lBuN+D8BcMmQdDyvfKODJXsxrB5KsFKiZ
         gQSBdyVPCDJoA5akaPZDl7idb/GUvSgAtMpDUMKy8iqSov934Kt/qMQUpLlRvjuoO1UX
         briQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20210112;
        h=content-transfer-encoding:cc:to:from:subject:message-id:references
         :mime-version:in-reply-to:date:x-gm-message-state:from:to:cc:subject
         :date:message-id:reply-to;
        bh=FxzXiYNEG3psU/UijmhvC35Bdkcw3jTJQ4HQ/yZCV/s=;
        b=BWUZtigkt0O7gvMj92/NazL+mS2wvIAjEmgJ8sMKQBQRWcSPGs0BkDg6bTSmVm7cNl
         fMFyd9LdzneQypsHEM7lvbOIQkpX+o/TcmTsxd/N5nxuBKoq/PZWk/Or4iIaC0ywMiMA
         dv12ASn/i1Yw2W7uYanJajh9wifPkoY+vtQQHcGd/xchTg+lPXNsq9ZnVL9MzY07Ud0g
         SF9IkDHTa/XHUk9GGEJePl67sJdMTgSpZ23qUVBsszS6z292j0cuhR5m0+7ftSquc5+f
         06Z9LPHvyYVOS1XP0cP3ESKDFQ1FXaQ8vi2WtEiWy0jqv/iGl4/vwJ1lco/wPgGIZiaE
         eMDA==
X-Gm-Message-State: AO0yUKWDnnUF1Rm8GrKCcstJJvtTOZoG2FXd/8GPJTNbnBxaarL3KCyd
	oQXbofIyRNl8+LqteyFrLi3dI2yfASJBmA==
X-Google-Smtp-Source: AK7set9wihvUb4BrOcfW+AxAOfjo+EU9au58sIsQAaENrkC7AI7pjxswJ7VpvsLyUyBufvKQEx6cV0kS7GAt9g==
X-Received: from shakeelb.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:262e])
 (user=shakeelb job=sendgmr) by 2002:a17:90a:f28f:b0:22c:5369:8a36 with SMTP
 id fs15-20020a17090af28f00b0022c53698a36mr250506pjb.0.1675142634866; Mon, 30
 Jan 2023 21:23:54 -0800 (PST)
Date: Tue, 31 Jan 2023 05:23:52 +0000
In-Reply-To: <Y9dETROtv9Bld9TI@casper.infradead.org>
Mime-Version: 1.0
References: <202301301057.e55dad5b-oliver.sang@intel.com> <Y9dETROtv9Bld9TI@casper.infradead.org>
Message-ID: <20230131052352.5qnqegzwmt7akk7t@google.com>
Subject: Re: [linus:master] [mm]  f1a7941243:  unixbench.score -19.2% regression
From: Shakeel Butt <shakeelb@google.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: kernel test robot <oliver.sang@intel.com>, oe-lkp@lists.linux.dev, lkp@intel.com, 
	linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, 
	Marek Szyprowski <m.szyprowski@samsung.com>, linux-mm@kvack.org, 
	linux-trace-kernel@vger.kernel.org, ying.huang@intel.com, feng.tang@intel.com, 
	zhengjun.xing@linux.intel.com, fengwei.yin@intel.com
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: quoted-printable
X-Rspamd-Server: rspam05
X-Rspamd-Queue-Id: 6F1B220015
X-Stat-Signature: c1zrm8oyeb7o1obxy6yr9tzzo588j7sd
X-Rspam-User: 
X-HE-Tag: 1675142636-56592
X-HE-Meta: U2FsdGVkX18wFVbWHe7c31z2wcI9OfdilakRnrOcqeHywq1kcp3ZYBVvu4XGHW9iy+91aEeh0o7t19abI1Xth/UWPqirxpUOZ+aZy+kyGtvIsXh0jjlyJmlokNalsWnS1/AgzWu0pAC+sAJGwX7tztv8mkJae1d1nbgPauNzLfYmyM/k8Y2jRE7ek7ZMokTZPqyoC4v9ZiLw1ak7X1ITrDXx4uGa28SNn+vlrvPQw400jSEj4PB/j+PVrqUQLNITwNJpJuoH4Gv4eYp1qYdStjywPRt5hhmpYEatAGk7JsvksQGCHGKOSkh0osLKzVogK4eMOWZ2PttESLM0C68g3MBknyadRBiYnB7d2vC0vwX7sFgBke5Wd21DDHdrZEaD9zGVvDEozgQObiFJYMLthIZhe+RFDdUBFbU7MOyomswG3zmBvIeJTjPcqa5zYZ3Td8umJUlQFkAW8j3wLqS65CgDA28cXPHtoR8LC3IuAuNCGE4Tq2Na/veTi2wCriFlNuT/pFgXqdJj6bIw7MmKBAx2FeOu9Xrims6P635Vys6YXBokyNAli9A0PYldrML46pWjOipDnrNK0Qgs1E3Qw/iznThnuZUYlii2IZsjDrkeVSuwuNJdSw9sxoTJzUMPbuPdQHYSx3jZJahVFqReGprqFpqk0caORScrAQpfMA51j6i/Z91X58ZNMvruOI/DrL1vTUdnXKTMV0CuBtf4pNINKoqo1/lS1XTFIOPHJTb2KCKkHk7qzluXGvpYicbk0Vl7ULW85x6fV1Ft5USHWVhOqfxzRaWyvvdl1v7KDw6esCFLntf7q7g0xJyHmLu53d9gaAx14XgNcvhiE9fntkoeQfY2PRfFgYgHjNfc6QmYZgaTqD6HNSZ6XnGyYchkvy1cpOVHRKm0cHecZDL8fS6lAQn3W5sl1PGmEFCbP8Q3cmNdczssDA2z4uIqJNEmE/qGjmt6xNk6S/273BQ
 Jazi6yEI
 IH5X56ne+m24wHtaYmbwibRIpXQAHHcgq2WfMXz8P7pvSbUu9ZSZxy2i0FILCAw+m+JnredABDQu6F4WjqK/pEv0ZAfhEG1y10LoOHhUY5FdB3BYTQgN6iZB3sFdICQN1gJNKdyGIs+0YZJhhBkDB3hsMi6eGWMhaYAi8/QOnfft4zvxVzxAD4QnE5iZu7rt+OF8/ULbxhAH8XUslnchCBI4dZjE8hGiccsyAytTKVWsLzsL2ep9ugEJjEhoT3xnPCJUTBdeg/cQlj1d2qTom1aZpxWbKdT8W42aMZSmJY7XMWL/Kaeu3j/Z2eGlBL+z9Kofrw1HTRKDJeXweZi3Ue15EMtWHn50SkIJh696cZXmgDZVMriSQS8EUCtr01+fPYm2lDx6e/fRiPfJjwON95QHLx2PT9KCr9Q3CcNZkkZJG36pQI9gYYdtwmqBH7i0KsZScQnEgs/uaaQnSkpbHk0yOoH/pLjGMdgZ9zCSMA0plGkgxFJTIMMD2hmwpHcFERIIashsk1/QZdjArHtOEtq1OItATKkfGHGC2z38D8SHHkV16/kt1QWvALdByHWWdKWSE9yfon/46IJupk7WXn9J98hQrriSB1CAxVpe8LZUo3md6SbdZ8qiNCy34498Hs4hdDoiZlqjgEnJ9D/GJg7CvHr50H8zLpm5gxdZv+0lBY0Q=
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Mon, Jan 30, 2023 at 04:15:09AM +0000, Matthew Wilcox wrote:
> On Mon, Jan 30, 2023 at 10:32:56AM +0800, kernel test robot wrote:
> > FYI, we noticed a -19.2% regression of unixbench.score due to commit:
> >
> > commit: f1a7941243c102a44e8847e3b94ff4ff3ec56f25 ("mm: convert mm's rss=
 stats into percpu_counter")
> > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
> >
> > in testcase: unixbench
> > on test machine: 128 threads 4 sockets Intel(R) Xeon(R) Gold 6338 CPU @=
 2.00GHz (Ice Lake) with 256G memory
> > with following parameters:
> >
> > 	runtime: 300s
> > 	nr_task: 30%
> > 	test: spawn
> > 	cpufreq_governor: performance
>
> ...
>
> > 9cd6ffa60256e931 f1a7941243c102a44e8847e3b94
> > ---------------- ---------------------------
> >          %stddev     %change         %stddev
> >              \          |                \
> >      11110           -19.2%       8974        unixbench.score
> >    1090843           -12.2%     957314        unixbench.time.involuntar=
y_context_switches
> >    4243909 =C2=B1  6%     -32.4%    2867136 =C2=B1  5%  unixbench.time.=
major_page_faults
> >      10547           -12.6%       9216        unixbench.time.maximum_re=
sident_set_size
> >  9.913e+08           -19.6%  7.969e+08        unixbench.time.minor_page=
_faults
> >       5638           +19.1%       6714        unixbench.time.system_tim=
e
> >       5502           -20.7%       4363        unixbench.time.user_time
>
> So we're spending a lot more time in the kernel and correspondingly less
> time in userspace.
>
> >   67991885           -16.9%   56507507        unixbench.time.voluntary_=
context_switches
> >   46198768           -19.1%   37355723        unixbench.workload
> >  1.365e+08           -12.5%  1.195e+08 =C2=B1  7%  cpuidle..usage
> >    1220612 =C2=B1  4%     -38.0%     757009 =C2=B1 28%  meminfo.Active
> >    1220354 =C2=B1  4%     -38.0%     756754 =C2=B1 28%  meminfo.Active(=
anon)
> >       0.50 =C2=B1  2%      -0.1        0.45 =C2=B1  4%  mpstat.cpu.all.=
soft%
> >       1.73            -0.2        1.52 =C2=B1  2%  mpstat.cpu.all.usr%
> >     532266           -18.4%     434559        vmstat.system.cs
> >     495826           -12.2%     435455 =C2=B1  8%  vmstat.system.in
> >   1.36e+08           -13.2%   1.18e+08 =C2=B1  9%  turbostat.C1
> >      68.80            +0.8       69.60        turbostat.C1%
> >  1.663e+08           -12.1%  1.462e+08 =C2=B1  8%  turbostat.IRQ
> >      15.54 =C2=B1 20%     -49.0%       7.93 =C2=B1 24%  sched_debug.cfs=
_rq:/.runnable_avg.min
> >      13.26 =C2=B1 19%     -46.6%       7.08 =C2=B1 29%  sched_debug.cfs=
_rq:/.util_avg.min
> >      48.96 =C2=B1  8%     +51.5%      74.20 =C2=B1 13%  sched_debug.cfs=
_rq:/.util_est_enqueued.avg
> >     138.00 =C2=B1  5%     +28.9%     177.87 =C2=B1  7%  sched_debug.cfs=
_rq:/.util_est_enqueued.stddev
> >     228060 =C2=B1  3%     +13.3%     258413 =C2=B1  4%  sched_debug.cpu=
.avg_idle.stddev
> >     432533 =C2=B1  5%     -16.4%     361517 =C2=B1  4%  sched_debug.cpu=
.nr_switches.min
> >  2.665e+08           -18.9%  2.162e+08        numa-numastat.node0.local=
_node
> >  2.666e+08           -18.9%  2.163e+08        numa-numastat.node0.numa_=
hit
> >  2.746e+08           -20.9%  2.172e+08        numa-numastat.node1.local=
_node
> >  2.747e+08           -20.9%  2.172e+08        numa-numastat.node1.numa_=
hit
> >  2.602e+08           -17.4%  2.149e+08        numa-numastat.node2.local=
_node
> >  2.603e+08           -17.4%  2.149e+08        numa-numastat.node2.numa_=
hit
> >  2.423e+08           -15.0%   2.06e+08        numa-numastat.node3.local=
_node
> >  2.424e+08           -15.0%  2.061e+08        numa-numastat.node3.numa_=
hit
>
> So we're going off-node a lot more for ... something.
>
> >  2.666e+08           -18.9%  2.163e+08        numa-vmstat.node0.numa_hi=
t
> >  2.665e+08           -18.9%  2.162e+08        numa-vmstat.node0.numa_lo=
cal
> >  2.747e+08           -20.9%  2.172e+08        numa-vmstat.node1.numa_hi=
t
> >  2.746e+08           -20.9%  2.172e+08        numa-vmstat.node1.numa_lo=
cal
> >  2.603e+08           -17.4%  2.149e+08        numa-vmstat.node2.numa_hi=
t
> >  2.602e+08           -17.4%  2.149e+08        numa-vmstat.node2.numa_lo=
cal
> >  2.424e+08           -15.0%  2.061e+08        numa-vmstat.node3.numa_hi=
t
> >  2.423e+08           -15.0%   2.06e+08        numa-vmstat.node3.numa_lo=
cal
> >     304947 =C2=B1  4%     -38.0%     189144 =C2=B1 28%  proc-vmstat.nr_=
active_anon
>
> Umm.  Are we running vmstat a lot during this test?  The commit says:
>
>     At the
>     moment the readers are either procfs interface, oom_killer and memory
>     reclaim which I think are not performance critical and should be ok w=
ith
>     slow read.  However I think we can make that change in a separate pat=
ch.
>
> This would explain the increased cross-NUMA references (we're going to
> the other nodes to collect the stats), and the general slowdown.  But I
> don't think it reflects a real workload; it's reflecting that the
> monitoring of this workload that we're doing is now more accurate and
> more expensive.
>

Thanks Willy for taking a stab at this issue. The numa_hit stat is
updated on allocations, so I don't think stat collection would increase
these stats. I looked at workload "spawn" in UnixBench and it is a
simple fork ping pong i.e. process does fork and then waits for the
child while the child simply exits.

I ran perf and it seems like percpu counter allocation is the additional
cost with this patch. See the report below. However I made spawn a bit
more sophisticated by adding a mmap() of a GiB then the page table
copy became the significant cost and no difference without or with the
given patch.

I am now wondering if this fork ping pong really an important workload
that we should revert the patch or ignore for now but work on improving
the performance of __alloc_percpu_gfp code.


-   90.97%     0.06%  spawn    [kernel.kallsyms]  [k] entry_SYSCALL_64_afte=
r_hwframe
   - 90.91% entry_SYSCALL_64_after_hwframe
      - 90.86% do_syscall_64
         - 80.03% __x64_sys_clone
            - 79.98% kernel_clone
               - 75.97% copy_process
                  + 46.04% perf_event_init_task
                  - 21.50% copy_mm
                     - 10.05% mm_init
----------------------> - 8.92% __percpu_counter_init
                           - 8.67% __alloc_percpu_gfp
                              - 5.70% pcpu_alloc
                                   1.29% _find_next_bit
                                2.57% memset_erms
                        + 0.96% pgd_alloc
                     + 6.16% copy_page_range
                     + 1.72% anon_vma_fork
                     + 0.87% mas_store
                       0.72% kmem_cache_alloc
                  + 2.71% dup_task_struct
                  + 1.37% perf_event_fork
                    0.63% alloc_pid
                    0.51% copy_files
               + 3.71% wake_up_new_task
         + 7.40% __x64_sys_exit_group
         + 2.32% __x64_sys_wait4
         + 1.03% syscall_exit_to_user_mode