linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <lorenzo.stoakes@oracle.com>
To: kernel test robot <oliver.sang@intel.com>
Cc: oe-lkp@lists.linux.dev, lkp@intel.com,
	linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mark Brown <broonie@kernel.org>,
	"Liam R. Howlett" <Liam.Howlett@oracle.com>,
	Vlastimil Babka <vbabka@suse.cz>,
	Bert Karwatzki <spasswolf@web.de>, Jeff Xu <jeffxu@chromium.org>,
	Jiri Olsa <olsajiri@gmail.com>, Kees Cook <kees@kernel.org>,
	Lorenzo Stoakes <lstoakes@gmail.com>,
	Matthew Wilcox <willy@infradead.org>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	Paul Moore <paul@paul-moore.com>,
	Sidhartha Kumar <sidhartha.kumar@oracle.com>,
	Suren Baghdasaryan <surenb@google.com>,
	linux-mm@kvack.org, ying.huang@intel.com, feng.tang@intel.com,
	fengwei.yin@intel.com
Subject: Re: [linus:master] [mm]  cacded5e42:  aim9.brk_test.ops_per_sec -5.0% regression
Date: Mon, 30 Sep 2024 09:21:52 +0100	[thread overview]
Message-ID: <77321196-5812-4e5b-be4c-20930e6f22bc@lucifer.local> (raw)
In-Reply-To: <202409301043.629bea78-oliver.sang@intel.com>

On Mon, Sep 30, 2024 at 10:21:27AM GMT, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a -5.0% regression of aim9.brk_test.ops_per_sec on:
>
>
> commit: cacded5e42b9609b07b22d80c10f0076d439f7d1 ("mm: avoid using vma_merge() for new VMAs")
> https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master
>
> testcase: aim9
> test machine: 48 threads 2 sockets Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz (Ivy Bridge-EP) with 64G memory

Hm, quite an old microarchitecture no?

Would it be possible to try this on a range of uarch's, especially more
recent noes, with some repeated runs to rule out statistical noise? Much
appreciated!

> parameters:
>
> 	testtime: 300s
> 	test: brk_test
> 	cpufreq_governor: performance
>
>
>
>
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Closes: https://lore.kernel.org/oe-lkp/202409301043.629bea78-oliver.sang@intel.com
>
>
> Details are as below:
> -------------------------------------------------------------------------------------------------->
>
>
> The kernel config and materials to reproduce are available at:
> https://download.01.org/0day-ci/archive/20240930/202409301043.629bea78-oliver.sang@intel.com
>
> =========================================================================================
> compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
>   gcc-12/performance/x86_64-rhel-8.3/debian-12-x86_64-20240206.cgz/lkp-ivb-2ep2/brk_test/aim9/300s
>
> commit:
>   fc21959f74 ("mm: abstract vma_expand() to use vma_merge_struct")
>   cacded5e42 ("mm: avoid using vma_merge() for new VMAs")

Yup this results in a different code path for brk(), but local testing
indicated no regression (a prior revision of the series had encountered
one, so I carefully assessed this, found the bug, and noted no clear
regression after this - but a lot of variance in the numbers).

>
> fc21959f74bc1138 cacded5e42b9609b07b22d80c10
> ---------------- ---------------------------
>          %stddev     %change         %stddev
>              \          |                \
>    1322908            -5.0%    1256536        aim9.brk_test.ops_per_sec

Unfortunate there's no stddev figure here, and 5% feels borderline on noise
- as above it'd be great to get some multiple runs going to rule out
noise. Thanks!

>     201.54            +2.9%     207.44        aim9.time.system_time
>      97.58            -6.0%      91.75        aim9.time.user_time
>       0.04 ± 82%    -100.0%       0.00        perf-sched.sch_delay.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.10 ± 60%    -100.0%       0.00        perf-sched.sch_delay.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.04 ± 82%    -100.0%       0.00        perf-sched.wait_time.avg.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.10 ± 60%    -100.0%       0.00        perf-sched.wait_time.max.ms.__cond_resched.down_write.do_brk_flags.__do_sys_brk.do_syscall_64
>   8.33e+08            +3.9%  8.654e+08        perf-stat.i.branch-instructions
>       1.15            -0.1        1.09        perf-stat.i.branch-miss-rate%
>   12964626            -1.9%   12711922        perf-stat.i.branch-misses
>       1.11            -7.4%       1.03        perf-stat.i.cpi
>  3.943e+09            +6.0%   4.18e+09        perf-stat.i.instructions
>       0.91            +7.9%       0.98        perf-stat.i.ipc
>       0.29 ±  2%      -9.1%       0.27 ±  4%  perf-stat.overall.MPKI
>       1.56            -0.1        1.47        perf-stat.overall.branch-miss-rate%
>       1.08            -6.8%       1.01        perf-stat.overall.cpi
>       0.92            +7.2%       0.99        perf-stat.overall.ipc
>  8.303e+08            +3.9%  8.627e+08        perf-stat.ps.branch-instructions
>   12931205            -2.0%   12678170        perf-stat.ps.branch-misses
>   3.93e+09            +6.0%  4.167e+09        perf-stat.ps.instructions
>  1.184e+12            +6.1%  1.256e+12        perf-stat.total.instructions
>       7.16 ±  2%      -0.4        6.76 ±  4%  perf-profile.calltrace.cycles-pp.entry_SYSRETQ_unsafe_stack.brk
>       5.72 ±  2%      -0.4        5.35 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap_event.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64
>       6.13 ±  2%      -0.3        5.84 ±  3%  perf-profile.calltrace.cycles-pp.perf_event_mmap.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.83 ± 11%      -0.1        0.71 ±  5%  perf-profile.calltrace.cycles-pp.__vm_enough_memory.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       0.00            +0.6        0.58 ±  5%  perf-profile.calltrace.cycles-pp.mas_leaf_max_gap.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range
>      16.73 ±  2%      +0.6       17.34        perf-profile.calltrace.cycles-pp.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.7        0.66 ±  6%  perf-profile.calltrace.cycles-pp.mas_wr_store_type.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags
>      24.21            +0.7       24.90        perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>      23.33            +0.7       24.05 ±  2%  perf-profile.calltrace.cycles-pp.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe.brk
>       0.00            +0.8        0.82 ±  4%  perf-profile.calltrace.cycles-pp.vma_complete.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +0.9        0.87 ±  5%  perf-profile.calltrace.cycles-pp.mas_update_gap.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags
>       0.00            +1.1        1.07 ±  9%  perf-profile.calltrace.cycles-pp.vma_prepare.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +1.1        1.10 ±  6%  perf-profile.calltrace.cycles-pp.mas_preallocate.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +2.3        2.26 ±  5%  perf-profile.calltrace.cycles-pp.mas_store_prealloc.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk
>       0.00            +7.6        7.56 ±  3%  perf-profile.calltrace.cycles-pp.vma_expand.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64
>       0.00            +8.6        8.62 ±  4%  perf-profile.calltrace.cycles-pp.vma_merge_new_range.do_brk_flags.__do_sys_brk.do_syscall_64.entry_SYSCALL_64_after_hwframe
>       7.74 ±  2%      -0.4        7.30 ±  4%  perf-profile.children.cycles-pp.entry_SYSRETQ_unsafe_stack
>       5.81 ±  2%      -0.4        5.43 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap_event
>       6.18 ±  2%      -0.3        5.88 ±  3%  perf-profile.children.cycles-pp.perf_event_mmap
>       3.93            -0.2        3.73 ±  3%  perf-profile.children.cycles-pp.perf_iterate_sb
>       0.22 ± 29%      -0.1        0.08 ± 17%  perf-profile.children.cycles-pp.may_expand_vm
>       0.96 ±  3%      -0.1        0.83 ±  4%  perf-profile.children.cycles-pp.vma_complete
>       0.61 ± 14%      -0.1        0.52 ±  7%  perf-profile.children.cycles-pp.percpu_counter_add_batch
>       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.children.cycles-pp.brk_test
>       0.08 ± 11%      +0.0        0.12 ± 14%  perf-profile.children.cycles-pp.mas_prev_setup
>       0.17 ± 12%      +0.1        0.27 ± 10%  perf-profile.children.cycles-pp.mas_wr_store_entry
>       0.00            +0.2        0.15 ± 11%  perf-profile.children.cycles-pp.mas_next_range
>       0.19 ±  8%      +0.2        0.38 ± 10%  perf-profile.children.cycles-pp.mas_next_slot
>       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.children.cycles-pp.mas_prev_slot
>      23.40            +0.7       24.12 ±  2%  perf-profile.children.cycles-pp.__do_sys_brk
>       0.00            +7.6        7.59 ±  3%  perf-profile.children.cycles-pp.vma_expand
>       0.00            +8.7        8.66 ±  4%  perf-profile.children.cycles-pp.vma_merge_new_range
>       1.61 ± 10%      -0.9        0.69 ±  8%  perf-profile.self.cycles-pp.do_brk_flags
>       7.64 ±  2%      -0.4        7.20 ±  4%  perf-profile.self.cycles-pp.entry_SYSRETQ_unsafe_stack
>       0.22 ± 30%      -0.1        0.08 ± 17%  perf-profile.self.cycles-pp.may_expand_vm
>       0.57 ± 15%      -0.1        0.46 ±  6%  perf-profile.self.cycles-pp.percpu_counter_add_batch
>       0.15 ±  7%      -0.1        0.08 ± 20%  perf-profile.self.cycles-pp.brk_test
>       0.20 ±  5%      -0.0        0.18 ±  4%  perf-profile.self.cycles-pp.anon_vma_interval_tree_insert
>       0.07 ± 18%      +0.0        0.10 ± 18%  perf-profile.self.cycles-pp.mas_prev_setup
>       0.00            +0.1        0.09 ± 12%  perf-profile.self.cycles-pp.mas_next_range
>       0.36 ±  8%      +0.1        0.45 ±  6%  perf-profile.self.cycles-pp.perf_event_mmap
>       0.15 ± 13%      +0.1        0.25 ± 14%  perf-profile.self.cycles-pp.mas_wr_store_entry
>       0.17 ± 11%      +0.2        0.37 ± 11%  perf-profile.self.cycles-pp.mas_next_slot
>       0.34 ± 17%      +0.3        0.64 ±  6%  perf-profile.self.cycles-pp.mas_prev_slot
>       0.00            +0.3        0.33 ±  5%  perf-profile.self.cycles-pp.vma_merge_new_range
>       0.00            +0.8        0.81 ±  9%  perf-profile.self.cycles-pp.vma_expand
>
>
>
>
> Disclaimer:
> Results have been estimated based on internal Intel analysis and are provided
> for informational purposes only. Any difference in system hardware or software
> design or configuration may affect actual performance.
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests/wiki
>

Overall, previously we special-cased brk() to avoid regression, but the
special-casing is horribly duplicative and bug-prone so, while we can
revert to doing that again, I'd really, really like to avoid it if we
possibly can :)


  reply	other threads:[~2024-09-30  8:22 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-30  2:21 kernel test robot
2024-09-30  8:21 ` Lorenzo Stoakes [this message]
2024-10-08  8:31   ` Oliver Sang
2024-10-08  8:44     ` Lorenzo Stoakes
2024-10-09  6:44       ` Oliver Sang
2024-10-09  9:52         ` Lorenzo Stoakes
2024-10-09 21:24         ` Lorenzo Stoakes
2024-10-11  2:46           ` Oliver Sang
2024-10-11  7:26             ` Lorenzo Stoakes
2024-10-15 19:56               ` Lorenzo Stoakes
2024-10-17  2:58                 ` Oliver Sang
2024-10-17  8:54                   ` Lorenzo Stoakes
2024-10-18  0:34                     ` Oliver Sang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77321196-5812-4e5b-be4c-20930e6f22bc@lucifer.local \
    --to=lorenzo.stoakes@oracle.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=broonie@kernel.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=jeffxu@chromium.org \
    --cc=kees@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=lstoakes@gmail.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=olsajiri@gmail.com \
    --cc=paul@paul-moore.com \
    --cc=paulmck@kernel.org \
    --cc=sidhartha.kumar@oracle.com \
    --cc=spasswolf@web.de \
    --cc=surenb@google.com \
    --cc=vbabka@suse.cz \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox