Re: [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: "Huang, Ying" <ying.huang@intel.com>
To: kernel test robot <yujie.liu@intel.com>
Cc: Rik van Riel <riel@surriel.com>,  <lkp@lists.01.org>,
	 <lkp@intel.com>, Andrew Morton <akpm@linux-foundation.org>,
	 Yang Shi <shy828301@gmail.com>,
	 Matthew Wilcox <willy@infradead.org>,
	<linux-kernel@vger.kernel.org>,  <linux-mm@kvack.org>,
	<feng.tang@intel.com>,  <zhengjun.xing@linux.intel.com>,
	<fengwei.yin@intel.com>
Subject: Re: [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression
Date: Wed, 19 Oct 2022 10:05:50 +0800	[thread overview]
Message-ID: <87edv4r2ip.fsf@yhuang6-desk2.ccr.corp.intel.com> (raw)
In-Reply-To: <202210181535.7144dd15-yujie.liu@intel.com> (kernel test robot's message of "Tue, 18 Oct 2022 16:44:59 +0800")

Hi, Yujie,

>      32528  48%    +147.6%      80547  38%  numa-meminfo.node0.AnonHugePages
>      92821  23%     +59.3%     147839  28%  numa-meminfo.node0.AnonPages

The Anon pages allocated are much more than the parent commit.  This is
expected, because THP instead of normal page will be allocated for
aligned memory area.

>      95.23           -79.8       15.41   6%  perf-profile.calltrace.cycles-pp.__munmap
>      95.08           -79.7       15.40   6%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap
>      95.02           -79.6       15.39   6%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      94.96           -79.6       15.37   6%  perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      94.95           -79.6       15.37   6%  perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap
>      94.86           -79.5       15.35   6%  perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe
>      94.38           -79.2       15.22   6%  perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64
>      42.74           -42.7        0.00        perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap
>      42.74           -42.7        0.00        perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap
>      42.72           -42.7        0.00        perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap
>      41.84           -41.8        0.00        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region
>      41.70           -41.7        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain
>      41.62           -41.6        0.00        perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
>      41.55           -41.6        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu
>      41.52           -41.5        0.00        perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
>      41.28           -41.3        0.00        perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush

In the parent commit, most CPU cycles are used for contention on LRU lock.

>       0.00            +4.8        4.82   7%  perf-profile.calltrace.cycles-pp.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault
>       0.00            +4.9        4.88   7%  perf-profile.calltrace.cycles-pp.zap_huge_pmd.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region
>       0.00            +8.2        8.22   8%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist
>       0.00            +8.2        8.23   8%  perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages
>       0.00            +8.3        8.35   8%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages
>       0.00            +8.3        8.35   8%  perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush
>       0.00            +8.4        8.37   8%  perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu
>       0.00            +9.6        9.60   6%  perf-profile.calltrace.cycles-pp.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region
>       0.00           +65.5       65.48   2%  perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault
>       0.00           +72.5       72.51   2%  perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault

With the commit, most CPU cycles are consumed for clear huge page.  This
is expected.  We allocate more pages, so, we need more cycles to clear
them.

Check the source code of test case (will-it-scale/malloc1), I found that
it will allocate some memory with malloc() then free it.

In the parent commit, because the virtual memory address isn't aligned
with 2M, normal page will be allocated.  With the commit, THP will be
allocated, so more page clearing and less LRU lock contention.  I think
this is the expected behavior of the commit.  And the test case isn't so
popular (malloc() then free() but don't access the memory allocated).  So
this regression isn't important.  We can just ignore it.

Best Regards,
Huang, Ying

next prev parent reply	other threads:[~2022-10-19  2:06 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-18  8:44 kernel test robot
2022-10-19  2:05 ` Huang, Ying [this message]
2022-10-20  4:23   ` Nathan Chancellor
2022-10-20  5:07     ` Huang, Ying
2022-10-20 15:28       ` Rik van Riel
2022-10-20 17:16         ` Nathan Chancellor
2022-11-28  6:40           ` Nathan Chancellor
2022-12-01 18:33             ` Thorsten Leemhuis
2022-12-01 20:29               ` Rik van Riel
2022-12-01 21:22                 ` Andrew Morton
2022-12-01 21:44                   ` Yang Shi
2022-12-02  8:46                   ` Thorsten Leemhuis
2022-12-02 18:44                     ` Andrew Morton
2022-12-02 19:37                       ` Thorsten Leemhuis
2022-12-01 21:35                 ` Nathan Chancellor
2022-12-16 11:48                 ` Yin, Fengwei
2022-10-20 16:40       ` Yujie Liu
2022-11-29  8:59     ` [mm] f35b5d7d67: will-it-scale.per_process_ops -95.5% regression #forregzbot Thorsten Leemhuis
2022-12-02  6:43       ` Thorsten Leemhuis

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87edv4r2ip.fsf@yhuang6-desk2.ccr.corp.intel.com \
    --to=ying.huang@intel.com \
    --cc=akpm@linux-foundation.org \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=riel@surriel.com \
    --cc=shy828301@gmail.com \
    --cc=willy@infradead.org \
    --cc=yujie.liu@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox