Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Yang Shi <shy828301@gmail.com>
To: "Christoph Lameter (Ampere)" <cl@linux.com>
Cc: Yin Fengwei <fengwei.yin@intel.com>,
	kernel test robot <oliver.sang@intel.com>,
	 Rik van Riel <riel@surriel.com>,
	oe-lkp@lists.linux.dev, lkp@intel.com,
	 Linux Memory Management List <linux-mm@kvack.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	 Matthew Wilcox <willy@infradead.org>,
	ying.huang@intel.com, feng.tang@intel.com
Subject: Re: [linux-next:master] [mm] 1111d46b5c: stress-ng.pthread.ops_per_sec -84.3% regression
Date: Wed, 20 Dec 2023 12:14:09 -0800	[thread overview]
Message-ID: <CAHbLzkrRfOdDx2OTB0kW49FxE+HtntKBihp2Qh-UrKXM7r8C8g@mail.gmail.com> (raw)
In-Reply-To: <edb35574-e8be-adc8-a756-96bcbab2f0af@linux.com>

On Wed, Dec 20, 2023 at 7:42 AM Christoph Lameter (Ampere) <cl@linux.com> wrote:
>
> On Wed, 20 Dec 2023, Yin Fengwei wrote:
>
> >> Interesting, wasn't the same regression seen last time? And I'm a
> >> little bit confused about how pthread got regressed. I didn't see the
> >> pthread benchmark do any intensive memory alloc/free operations. Do
> >> the pthread APIs do any intensive memory operations? I saw the
> >> benchmark does allocate memory for thread stack, but it should be just
> >> 8K per thread, so it should not trigger what this patch does. With
> >> 1024 threads, the thread stacks may get merged into one single VMA (8M
> >> total), but it may do so even though the patch is not applied.
> > stress-ng.pthread test code is strange here:
> >
> > https://github.com/ColinIanKing/stress-ng/blob/master/stress-pthread.c#L573
> >
> > Even it allocates its own stack, but that attr is not passed
> > to pthread_create. So it's still glibc to allocate stack for
> > pthread which is 8M size. This is why this patch can impact
> > the stress-ng.pthread testing.
>
> Hmmm... The use of calloc()  for 8M triggers an mmap I guess.
>
> Why is that memory slower if we align the adress to a 2M boundary? Because
> THP can act faster and creates more overhead?

glibc calls madvise() to free unused stack, that may have higher cost
due to THP (splitting pmd, deferred split queue, etc).

>
> > while this time, the hotspot is in (pmd_lock from do_madvise I suppose):
> >   - 55.02% zap_pmd_range.isra.0
> >      - 53.42% __split_huge_pmd
> >         - 51.74% _raw_spin_lock
> >            - 51.73% native_queued_spin_lock_slowpath
> >               + 3.03% asm_sysvec_call_function
> >         - 1.67% __split_huge_pmd_locked
> >            - 0.87% pmdp_invalidate
> >               + 0.86% flush_tlb_mm_range
> >      - 1.60% zap_pte_range
> >         - 1.04% page_remove_rmap
> >              0.55% __mod_lruvec_page_state
>
> Ok so we have 2M mappings and they are split because of some action on 4K
> segments? Guess because of the guard pages?

It should not relate to guard pages, just due to free unused stack
which may be partial 2M.

>
> >> More time spent in madvise and munmap. but I'm not sure whether this
> >> is caused by tearing down the address space when exiting the test. If
> >> so it should not count in the regression.
> > It's not for the whole address space tearing down. It's for pthread
> > stack tearing down when pthread exit (can be treated as address space
> > tearing down? I suppose so).
> >
> > https://github.com/lattera/glibc/blob/master/nptl/allocatestack.c#L384
> > https://github.com/lattera/glibc/blob/master/nptl/pthread_create.c#L576
> >
> > Another thing is whether it's worthy to make stack use THP? It may be
> > useful for some apps which need large stack size?
>
> No can do since a calloc is used to allocate the stack. How can the kernel
> distinguish the allocation?

Just by VM_GROWSDOWN | VM_GROWSUP. The user space needs to tell kernel
this area is stack by setting proper flags. For example,

ffffca1df000-ffffca200000 rw-p 00000000 00:00 0                          [stack]
Size:                132 kB
KernelPageSize:        4 kB
MMUPageSize:           4 kB
Rss:                  60 kB
Pss:                  60 kB
Pss_Dirty:            60 kB
Shared_Clean:          0 kB
Shared_Dirty:          0 kB
Private_Clean:         0 kB
Private_Dirty:        60 kB
Referenced:           60 kB
Anonymous:            60 kB
LazyFree:              0 kB
AnonHugePages:         0 kB
ShmemPmdMapped:        0 kB
FilePmdMapped:         0 kB
Shared_Hugetlb:        0 kB
Private_Hugetlb:       0 kB
Swap:                  0 kB
SwapPss:               0 kB
Locked:                0 kB
THPeligible:    0
VmFlags: rd wr mr mw me gd ac

The "gd" flag means GROWSDOWN. But it totally depends on glibc in
terms of how it considers about "stack". So glibc just uses calloc()
to allocate stack area.

>
>

next prev parent reply	other threads:[~2023-12-20 20:14 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-19 15:41 kernel test robot
2023-12-20  5:27 ` Yang Shi
2023-12-20  8:29   ` Yin Fengwei
2023-12-20 15:42     ` Christoph Lameter (Ampere)
2023-12-20 20:14       ` Yang Shi [this message]
2023-12-20 20:09     ` Yang Shi
2023-12-21  0:26       ` Yang Shi
2023-12-21  0:58         ` Yin Fengwei
2023-12-21  1:02           ` Yin Fengwei
2023-12-21  4:49           ` Matthew Wilcox
2023-12-21  4:58             ` Yin Fengwei
2023-12-21 18:07             ` Yang Shi
2023-12-21 18:14               ` Matthew Wilcox
2023-12-22  1:06                 ` Yin, Fengwei
2023-12-22  2:23                   ` Huang, Ying
2023-12-21 13:39           ` Yin, Fengwei
2023-12-21 18:11             ` Yang Shi
2023-12-22  1:13               ` Yin, Fengwei
2024-01-04  1:32                 ` Yang Shi
2024-01-04  8:18                   ` Yin Fengwei
2024-01-04  8:39                     ` Oliver Sang
2024-01-05  9:29                       ` Oliver Sang
2024-01-05 14:52                         ` Yin, Fengwei
2024-01-05 18:49                         ` Yang Shi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAHbLzkrRfOdDx2OTB0kW49FxE+HtntKBihp2Qh-UrKXM7r8C8g@mail.gmail.com \
    --to=shy828301@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=cl@linux.com \
    --cc=feng.tang@intel.com \
    --cc=fengwei.yin@intel.com \
    --cc=linux-mm@kvack.org \
    --cc=lkp@intel.com \
    --cc=oe-lkp@lists.linux.dev \
    --cc=oliver.sang@intel.com \
    --cc=riel@surriel.com \
    --cc=willy@infradead.org \
    --cc=ying.huang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox