From: "Yin, Fengwei" <fengwei.yin@intel.com>
To: Nathan Chancellor <nathan@kernel.org>
Cc: <riel@surriel.com>, <akpm@linux-foundation.org>,
<shy828301@gmail.com>, <willy@infradead.org>,
<kirill.shutemov@intel.com>, <ying.huang@intel.com>,
<feng.tang@intel.com>, <zhengjun.xing@linux.intel.com>,
<linux-mm@kvack.org>
Subject: Re: [RFC PATCH] mm/thp: check and bail out if page in deferred queue already
Date: Thu, 29 Dec 2022 09:14:28 +0800 [thread overview]
Message-ID: <f4cdc9fa-b2fe-13aa-29a1-2d6944635802@intel.com> (raw)
In-Reply-To: <Y6tSUutBqeXeOrcR@dev-arch.thelio-3990X>
On 12/28/2022 4:15 AM, Nathan Chancellor wrote:
> On Fri, Dec 23, 2022 at 09:52:07PM +0800, Yin Fengwei wrote:
>> Kernel build regression with LLVM was reported here:
>> https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/
>> with commit f35b5d7d676e ("mm: align larger anonymous mappings on THP
>> boundaries"). And the commit f35b5d7d676e was reverted.
>>
>> It turned out the regression is related with madvise(MADV_DONTNEED)
>> was used by ld.lld. But with none PMD_SIZE aligned parameter len.
>> trace-bpfcc captured:
>> 531607 531732 ld.lld do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4
>> 531607 531793 ld.lld do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4
>>
>> If the underneath physical page is THP, the madvise(MADV_DONTNNED) can
>> trigger split_queue_lock contention raised significantly. perf showed
>> following data:
>> 14.85% 0.00% ld.lld [kernel.kallsyms] [k]
>> entry_SYSCALL_64_after_hwframe
>> 11.52%
>> entry_SYSCALL_64_after_hwframe
>> do_syscall_64
>> __x64_sys_madvise
>> do_madvise.part.0
>> zap_page_range
>> unmap_single_vma
>> unmap_page_range
>> page_remove_rmap
>> deferred_split_huge_page
>> __lock_text_start
>> native_queued_spin_lock_slowpath
>>
>> If THP can't be removed from rmap as whole THP, partial THP will be
>> removed from rmap by removing sub-pages from rmap. Even the THP
>> head page is added to deferred queue already, the split_queue_lock
>> will be acquired and check whether the THP head page is in the queue
>> already. Thus, the contention of split_queue_lock is raised.
>>
>> Before acquire split_queue_lock, check and bail out early if the THP
>> head page is in the queue already. The checking without holding
>> split_queue_lock could race with deferred_split_scan, but it doesn't
>> impact the correctness here.
>>
>> Test result of building kernel with ld.lld:
>> commit 7b5a0b664ebe (parent commit of f35b5d7d676e):
>> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
>> 6:07.99 real, 26367.77 user, 5063.35 sys
>>
>> commit f35b5d7d676e:
>> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
>> 7:22.15 real, 26235.03 user, 12504.55 sys
>>
>> commit f35b5d7d676e with the fixing patch:
>> time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
>> 6:08.49 real, 26520.15 user, 5047.91 sys
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>
> I cannot say whether or not this is a good idea or not but it does
> resolve the regression I reported:
>
> Benchmark 1: x86_64 allmodconfig (GCC + ld.lld) @ 1b929c02afd3 ("Linux 6.2-rc1") on 6.0.0-rc3-debug-00016-g7b5a0b664ebe
> Time (mean ± σ): 383.003 s ± 0.680 s [User: 34737.850 s, System: 7287.079 s]
> Range (min … max): 382.218 s … 383.413 s 3 runs
>
> Benchmark 1: x86_64 allmodconfig (GCC + ld.lld) @ 1b929c02afd3 ("Linux 6.2-rc1") on 6.0.0-rc3-debug-00017-gf35b5d7d676e
> Time (mean ± σ): 437.886 s ± 1.030 s [User: 35888.658 s, System: 14048.871 s]
> Range (min … max): 436.865 s … 438.924 s 3 runs
>
> Benchmark 1: x86_64 allmodconfig (GCC + ld.lld) @ 1b929c02afd3 ("Linux 6.2-rc1") on 6.0.0-rc3-debug-00017-gf35b5d7d676e-dirty
> Time (mean ± σ): 384.371 s ± 1.004 s [User: 35402.880 s, System: 6401.691 s]
> Range (min … max): 383.547 s … 385.489 s 3 runs
>
> Tested-by: Nathan Chancellor <nathan@kernel.org>
Thanks for testing the patch.
Regards
Yin, Fengwei
next prev parent reply other threads:[~2022-12-29 1:14 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-23 13:52 Yin Fengwei
2022-12-27 20:15 ` Nathan Chancellor
2022-12-29 1:14 ` Yin, Fengwei [this message]
2022-12-28 1:24 ` David Rientjes
2022-12-29 1:15 ` Yin, Fengwei
2022-12-28 2:06 ` Huang, Ying
2022-12-29 1:15 ` Yin, Fengwei
2022-12-28 23:33 ` Andrew Morton
2022-12-29 1:29 ` Yin, Fengwei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f4cdc9fa-b2fe-13aa-29a1-2d6944635802@intel.com \
--to=fengwei.yin@intel.com \
--cc=akpm@linux-foundation.org \
--cc=feng.tang@intel.com \
--cc=kirill.shutemov@intel.com \
--cc=linux-mm@kvack.org \
--cc=nathan@kernel.org \
--cc=riel@surriel.com \
--cc=shy828301@gmail.com \
--cc=willy@infradead.org \
--cc=ying.huang@intel.com \
--cc=zhengjun.xing@linux.intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox