linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] mm/thp: check and bail out if page in deferred queue already
@ 2022-12-23 13:52 Yin Fengwei
  2022-12-27 20:15 ` Nathan Chancellor
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Yin Fengwei @ 2022-12-23 13:52 UTC (permalink / raw)
  To: riel, nathan, akpm, shy828301, willy, kirill.shutemov,
	ying.huang, feng.tang, zhengjun.xing, linux-mm
  Cc: fengwei.yin

Kernel build regression with LLVM was reported here:
https://lore.kernel.org/all/Y1GCYXGtEVZbcv%2F5@dev-arch.thelio-3990X/
with commit f35b5d7d676e ("mm: align larger anonymous mappings on THP
boundaries"). And the commit f35b5d7d676e was reverted.

It turned out the regression is related with madvise(MADV_DONTNEED)
was used by ld.lld. But with none PMD_SIZE aligned parameter len.
trace-bpfcc captured:
531607  531732  ld.lld          do_madvise.part.0 start: 0x7feca9000000, len: 0x7fb000, behavior: 0x4
531607  531793  ld.lld          do_madvise.part.0 start: 0x7fec86a00000, len: 0x7fb000, behavior: 0x4

If the underneath physical page is THP, the madvise(MADV_DONTNNED) can
trigger split_queue_lock contention raised significantly. perf showed
following data:
    14.85%     0.00%  ld.lld           [kernel.kallsyms]           [k]
       entry_SYSCALL_64_after_hwframe
           11.52%
                entry_SYSCALL_64_after_hwframe
                do_syscall_64
                __x64_sys_madvise
                do_madvise.part.0
                zap_page_range
                unmap_single_vma
                unmap_page_range
                page_remove_rmap
                deferred_split_huge_page
                __lock_text_start
                native_queued_spin_lock_slowpath

If THP can't be removed from rmap as whole THP, partial THP will be
removed from rmap by removing sub-pages from rmap. Even the THP
head page is added to deferred queue already, the split_queue_lock
will be acquired and check whether the THP head page is in the queue
already. Thus, the contention of split_queue_lock is raised.

Before acquire split_queue_lock, check and bail out early if the THP
head page is in the queue already. The checking without holding
split_queue_lock could race with deferred_split_scan, but it doesn't
impact the correctness here.

Test result of building kernel with ld.lld:
commit 7b5a0b664ebe (parent commit of f35b5d7d676e):
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
        6:07.99 real,   26367.77 user,  5063.35 sys

commit f35b5d7d676e:
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
        7:22.15 real,   26235.03 user,  12504.55 sys

commit f35b5d7d676e with the fixing patch:
time -f "\t%E real,\t%U user,\t%S sys" make LD=ld.lld -skj96 allmodconfig all
        6:08.49 real,   26520.15 user,  5047.91 sys

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
My first thought was to change the per node deferred queue to per cpu.
It's complicated and may be overkill.

For the race without lock acquired, I didn't see obvious issue here. But I
could miss something here. Let me know if I did. Thanks.


 mm/huge_memory.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index abe6cfd92ffa..7cde9f702e63 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -2837,6 +2837,9 @@ void deferred_split_huge_page(struct page *page)
 	if (PageSwapCache(page))
 		return;
 
+	if (!list_empty(page_deferred_list(page)))
+		return;
+
 	spin_lock_irqsave(&ds_queue->split_queue_lock, flags);
 	if (list_empty(page_deferred_list(page))) {
 		count_vm_event(THP_DEFERRED_SPLIT_PAGE);

base-commit: 8395ae05cb5a2e31d36106e8c85efa11cda849be
-- 
2.34.1



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-12-29  1:29 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-23 13:52 [RFC PATCH] mm/thp: check and bail out if page in deferred queue already Yin Fengwei
2022-12-27 20:15 ` Nathan Chancellor
2022-12-29  1:14   ` Yin, Fengwei
2022-12-28  1:24 ` David Rientjes
2022-12-29  1:15   ` Yin, Fengwei
2022-12-28  2:06 ` Huang, Ying
2022-12-29  1:15   ` Yin, Fengwei
2022-12-28 23:33 ` Andrew Morton
2022-12-29  1:29   ` Yin, Fengwei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox