linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Hugepage program taking forever to exit
@ 2024-09-10 18:21 Jens Axboe
  2024-09-10 19:33 ` Johannes Weiner
  2024-09-10 20:17 ` Yu Zhao
  0 siblings, 2 replies; 9+ messages in thread
From: Jens Axboe @ 2024-09-10 18:21 UTC (permalink / raw)
  To: Linux-MM; +Cc: Yu Zhao, Johannes Weiner, Andrew Morton, Muchun Song

Hi,

Investigating another issue, I wrote the following simple program that allocates
and faults in 500 1GB huge pages, and then registers them with io_uring. Each
step is timed:

Got 500 huge pages (each 1024MB) in 0 msec
Faulted in 500 huge pages in 38632 msec
Registered 500 pages in 867 msec

and as expected, faulting in the pages takes (by far) the longest. From
the above, you'd also expect the total runtime to be around ~39 seconds.
But it is not... In fact it takes 82 seconds in total for this program
to have exited. Looking at why, I see:

[<0>] __wait_rcu_gp+0x12b/0x160
[<0>] synchronize_rcu_normal.part.0+0x2a/0x30
[<0>] hugetlb_vmemmap_restore_folios+0x22/0xe0
[<0>] update_and_free_pages_bulk+0x4c/0x220
[<0>] return_unused_surplus_pages+0x80/0xa0
[<0>] hugetlb_acct_memory.part.0+0x2dd/0x3b0
[<0>] hugetlb_vm_op_close+0x160/0x180
[<0>] remove_vma+0x20/0x60
[<0>] exit_mmap+0x199/0x340
[<0>] mmput+0x49/0x110
[<0>] do_exit+0x261/0x9b0
[<0>] do_group_exit+0x2c/0x80
[<0>] __x64_sys_exit_group+0x14/0x20
[<0>] x64_sys_call+0x714/0x720
[<0>] do_syscall_64+0x5b/0x160
[<0>] entry_SYSCALL_64_after_hwframe+0x4b/0x53

and yes, it does look like the program is mostly idle for most of the
time while returning these huge pages. It's also telling us exactly why
we're just sitting idle - RCU grace period.

The below quick change means the runtime of the program is pretty much
just the time it takes to execute the parts of it, as you can see from
the full output after the change:

axboe@r7525 ~> time sudo ./reg-huge
Got 500 huge pages (each 1024MB) in 0 msec
Faulted in 500 huge pages in 38632 msec
Registered 500 pages in 867 msec

________________________________________________________
Executed in   39.53 secs      fish           external
   usr time    4.88 millis  238.00 micros    4.64 millis
   sys time    0.00 millis    0.00 micros    0.00 millis

where 38632+876 == 39.51s.

Looks like this was introduced by:

commit bd225530a4c717714722c3731442b78954c765b3
Author: Yu Zhao <yuzhao@google.com>
Date:   Thu Jun 27 16:27:05 2024 -0600

    mm/hugetlb_vmemmap: fix race with speculative PFN walkers


diff --git a/mm/hugetlb_vmemmap.c b/mm/hugetlb_vmemmap.c
index 0c3f56b3578e..95f6ad8f8232 100644
--- a/mm/hugetlb_vmemmap.c
+++ b/mm/hugetlb_vmemmap.c
@@ -517,7 +517,7 @@ long hugetlb_vmemmap_restore_folios(const struct hstate *h,
 	long ret = 0;
 
 	/* avoid writes from page_ref_add_unless() while unfolding vmemmap */
-	synchronize_rcu();
+	synchronize_rcu_expedited();
 
 	list_for_each_entry_safe(folio, t_folio, folio_list, lru) {
 		if (folio_test_hugetlb_vmemmap_optimized(folio)) {


-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-09-11 22:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-09-10 18:21 Hugepage program taking forever to exit Jens Axboe
2024-09-10 19:33 ` Johannes Weiner
2024-09-10 20:17 ` Yu Zhao
2024-09-10 23:08   ` Jens Axboe
2024-09-11  3:42     ` Andrew Morton
2024-09-11 13:22       ` Jens Axboe
2024-09-11 16:23         ` Yu Zhao
2024-09-11 18:38         ` Andrew Morton
2024-09-11 22:08           ` Yu Zhao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox