* Re: Very slow page fault caused by migrate_pages
2021-05-16 7:39 ` Very slow page fault caused by migrate_pages 胡玮文
@ 2021-05-16 8:56 ` Hillf Danton
0 siblings, 0 replies; 2+ messages in thread
From: Hillf Danton @ 2021-05-16 8:56 UTC (permalink / raw)
To: Hu Weiwen; +Cc: Hillf Danton, linux-kernel, linux-mm
On Sun, 16 May 2021 15:39:56 +0800 Hu Weiwen wrote:
>On Sun, May 16, 2021 at 10:50:35AM +0800, Hillf Danton wrote:
>> On Sun, 16 May 2021 01:01:04 +0800 Hu Weiwen wrote:
>> > Hi all,
>> >
>> > We observed our program sometimes running over 1000x slower than normal and
>> > cannot find the root cause. I would be grateful if someone can give me a hint. I
>> > also searched on this mailing list but failed to find anything related.
>> >
>> > The affected process spend almost 100% CPU time in kernel. First I ran `perf' to
>> > identify where times are spent. It turns out 99.93% of the time is spent on
>> > do_page_fault, and 98.16% on migrate_pages. And `perf record -e faults -t <tid>`
>> > show 3 seconds per fault, which seems to much to me.
>> >
>> > Then I do `perf record -e compaction:*`, and see 12k
>> > compaction:mm_compaction_migratepages events per second. almost all of them has
>> > nr_migrated=0 and nr_failed greater than 0. So it seems a lot of migration
>> > failure is happening. But I cannot understand what causes it. The more
>> > complete perf output is attached below.
>> >
>> > This system is used to run machine-learning training programs,
>> > with 2 Intel Xeon CPU, 512GB RAM and 8 NVIDIA GPU attached,
>> > running Ubuntu 20.04, with kernel version 5.4.0-72-generic.
>>
>> Given the do_huge_pmd_anonymous_page in your perf report below, what you
>> need is likely commit facdaa917c4d ("mm: proactive compaction").
>>
>> You need to port it back to the linux-5.4 powering your box if there is
>> anying that prevents you from using the mainline kernel. Feel free to
>> report again if it does not work for you.
>
>Thank you for your reply. Will try it. But it is hard to say whether it is
>working. We can only reproduce this about once per month on 10+ similar boxes.
>
>But I think things still not adding up.
>
>The last time I see this, I left it running in this state for 6 hours. But it
>still does not get better. If it is just fragmented memory that caused too much
>compaction on huge page allocation, I'd expect it to solve itself in minutes.
Sounds good given "367GB of RAM is available when we observe the problem."
(free -g illustrates the available RAM.)
>Commit message of facdaa917c4d reported 700GB memory being allocated in 27
>minutes without that patch, but we hardly have any progress in 6 hours.
>
>Also, Commit message of facdaa917c4d reported at most 33ms latency on allocation
>without that patch, but we see 3 seconds per page fault.
>
>Over 99% of pages passed to `migrate_pages' are failed to migrate. This is
>concerning. It might relate to our issue. I plan to investigate more on this
>when I see the issue next time. Any tips on how to find out why migrate failed
>are appreciated.
No idea of such tips if they exist instead of migrate.c.
If d479960e44f2 ("mm: disable LRU pagevec during the migration temporarily")
is not a cure then try to see which commit works with GPU module unloaded.
>
>> > Load average is 13, and 367GB of RAM is available when
>> > we observe the problem this time. We have observed the very same problem on
>> > multiple nodes, running different programs, with different models of CPU, GPU.
>> >
>> > Currently, only reboot the system can resolve this. When this happens, all
>> > process on the system is affected, and it does not resolve if I restart the
>> > affected process.
>> >
>> > Thanks,
>> > Hu Weiwen
>> > SCUT student
>> >
>> >
>> > `perf report` output, collected with `sudo perf record -F 99 --pid 2413158 -g -- sleep 10`:
>> >
>> > Samples: 993 of event 'cycles', Event count (approx.): 36810932529
>> > Children Self Command Shared Object Symbol
>> > - 99.93% 0.00% python libc-2.31.so [.] 0x00007ff5fcc78984
>> > 0x7ff5fcc78984
>> > page_fault
>> > do_page_fault
>> > __do_page_fault
>> > do_user_addr_fault
>> > handle_mm_fault
>> > __handle_mm_fault
>> > do_huge_pmd_anonymous_page
>> > alloc_pages_vma
>> > __alloc_pages_nodemask
>> > __alloc_pages_slowpath
>> > __alloc_pages_direct_compact
>> > try_to_compact_pages
>> > compact_zone_order
>> > - compact_zone
>> > - 98.16% migrate_pages
>> > - 93.30% try_to_unmap
>> > - rmap_walk
>> > - 92.12% rmap_walk_file
>> > - 90.45% try_to_unmap_one
>> > - 83.48% ptep_clear_flush
>> > - 76.33% flush_tlb_mm_range
>> > - 74.35% native_flush_tlb_others
>> > - 73.75% on_each_cpu_cond_mask
>> > - 70.85% on_each_cpu_mask
>> > - 69.57% smp_call_function_many
>> > - 67.20% smp_call_function_single
>> > 4.93% llist_add_batch
>> > + 1.09% generic_exec_single
>> > 1.19% smp_call_function_single
>> > + 2.01% zalloc_cpumask_var
>> > 1.45% page_vma_mapped_walk
>> > 0.78% down_read
>> > 0.79% up_read
>> > + 3.29% remove_migration_ptes
>> > 0.59% move_to_new_page
>> >
>> >
>> > `perf report` output, collected with `sudo perf record -e compaction:* -a -- sleep 10`:
>> >
>> > Samples: 133K of event 'compaction:mm_compaction_migratepages', Event count (approx.): 133394
>> > Overhead Trace output
>> > 26.25% nr_migrated=0 nr_failed=32
>> > 5.00% nr_migrated=0 nr_failed=2
>> > 4.64% nr_migrated=0 nr_failed=4
>> > 4.24% nr_migrated=0 nr_failed=3
>> > 4.20% nr_migrated=0 nr_failed=6
>> > 3.83% nr_migrated=0 nr_failed=1
>> > 3.79% nr_migrated=0 nr_failed=5
>> > 3.62% nr_migrated=0 nr_failed=7
>> > 3.45% nr_migrated=0 nr_failed=8
>> > 3.06% nr_migrated=0 nr_failed=9
>> > 2.68% nr_migrated=0 nr_failed=10
>> > 2.68% nr_migrated=0 nr_failed=11
>> > 2.42% nr_migrated=0 nr_failed=12
>> > 2.35% nr_migrated=0 nr_failed=13
>> > 2.11% nr_migrated=0 nr_failed=14
>> > 1.93% nr_migrated=0 nr_failed=16
>> > 1.90% nr_migrated=0 nr_failed=15
>> > 1.64% nr_migrated=0 nr_failed=17
>> > 1.50% nr_migrated=0 nr_failed=18
>> > 1.25% nr_migrated=0 nr_failed=19
>> > 1.25% nr_migrated=0 nr_failed=20
>> > 1.22% nr_migrated=0 nr_failed=21
>> > ...
>> > 0.00% nr_migrated=0 nr_failed=207
>> > 0.00% nr_migrated=0 nr_failed=258
>> > 0.00% nr_migrated=0 nr_failed=273
>> > 0.00% nr_migrated=0 nr_failed=277
>> > 0.00% nr_migrated=10 nr_failed=22
>> > 0.00% nr_migrated=13 nr_failed=19
>> > 0.00% nr_migrated=2 nr_failed=2
>> > 0.00% nr_migrated=3 nr_failed=13
>> > 0.00% nr_migrated=3 nr_failed=17
>> > 0.00% nr_migrated=3 nr_failed=25
>> > 0.00% nr_migrated=3 nr_failed=27
>> > 0.00% nr_migrated=3 nr_failed=28
>> > 0.00% nr_migrated=4 nr_failed=10
>> > 0.00% nr_migrated=4 nr_failed=15
>> > 0.00% nr_migrated=4 nr_failed=24
>> > 0.00% nr_migrated=5 nr_failed=10
>> > 0.00% nr_migrated=5 nr_failed=15
>> > 0.00% nr_migrated=6 nr_failed=20
>> > 0.00% nr_migrated=6 nr_failed=21
>> > 0.00% nr_migrated=6 nr_failed=23
>> > 0.00% nr_migrated=8 nr_failed=19
>
>
^ permalink raw reply [flat|nested] 2+ messages in thread