linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [Question]: major faults are still triggered after mlockall when numa balancing
@ 2023-11-09 13:47 zhangpeng (AS)
  2023-11-09 14:11 ` Peter Zijlstra
                   ` (5 more replies)
  0 siblings, 6 replies; 23+ messages in thread
From: zhangpeng (AS) @ 2023-11-09 13:47 UTC (permalink / raw)
  To: linux-mm, linux-kernel
  Cc: akpm, Matthew Wilcox, lstoakes, hughd, david, fengwei.yin,
	vbabka, peterz, mgorman, mingo, riel, ying.huang, hannes,
	Nanyong Sun, Kefeng Wang

Hi everyone,

There is a performance issue that has been bothering us recently.
This problem can reproduce in the latest mainline version (Linux 6.6).

We use mlockall(MCL_CURRENT | MCL_FUTURE) in the user mode process
to avoid performance problems caused by major fault.

There is a stage in numa fault which will set pte as 0 in do_numa_page() :
ptep_modify_prot_start() will clear the vmf->pte, until
ptep_modify_prot_commit() assign a value to the vmf->pte.

For the data segment of the user-mode program, the global variable area
is a private mapping. After the pagecache is loaded, the private
anonymous page is generated after the COW is triggered. Mlockall can
lock COW pages (anonymous pages), but the original file pages cannot
be locked and may be reclaimed. If the global variable (private anon page)
is accessed when vmf->pte is zero which is concurrently set by numa fault,
a file page fault will be triggered.

At this time, the original private file page may have been reclaimed.
If the page cache is not available at this time, a major fault will be
triggered and the file will be read, causing additional overhead.

Our problem scenario is as follows:

task 1                      task 2
------                      ------
/* scan global variables */
do_numa_page()
   spin_lock(vmf->ptl)
   ptep_modify_prot_start()
   /* set vmf->pte as null */
                             /* Access global variables */
                             handle_pte_fault()
                               /* no pte lock */
                               do_pte_missing()
                                 do_fault()
                                   do_read_fault()
   ptep_modify_prot_commit()
   /* ptep update done */
   pte_unmap_unlock(vmf->pte, vmf->ptl)
                                     do_fault_around()
                                     __do_fault()
                                       filemap_fault()
                                         /* page cache is not available
                                         and a major fault is triggered */
                                         do_sync_mmap_readahead()
                                         /* page_not_uptodate and goto
                                         out_retry. */

Is there any way to avoid such a major fault?

-- 
Best Regards,
Peng



^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2023-11-15  1:48 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-09 13:47 [Question]: major faults are still triggered after mlockall when numa balancing zhangpeng (AS)
2023-11-09 14:11 ` Peter Zijlstra
2023-11-09 14:29   ` Matthew Wilcox
2023-11-09 15:15     ` Yin, Fengwei
2023-11-09 17:27 ` Matthew Wilcox
2023-11-10  5:32   ` Huang, Ying
2023-11-10  9:04     ` Yin, Fengwei
2023-11-13  2:02       ` Huang, Ying
2023-11-14 11:23         ` Yin, Fengwei
2023-11-15  1:46           ` Huang, Ying
2023-11-10  9:39   ` zhangpeng (AS)
2023-11-09 22:54 ` Yang Shi
2023-11-10  1:57   ` Yin, Fengwei
2023-11-10  3:39     ` Kefeng Wang
2023-11-10  3:50       ` Yin, Fengwei
2023-11-10  4:00         ` Aneesh Kumar K V
2023-11-14  1:41     ` Yang Shi
2023-11-14 11:10       ` Yin, Fengwei
2023-11-09 23:21 ` Matthew Wilcox
2023-11-10  5:04 ` Aneesh Kumar K.V
2023-11-10  8:36   ` zhangpeng (AS)
2023-11-10  8:17 ` Aneesh Kumar K.V
2023-11-10  9:50   ` zhangpeng (AS)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox