linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [stable-6.6.y] mm: khugepaged refuses to freeze
@ 2026-02-06  2:47 Sergey Senozhatsky
  2026-02-06  3:33 ` Baolin Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2026-02-06  2:47 UTC (permalink / raw)
  To: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Zi Yan
  Cc: Baolin Wang, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, linux-mm, linux-kernel

Greetings,

I'm looking at a slightly unusual issue where khugepaged refuses to
freeze during system suspend:

...
 PM: suspend entry (s2idle)
 Filesystems sync: 0.003 seconds
 Freezing user space processes
 Freezing user space processes completed (elapsed 0.003 seconds)
 OOM killer disabled.
 Freezing remaining freezable tasks
 Freezing remaining freezable tasks failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
 task:khugepaged      state:D stack:0     pid:1345  ppid:2      flags:0x00004000
 Call Trace:
  <TASK>
  schedule+0x523/0x16a0
  ? sysvec_apic_timer_interrupt+0xf/0x90
  ? asm_sysvec_apic_timer_interrupt+0x16/0x20
  ? wait_for_completion_io_timeout+0xc5/0x170
  schedule_timeout+0x23b/0x6e0
  ? __pfx_process_timeout+0x10/0x10
  ? wait_for_completion_io_timeout+0xc5/0x170
  io_schedule_timeout+0x3f/0x80
  wait_for_completion_io_timeout+0xe4/0x170
  submit_bio_wait+0x79/0xc0
  swap_readpage+0x150/0x2d0
  ? __pfx_submit_bio_wait_endio+0x10/0x10
  swap_cluster_readahead+0x3be/0x750
  ? __pfx_workingset_update_node+0x10/0x10
  shmem_swapin+0xa7/0x100
  shmem_swapin_folio+0xcd/0x2e0
  shmem_get_folio+0x237/0x580
  collapse_file+0x247/0x1280
  hpage_collapse_scan_file+0x26e/0x380
  khugepaged+0x43b/0x810
  kthread+0xfb/0x120
  ? __pfx_khugepaged+0x10/0x10
  ? __pfx_kthread+0x10/0x10
  ret_from_fork+0x38/0x50
  ? __pfx_kthread+0x10/0x10
  ret_from_fork_asm+0x1b/0x30
  </TASK>
...

The system is using zram swap.  I wonder if khugepaged should
be suspend/freeze aware.  Does something like below make sense?
Or is the problem elsewhere?

---
 mm/khugepaged.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index eff9e3061925..fa6a018b20a8 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -1894,6 +1894,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
 		xas_set(&xas, index);
 		folio = xas_load(&xas);
 
+		if (try_to_freeze())
+			goto xa_unlocked;
+
 		VM_BUG_ON(index != xas.xa_index);
 		if (is_shmem) {
 			if (!folio) {
-- 
2.53.0.rc2.204.g2597b5adb4-goog



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-06  2:47 [stable-6.6.y] mm: khugepaged refuses to freeze Sergey Senozhatsky
@ 2026-02-06  3:33 ` Baolin Wang
  2026-02-06  3:38   ` Sergey Senozhatsky
  0 siblings, 1 reply; 13+ messages in thread
From: Baolin Wang @ 2026-02-06  3:33 UTC (permalink / raw)
  To: Sergey Senozhatsky, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Zi Yan
  Cc: Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel



On 2/6/26 10:47 AM, Sergey Senozhatsky wrote:
> Greetings,
> 
> I'm looking at a slightly unusual issue where khugepaged refuses to
> freeze during system suspend:
> 
> ...
>   PM: suspend entry (s2idle)
>   Filesystems sync: 0.003 seconds
>   Freezing user space processes
>   Freezing user space processes completed (elapsed 0.003 seconds)
>   OOM killer disabled.
>   Freezing remaining freezable tasks
>   Freezing remaining freezable tasks failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
>   task:khugepaged      state:D stack:0     pid:1345  ppid:2      flags:0x00004000
>   Call Trace:
>    <TASK>
>    schedule+0x523/0x16a0
>    ? sysvec_apic_timer_interrupt+0xf/0x90
>    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
>    ? wait_for_completion_io_timeout+0xc5/0x170
>    schedule_timeout+0x23b/0x6e0
>    ? __pfx_process_timeout+0x10/0x10
>    ? wait_for_completion_io_timeout+0xc5/0x170
>    io_schedule_timeout+0x3f/0x80
>    wait_for_completion_io_timeout+0xe4/0x170
>    submit_bio_wait+0x79/0xc0
>    swap_readpage+0x150/0x2d0
>    ? __pfx_submit_bio_wait_endio+0x10/0x10
>    swap_cluster_readahead+0x3be/0x750
>    ? __pfx_workingset_update_node+0x10/0x10
>    shmem_swapin+0xa7/0x100
>    shmem_swapin_folio+0xcd/0x2e0
>    shmem_get_folio+0x237/0x580
>    collapse_file+0x247/0x1280
>    hpage_collapse_scan_file+0x26e/0x380
>    khugepaged+0x43b/0x810
>    kthread+0xfb/0x120
>    ? __pfx_khugepaged+0x10/0x10
>    ? __pfx_kthread+0x10/0x10
>    ret_from_fork+0x38/0x50
>    ? __pfx_kthread+0x10/0x10
>    ret_from_fork_asm+0x1b/0x30
>    </TASK>
> ...
> 
> The system is using zram swap.  I wonder if khugepaged should
> be suspend/freeze aware.  Does something like below make sense?
> Or is the problem elsewhere?
> 
> ---
>   mm/khugepaged.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index eff9e3061925..fa6a018b20a8 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1894,6 +1894,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>   		xas_set(&xas, index);
>   		folio = xas_load(&xas);
>   
> +		if (try_to_freeze())
> +			goto xa_unlocked;
> +
>   		VM_BUG_ON(index != xas.xa_index);
>   		if (is_shmem) {
>   			if (!folio) {

Your analysis is reasonable. When the system is freezing, khugepaged is 
still trying to swap-in shmem to collapse, which prevents the system 
from entering suspend state. However, it’s not only shmem that will swap 
in, collapsing anonymous folios may also trigger swap-in operations.

Therefore, I think we should skip all collapse scans for anonymous and 
file pages in the main scan function khugepaged_do_scan() if the system 
is attempting to freeze.

Some sample code is as follows:

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fa1e57fd2c46..cfa7882585ad 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2560,9 +2560,18 @@ static void khugepaged_do_scan(struct 
collapse_control *cc)
         lru_add_drain_all();

         while (true) {
+               bool was_frozen;
+
                 cond_resched();

-               if (unlikely(kthread_should_stop()))
+               if (unlikely(kthread_freezable_should_stop(&was_frozen)))
+                       break;
+
+               /*
+                * We can speed up thawing tasks if we don't call 
khugepaged_scan_mm_slot()
+                * after returning from the refrigerator
+                */
+               if (was_frozen)
                         break;

                 spin_lock(&khugepaged_mm_lock);


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-06  3:33 ` Baolin Wang
@ 2026-02-06  3:38   ` Sergey Senozhatsky
  2026-02-06  4:31     ` Sergey Senozhatsky
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2026-02-06  3:38 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Sergey Senozhatsky, Andrew Morton, David Hildenbrand,
	Lorenzo Stoakes, Zi Yan, Liam R. Howlett, Nico Pache,
	Ryan Roberts, Dev Jain, Barry Song, Lance Yang, linux-mm,
	linux-kernel

On (26/02/06 11:33), Baolin Wang wrote:
> >   Freezing remaining freezable tasks failed after 20.004 seconds (1 tasks refusing to freeze, wq_busy=0):
> >   task:khugepaged      state:D stack:0     pid:1345  ppid:2      flags:0x00004000
> >   Call Trace:
> >    <TASK>
> >    schedule+0x523/0x16a0
> >    ? sysvec_apic_timer_interrupt+0xf/0x90
> >    ? asm_sysvec_apic_timer_interrupt+0x16/0x20
> >    ? wait_for_completion_io_timeout+0xc5/0x170
> >    schedule_timeout+0x23b/0x6e0
> >    ? __pfx_process_timeout+0x10/0x10
> >    ? wait_for_completion_io_timeout+0xc5/0x170
> >    io_schedule_timeout+0x3f/0x80
> >    wait_for_completion_io_timeout+0xe4/0x170
> >    submit_bio_wait+0x79/0xc0
> >    swap_readpage+0x150/0x2d0
> >    ? __pfx_submit_bio_wait_endio+0x10/0x10
> >    swap_cluster_readahead+0x3be/0x750
> >    ? __pfx_workingset_update_node+0x10/0x10
> >    shmem_swapin+0xa7/0x100
> >    shmem_swapin_folio+0xcd/0x2e0
> >    shmem_get_folio+0x237/0x580
> >    collapse_file+0x247/0x1280
> >    hpage_collapse_scan_file+0x26e/0x380
> >    khugepaged+0x43b/0x810
> >    kthread+0xfb/0x120
> >    ? __pfx_khugepaged+0x10/0x10
> >    ? __pfx_kthread+0x10/0x10
> >    ret_from_fork+0x38/0x50
> >    ? __pfx_kthread+0x10/0x10
> >    ret_from_fork_asm+0x1b/0x30
> >    </TASK>
> > ...
> > 
> > The system is using zram swap.  I wonder if khugepaged should
> > be suspend/freeze aware.  Does something like below make sense?
> > Or is the problem elsewhere?
> > 
> > ---
> >   mm/khugepaged.c | 3 +++
> >   1 file changed, 3 insertions(+)
> > 
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index eff9e3061925..fa6a018b20a8 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -1894,6 +1894,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> >   		xas_set(&xas, index);
> >   		folio = xas_load(&xas);
> > +		if (try_to_freeze())
> > +			goto xa_unlocked;
> > +
> >   		VM_BUG_ON(index != xas.xa_index);
> >   		if (is_shmem) {
> >   			if (!folio) {
> 
> Your analysis is reasonable. When the system is freezing, khugepaged is
> still trying to swap-in shmem to collapse, which prevents the system from
> entering suspend state. However, it’s not only shmem that will swap in,
> collapsing anonymous folios may also trigger swap-in operations.

Right, I thought about it but wasn't sure.  Could the inner loop (e.g.
collapse_file() in this particular case) loop long enough to fail suspend
w/o ever giving the outer loop (khugepaged_do_scan()) a chance to freeze?


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-06  3:38   ` Sergey Senozhatsky
@ 2026-02-06  4:31     ` Sergey Senozhatsky
  2026-02-06  5:12       ` Baolin Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2026-02-06  4:31 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Zi Yan,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel, Sergey Senozhatsky

On (26/02/06 12:38), Sergey Senozhatsky wrote:
[..]
> > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > > index eff9e3061925..fa6a018b20a8 100644
> > > --- a/mm/khugepaged.c
> > > +++ b/mm/khugepaged.c
> > > @@ -1894,6 +1894,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
> > >   		xas_set(&xas, index);
> > >   		folio = xas_load(&xas);
> > > +		if (try_to_freeze())
> > > +			goto xa_unlocked;
> > > +
> > >   		VM_BUG_ON(index != xas.xa_index);
> > >   		if (is_shmem) {
> > >   			if (!folio) {
> > 
> > Your analysis is reasonable. When the system is freezing, khugepaged is
> > still trying to swap-in shmem to collapse, which prevents the system from
> > entering suspend state. However, it’s not only shmem that will swap in,
> > collapsing anonymous folios may also trigger swap-in operations.
> 
> Right, I thought about it but wasn't sure.  Could the inner loop (e.g.
> collapse_file() in this particular case) loop long enough to fail suspend
> w/o ever giving the outer loop (khugepaged_do_scan()) a chance to freeze?

For inner loops I wondered if cond_resched() could be an indicator of
where try_to_freeze() should be placed.  Those cond_resched() calls
are there for a reason, after all.   E.g. something like:

---

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fa6a018b20a8..cee08466a069 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -2431,6 +2431,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 		unsigned long hstart, hend;
 
 		cond_resched();
+		if (try_to_freeze())
+			break;
+
 		if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
 			progress++;
 			break;
@@ -2453,6 +2456,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
 			bool mmap_locked = true;
 
 			cond_resched();
+			if (try_to_freeze())
+				goto breakouterloop;
+
 			if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
 				goto breakouterloop;
 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-06  4:31     ` Sergey Senozhatsky
@ 2026-02-06  5:12       ` Baolin Wang
  2026-02-06  8:36         ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 13+ messages in thread
From: Baolin Wang @ 2026-02-06  5:12 UTC (permalink / raw)
  To: Sergey Senozhatsky
  Cc: Andrew Morton, David Hildenbrand, Lorenzo Stoakes, Zi Yan,
	Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain, Barry Song,
	Lance Yang, linux-mm, linux-kernel



On 2/6/26 12:31 PM, Sergey Senozhatsky wrote:
> On (26/02/06 12:38), Sergey Senozhatsky wrote:
> [..]
>>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>>> index eff9e3061925..fa6a018b20a8 100644
>>>> --- a/mm/khugepaged.c
>>>> +++ b/mm/khugepaged.c
>>>> @@ -1894,6 +1894,9 @@ static enum scan_result collapse_file(struct mm_struct *mm, unsigned long addr,
>>>>    		xas_set(&xas, index);
>>>>    		folio = xas_load(&xas);
>>>> +		if (try_to_freeze())
>>>> +			goto xa_unlocked;
>>>> +
>>>>    		VM_BUG_ON(index != xas.xa_index);
>>>>    		if (is_shmem) {
>>>>    			if (!folio) {
>>>
>>> Your analysis is reasonable. When the system is freezing, khugepaged is
>>> still trying to swap-in shmem to collapse, which prevents the system from
>>> entering suspend state. However, it’s not only shmem that will swap in,
>>> collapsing anonymous folios may also trigger swap-in operations.
>>
>> Right, I thought about it but wasn't sure.  Could the inner loop (e.g.
>> collapse_file() in this particular case) loop long enough to fail suspend
>> w/o ever giving the outer loop (khugepaged_do_scan()) a chance to freeze?

Yes, that’s possible. However, if we add a try_to_freeze() check in the 
inner loop, we need to consider various scenarios (such as anonymous 
folio swap-in and other potential cases?), which feels too hacky to me.

> For inner loops I wondered if cond_resched() could be an indicator of
> where try_to_freeze() should be placed.  Those cond_resched() calls
> are there for a reason, after all.   E.g. something like:
> 
> ---
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index fa6a018b20a8..cee08466a069 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -2431,6 +2431,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
>   		unsigned long hstart, hend;
>   
>   		cond_resched();
> +		if (try_to_freeze())
> +			break;
> +
>   		if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
>   			progress++;
>   			break;
> @@ -2453,6 +2456,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
>   			bool mmap_locked = true;
>   
>   			cond_resched();
> +			if (try_to_freeze())
> +				goto breakouterloop;
> +
>   			if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
>   				goto breakouterloop;

This looks better than the previous version. Let’s also wait to see if 
others have any better suggestions.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-06  5:12       ` Baolin Wang
@ 2026-02-06  8:36         ` David Hildenbrand (Arm)
  2026-02-06  8:55           ` Baolin Wang
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06  8:36 UTC (permalink / raw)
  To: Baolin Wang, Sergey Senozhatsky
  Cc: Andrew Morton, Lorenzo Stoakes, Zi Yan, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel

On 2/6/26 06:12, Baolin Wang wrote:
> 
> 
> On 2/6/26 12:31 PM, Sergey Senozhatsky wrote:
>> On (26/02/06 12:38), Sergey Senozhatsky wrote:
>> [..]
>>>
>>> Right, I thought about it but wasn't sure.  Could the inner loop (e.g.
>>> collapse_file() in this particular case) loop long enough to fail 
>>> suspend
>>> w/o ever giving the outer loop (khugepaged_do_scan()) a chance to 
>>> freeze?
> 
> Yes, that’s possible. However, if we add a try_to_freeze() check in the 
> inner loop, we need to consider various scenarios (such as anonymous 
> folio swap-in and other potential cases?), which feels too hacky to me.
> 
>> For inner loops I wondered if cond_resched() could be an indicator of
>> where try_to_freeze() should be placed.  Those cond_resched() calls
>> are there for a reason, after all.   E.g. something like:
>>
>> ---
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index fa6a018b20a8..cee08466a069 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -2431,6 +2431,9 @@ static unsigned int 
>> khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
>>           unsigned long hstart, hend;
>>           cond_resched();
>> +        if (try_to_freeze())
>> +            break;
>> +
>>           if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
>>               progress++;
>>               break;
>> @@ -2453,6 +2456,9 @@ static unsigned int 
>> khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
>>               bool mmap_locked = true;
>>               cond_resched();
>> +            if (try_to_freeze())
>> +                goto breakouterloop;
>> +
>>               if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
>>                   goto breakouterloop;
> 
> This looks better than the previous version. Let’s also wait to see if 
> others have any better suggestions.

What prevents other callpaths (faults, read(), write(), etc) from 
similarly triggering swapin?

I recall that there is a notifier when the system is preparing to sleep 
(pm notifier or something). Could we simply hook into that to tell 
khugepaged to suspend+resume?

Essentially, making hpage_collapse_test_exit_or_disable() break our for us.

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-06  8:36         ` David Hildenbrand (Arm)
@ 2026-02-06  8:55           ` Baolin Wang
  2026-02-06  9:00             ` David Hildenbrand (Arm)
  0 siblings, 1 reply; 13+ messages in thread
From: Baolin Wang @ 2026-02-06  8:55 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Sergey Senozhatsky
  Cc: Andrew Morton, Lorenzo Stoakes, Zi Yan, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel



On 2/6/26 4:36 PM, David Hildenbrand (Arm) wrote:
> On 2/6/26 06:12, Baolin Wang wrote:
>>
>>
>> On 2/6/26 12:31 PM, Sergey Senozhatsky wrote:
>>> On (26/02/06 12:38), Sergey Senozhatsky wrote:
>>> [..]
>>>>
>>>> Right, I thought about it but wasn't sure.  Could the inner loop (e.g.
>>>> collapse_file() in this particular case) loop long enough to fail 
>>>> suspend
>>>> w/o ever giving the outer loop (khugepaged_do_scan()) a chance to 
>>>> freeze?
>>
>> Yes, that’s possible. However, if we add a try_to_freeze() check in 
>> the inner loop, we need to consider various scenarios (such as 
>> anonymous folio swap-in and other potential cases?), which feels too 
>> hacky to me.
>>
>>> For inner loops I wondered if cond_resched() could be an indicator of
>>> where try_to_freeze() should be placed.  Those cond_resched() calls
>>> are there for a reason, after all.   E.g. something like:
>>>
>>> ---
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index fa6a018b20a8..cee08466a069 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -2431,6 +2431,9 @@ static unsigned int 
>>> khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
>>>           unsigned long hstart, hend;
>>>           cond_resched();
>>> +        if (try_to_freeze())
>>> +            break;
>>> +
>>>           if (unlikely(hpage_collapse_test_exit_or_disable(mm))) {
>>>               progress++;
>>>               break;
>>> @@ -2453,6 +2456,9 @@ static unsigned int 
>>> khugepaged_scan_mm_slot(unsigned int pages, enum scan_result
>>>               bool mmap_locked = true;
>>>               cond_resched();
>>> +            if (try_to_freeze())
>>> +                goto breakouterloop;
>>> +
>>>               if (unlikely(hpage_collapse_test_exit_or_disable(mm)))
>>>                   goto breakouterloop;
>>
>> This looks better than the previous version. Let’s also wait to see if 
>> others have any better suggestions.
> 
> What prevents other callpaths (faults, read(), write(), etc) from 
> similarly triggering swapin?

Usually it’s just a userspace process triggering one page fault to swap 
a page in, then will return to userspace. There aren’t other kernel 
threads like khugepaged continuously do swap-in in a loop.

> I recall that there is a notifier when the system is preparing to sleep 
> (pm notifier or something). Could we simply hook into that to tell 
> khugepaged to suspend+resume?

Do you mean “struct dev_pm_ops”, which is used to register PM callbacks 
for devices? However, I don’t know how to use it with a kernel thread.

Also look at how kswapd does it, kswapd also uses 
kthread_freezable_should_stop() to check the freeze state.


> Essentially, making hpage_collapse_test_exit_or_disable() break our for us.

Ah, yes, even better:)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-06  8:55           ` Baolin Wang
@ 2026-02-06  9:00             ` David Hildenbrand (Arm)
  2026-02-10  3:21               ` Sergey Senozhatsky
  0 siblings, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-06  9:00 UTC (permalink / raw)
  To: Baolin Wang, Sergey Senozhatsky
  Cc: Andrew Morton, Lorenzo Stoakes, Zi Yan, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel

>> I recall that there is a notifier when the system is preparing to 
>> sleep (pm notifier or something). Could we simply hook into that to 
>> tell khugepaged to suspend+resume?
> 
> Do you mean “struct dev_pm_ops”, which is used to register PM callbacks 
> for devices? However, I don’t know how to use it with a kernel thread.
> 
> Also look at how kswapd does it, kswapd also uses 
> kthread_freezable_should_stop() to check the freeze state.

Right, mimicking what kswapd does sound reasonable!

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-06  9:00             ` David Hildenbrand (Arm)
@ 2026-02-10  3:21               ` Sergey Senozhatsky
  2026-02-10 10:07                 ` Baolin Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Sergey Senozhatsky @ 2026-02-10  3:21 UTC (permalink / raw)
  To: David Hildenbrand (Arm)
  Cc: Baolin Wang, Sergey Senozhatsky, Andrew Morton, Lorenzo Stoakes,
	Zi Yan, Liam R. Howlett, Nico Pache, Ryan Roberts, Dev Jain,
	Barry Song, Lance Yang, linux-mm, linux-kernel

On (26/02/06 10:00), David Hildenbrand (Arm) wrote:
> > > I recall that there is a notifier when the system is preparing to
> > > sleep (pm notifier or something). Could we simply hook into that to
> > > tell khugepaged to suspend+resume?
> > 
> > Do you mean “struct dev_pm_ops”, which is used to register PM callbacks
> > for devices? However, I don’t know how to use it with a kernel thread.
> > 
> > Also look at how kswapd does it, kswapd also uses
> > kthread_freezable_should_stop() to check the freeze state.
> 
> Right, mimicking what kswapd does sound reasonable!

I may be missing something, as I'm not seeing dev_pm_ops in vmscan code.
Would something like this work?

---

diff --git a/mm/khugepaged.c b/mm/khugepaged.c
index fa6a018b20a8..c5d89ec223d3 100644
--- a/mm/khugepaged.c
+++ b/mm/khugepaged.c
@@ -394,8 +394,12 @@ static inline int hpage_collapse_test_exit(struct mm_struct *mm)
 
 static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm)
 {
+	bool was_frozen;
+	int ret = kthread_freezable_should_stop(&was_frozen);
+
 	return hpage_collapse_test_exit(mm) ||
-		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
+		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm) ||
+		was_frozen || ret;
 }
 
 static bool hugepage_pmd_enabled(void)


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-10  3:21               ` Sergey Senozhatsky
@ 2026-02-10 10:07                 ` Baolin Wang
  2026-02-10 10:12                   ` Sergey Senozhatsky
  2026-02-10 10:21                   ` David Hildenbrand (Arm)
  0 siblings, 2 replies; 13+ messages in thread
From: Baolin Wang @ 2026-02-10 10:07 UTC (permalink / raw)
  To: Sergey Senozhatsky, David Hildenbrand (Arm)
  Cc: Andrew Morton, Lorenzo Stoakes, Zi Yan, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel



On 2/10/26 11:21 AM, Sergey Senozhatsky wrote:
> On (26/02/06 10:00), David Hildenbrand (Arm) wrote:
>>>> I recall that there is a notifier when the system is preparing to
>>>> sleep (pm notifier or something). Could we simply hook into that to
>>>> tell khugepaged to suspend+resume?
>>>
>>> Do you mean “struct dev_pm_ops”, which is used to register PM callbacks
>>> for devices? However, I don’t know how to use it with a kernel thread.
>>>
>>> Also look at how kswapd does it, kswapd also uses
>>> kthread_freezable_should_stop() to check the freeze state.
>>
>> Right, mimicking what kswapd does sound reasonable!
> 
> I may be missing something, as I'm not seeing dev_pm_ops in vmscan code.
> Would something like this work?
> 
> ---
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index fa6a018b20a8..c5d89ec223d3 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -394,8 +394,12 @@ static inline int hpage_collapse_test_exit(struct mm_struct *mm)
>   
>   static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm)
>   {
> +	bool was_frozen;
> +	int ret = kthread_freezable_should_stop(&was_frozen);
> +
>   	return hpage_collapse_test_exit(mm) ||
> -		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
> +		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm) ||
> +		was_frozen || ret;
>   }

Since the hpage_collapse_test_exit_or_disable() can be called by 
madvise_callapse(), which is not a kernel thread. So I think using the 
try_to_freeze() is enough? or pass the cc->is_khugepaged to check if 
current thread is khugepaged.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-10 10:07                 ` Baolin Wang
@ 2026-02-10 10:12                   ` Sergey Senozhatsky
  2026-02-10 10:21                   ` David Hildenbrand (Arm)
  1 sibling, 0 replies; 13+ messages in thread
From: Sergey Senozhatsky @ 2026-02-10 10:12 UTC (permalink / raw)
  To: Baolin Wang
  Cc: Sergey Senozhatsky, David Hildenbrand (Arm),
	Andrew Morton, Lorenzo Stoakes, Zi Yan, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel

On (26/02/10 18:07), Baolin Wang wrote:
> > > > Do you mean “struct dev_pm_ops”, which is used to register PM callbacks
> > > > for devices? However, I don’t know how to use it with a kernel thread.
> > > > 
> > > > Also look at how kswapd does it, kswapd also uses
> > > > kthread_freezable_should_stop() to check the freeze state.
> > > 
> > > Right, mimicking what kswapd does sound reasonable!
> > 
> > I may be missing something, as I'm not seeing dev_pm_ops in vmscan code.
> > Would something like this work?
> > 
> > ---
> > 
> > diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> > index fa6a018b20a8..c5d89ec223d3 100644
> > --- a/mm/khugepaged.c
> > +++ b/mm/khugepaged.c
> > @@ -394,8 +394,12 @@ static inline int hpage_collapse_test_exit(struct mm_struct *mm)
> >   static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm)
> >   {
> > +	bool was_frozen;
> > +	int ret = kthread_freezable_should_stop(&was_frozen);
> > +
> >   	return hpage_collapse_test_exit(mm) ||
> > -		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
> > +		mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm) ||
> > +		was_frozen || ret;
> >   }
> 
> Since the hpage_collapse_test_exit_or_disable() can be called by
> madvise_callapse(), which is not a kernel thread. So I think using the
> try_to_freeze() is enough?

I guess try_to_freeze() should work.

> or pass the cc->is_khugepaged to check if current thread is khugepaged.

Or I guess I can check `current->flags & PF_KTHREAD` in
hpage_collapse_test_exit_or_disable().


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-10 10:07                 ` Baolin Wang
  2026-02-10 10:12                   ` Sergey Senozhatsky
@ 2026-02-10 10:21                   ` David Hildenbrand (Arm)
  2026-02-11  1:03                     ` Baolin Wang
  1 sibling, 1 reply; 13+ messages in thread
From: David Hildenbrand (Arm) @ 2026-02-10 10:21 UTC (permalink / raw)
  To: Baolin Wang, Sergey Senozhatsky
  Cc: Andrew Morton, Lorenzo Stoakes, Zi Yan, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel

On 2/10/26 11:07, Baolin Wang wrote:
> 
> 
> On 2/10/26 11:21 AM, Sergey Senozhatsky wrote:
>> On (26/02/06 10:00), David Hildenbrand (Arm) wrote:
>>>
>>> Right, mimicking what kswapd does sound reasonable!
>>
>> I may be missing something, as I'm not seeing dev_pm_ops in vmscan code.
>> Would something like this work?
>>
>> ---
>>
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index fa6a018b20a8..c5d89ec223d3 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -394,8 +394,12 @@ static inline int hpage_collapse_test_exit(struct 
>> mm_struct *mm)
>>   static inline int hpage_collapse_test_exit_or_disable(struct 
>> mm_struct *mm)
>>   {
>> +    bool was_frozen;
>> +    int ret = kthread_freezable_should_stop(&was_frozen);
>> +
>>       return hpage_collapse_test_exit(mm) ||
>> -        mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
>> +        mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm) ||
>> +        was_frozen || ret;
>>   }
> 
> Since the hpage_collapse_test_exit_or_disable() can be called by 
> madvise_callapse(), which is not a kernel thread. 

Which raises the question whether we should forward that context 
(khugepaged vs. madvise) to hpage_collapse_test_exit_or_disable().

-- 
Cheers,

David


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [stable-6.6.y] mm: khugepaged refuses to freeze
  2026-02-10 10:21                   ` David Hildenbrand (Arm)
@ 2026-02-11  1:03                     ` Baolin Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Baolin Wang @ 2026-02-11  1:03 UTC (permalink / raw)
  To: David Hildenbrand (Arm), Sergey Senozhatsky
  Cc: Andrew Morton, Lorenzo Stoakes, Zi Yan, Liam R. Howlett,
	Nico Pache, Ryan Roberts, Dev Jain, Barry Song, Lance Yang,
	linux-mm, linux-kernel



On 2/10/26 6:21 PM, David Hildenbrand (Arm) wrote:
> On 2/10/26 11:07, Baolin Wang wrote:
>>
>>
>> On 2/10/26 11:21 AM, Sergey Senozhatsky wrote:
>>> On (26/02/06 10:00), David Hildenbrand (Arm) wrote:
>>>>
>>>> Right, mimicking what kswapd does sound reasonable!
>>>
>>> I may be missing something, as I'm not seeing dev_pm_ops in vmscan code.
>>> Would something like this work?
>>>
>>> ---
>>>
>>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>>> index fa6a018b20a8..c5d89ec223d3 100644
>>> --- a/mm/khugepaged.c
>>> +++ b/mm/khugepaged.c
>>> @@ -394,8 +394,12 @@ static inline int 
>>> hpage_collapse_test_exit(struct mm_struct *mm)
>>>   static inline int hpage_collapse_test_exit_or_disable(struct 
>>> mm_struct *mm)
>>>   {
>>> +    bool was_frozen;
>>> +    int ret = kthread_freezable_should_stop(&was_frozen);
>>> +
>>>       return hpage_collapse_test_exit(mm) ||
>>> -        mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm);
>>> +        mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm) ||
>>> +        was_frozen || ret;
>>>   }
>>
>> Since the hpage_collapse_test_exit_or_disable() can be called by 
>> madvise_callapse(), which is not a kernel thread. 
> 
> Which raises the question whether we should forward that context 
> (khugepaged vs. madvise) to hpage_collapse_test_exit_or_disable().

Passing in the 'cc' pointer looks fine to me. Something like:

static inline int hpage_collapse_test_exit_or_disable(struct mm_struct *mm,
                                                 struct collapse_control 
*cc)
{
         bool was_frozen = false;

         if (cc->is_khugepaged && 
unlikely(kthread_freezable_should_stop(&was_frozen)))
                 return 1;

         return hpage_collapse_test_exit(mm) ||
                 mm_flags_test(MMF_DISABLE_THP_COMPLETELY, mm) ||
                 was_frozen;
}

Sergey, could you submit a formal patch for review?


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2026-02-11  1:04 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-06  2:47 [stable-6.6.y] mm: khugepaged refuses to freeze Sergey Senozhatsky
2026-02-06  3:33 ` Baolin Wang
2026-02-06  3:38   ` Sergey Senozhatsky
2026-02-06  4:31     ` Sergey Senozhatsky
2026-02-06  5:12       ` Baolin Wang
2026-02-06  8:36         ` David Hildenbrand (Arm)
2026-02-06  8:55           ` Baolin Wang
2026-02-06  9:00             ` David Hildenbrand (Arm)
2026-02-10  3:21               ` Sergey Senozhatsky
2026-02-10 10:07                 ` Baolin Wang
2026-02-10 10:12                   ` Sergey Senozhatsky
2026-02-10 10:21                   ` David Hildenbrand (Arm)
2026-02-11  1:03                     ` Baolin Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox