* [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead @ 2022-02-21 11:17 cgel.zte 2022-02-26 1:24 ` Andrew Morton 2022-02-28 7:57 ` Michal Hocko 0 siblings, 2 replies; 9+ messages in thread From: cgel.zte @ 2022-02-21 11:17 UTC (permalink / raw) To: akpm, naoya.horiguchi, mhocko, minchan, hannes Cc: rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang From: Guo Ziliang <guo.ziliang@zte.com.cn> In our testing, a deadloop task was found. Through sysrq printing, same stack was found every time, as follows: __swap_duplicate+0x58/0x1a0 swapcache_prepare+0x24/0x30 __read_swap_cache_async+0xac/0x220 read_swap_cache_async+0x58/0xa0 swapin_readahead+0x24c/0x628 do_swap_page+0x374/0x8a0 __handle_mm_fault+0x598/0xd60 handle_mm_fault+0x114/0x200 do_page_fault+0x148/0x4d0 do_translation_fault+0xb0/0xd4 do_mem_abort+0x50/0xb0 The reason for the deadloop is that swapcache_prepare() always returns EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that it cannot jump out of the loop. We suspect that the task that clears the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower the priority of the task stuck in a deadloop so that the task that clears the SWAP_HAS_CACHE flag will run. The results show that the system returns to normal after the priority is lowered. In our testing, multiple real-time tasks are bound to the same core, and the task in the deadloop is the highest priority task of the core, so the deadloop task cannot be preempted. Although cond_resched() is used by __read_swap_cache_async, it is an empty function in the preemptive system and cannot achieve the purpose of releasing the CPU. A high-priority task cannot release the CPU unless preempted by a higher-priority task. But when this task is already the highest priority task on this core, other tasks will not be able to be scheduled. So we think we should replace cond_resched() with schedule_timeout_uninterruptible(1), schedule_timeout_interruptible will call set_current_state first to set the task state, so the task will be removed from the running queue, so as to achieve the purpose of giving up the CPU and prevent it from running in kernel mode for too long. Reported-by: Zeal Robot <zealci@zte.com.cn> Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn> Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> Signed-off-by: Guo Ziliang <guo.ziliang@zte.com.cn> --- mm/swap_state.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mm/swap_state.c b/mm/swap_state.c index 8d4104242100..ee67164531c0 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, * __read_swap_cache_async(), which has set SWAP_HAS_CACHE * in swap_map, but not yet added its page to swap cache. */ - cond_resched(); + schedule_timeout_uninterruptible(1); } /* -- 2.15.2 ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead 2022-02-21 11:17 [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead cgel.zte @ 2022-02-26 1:24 ` Andrew Morton 2022-03-01 4:07 ` Hugh Dickins 2022-02-28 7:57 ` Michal Hocko 1 sibling, 1 reply; 9+ messages in thread From: Andrew Morton @ 2022-02-26 1:24 UTC (permalink / raw) To: cgel.zte Cc: naoya.horiguchi, mhocko, minchan, hannes, rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang, Hugh Dickins On Mon, 21 Feb 2022 11:17:49 +0000 cgel.zte@gmail.com wrote: > From: Guo Ziliang <guo.ziliang@zte.com.cn> > > In our testing, a deadloop task was found. Through sysrq printing, same > stack was found every time, as follows: > __swap_duplicate+0x58/0x1a0 > swapcache_prepare+0x24/0x30 > __read_swap_cache_async+0xac/0x220 > read_swap_cache_async+0x58/0xa0 > swapin_readahead+0x24c/0x628 > do_swap_page+0x374/0x8a0 > __handle_mm_fault+0x598/0xd60 > handle_mm_fault+0x114/0x200 > do_page_fault+0x148/0x4d0 > do_translation_fault+0xb0/0xd4 > do_mem_abort+0x50/0xb0 > > The reason for the deadloop is that swapcache_prepare() always returns > EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that > it cannot jump out of the loop. We suspect that the task that clears > the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower > the priority of the task stuck in a deadloop so that the task that > clears the SWAP_HAS_CACHE flag will run. The results show that the > system returns to normal after the priority is lowered. > > In our testing, multiple real-time tasks are bound to the same core, > and the task in the deadloop is the highest priority task of the > core, so the deadloop task cannot be preempted. > > Although cond_resched() is used by __read_swap_cache_async, it is an > empty function in the preemptive system and cannot achieve the purpose > of releasing the CPU. A high-priority task cannot release the CPU > unless preempted by a higher-priority task. But when this task > is already the highest priority task on this core, other tasks > will not be able to be scheduled. So we think we should replace > cond_resched() with schedule_timeout_uninterruptible(1), > schedule_timeout_interruptible will call set_current_state > first to set the task state, so the task will be removed > from the running queue, so as to achieve the purpose of > giving up the CPU and prevent it from running in kernel > mode for too long. > > ... > > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > * __read_swap_cache_async(), which has set SWAP_HAS_CACHE > * in swap_map, but not yet added its page to swap cache. > */ > - cond_resched(); > + schedule_timeout_uninterruptible(1); > } > > /* Sigh. I guess yes, we should do this, at least in a short-term, backportable-to-stable way. But busy-waiting while hoping that someone else will save us isn't an attractive design. Hugh, have you ever thought about something more deterministic in there? Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead 2022-02-26 1:24 ` Andrew Morton @ 2022-03-01 4:07 ` Hugh Dickins 2022-03-02 0:32 ` Andrew Morton 0 siblings, 1 reply; 9+ messages in thread From: Hugh Dickins @ 2022-03-01 4:07 UTC (permalink / raw) To: Andrew Morton Cc: cgel.zte, naoya.horiguchi, mhocko, minchan, hannes, rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang, Hugh Dickins On Fri, 25 Feb 2022, Andrew Morton wrote: > On Mon, 21 Feb 2022 11:17:49 +0000 cgel.zte@gmail.com wrote: > > From: Guo Ziliang <guo.ziliang@zte.com.cn> > > > > In our testing, a deadloop task was found. Through sysrq printing, same > > stack was found every time, as follows: > > __swap_duplicate+0x58/0x1a0 > > swapcache_prepare+0x24/0x30 > > __read_swap_cache_async+0xac/0x220 > > read_swap_cache_async+0x58/0xa0 > > swapin_readahead+0x24c/0x628 > > do_swap_page+0x374/0x8a0 > > __handle_mm_fault+0x598/0xd60 > > handle_mm_fault+0x114/0x200 > > do_page_fault+0x148/0x4d0 > > do_translation_fault+0xb0/0xd4 > > do_mem_abort+0x50/0xb0 > > > > The reason for the deadloop is that swapcache_prepare() always returns > > EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that > > it cannot jump out of the loop. We suspect that the task that clears > > the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower > > the priority of the task stuck in a deadloop so that the task that > > clears the SWAP_HAS_CACHE flag will run. The results show that the > > system returns to normal after the priority is lowered. > > > > In our testing, multiple real-time tasks are bound to the same core, > > and the task in the deadloop is the highest priority task of the > > core, so the deadloop task cannot be preempted. > > > > Although cond_resched() is used by __read_swap_cache_async, it is an > > empty function in the preemptive system and cannot achieve the purpose > > of releasing the CPU. A high-priority task cannot release the CPU > > unless preempted by a higher-priority task. But when this task > > is already the highest priority task on this core, other tasks > > will not be able to be scheduled. So we think we should replace > > cond_resched() with schedule_timeout_uninterruptible(1), > > schedule_timeout_interruptible will call set_current_state > > first to set the task state, so the task will be removed > > from the running queue, so as to achieve the purpose of > > giving up the CPU and prevent it from running in kernel > > mode for too long. > > > > ... > > > > --- a/mm/swap_state.c > > +++ b/mm/swap_state.c > > @@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > > * __read_swap_cache_async(), which has set SWAP_HAS_CACHE > > * in swap_map, but not yet added its page to swap cache. > > */ > > - cond_resched(); > > + schedule_timeout_uninterruptible(1); > > } > > > > /* > > Sigh. I guess yes, we should do this, at least in a short-term, > backportable-to-stable way. > > But busy-waiting while hoping that someone else will save us isn't an > attractive design. Hugh, have you ever thought about something more > deterministic in there? Not something more deterministic, no: I think that would entail heavier locking, perhaps slowing down hotter paths, just to avoid this swap oddity. This loop was written long before there was a preemptive kernel: it was appropriate then, and almost never needed more than one retry to complete; but preemption changed the story without us realizing. Sigh here too. I commend the thread on it from July 2018: https://lore.kernel.org/linux-mm/2018072514403228778860@wingtech.com/ There the 4.9-stable user proposed preempt_disable(), I agreed but found the patch provided insufficient, and offered another 4.9 patch further down the thread. Your preference at the time was msleep(1). I was working on a similar patch for 4.18, but have not completed it yet ;) and don't remember how satisfied or not I was with that one; and wonder if I'm any more likely to get it finished by 2026. It's clear that I put much more thought into it back then than just now. Maybe someone else would have a go: my 4.9 patch in that thread shows most of it, but might need a lot of work to update to 5.17. And it also gathered some temporary debug stats on how often this happens: I'm not conscious of using RT at all, but was disturbed to see how long an ordinary preemptive kernel was sometimes spinning there. So I think I agree with you more than Michal on that: RT just makes the bad behaviour more obvious. Hugh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead 2022-03-01 4:07 ` Hugh Dickins @ 2022-03-02 0:32 ` Andrew Morton 2022-03-02 19:31 ` Hugh Dickins 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2022-03-02 0:32 UTC (permalink / raw) To: Hugh Dickins Cc: cgel.zte, naoya.horiguchi, mhocko, minchan, hannes, rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang On Mon, 28 Feb 2022 20:07:33 -0800 (PST) Hugh Dickins <hughd@google.com> wrote: > > > --- a/mm/swap_state.c > > > +++ b/mm/swap_state.c > > > @@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > > > * __read_swap_cache_async(), which has set SWAP_HAS_CACHE > > > * in swap_map, but not yet added its page to swap cache. > > > */ > > > - cond_resched(); > > > + schedule_timeout_uninterruptible(1); > > > } > > > > > > /* > > > > Sigh. I guess yes, we should do this, at least in a short-term, > > backportable-to-stable way. > > > > But busy-waiting while hoping that someone else will save us isn't an > > attractive design. Hugh, have you ever thought about something more > > deterministic in there? > > Not something more deterministic, no: I think that would entail > heavier locking, perhaps slowing down hotter paths, just to avoid > this swap oddity. > > This loop was written long before there was a preemptive kernel: > it was appropriate then, and almost never needed more than one retry > to complete; but preemption changed the story without us realizing. > > Sigh here too. I commend the thread on it from July 2018: > https://lore.kernel.org/linux-mm/2018072514403228778860@wingtech.com/ > > There the 4.9-stable user proposed preempt_disable(), I agreed but > found the patch provided insufficient, and offered another 4.9 patch > further down the thread. Your preference at the time was msleep(1). > > I was working on a similar patch for 4.18, but have not completed it > yet ;) and don't remember how satisfied or not I was with that one; > and wonder if I'm any more likely to get it finished by 2026. It's > clear that I put much more thought into it back then than just now. > > Maybe someone else would have a go: my 4.9 patch in that thread > shows most of it, but might need a lot of work to update to 5.17. > > And it also gathered some temporary debug stats on how often this > happens: I'm not conscious of using RT at all, but was disturbed to see > how long an ordinary preemptive kernel was sometimes spinning there. > So I think I agree with you more than Michal on that: RT just makes > the bad behaviour more obvious. Thanks as always. Using msleep() seems pretty pointless so I plan to go ahead with patch as-is, with a cc:stable. None of it is pretty, but it's better than what we have now, yes? ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead 2022-03-02 0:32 ` Andrew Morton @ 2022-03-02 19:31 ` Hugh Dickins 0 siblings, 0 replies; 9+ messages in thread From: Hugh Dickins @ 2022-03-02 19:31 UTC (permalink / raw) To: Andrew Morton Cc: Hugh Dickins, cgel.zte, naoya.horiguchi, mhocko, minchan, hannes, rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang On Tue, 1 Mar 2022, Andrew Morton wrote: > > Using msleep() seems pretty pointless so I plan to go ahead with patch > as-is, with a cc:stable. None of it is pretty, but it's better than > what we have now, yes? Yes, I've no objection to that. Hugh ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead 2022-02-21 11:17 [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead cgel.zte 2022-02-26 1:24 ` Andrew Morton @ 2022-02-28 7:57 ` Michal Hocko 2022-02-28 15:33 ` Andrew Morton 1 sibling, 1 reply; 9+ messages in thread From: Michal Hocko @ 2022-02-28 7:57 UTC (permalink / raw) To: cgel.zte Cc: akpm, naoya.horiguchi, minchan, hannes, rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang On Mon 21-02-22 11:17:49, cgel.zte@gmail.com wrote: > From: Guo Ziliang <guo.ziliang@zte.com.cn> > > In our testing, a deadloop task was found. Through sysrq printing, same > stack was found every time, as follows: > __swap_duplicate+0x58/0x1a0 > swapcache_prepare+0x24/0x30 > __read_swap_cache_async+0xac/0x220 > read_swap_cache_async+0x58/0xa0 > swapin_readahead+0x24c/0x628 > do_swap_page+0x374/0x8a0 > __handle_mm_fault+0x598/0xd60 > handle_mm_fault+0x114/0x200 > do_page_fault+0x148/0x4d0 > do_translation_fault+0xb0/0xd4 > do_mem_abort+0x50/0xb0 > > The reason for the deadloop is that swapcache_prepare() always returns > EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that > it cannot jump out of the loop. We suspect that the task that clears > the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower > the priority of the task stuck in a deadloop so that the task that > clears the SWAP_HAS_CACHE flag will run. The results show that the > system returns to normal after the priority is lowered. > > In our testing, multiple real-time tasks are bound to the same core, > and the task in the deadloop is the highest priority task of the > core, so the deadloop task cannot be preempted. > > Although cond_resched() is used by __read_swap_cache_async, it is an > empty function in the preemptive system and cannot achieve the purpose > of releasing the CPU. A high-priority task cannot release the CPU > unless preempted by a higher-priority task. But when this task > is already the highest priority task on this core, other tasks > will not be able to be scheduled. So we think we should replace > cond_resched() with schedule_timeout_uninterruptible(1), > schedule_timeout_interruptible will call set_current_state > first to set the task state, so the task will be removed > from the running queue, so as to achieve the purpose of > giving up the CPU and prevent it from running in kernel > mode for too long. I am sorry but I really do not see how this case is any different from any other kernel code path being hogged by a RT task. We surely shouldn't put sleeps into all random paths which are doing cond_resched at the moment. > Reported-by: Zeal Robot <zealci@zte.com.cn> > Reviewed-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > Reviewed-by: Jiang Xuexin <jiang.xuexin@zte.com.cn> > Reviewed-by: Yang Yang <yang.yang29@zte.com.cn> > Signed-off-by: Guo Ziliang <guo.ziliang@zte.com.cn> > --- > mm/swap_state.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/swap_state.c b/mm/swap_state.c > index 8d4104242100..ee67164531c0 100644 > --- a/mm/swap_state.c > +++ b/mm/swap_state.c > @@ -478,7 +478,7 @@ struct page *__read_swap_cache_async(swp_entry_t entry, gfp_t gfp_mask, > * __read_swap_cache_async(), which has set SWAP_HAS_CACHE > * in swap_map, but not yet added its page to swap cache. > */ > - cond_resched(); > + schedule_timeout_uninterruptible(1); > } > > /* > -- > 2.15.2 > -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead 2022-02-28 7:57 ` Michal Hocko @ 2022-02-28 15:33 ` Andrew Morton 2022-03-02 9:46 ` Michal Hocko 0 siblings, 1 reply; 9+ messages in thread From: Andrew Morton @ 2022-02-28 15:33 UTC (permalink / raw) To: Michal Hocko Cc: cgel.zte, naoya.horiguchi, minchan, hannes, rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang On Mon, 28 Feb 2022 08:57:49 +0100 Michal Hocko <mhocko@suse.com> wrote: > On Mon 21-02-22 11:17:49, cgel.zte@gmail.com wrote: > > From: Guo Ziliang <guo.ziliang@zte.com.cn> > > > > In our testing, a deadloop task was found. Through sysrq printing, same > > stack was found every time, as follows: > > __swap_duplicate+0x58/0x1a0 > > swapcache_prepare+0x24/0x30 > > __read_swap_cache_async+0xac/0x220 > > read_swap_cache_async+0x58/0xa0 > > swapin_readahead+0x24c/0x628 > > do_swap_page+0x374/0x8a0 > > __handle_mm_fault+0x598/0xd60 > > handle_mm_fault+0x114/0x200 > > do_page_fault+0x148/0x4d0 > > do_translation_fault+0xb0/0xd4 > > do_mem_abort+0x50/0xb0 > > > > The reason for the deadloop is that swapcache_prepare() always returns > > EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that > > it cannot jump out of the loop. We suspect that the task that clears > > the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower > > the priority of the task stuck in a deadloop so that the task that > > clears the SWAP_HAS_CACHE flag will run. The results show that the > > system returns to normal after the priority is lowered. > > > > In our testing, multiple real-time tasks are bound to the same core, > > and the task in the deadloop is the highest priority task of the > > core, so the deadloop task cannot be preempted. > > > > Although cond_resched() is used by __read_swap_cache_async, it is an > > empty function in the preemptive system and cannot achieve the purpose > > of releasing the CPU. A high-priority task cannot release the CPU > > unless preempted by a higher-priority task. But when this task > > is already the highest priority task on this core, other tasks > > will not be able to be scheduled. So we think we should replace > > cond_resched() with schedule_timeout_uninterruptible(1), > > schedule_timeout_interruptible will call set_current_state > > first to set the task state, so the task will be removed > > from the running queue, so as to achieve the purpose of > > giving up the CPU and prevent it from running in kernel > > mode for too long. > > I am sorry but I really do not see how this case is any different from > any other kernel code path being hogged by a RT task. We surely > shouldn't put sleeps into all random paths which are doing cond_resched > at the moment. But this cond_resched() is different from most. This one is attempting to yield the CPU so this task can make progress. And cond_resched() simply isn't an appropriate way of doing this because under this fairly common situation, it's a no-op. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead 2022-02-28 15:33 ` Andrew Morton @ 2022-03-02 9:46 ` Michal Hocko 2022-03-02 20:38 ` Hugh Dickins 0 siblings, 1 reply; 9+ messages in thread From: Michal Hocko @ 2022-03-02 9:46 UTC (permalink / raw) To: Andrew Morton Cc: cgel.zte, naoya.horiguchi, minchan, hannes, rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang On Mon 28-02-22 07:33:15, Andrew Morton wrote: > On Mon, 28 Feb 2022 08:57:49 +0100 Michal Hocko <mhocko@suse.com> wrote: > > > On Mon 21-02-22 11:17:49, cgel.zte@gmail.com wrote: > > > From: Guo Ziliang <guo.ziliang@zte.com.cn> > > > > > > In our testing, a deadloop task was found. Through sysrq printing, same > > > stack was found every time, as follows: > > > __swap_duplicate+0x58/0x1a0 > > > swapcache_prepare+0x24/0x30 > > > __read_swap_cache_async+0xac/0x220 > > > read_swap_cache_async+0x58/0xa0 > > > swapin_readahead+0x24c/0x628 > > > do_swap_page+0x374/0x8a0 > > > __handle_mm_fault+0x598/0xd60 > > > handle_mm_fault+0x114/0x200 > > > do_page_fault+0x148/0x4d0 > > > do_translation_fault+0xb0/0xd4 > > > do_mem_abort+0x50/0xb0 > > > > > > The reason for the deadloop is that swapcache_prepare() always returns > > > EEXIST, indicating that SWAP_HAS_CACHE has not been cleared, so that > > > it cannot jump out of the loop. We suspect that the task that clears > > > the SWAP_HAS_CACHE flag never gets a chance to run. We try to lower > > > the priority of the task stuck in a deadloop so that the task that > > > clears the SWAP_HAS_CACHE flag will run. The results show that the > > > system returns to normal after the priority is lowered. > > > > > > In our testing, multiple real-time tasks are bound to the same core, > > > and the task in the deadloop is the highest priority task of the > > > core, so the deadloop task cannot be preempted. > > > > > > Although cond_resched() is used by __read_swap_cache_async, it is an > > > empty function in the preemptive system and cannot achieve the purpose > > > of releasing the CPU. A high-priority task cannot release the CPU > > > unless preempted by a higher-priority task. But when this task > > > is already the highest priority task on this core, other tasks > > > will not be able to be scheduled. So we think we should replace > > > cond_resched() with schedule_timeout_uninterruptible(1), > > > schedule_timeout_interruptible will call set_current_state > > > first to set the task state, so the task will be removed > > > from the running queue, so as to achieve the purpose of > > > giving up the CPU and prevent it from running in kernel > > > mode for too long. > > > > I am sorry but I really do not see how this case is any different from > > any other kernel code path being hogged by a RT task. We surely > > shouldn't put sleeps into all random paths which are doing cond_resched > > at the moment. > > But this cond_resched() is different from most. This one is attempting > to yield the CPU so this task can make progress. And cond_resched() > simply isn't an appropriate way of doing this because under this fairly > common situation, it's a no-op. I might be really missing something but I really do not see how is this any different from the page allocator path which only does cond_resched as well (well, except for throttling but that might just not trigger). Or other paths which just do cond_resched while waiting for a progress somewhere else. Not that I like this situation but !PREEMPT kernel with RT priority tasks is rather limited and full of potential priblems IMHO. -- Michal Hocko SUSE Labs ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead 2022-03-02 9:46 ` Michal Hocko @ 2022-03-02 20:38 ` Hugh Dickins 0 siblings, 0 replies; 9+ messages in thread From: Hugh Dickins @ 2022-03-02 20:38 UTC (permalink / raw) To: Michal Hocko Cc: Andrew Morton, cgel.zte, naoya.horiguchi, minchan, hannes, rogerq, linux-mm, linux-kernel, guo.ziliang, Zeal Robot, Ran Xiaokai, Jiang Xuexin, Yang Yang On Wed, 2 Mar 2022, Michal Hocko wrote: > > I might be really missing something but I really do not see how is this > any different from the page allocator path which only does cond_resched > as well (well, except for throttling but that might just not trigger). > Or other paths which just do cond_resched while waiting for a progress > somewhere else. > > Not that I like this situation but !PREEMPT kernel with RT priority > tasks is rather limited and full of potential priblems IMHO. As I said in previous mail, I have really not given this as much thought this time as I did in the 2018 mail thread linked there; but have seen that it behaves more badly than I had imagined, in any preemptive kernel - no need for RT. We just don't have the stats to show when this code here spins waiting on code elsewhere that is sleeping. I think the difference from most cond_resched() places is that swapin is trying to collect together several factors with minimal locking, and we should have added preempt_disable()s when preemption was invented. But it's only swap so we didn't notice. Hugh ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-03-02 20:38 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-21 11:17 [PATCH linux-next] mm: swap: get rid of deadloop in swapin readahead cgel.zte 2022-02-26 1:24 ` Andrew Morton 2022-03-01 4:07 ` Hugh Dickins 2022-03-02 0:32 ` Andrew Morton 2022-03-02 19:31 ` Hugh Dickins 2022-02-28 7:57 ` Michal Hocko 2022-02-28 15:33 ` Andrew Morton 2022-03-02 9:46 ` Michal Hocko 2022-03-02 20:38 ` Hugh Dickins
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox