[PATCH] mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
@ 2016-01-29 10:49 Tetsuo Handa
  2016-01-29 13:40 ` Michal Hocko
  0 siblings, 1 reply; 2+ messages in thread
From: Tetsuo Handa @ 2016-01-29 10:49 UTC (permalink / raw)
  To: linux-mm, akpm, jstancek, torvalds
  Cc: Tetsuo Handa, Michal Hocko, Tejun Heo, Cristopher Lameter,
	Joonsoo Kim, Arkadiusz Miskiewicz, stable

Jan Stancek has reported that system occasionally hanging after
"oom01" testcase from LTP triggers OOM. Guessing from a result that
there is a kworker thread doing memory allocation and the values
between "Node 0 Normal free:" and "Node 0 Normal:" differs when
hanging, vmstat is not up-to-date for some reason.

According to commit 373ccbe59270 ("mm, vmstat: allow WQ concurrency to
discover memory reclaim doesn't make any progress"), it meant to force
the kworker thread to take a short sleep, but it by error used
schedule_timeout(1). We missed that schedule_timeout() in state
TASK_RUNNING doesn't do anything.

Fix it by using schedule_timeout_uninterruptible(1) which forces
the kworker thread to take a short sleep in order to make sure
that vmstat is up-to-date.

Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
Reported-by: Jan Stancek <jstancek@redhat.com>
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Cristopher Lameter <clameter@sgi.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
Cc: <stable@vger.kernel.org>
---
 mm/backing-dev.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index 7340353..cbe6f0b 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -989,7 +989,7 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout)
 		 * here rather than calling cond_resched().
 		 */
 		if (current->flags & PF_WQ_WORKER)
-			schedule_timeout(1);
+			schedule_timeout_uninterruptible(1);
 		else
 			cond_resched();

-- 
1.8.3.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [PATCH] mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress
  2016-01-29 10:49 [PATCH] mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress Tetsuo Handa
@ 2016-01-29 13:40 ` Michal Hocko
  0 siblings, 0 replies; 2+ messages in thread
From: Michal Hocko @ 2016-01-29 13:40 UTC (permalink / raw)
  To: Tetsuo Handa
  Cc: linux-mm, akpm, jstancek, torvalds, Tejun Heo,
	Cristopher Lameter, Joonsoo Kim, Arkadiusz Miskiewicz, stable

On Fri 29-01-16 19:49:12, Tetsuo Handa wrote:
> Jan Stancek has reported that system occasionally hanging after
> "oom01" testcase from LTP triggers OOM. Guessing from a result that
> there is a kworker thread doing memory allocation and the values
> between "Node 0 Normal free:" and "Node 0 Normal:" differs when
> hanging, vmstat is not up-to-date for some reason.
> 
> According to commit 373ccbe59270 ("mm, vmstat: allow WQ concurrency to
> discover memory reclaim doesn't make any progress"), it meant to force
> the kworker thread to take a short sleep, but it by error used
> schedule_timeout(1). We missed that schedule_timeout() in state
> TASK_RUNNING doesn't do anything.

Dang... You are right of course. I've made the same mistake during
oom_reaper development but didn't realize that the same has been used
for the WQ thingy. My bad!

I am not sure this really fixes the issue mentioned above because I
didn't get to look at the report yet but we definitely have to change
the task state before calling schedule_timeout so this is obviously
correct. I would just argue that the interruptible sleep or TASK_IDLE
would be little bit better. But it shouldn't really matter much with
such a short timeout I guess.

> Fix it by using schedule_timeout_uninterruptible(1) which forces
> the kworker thread to take a short sleep in order to make sure
> that vmstat is up-to-date.
> 
> Fixes: 373ccbe59270 ("mm, vmstat: allow WQ concurrency to discover memory reclaim doesn't make any progress")
> Reported-by: Jan Stancek <jstancek@redhat.com>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> Cc: Michal Hocko <mhocko@suse.com>
> Cc: Tejun Heo <tj@kernel.org>
> Cc: Cristopher Lameter <clameter@sgi.com>
> Cc: Joonsoo Kim <js1304@gmail.com>
> Cc: Arkadiusz Miskiewicz <arekm@maven.pl>
> Cc: <stable@vger.kernel.org>

Acked-by: Michal Hocko <mhocko@suse.com>

Thanks!
> ---
>  mm/backing-dev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/backing-dev.c b/mm/backing-dev.c
> index 7340353..cbe6f0b 100644
> --- a/mm/backing-dev.c
> +++ b/mm/backing-dev.c
> @@ -989,7 +989,7 @@ long wait_iff_congested(struct zone *zone, int sync, long timeout)
>  		 * here rather than calling cond_resched().
>  		 */
>  		if (current->flags & PF_WQ_WORKER)
> -			schedule_timeout(1);
> +			schedule_timeout_uninterruptible(1);
>  		else
>  			cond_resched();
>  
> -- 
> 1.8.3.1
> 

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-01-29 13:40 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-29 10:49 [PATCH] mm, vmstat: fix wrong WQ sleep when memory reclaim doesn't make any progress Tetsuo Handa
2016-01-29 13:40 ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox