[PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
@ 2015-02-24 18:19 Michal Hocko
  2015-02-24 18:22 ` Michal Hocko
  2015-02-24 19:11 ` Johannes Weiner
  0 siblings, 2 replies; 10+ messages in thread
From: Michal Hocko @ 2015-02-24 18:19 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Johannes Weiner, \"Rafael J. Wysocki\",
	Tetsuo Handa, linux-mm, LKML

Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
after OOM killer is disabled if the allocation is performed by a
kernel thread. This behavior was introduced from the very beginning by
7f33d49a2ed5 (mm, PM/Freezer: Disable OOM killer when tasks are frozen).
This means that the basic contract for the allocation request is broken
and the context requesting such an allocation might blow up unexpectedly.

There are basically two ways forward.
1) move oom_killer_disable after kernel threads are frozen. This has a
   risk that the OOM victim wouldn't be able to finish because it would
   depend on an already frozen kernel thread. This would be really
   tricky to debug.
2) do not fail GFP_NOFAIL allocation no matter what and risk a potential
   Freezable kernel threads will loop and fail the suspend. Incidental
   allocations after kernel threads are frozen will at least dump a
   warning - if we are lucky and the serial console is still active of
   course...

This patch implements the later option because it is safer. We would see
warnings rather than allocation failures for the kernel threads which
would blow up otherwise and have a higher chances to identify
__GFP_NOFAIL users from deeper pm code.

Signed-off-by: Michal Hocko <mhocko@suse.cz>
---

We haven't seen any bug reports 

 mm/oom_kill.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
index 642f38cb175a..ea8b443cd871 100644
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -772,6 +772,10 @@ out:
 		schedule_timeout_killable(1);
 }

+static DEFINE_RATELIMIT_STATE(oom_disabled_rs,
+		DEFAULT_RATELIMIT_INTERVAL,
+		DEFAULT_RATELIMIT_BURST);
+
 /**
  * out_of_memory -  tries to invoke OOM killer.
  * @zonelist: zonelist pointer
@@ -792,6 +796,10 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
 	if (!oom_killer_disabled) {
 		__out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill);
 		ret = true;
+	} else if (gfp_mask & __GFP_NOFAIL) {
+		if (__ratelimit(&oom_disabled_rs))
+			WARN(1, "Unable to make forward progress for __GFP_NOFAIL because OOM killer is disbaled\n");
+		ret = true;
 	}
 	up_read(&oom_sem);

-- 
2.1.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-24 18:19 [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled Michal Hocko
@ 2015-02-24 18:22 ` Michal Hocko
  2015-02-24 19:11 ` Johannes Weiner
  1 sibling, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2015-02-24 18:22 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Rientjes, Johannes Weiner, \"Rafael J. Wysocki\",
	Tetsuo Handa, linux-mm, LKML

On Tue 24-02-15 19:19:24, Michal Hocko wrote:
> Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
> after OOM killer is disabled if the allocation is performed by a
> kernel thread. This behavior was introduced from the very beginning by
> 7f33d49a2ed5 (mm, PM/Freezer: Disable OOM killer when tasks are frozen).
> This means that the basic contract for the allocation request is broken
> and the context requesting such an allocation might blow up unexpectedly.
> 
> There are basically two ways forward.
> 1) move oom_killer_disable after kernel threads are frozen. This has a
>    risk that the OOM victim wouldn't be able to finish because it would
>    depend on an already frozen kernel thread. This would be really
>    tricky to debug.
> 2) do not fail GFP_NOFAIL allocation no matter what and risk a potential
>    Freezable kernel threads will loop and fail the suspend. Incidental
>    allocations after kernel threads are frozen will at least dump a
>    warning - if we are lucky and the serial console is still active of
>    course...
> 
> This patch implements the later option because it is safer. We would see
> warnings rather than allocation failures for the kernel threads which
> would blow up otherwise and have a higher chances to identify
> __GFP_NOFAIL users from deeper pm code.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> 
> We haven't seen any bug reports 

Ups, forgot to save the file before sending. The full text is:
"
We haven't seen any bug reports since 2009 so I haven't marked the patch
for stable. I have no problem to backport it to stable trees though if
people think it is a good precaution.
"

> 
>  mm/oom_kill.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 642f38cb175a..ea8b443cd871 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -772,6 +772,10 @@ out:
>  		schedule_timeout_killable(1);
>  }
>  
> +static DEFINE_RATELIMIT_STATE(oom_disabled_rs,
> +		DEFAULT_RATELIMIT_INTERVAL,
> +		DEFAULT_RATELIMIT_BURST);
> +
>  /**
>   * out_of_memory -  tries to invoke OOM killer.
>   * @zonelist: zonelist pointer
> @@ -792,6 +796,10 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
>  	if (!oom_killer_disabled) {
>  		__out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill);
>  		ret = true;
> +	} else if (gfp_mask & __GFP_NOFAIL) {
> +		if (__ratelimit(&oom_disabled_rs))
> +			WARN(1, "Unable to make forward progress for __GFP_NOFAIL because OOM killer is disbaled\n");
> +		ret = true;
>  	}
>  	up_read(&oom_sem);
>  
> -- 
> 2.1.4
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-24 18:19 [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled Michal Hocko
  2015-02-24 18:22 ` Michal Hocko
@ 2015-02-24 19:11 ` Johannes Weiner
  2015-02-24 20:23   ` David Rientjes
                     ` (2 more replies)
  1 sibling, 3 replies; 10+ messages in thread
From: Johannes Weiner @ 2015-02-24 19:11 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, David Rientjes, \"Rafael J. Wysocki\",
	Tetsuo Handa, linux-mm, LKML

On Tue, Feb 24, 2015 at 07:19:24PM +0100, Michal Hocko wrote:
> Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
> after OOM killer is disabled if the allocation is performed by a
> kernel thread. This behavior was introduced from the very beginning by
> 7f33d49a2ed5 (mm, PM/Freezer: Disable OOM killer when tasks are frozen).
> This means that the basic contract for the allocation request is broken
> and the context requesting such an allocation might blow up unexpectedly.
> 
> There are basically two ways forward.
> 1) move oom_killer_disable after kernel threads are frozen. This has a
>    risk that the OOM victim wouldn't be able to finish because it would
>    depend on an already frozen kernel thread. This would be really
>    tricky to debug.
> 2) do not fail GFP_NOFAIL allocation no matter what and risk a potential
>    Freezable kernel threads will loop and fail the suspend. Incidental
>    allocations after kernel threads are frozen will at least dump a
>    warning - if we are lucky and the serial console is still active of
>    course...
> 
> This patch implements the later option because it is safer. We would see
> warnings rather than allocation failures for the kernel threads which
> would blow up otherwise and have a higher chances to identify
> __GFP_NOFAIL users from deeper pm code.
> 
> Signed-off-by: Michal Hocko <mhocko@suse.cz>
> ---
> 
> We haven't seen any bug reports 
> 
>  mm/oom_kill.c | 8 ++++++++
>  1 file changed, 8 insertions(+)
> 
> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> index 642f38cb175a..ea8b443cd871 100644
> --- a/mm/oom_kill.c
> +++ b/mm/oom_kill.c
> @@ -772,6 +772,10 @@ out:
>  		schedule_timeout_killable(1);
>  }
>  
> +static DEFINE_RATELIMIT_STATE(oom_disabled_rs,
> +		DEFAULT_RATELIMIT_INTERVAL,
> +		DEFAULT_RATELIMIT_BURST);
> +
>  /**
>   * out_of_memory -  tries to invoke OOM killer.
>   * @zonelist: zonelist pointer
> @@ -792,6 +796,10 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
>  	if (!oom_killer_disabled) {
>  		__out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill);
>  		ret = true;
> +	} else if (gfp_mask & __GFP_NOFAIL) {
> +		if (__ratelimit(&oom_disabled_rs))
> +			WARN(1, "Unable to make forward progress for __GFP_NOFAIL because OOM killer is disbaled\n");
> +		ret = true;

I'm fine with keeping the allocation looping, but is that message
helpful?  It seems completely useless to the user encountering it.  Is
it going to help kernel developers when we get a bug report with it?

WARN_ON_ONCE()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-24 19:11 ` Johannes Weiner
@ 2015-02-24 20:23   ` David Rientjes
  2015-02-25 14:08     ` [PATCH -v2] " Michal Hocko
  2015-02-24 22:09   ` [PATCH] " Konstantin Khlebnikov
  2015-02-25 14:02   ` Michal Hocko
  2 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2015-02-24 20:23 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, \"Rafael J. Wysocki\",
	Tetsuo Handa, linux-mm, LKML

On Tue, 24 Feb 2015, Johannes Weiner wrote:

> On Tue, Feb 24, 2015 at 07:19:24PM +0100, Michal Hocko wrote:
> > Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
> > after OOM killer is disabled if the allocation is performed by a
> > kernel thread. This behavior was introduced from the very beginning by
> > 7f33d49a2ed5 (mm, PM/Freezer: Disable OOM killer when tasks are frozen).
> > This means that the basic contract for the allocation request is broken
> > and the context requesting such an allocation might blow up unexpectedly.
> > 
> > There are basically two ways forward.
> > 1) move oom_killer_disable after kernel threads are frozen. This has a
> >    risk that the OOM victim wouldn't be able to finish because it would
> >    depend on an already frozen kernel thread. This would be really
> >    tricky to debug.
> > 2) do not fail GFP_NOFAIL allocation no matter what and risk a potential
> >    Freezable kernel threads will loop and fail the suspend. Incidental
> >    allocations after kernel threads are frozen will at least dump a
> >    warning - if we are lucky and the serial console is still active of
> >    course...
> > 
> > This patch implements the later option because it is safer. We would see
> > warnings rather than allocation failures for the kernel threads which
> > would blow up otherwise and have a higher chances to identify
> > __GFP_NOFAIL users from deeper pm code.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.cz>
> > ---
> > 
> > We haven't seen any bug reports 
> > 
> >  mm/oom_kill.c | 8 ++++++++
> >  1 file changed, 8 insertions(+)
> > 
> > diff --git a/mm/oom_kill.c b/mm/oom_kill.c
> > index 642f38cb175a..ea8b443cd871 100644
> > --- a/mm/oom_kill.c
> > +++ b/mm/oom_kill.c
> > @@ -772,6 +772,10 @@ out:
> >  		schedule_timeout_killable(1);
> >  }
> >  
> > +static DEFINE_RATELIMIT_STATE(oom_disabled_rs,
> > +		DEFAULT_RATELIMIT_INTERVAL,
> > +		DEFAULT_RATELIMIT_BURST);
> > +
> >  /**
> >   * out_of_memory -  tries to invoke OOM killer.
> >   * @zonelist: zonelist pointer
> > @@ -792,6 +796,10 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
> >  	if (!oom_killer_disabled) {
> >  		__out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill);
> >  		ret = true;
> > +	} else if (gfp_mask & __GFP_NOFAIL) {
> > +		if (__ratelimit(&oom_disabled_rs))
> > +			WARN(1, "Unable to make forward progress for __GFP_NOFAIL because OOM killer is disbaled\n");
> > +		ret = true;
> 
> I'm fine with keeping the allocation looping, but is that message
> helpful?  It seems completely useless to the user encountering it.  Is
> it going to help kernel developers when we get a bug report with it?
> 
> WARN_ON_ONCE()?
> 

Yeah, I'm not sure that the warning is helpful (and it needs 
s/disbaled/disabled/ if it is to be kept).  I also think this check should 
be moved out of out_of_memory() since gfp/retry logic should be in the 
page allocator itself and not in the oom killer: just make 
__alloc_pages_may_oom() also set *did_some_progress = 1 for __GFP_NOFAIL.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH -v2] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-24 20:23   ` David Rientjes
@ 2015-02-25 14:08     ` Michal Hocko
  2015-02-25 20:41       ` David Rientjes
  0 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2015-02-25 14:08 UTC (permalink / raw)
  To: David Rientjes
  Cc: Johannes Weiner, Andrew Morton, \"Rafael J. Wysocki\",
	Tetsuo Handa, linux-mm, LKML

On Tue 24-02-15 12:23:55, David Rientjes wrote:
> On Tue, 24 Feb 2015, Johannes Weiner wrote:
[...]
> > I'm fine with keeping the allocation looping, but is that message
> > helpful?  It seems completely useless to the user encountering it.  Is
> > it going to help kernel developers when we get a bug report with it?
> > 
> > WARN_ON_ONCE()?
> > 
> 
> Yeah, I'm not sure that the warning is helpful (and it needs 
> s/disbaled/disabled/ if it is to be kept).  I also think this check should 
> be moved out of out_of_memory() since gfp/retry logic should be in the 
> page allocator itself and not in the oom killer: just make 
> __alloc_pages_may_oom() also set *did_some_progress = 1 for __GFP_NOFAIL.

OK, this is a good point. Updated patch is below:
---

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -v2] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-25 14:08     ` [PATCH -v2] " Michal Hocko
@ 2015-02-25 20:41       ` David Rientjes
  2015-02-26 17:34         ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: David Rientjes @ 2015-02-25 20:41 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Johannes Weiner, Andrew Morton, \"Rafael J. Wysocki\",
	Tetsuo Handa, linux-mm, LKML

On Wed, 25 Feb 2015, Michal Hocko wrote:

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 2d224bbdf8e8..c2ff40a30003 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2363,7 +2363,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  			goto out;
>  	}
>  	/* Exhausted what can be done so it's blamo time */
> -	if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false))
> +	if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
> +			|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
>  		*did_some_progress = 1;
>  out:
>  	oom_zonelist_unlock(ac->zonelist, gfp_mask);

Eek, not sure we actually need to play any games with did_some_progress, 
it might be clearer just to do this

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2760,7 +2760,7 @@ retry:
 							&did_some_progress);
 			if (page)
 				goto got_pg;
-			if (!did_some_progress)
+			if (!did_some_progress && !(gfp_mask & __GFP_NOFAIL))
 				goto nopage;
 		}
 		/* Wait for some write requests to complete then retry */

Either way you decide, feel free to add my

Acked-by: David Rientjes <rientjes@gooogle.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH -v2] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-25 20:41       ` David Rientjes
@ 2015-02-26 17:34         ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2015-02-26 17:34 UTC (permalink / raw)
  To: David Rientjes, Andrew Morton
  Cc: Johannes Weiner, \"Rafael J. Wysocki\",
	Tetsuo Handa, linux-mm, LKML

On Wed 25-02-15 12:41:07, David Rientjes wrote:
> On Wed, 25 Feb 2015, Michal Hocko wrote:
> 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 2d224bbdf8e8..c2ff40a30003 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2363,7 +2363,8 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
> >  			goto out;
> >  	}
> >  	/* Exhausted what can be done so it's blamo time */
> > -	if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false))
> > +	if (out_of_memory(ac->zonelist, gfp_mask, order, ac->nodemask, false)
> > +			|| WARN_ON_ONCE(gfp_mask & __GFP_NOFAIL))
> >  		*did_some_progress = 1;
> >  out:
> >  	oom_zonelist_unlock(ac->zonelist, gfp_mask);
> 
> Eek, not sure we actually need to play any games with did_some_progress, 
> it might be clearer just to do this

We would loose the warning which _might_ be helpful and I also find this
place better because it is close to the out_of_memory and this one has
only one failure mode. So I would prefer to stick with this unless there
are big objections.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2760,7 +2760,7 @@ retry:
>  							&did_some_progress);
>  			if (page)
>  				goto got_pg;
> -			if (!did_some_progress)
> +			if (!did_some_progress && !(gfp_mask & __GFP_NOFAIL))
>  				goto nopage;
>  		}
>  		/* Wait for some write requests to complete then retry */
> 
> Either way you decide, feel free to add my
> 
> Acked-by: David Rientjes <rientjes@gooogle.com>

Thanks!

Andrew, should I repost or you can pick it up from this thread? Assuming
you and others do not have objections of course.

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-24 19:11 ` Johannes Weiner
  2015-02-24 20:23   ` David Rientjes
@ 2015-02-24 22:09   ` Konstantin Khlebnikov
  2015-02-24 22:16     ` Konstantin Khlebnikov
  2015-02-25 14:02   ` Michal Hocko
  2 siblings, 1 reply; 10+ messages in thread
From: Konstantin Khlebnikov @ 2015-02-24 22:09 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, David Rientjes, \Rafael J. Wysocki\,
	Tetsuo Handa, linux-mm, LKML

On Tue, Feb 24, 2015 at 10:11 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
> On Tue, Feb 24, 2015 at 07:19:24PM +0100, Michal Hocko wrote:
>> Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
>> after OOM killer is disabled if the allocation is performed by a
>> kernel thread. This behavior was introduced from the very beginning by
>> 7f33d49a2ed5 (mm, PM/Freezer: Disable OOM killer when tasks are frozen).
>> This means that the basic contract for the allocation request is broken
>> and the context requesting such an allocation might blow up unexpectedly.
>>
>> There are basically two ways forward.
>> 1) move oom_killer_disable after kernel threads are frozen. This has a
>>    risk that the OOM victim wouldn't be able to finish because it would
>>    depend on an already frozen kernel thread. This would be really
>>    tricky to debug.
>> 2) do not fail GFP_NOFAIL allocation no matter what and risk a potential
>>    Freezable kernel threads will loop and fail the suspend. Incidental
>>    allocations after kernel threads are frozen will at least dump a
>>    warning - if we are lucky and the serial console is still active of
>>    course...
>>
>> This patch implements the later option because it is safer. We would see
>> warnings rather than allocation failures for the kernel threads which
>> would blow up otherwise and have a higher chances to identify
>> __GFP_NOFAIL users from deeper pm code.
>>
>> Signed-off-by: Michal Hocko <mhocko@suse.cz>
>> ---
>>
>> We haven't seen any bug reports
>>
>>  mm/oom_kill.c | 8 ++++++++
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>> index 642f38cb175a..ea8b443cd871 100644
>> --- a/mm/oom_kill.c
>> +++ b/mm/oom_kill.c
>> @@ -772,6 +772,10 @@ out:
>>               schedule_timeout_killable(1);
>>  }
>>
>> +static DEFINE_RATELIMIT_STATE(oom_disabled_rs,
>> +             DEFAULT_RATELIMIT_INTERVAL,
>> +             DEFAULT_RATELIMIT_BURST);
>> +
>>  /**
>>   * out_of_memory -  tries to invoke OOM killer.
>>   * @zonelist: zonelist pointer
>> @@ -792,6 +796,10 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
>>       if (!oom_killer_disabled) {
>>               __out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill);
>>               ret = true;
>> +     } else if (gfp_mask & __GFP_NOFAIL) {
>> +             if (__ratelimit(&oom_disabled_rs))
>> +                     WARN(1, "Unable to make forward progress for __GFP_NOFAIL because OOM killer is disbaled\n");
>> +             ret = true;
>
> I'm fine with keeping the allocation looping, but is that message
> helpful?  It seems completely useless to the user encountering it.  Is
> it going to help kernel developers when we get a bug report with it?
>
> WARN_ON_ONCE()?

maybe panic() ?

If somebody turns off oom-killer it seems he's pretty sure that he has
enough memory.

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-24 22:09   ` [PATCH] " Konstantin Khlebnikov
@ 2015-02-24 22:16     ` Konstantin Khlebnikov
  0 siblings, 0 replies; 10+ messages in thread
From: Konstantin Khlebnikov @ 2015-02-24 22:16 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Michal Hocko, Andrew Morton, David Rientjes, \Rafael J. Wysocki\,
	Tetsuo Handa, linux-mm, LKML

On Wed, Feb 25, 2015 at 1:09 AM, Konstantin Khlebnikov <koct9i@gmail.com> wrote:
> On Tue, Feb 24, 2015 at 10:11 PM, Johannes Weiner <hannes@cmpxchg.org> wrote:
>> On Tue, Feb 24, 2015 at 07:19:24PM +0100, Michal Hocko wrote:
>>> Tetsuo Handa has pointed out that __GFP_NOFAIL allocations might fail
>>> after OOM killer is disabled if the allocation is performed by a
>>> kernel thread. This behavior was introduced from the very beginning by
>>> 7f33d49a2ed5 (mm, PM/Freezer: Disable OOM killer when tasks are frozen).
>>> This means that the basic contract for the allocation request is broken
>>> and the context requesting such an allocation might blow up unexpectedly.
>>>
>>> There are basically two ways forward.
>>> 1) move oom_killer_disable after kernel threads are frozen. This has a
>>>    risk that the OOM victim wouldn't be able to finish because it would
>>>    depend on an already frozen kernel thread. This would be really
>>>    tricky to debug.
>>> 2) do not fail GFP_NOFAIL allocation no matter what and risk a potential
>>>    Freezable kernel threads will loop and fail the suspend. Incidental
>>>    allocations after kernel threads are frozen will at least dump a
>>>    warning - if we are lucky and the serial console is still active of
>>>    course...
>>>
>>> This patch implements the later option because it is safer. We would see
>>> warnings rather than allocation failures for the kernel threads which
>>> would blow up otherwise and have a higher chances to identify
>>> __GFP_NOFAIL users from deeper pm code.
>>>
>>> Signed-off-by: Michal Hocko <mhocko@suse.cz>
>>> ---
>>>
>>> We haven't seen any bug reports
>>>
>>>  mm/oom_kill.c | 8 ++++++++
>>>  1 file changed, 8 insertions(+)
>>>
>>> diff --git a/mm/oom_kill.c b/mm/oom_kill.c
>>> index 642f38cb175a..ea8b443cd871 100644
>>> --- a/mm/oom_kill.c
>>> +++ b/mm/oom_kill.c
>>> @@ -772,6 +772,10 @@ out:
>>>               schedule_timeout_killable(1);
>>>  }
>>>
>>> +static DEFINE_RATELIMIT_STATE(oom_disabled_rs,
>>> +             DEFAULT_RATELIMIT_INTERVAL,
>>> +             DEFAULT_RATELIMIT_BURST);
>>> +
>>>  /**
>>>   * out_of_memory -  tries to invoke OOM killer.
>>>   * @zonelist: zonelist pointer
>>> @@ -792,6 +796,10 @@ bool out_of_memory(struct zonelist *zonelist, gfp_t gfp_mask,
>>>       if (!oom_killer_disabled) {
>>>               __out_of_memory(zonelist, gfp_mask, order, nodemask, force_kill);
>>>               ret = true;
>>> +     } else if (gfp_mask & __GFP_NOFAIL) {
>>> +             if (__ratelimit(&oom_disabled_rs))
>>> +                     WARN(1, "Unable to make forward progress for __GFP_NOFAIL because OOM killer is disbaled\n");
>>> +             ret = true;
>>
>> I'm fine with keeping the allocation looping, but is that message
>> helpful?  It seems completely useless to the user encountering it.  Is
>> it going to help kernel developers when we get a bug report with it?
>>
>> WARN_ON_ONCE()?
>
> maybe panic() ?
>
> If somebody turns off oom-killer it seems he's pretty sure that he has
> enough memory.

Ah, that's used in freeze/suspend code. I thought that some kind of
sysctl for brave sysadmins.

>
>>
>> --
>> To unsubscribe, send a message with 'unsubscribe linux-mm' in
>> the body to majordomo@kvack.org.  For more info on Linux MM,
>> see: http://www.linux-mm.org/ .
>> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled
  2015-02-24 19:11 ` Johannes Weiner
  2015-02-24 20:23   ` David Rientjes
  2015-02-24 22:09   ` [PATCH] " Konstantin Khlebnikov
@ 2015-02-25 14:02   ` Michal Hocko
  2 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2015-02-25 14:02 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: Andrew Morton, David Rientjes, \"Rafael J. Wysocki\",
	Tetsuo Handa, linux-mm, LKML

On Tue 24-02-15 14:11:27, Johannes Weiner wrote:
[...]
> I'm fine with keeping the allocation looping, but is that message
> helpful?  It seems completely useless to the user encountering it.  Is
> it going to help kernel developers when we get a bug report with it?

It is better than a silent endless loop. And we get a trace which points
to the place which is doing the allocation. We haven't seen any weird
crashes during suspend throughout last 6 years so this would be
extremely unlikely and hard to reproduce so having the trace sounds
useful to me.

> WARN_ON_ONCE()?

I do not expect this will spew a lot of messages. But I can live with
WARN_ON_ONCE as well.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-26 17:34 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-24 18:19 [PATCH] mm, oom: do not fail __GFP_NOFAIL allocation if oom killer is disbaled Michal Hocko
2015-02-24 18:22 ` Michal Hocko
2015-02-24 19:11 ` Johannes Weiner
2015-02-24 20:23   ` David Rientjes
2015-02-25 14:08     ` [PATCH -v2] " Michal Hocko
2015-02-25 20:41       ` David Rientjes
2015-02-26 17:34         ` Michal Hocko
2015-02-24 22:09   ` [PATCH] " Konstantin Khlebnikov
2015-02-24 22:16     ` Konstantin Khlebnikov
2015-02-25 14:02   ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox