Stalled MM patches for review

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* Stalled MM patches for review
@ 2014-12-15 23:02 Andrew Morton
  2014-12-15 23:56 ` Yasuaki Ishimatsu
  0 siblings, 1 reply; 9+ messages in thread
From: Andrew Morton @ 2014-12-15 23:02 UTC (permalink / raw)
  To: linux-mm


I'm sitting on a bunch of patches which have question marks over them. 
I'll send them out now.  Can people please dig in and see if we can get
them finished off one way or the other?

My notes (which may be out of date):

mm-page_isolation-check-pfn-validity-before-access.patch:
  - Might be unneeded. mhocko has issues.

mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask.patch:
  - Needs review and checking

mm-page_alloc-embed-oom-killing-naturally-into-allocation-slowpath.patch:
  - mhocko wanted a changelog update

mm-fix-invalid-use-of-pfn_valid_within-in-test_pages_in_a_zone.patch:
  - Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> has issues with it

mm-introduce-do_shared_fault-and-drop-do_fault-fix-fix.patch:
  - Adds a comment whcih might not be true?

fs-mpagec-forgotten-write_sync-in-case-of-data-integrity-write.patch:
  - Unsure whether or not this helps.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stalled MM patches for review
  2014-12-15 23:02 Stalled MM patches for review Andrew Morton
@ 2014-12-15 23:56 ` Yasuaki Ishimatsu
  2014-12-16  3:06   ` Johannes Weiner
  0 siblings, 1 reply; 9+ messages in thread
From: Yasuaki Ishimatsu @ 2014-12-15 23:56 UTC (permalink / raw)
  To: linux-mm; +Cc: Andrew Morton

(2014/12/16 8:02), Andrew Morton wrote:
>
> I'm sitting on a bunch of patches which have question marks over them.
> I'll send them out now.  Can people please dig in and see if we can get
> them finished off one way or the other?
>
> My notes (which may be out of date):
>

Here are the threads of each discussion. Please use them as a reference.

> mm-page_isolation-check-pfn-validity-before-access.patch:
>    - Might be unneeded. mhocko has issues.

https://lkml.org/lkml/2014/11/6/79

>
> mm-page_allocc-__alloc_pages_nodemask-dont-alter-arg-gfp_mask.patch:
>    - Needs review and checking



> mm-page_alloc-embed-oom-killing-naturally-into-allocation-slowpath.patch:
>    - mhocko wanted a changelog update

https://lkml.org/lkml/2014/12/4/697

>
> mm-fix-invalid-use-of-pfn_valid_within-in-test_pages_in_a_zone.patch:
>    - Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> has issues with it

https://lkml.org/lkml/2014/12/9/482

>
> mm-introduce-do_shared_fault-and-drop-do_fault-fix-fix.patch:
>    - Adds a comment whcih might not be true?
>

> fs-mpagec-forgotten-write_sync-in-case-of-data-integrity-write.patch:
>    - Unsure whether or not this helps.

https://lkml.org/lkml/2014/2/15/245

Thanks,
Yasuaki Ishimatsu

>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stalled MM patches for review
  2014-12-15 23:56 ` Yasuaki Ishimatsu
@ 2014-12-16  3:06   ` Johannes Weiner
  2014-12-17  1:07     ` David Rientjes
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Weiner @ 2014-12-16  3:06 UTC (permalink / raw)
  To: Yasuaki Ishimatsu; +Cc: linux-mm, Andrew Morton

On Tue, Dec 16, 2014 at 08:56:49AM +0900, Yasuaki Ishimatsu wrote:
> (2014/12/16 8:02), Andrew Morton wrote:
> >
> >I'm sitting on a bunch of patches which have question marks over them.
> >I'll send them out now.  Can people please dig in and see if we can get
> >them finished off one way or the other?
> >
> >My notes (which may be out of date):
>
> >mm-page_alloc-embed-oom-killing-naturally-into-allocation-slowpath.patch:
> >   - mhocko wanted a changelog update
> 
> https://lkml.org/lkml/2014/12/4/697

Thanks Yasuaki-san!

Andrew, here is an updated version of this patch with a more detailed
changelog.  Must have fallen through the cracks:

---
From: Johannes Weiner <hannes@cmpxchg.org>
Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation slowpath
Date: Fri,  5 Dec 2014 09:48:13 -0500
Message-Id: <1417790893-32010-1-git-send-email-hannes@cmpxchg.org>

The OOM killing invocation does a lot of duplicative checks against
the task's allocation context.  Rework it to take advantage of the
existing checks in the allocator slowpath.

The OOM killer is invoked when the allocator is unable to reclaim any
pages but the allocation has to keep looping.  Instead of having a
check for __GFP_NORETRY hidden in oom_gfp_allowed(), just move the OOM
invocation to the true branch of should_alloc_retry().  The __GFP_FS
check from oom_gfp_allowed() can then be moved into the OOM avoidance
branch in __alloc_pages_may_oom(), along with the PF_DUMPCORE test.

__alloc_pages_may_oom() can then signal to the caller whether the OOM
killer was invoked, instead of requiring it to duplicate the order and
high_zoneidx checks to guess this when deciding whether to continue.

Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: Michal Hocko <mhocko@suse.cz>
---
 include/linux/oom.h |  5 ----
 mm/page_alloc.c     | 78 +++++++++++++++++++++++------------------------------
 2 files changed, 33 insertions(+), 50 deletions(-)

diff --git a/include/linux/oom.h b/include/linux/oom.h
index e8d6e1058723..4971874f54db 100644
--- a/include/linux/oom.h
+++ b/include/linux/oom.h
@@ -85,11 +85,6 @@ static inline void oom_killer_enable(void)
 	oom_killer_disabled = false;
 }
 
-static inline bool oom_gfp_allowed(gfp_t gfp_mask)
-{
-	return (gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY);
-}
-
 extern struct task_struct *find_lock_task_mm(struct task_struct *p);
 
 /* sysctls */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 616a2c956b4b..88b64c09a8c0 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2232,12 +2232,21 @@ static inline struct page *
 __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	struct zonelist *zonelist, enum zone_type high_zoneidx,
 	nodemask_t *nodemask, struct zone *preferred_zone,
-	int classzone_idx, int migratetype)
+	int classzone_idx, int migratetype, unsigned long *did_some_progress)
 {
 	struct page *page;
 
-	/* Acquire the per-zone oom lock for each zone */
+	*did_some_progress = 0;
+
+	if (oom_killer_disabled)
+		return NULL;
+
+	/*
+	 * Acquire the per-zone oom lock for each zone.  If that
+	 * fails, somebody else is making progress for us.
+	 */
 	if (!oom_zonelist_trylock(zonelist, gfp_mask)) {
+		*did_some_progress = 1;
 		schedule_timeout_uninterruptible(1);
 		return NULL;
 	}
@@ -2263,12 +2272,18 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 		goto out;
 
 	if (!(gfp_mask & __GFP_NOFAIL)) {
+		/* Coredumps can quickly deplete all memory reserves */
+		if (current->flags & PF_DUMPCORE)
+			goto out;
 		/* The OOM killer will not help higher order allocs */
 		if (order > PAGE_ALLOC_COSTLY_ORDER)
 			goto out;
 		/* The OOM killer does not needlessly kill tasks for lowmem */
 		if (high_zoneidx < ZONE_NORMAL)
 			goto out;
+		/* The OOM killer does not compensate for light reclaim */
+		if (!(gfp_mask & __GFP_FS))
+			goto out;
 		/*
 		 * GFP_THISNODE contains __GFP_NORETRY and we never hit this.
 		 * Sanity check for bare calls of __GFP_THISNODE, not real OOM.
@@ -2281,7 +2296,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
 	}
 	/* Exhausted what can be done so it's blamo time */
 	out_of_memory(zonelist, gfp_mask, order, nodemask, false);
-
+	*did_some_progress = 1;
 out:
 	oom_zonelist_unlock(zonelist, gfp_mask);
 	return page;
@@ -2571,7 +2586,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	    (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
 		goto nopage;
 
-restart:
 	if (!(gfp_mask & __GFP_NO_KSWAPD))
 		wake_all_kswapds(order, zonelist, high_zoneidx,
 				preferred_zone, nodemask);
@@ -2701,51 +2715,25 @@ rebalance:
 	if (page)
 		goto got_pg;
 
-	/*
-	 * If we failed to make any progress reclaiming, then we are
-	 * running out of options and have to consider going OOM
-	 */
-	if (!did_some_progress) {
-		if (oom_gfp_allowed(gfp_mask)) {
-			if (oom_killer_disabled)
-				goto nopage;
-			/* Coredumps can quickly deplete all memory reserves */
-			if ((current->flags & PF_DUMPCORE) &&
-			    !(gfp_mask & __GFP_NOFAIL))
-				goto nopage;
-			page = __alloc_pages_may_oom(gfp_mask, order,
-					zonelist, high_zoneidx,
-					nodemask, preferred_zone,
-					classzone_idx, migratetype);
-			if (page)
-				goto got_pg;
-
-			if (!(gfp_mask & __GFP_NOFAIL)) {
-				/*
-				 * The oom killer is not called for high-order
-				 * allocations that may fail, so if no progress
-				 * is being made, there are no other options and
-				 * retrying is unlikely to help.
-				 */
-				if (order > PAGE_ALLOC_COSTLY_ORDER)
-					goto nopage;
-				/*
-				 * The oom killer is not called for lowmem
-				 * allocations to prevent needlessly killing
-				 * innocent tasks.
-				 */
-				if (high_zoneidx < ZONE_NORMAL)
-					goto nopage;
-			}
-
-			goto restart;
-		}
-	}
-
 	/* Check if we should retry the allocation */
 	pages_reclaimed += did_some_progress;
 	if (should_alloc_retry(gfp_mask, order, did_some_progress,
 						pages_reclaimed)) {
+		/*
+		 * If we fail to make progress by freeing individual
+		 * pages, but the allocation wants us to keep going,
+		 * start OOM killing tasks.
+		 */
+		if (!did_some_progress) {
+			page = __alloc_pages_may_oom(gfp_mask, order, zonelist,
+						high_zoneidx, nodemask,
+						preferred_zone, classzone_idx,
+						migratetype,&did_some_progress);
+			if (page)
+				goto got_pg;
+			if (!did_some_progress)
+				goto nopage;
+		}
 		/* Wait for some write requests to complete then retry */
 		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
 		goto rebalance;
-- 
2.1.3

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stalled MM patches for review
  2014-12-16  3:06   ` Johannes Weiner
@ 2014-12-17  1:07     ` David Rientjes
  2014-12-17  2:13       ` Johannes Weiner
  0 siblings, 1 reply; 9+ messages in thread
From: David Rientjes @ 2014-12-17  1:07 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Yasuaki Ishimatsu, linux-mm, Andrew Morton

On Mon, 15 Dec 2014, Johannes Weiner wrote:

> diff --git a/include/linux/oom.h b/include/linux/oom.h
> index e8d6e1058723..4971874f54db 100644
> --- a/include/linux/oom.h
> +++ b/include/linux/oom.h
> @@ -85,11 +85,6 @@ static inline void oom_killer_enable(void)
>  	oom_killer_disabled = false;
>  }
>  
> -static inline bool oom_gfp_allowed(gfp_t gfp_mask)
> -{
> -	return (gfp_mask & __GFP_FS) && !(gfp_mask & __GFP_NORETRY);
> -}
> -

Hmm, my patch which removed this already seems to have been yanked from 
-mm because this is seen as an alternative.  Not sure why it couldn't have 
been rebased on top of it.

>  extern struct task_struct *find_lock_task_mm(struct task_struct *p);
>  
>  /* sysctls */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 616a2c956b4b..88b64c09a8c0 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2232,12 +2232,21 @@ static inline struct page *
>  __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	struct zonelist *zonelist, enum zone_type high_zoneidx,
>  	nodemask_t *nodemask, struct zone *preferred_zone,
> -	int classzone_idx, int migratetype)
> +	int classzone_idx, int migratetype, unsigned long *did_some_progress)
>  {
>  	struct page *page;
>  
> -	/* Acquire the per-zone oom lock for each zone */
> +	*did_some_progress = 0;

I initially didn't care much for using did_some_progress as a boolean 
value for __alloc_pages_may_oom() to determine whether the oom killer 
(either already running or having been called) should lead to future 
memory freeing, but its use for reclaim is similar with the exception that 
we don't need to actually know how much memory was freed since this check 
is now already under should_alloc_retry().  Pretty creative.

> +
> +	if (oom_killer_disabled)
> +		return NULL;
> +
> +	/*
> +	 * Acquire the per-zone oom lock for each zone.  If that
> +	 * fails, somebody else is making progress for us.
> +	 */
>  	if (!oom_zonelist_trylock(zonelist, gfp_mask)) {
> +		*did_some_progress = 1;
>  		schedule_timeout_uninterruptible(1);

Aside, outside the scope of this particular patch: I think this should 
probably be schedule_timeout_killable(1) instead in the case where current 
has already been killed as a result of the pending oom kill.  It's 
probably not a huge deal since we'll drop the oom zonelist locks almost 
immediately after that and we just have to wait to be scheduled again, but 
it is probably more correct to do schedule_timeout_killable(1).

>  		return NULL;
>  	}
> @@ -2263,12 +2272,18 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  		goto out;
>  
>  	if (!(gfp_mask & __GFP_NOFAIL)) {
> +		/* Coredumps can quickly deplete all memory reserves */
> +		if (current->flags & PF_DUMPCORE)
> +			goto out;
>  		/* The OOM killer will not help higher order allocs */
>  		if (order > PAGE_ALLOC_COSTLY_ORDER)
>  			goto out;
>  		/* The OOM killer does not needlessly kill tasks for lowmem */
>  		if (high_zoneidx < ZONE_NORMAL)
>  			goto out;
> +		/* The OOM killer does not compensate for light reclaim */
> +		if (!(gfp_mask & __GFP_FS))
> +			goto out;
>  		/*
>  		 * GFP_THISNODE contains __GFP_NORETRY and we never hit this.
>  		 * Sanity check for bare calls of __GFP_THISNODE, not real OOM.
> @@ -2281,7 +2296,7 @@ __alloc_pages_may_oom(gfp_t gfp_mask, unsigned int order,
>  	}
>  	/* Exhausted what can be done so it's blamo time */
>  	out_of_memory(zonelist, gfp_mask, order, nodemask, false);
> -
> +	*did_some_progress = 1;
>  out:
>  	oom_zonelist_unlock(zonelist, gfp_mask);
>  	return page;
> @@ -2571,7 +2586,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	    (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
>  		goto nopage;
>  
> -restart:
>  	if (!(gfp_mask & __GFP_NO_KSWAPD))
>  		wake_all_kswapds(order, zonelist, high_zoneidx,
>  				preferred_zone, nodemask);
> @@ -2701,51 +2715,25 @@ rebalance:
>  	if (page)
>  		goto got_pg;
>  
> -	/*
> -	 * If we failed to make any progress reclaiming, then we are
> -	 * running out of options and have to consider going OOM
> -	 */
> -	if (!did_some_progress) {
> -		if (oom_gfp_allowed(gfp_mask)) {
> -			if (oom_killer_disabled)
> -				goto nopage;
> -			/* Coredumps can quickly deplete all memory reserves */
> -			if ((current->flags & PF_DUMPCORE) &&
> -			    !(gfp_mask & __GFP_NOFAIL))
> -				goto nopage;
> -			page = __alloc_pages_may_oom(gfp_mask, order,
> -					zonelist, high_zoneidx,
> -					nodemask, preferred_zone,
> -					classzone_idx, migratetype);
> -			if (page)
> -				goto got_pg;
> -
> -			if (!(gfp_mask & __GFP_NOFAIL)) {
> -				/*
> -				 * The oom killer is not called for high-order
> -				 * allocations that may fail, so if no progress
> -				 * is being made, there are no other options and
> -				 * retrying is unlikely to help.
> -				 */
> -				if (order > PAGE_ALLOC_COSTLY_ORDER)
> -					goto nopage;
> -				/*
> -				 * The oom killer is not called for lowmem
> -				 * allocations to prevent needlessly killing
> -				 * innocent tasks.
> -				 */
> -				if (high_zoneidx < ZONE_NORMAL)
> -					goto nopage;
> -			}
> -
> -			goto restart;
> -		}
> -	}
> -
>  	/* Check if we should retry the allocation */
>  	pages_reclaimed += did_some_progress;
>  	if (should_alloc_retry(gfp_mask, order, did_some_progress,
>  						pages_reclaimed)) {
> +		/*
> +		 * If we fail to make progress by freeing individual
> +		 * pages, but the allocation wants us to keep going,
> +		 * start OOM killing tasks.
> +		 */
> +		if (!did_some_progress) {
> +			page = __alloc_pages_may_oom(gfp_mask, order, zonelist,
> +						high_zoneidx, nodemask,
> +						preferred_zone, classzone_idx,
> +						migratetype,&did_some_progress);

Missing a space.

> +			if (page)
> +				goto got_pg;
> +			if (!did_some_progress)
> +				goto nopage;
> +		}
>  		/* Wait for some write requests to complete then retry */
>  		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
>  		goto rebalance;

This is broken because it does not recall gfp_to_alloc_flags().  If 
current is the oom kill victim, then ALLOC_NO_WATERMARKS never gets set 
properly and the slowpath will end up looping forever.  The "restart" 
label which was removed in this patch needs to be reintroduced, and it can 
probably be moved to directly before gfp_to_alloc_flags().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stalled MM patches for review
  2014-12-17  1:07     ` David Rientjes
@ 2014-12-17  2:13       ` Johannes Weiner
  2014-12-17 15:51         ` Michal Hocko
  2014-12-17 22:28         ` David Rientjes
  0 siblings, 2 replies; 9+ messages in thread
From: Johannes Weiner @ 2014-12-17  2:13 UTC (permalink / raw)
  To: David Rientjes; +Cc: Yasuaki Ishimatsu, linux-mm, Andrew Morton

On Tue, Dec 16, 2014 at 05:07:16PM -0800, David Rientjes wrote:
> On Mon, 15 Dec 2014, Johannes Weiner wrote:
> >  	/* Check if we should retry the allocation */
> >  	pages_reclaimed += did_some_progress;
> >  	if (should_alloc_retry(gfp_mask, order, did_some_progress,
> >  						pages_reclaimed)) {
> > +		/*
> > +		 * If we fail to make progress by freeing individual
> > +		 * pages, but the allocation wants us to keep going,
> > +		 * start OOM killing tasks.
> > +		 */
> > +		if (!did_some_progress) {
> > +			page = __alloc_pages_may_oom(gfp_mask, order, zonelist,
> > +						high_zoneidx, nodemask,
> > +						preferred_zone, classzone_idx,
> > +						migratetype,&did_some_progress);
> 
> Missing a space.

That was because of the 80 character limit, it seemed preferrable over
a linewrap.

> > +			if (page)
> > +				goto got_pg;
> > +			if (!did_some_progress)
> > +				goto nopage;
> > +		}
> >  		/* Wait for some write requests to complete then retry */
> >  		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
> >  		goto rebalance;
> 
> This is broken because it does not recall gfp_to_alloc_flags().  If 
> current is the oom kill victim, then ALLOC_NO_WATERMARKS never gets set 
> properly and the slowpath will end up looping forever.  The "restart" 
> label which was removed in this patch needs to be reintroduced, and it can 
> probably be moved to directly before gfp_to_alloc_flags().

Thanks for catching this.  gfp_to_alloc_flags()'s name doesn't exactly
imply such side effects...  Here is a fixlet on top:

---

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stalled MM patches for review
  2014-12-17  2:13       ` Johannes Weiner
@ 2014-12-17 15:51         ` Michal Hocko
  2014-12-17 22:28         ` David Rientjes
  1 sibling, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2014-12-17 15:51 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: David Rientjes, Yasuaki Ishimatsu, linux-mm, Andrew Morton

On Tue 16-12-14 21:13:02, Johannes Weiner wrote:
[...]
> From 45362d1920340716ef58bf1024d9674b5dfa809e Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Tue, 16 Dec 2014 21:04:24 -0500
> Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation
>  slowpath fix
> 
> When retrying the allocation after potentially invoking OOM, make sure
> the alloc flags are recalculated, as they have to consider TIF_MEMDIE.
> 
> Restore the original restart label, but rename it to 'retry' to match
> the should_alloc_retry() it depends on.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>

Missed that too. Well spotted!
Acked-by: Michal Hocko <mhocko@suse.cz>

> ---
>  mm/page_alloc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 83ec725aec36..e8f5997c557c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2673,6 +2673,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	    (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
>  		goto nopage;
>  
> +retry:
>  	if (!(gfp_mask & __GFP_NO_KSWAPD))
>  		wake_all_kswapds(order, zonelist, high_zoneidx,
>  				preferred_zone, nodemask);
> @@ -2695,7 +2696,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		classzone_idx = zonelist_zone_idx(preferred_zoneref);
>  	}
>  
> -rebalance:
>  	/* This is the last chance, in general, before the goto nopage. */
>  	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
>  			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
> @@ -2823,7 +2823,7 @@ rebalance:
>  		}
>  		/* Wait for some write requests to complete then retry */
>  		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
> -		goto rebalance;
> +		goto retry;
>  	} else {
>  		/*
>  		 * High-order allocations do not necessarily loop after
> -- 
> 2.1.3
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stalled MM patches for review
  2014-12-17  2:13       ` Johannes Weiner
  2014-12-17 15:51         ` Michal Hocko
@ 2014-12-17 22:28         ` David Rientjes
  2014-12-18  2:20           ` Johannes Weiner
  1 sibling, 1 reply; 9+ messages in thread
From: David Rientjes @ 2014-12-17 22:28 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Yasuaki Ishimatsu, linux-mm, Andrew Morton

On Tue, 16 Dec 2014, Johannes Weiner wrote:

> > This is broken because it does not recall gfp_to_alloc_flags().  If 
> > current is the oom kill victim, then ALLOC_NO_WATERMARKS never gets set 
> > properly and the slowpath will end up looping forever.  The "restart" 
> > label which was removed in this patch needs to be reintroduced, and it can 
> > probably be moved to directly before gfp_to_alloc_flags().
> 
> Thanks for catching this.  gfp_to_alloc_flags()'s name doesn't exactly
> imply such side effects...  Here is a fixlet on top:
> 

It would have livelocked the machine on an oom kill.

> ---
> From 45362d1920340716ef58bf1024d9674b5dfa809e Mon Sep 17 00:00:00 2001
> From: Johannes Weiner <hannes@cmpxchg.org>
> Date: Tue, 16 Dec 2014 21:04:24 -0500
> Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation
>  slowpath fix
> 
> When retrying the allocation after potentially invoking OOM, make sure
> the alloc flags are recalculated, as they have to consider TIF_MEMDIE.
> 
> Restore the original restart label, but rename it to 'retry' to match
> the should_alloc_retry() it depends on.
> 
> Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> ---
>  mm/page_alloc.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 83ec725aec36..e8f5997c557c 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2673,6 +2673,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  	    (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
>  		goto nopage;
>  
> +retry:
>  	if (!(gfp_mask & __GFP_NO_KSWAPD))
>  		wake_all_kswapds(order, zonelist, high_zoneidx,
>  				preferred_zone, nodemask);
> @@ -2695,7 +2696,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
>  		classzone_idx = zonelist_zone_idx(preferred_zoneref);
>  	}
>  
> -rebalance:
>  	/* This is the last chance, in general, before the goto nopage. */
>  	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
>  			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
> @@ -2823,7 +2823,7 @@ rebalance:
>  		}
>  		/* Wait for some write requests to complete then retry */
>  		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
> -		goto rebalance;
> +		goto retry;
>  	} else {
>  		/*
>  		 * High-order allocations do not necessarily loop after

Why remove 'rebalance'?  In the situation where direct reclaim does free 
memory and we're waiting on writeback (no call to the oom killer is made), 
it doesn't seem necessary to recalculate classzone_idx.

Additionally, we never called wait_iff_congested() before when the oom 
killer freed memory.  This is a no-op if the preferred_zone isn't waiting 
on writeback, but seems pointless if we just freed memory by calling the 
oom killer.

In other words, I'm not sure why you're fixlet isn't this:

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -2673,6 +2673,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
 	    (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
 		goto nopage;
 
+retry:
 	if (!(gfp_mask & __GFP_NO_KSWAPD))
 		wake_all_kswapds(order, zonelist, high_zoneidx,
 				preferred_zone, nodemask);
@@ -2822,6 +2823,7 @@ rebalance:
 				BUG_ON(gfp_mask & __GFP_NOFAIL);
 				goto nopage;
 			}
+			goto retry;
 		}
 		/* Wait for some write requests to complete then retry */
 		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stalled MM patches for review
  2014-12-17 22:28         ` David Rientjes
@ 2014-12-18  2:20           ` Johannes Weiner
  2014-12-18 16:55             ` Michal Hocko
  0 siblings, 1 reply; 9+ messages in thread
From: Johannes Weiner @ 2014-12-18  2:20 UTC (permalink / raw)
  To: David Rientjes; +Cc: Yasuaki Ishimatsu, linux-mm, Andrew Morton

On Wed, Dec 17, 2014 at 02:28:37PM -0800, David Rientjes wrote:
> On Tue, 16 Dec 2014, Johannes Weiner wrote:
> 
> > > This is broken because it does not recall gfp_to_alloc_flags().  If 
> > > current is the oom kill victim, then ALLOC_NO_WATERMARKS never gets set 
> > > properly and the slowpath will end up looping forever.  The "restart" 
> > > label which was removed in this patch needs to be reintroduced, and it can 
> > > probably be moved to directly before gfp_to_alloc_flags().
> > 
> > Thanks for catching this.  gfp_to_alloc_flags()'s name doesn't exactly
> > imply such side effects...  Here is a fixlet on top:
> > 
> 
> It would have livelocked the machine on an oom kill.

Very unlikely.  The allocator will loop trying to reclaim and there is
usually other activity on the system that makes progress.  There must
be, because the allocator can always hold locks that the victim needs
to exit.

> > From 45362d1920340716ef58bf1024d9674b5dfa809e Mon Sep 17 00:00:00 2001
> > From: Johannes Weiner <hannes@cmpxchg.org>
> > Date: Tue, 16 Dec 2014 21:04:24 -0500
> > Subject: [patch] mm: page_alloc: embed OOM killing naturally into allocation
> >  slowpath fix
> > 
> > When retrying the allocation after potentially invoking OOM, make sure
> > the alloc flags are recalculated, as they have to consider TIF_MEMDIE.
> > 
> > Restore the original restart label, but rename it to 'retry' to match
> > the should_alloc_retry() it depends on.
> > 
> > Signed-off-by: Johannes Weiner <hannes@cmpxchg.org>
> > ---
> >  mm/page_alloc.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 83ec725aec36..e8f5997c557c 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -2673,6 +2673,7 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  	    (gfp_mask & GFP_THISNODE) == GFP_THISNODE)
> >  		goto nopage;
> >  
> > +retry:
> >  	if (!(gfp_mask & __GFP_NO_KSWAPD))
> >  		wake_all_kswapds(order, zonelist, high_zoneidx,
> >  				preferred_zone, nodemask);
> > @@ -2695,7 +2696,6 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order,
> >  		classzone_idx = zonelist_zone_idx(preferred_zoneref);
> >  	}
> >  
> > -rebalance:
> >  	/* This is the last chance, in general, before the goto nopage. */
> >  	page = get_page_from_freelist(gfp_mask, nodemask, order, zonelist,
> >  			high_zoneidx, alloc_flags & ~ALLOC_NO_WATERMARKS,
> > @@ -2823,7 +2823,7 @@ rebalance:
> >  		}
> >  		/* Wait for some write requests to complete then retry */
> >  		wait_iff_congested(preferred_zone, BLK_RW_ASYNC, HZ/50);
> > -		goto rebalance;
> > +		goto retry;
> >  	} else {
> >  		/*
> >  		 * High-order allocations do not necessarily loop after
> 
> Why remove 'rebalance'?  In the situation where direct reclaim does free 
> memory and we're waiting on writeback (no call to the oom killer is made), 
> it doesn't seem necessary to recalculate classzone_idx.
> 
> Additionally, we never called wait_iff_congested() before when the oom 
> killer freed memory.  This is a no-op if the preferred_zone isn't waiting 
> on writeback, but seems pointless if we just freed memory by calling the 
> oom killer.

Why keep all these undocumented assumptions in the code?  It's really
simple: if we retry freeing memory (LRU reclaim or OOM kills), we wait
for congestion, kick kswapd, re-evaluate the current task state,
regardless of which reclaim method did what or anything at all.  It's
a slowpath, so there is no reason to not keep this simple and robust.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: Stalled MM patches for review
  2014-12-18  2:20           ` Johannes Weiner
@ 2014-12-18 16:55             ` Michal Hocko
  0 siblings, 0 replies; 9+ messages in thread
From: Michal Hocko @ 2014-12-18 16:55 UTC (permalink / raw)
  To: Johannes Weiner
  Cc: David Rientjes, Yasuaki Ishimatsu, linux-mm, Andrew Morton

On Wed 17-12-14 21:20:19, Johannes Weiner wrote:
> On Wed, Dec 17, 2014 at 02:28:37PM -0800, David Rientjes wrote:
[...]
> > Why remove 'rebalance'?  In the situation where direct reclaim does free 
> > memory and we're waiting on writeback (no call to the oom killer is made), 
> > it doesn't seem necessary to recalculate classzone_idx.
> > 
> > Additionally, we never called wait_iff_congested() before when the oom 
> > killer freed memory.  This is a no-op if the preferred_zone isn't waiting 
> > on writeback, but seems pointless if we just freed memory by calling the 
> > oom killer.
> 
> Why keep all these undocumented assumptions in the code?  It's really
> simple: if we retry freeing memory (LRU reclaim or OOM kills), we wait
> for congestion, kick kswapd, re-evaluate the current task state,
> regardless of which reclaim method did what or anything at all.  It's
> a slowpath, so there is no reason to not keep this simple and robust.

Agreed, the less subtle loops via labels we have the better.
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-12-18 16:55 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-15 23:02 Stalled MM patches for review Andrew Morton
2014-12-15 23:56 ` Yasuaki Ishimatsu
2014-12-16  3:06   ` Johannes Weiner
2014-12-17  1:07     ` David Rientjes
2014-12-17  2:13       ` Johannes Weiner
2014-12-17 15:51         ` Michal Hocko
2014-12-17 22:28         ` David Rientjes
2014-12-18  2:20           ` Johannes Weiner
2014-12-18 16:55             ` Michal Hocko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox