* [PATCH] mm: stop kswapd's infinite loop at high order allocation @ 2008-12-30 10:55 KOSAKI Motohiro 2008-12-30 11:10 ` Nick Piggin 2008-12-30 18:59 ` Mel Gorman 0 siblings, 2 replies; 20+ messages in thread From: KOSAKI Motohiro @ 2008-12-30 10:55 UTC (permalink / raw) To: LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash; +Cc: kosaki.motohiro ok, wassim confirmed this patch works well. == From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation Wassim Dagash reported following kswapd infinite loop problem. kswapd runs in some infinite loop trying to swap until order 10 of zone highmem is OK, While zone higmem (as I understand) has nothing to do with contiguous memory (cause there is no 1-1 mapping) which means kswapd will continue to try to balance order 10 of zone highmem forever (or until someone release a very large chunk of highmem). He proposed remove contenious checking on highmem at all. However hugepage on highmem need contenious highmem page. To add infinite loop stopper is simple and good. Reported-by: wassim dagash <wassim.dagash@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> --- mm/vmscan.c | 11 +++++++++++ 1 file changed, 11 insertions(+) Index: b/mm/vmscan.c =================================================================== --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1872,6 +1872,17 @@ out: try_to_freeze(); + /* + * When highmem is very fragmented, + * alloc_pages(GFP_KERNEL, very-high-order) can cause + * infinite loop because zone_watermark_ok(highmem) failed. + * However, alloc_pages(GFP_KERNEL..) indicate highmem memory + * continuousness isn't necessary. + * Therefore we don't want contenious check at 2nd loop. + */ + if (nr_reclaimed < SWAP_CLUSTER_MAX) + order = sc.order = 0; + goto loop_again; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-30 10:55 [PATCH] mm: stop kswapd's infinite loop at high order allocation KOSAKI Motohiro @ 2008-12-30 11:10 ` Nick Piggin 2008-12-30 18:59 ` Mel Gorman 1 sibling, 0 replies; 20+ messages in thread From: Nick Piggin @ 2008-12-30 11:10 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, wassim dagash On Tue, Dec 30, 2008 at 07:55:47PM +0900, KOSAKI Motohiro wrote: > > ok, wassim confirmed this patch works well. > > > == > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation > > Wassim Dagash reported following kswapd infinite loop problem. > > kswapd runs in some infinite loop trying to swap until order 10 of zone > highmem is OK, While zone higmem (as I understand) has nothing to do > with contiguous memory (cause there is no 1-1 mapping) which means > kswapd will continue to try to balance order 10 of zone highmem > forever (or until someone release a very large chunk of highmem). > > He proposed remove contenious checking on highmem at all. > However hugepage on highmem need contenious highmem page. > > To add infinite loop stopper is simple and good. > > > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Thanks, Reviewed-by: Nick Piggin <npiggin@suse.de> > --- > mm/vmscan.c | 11 +++++++++++ > 1 file changed, 11 insertions(+) > > Index: b/mm/vmscan.c > =================================================================== > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1872,6 +1872,17 @@ out: > > try_to_freeze(); > > + /* > + * When highmem is very fragmented, > + * alloc_pages(GFP_KERNEL, very-high-order) can cause > + * infinite loop because zone_watermark_ok(highmem) failed. > + * However, alloc_pages(GFP_KERNEL..) indicate highmem memory > + * continuousness isn't necessary. > + * Therefore we don't want contenious check at 2nd loop. > + */ > + if (nr_reclaimed < SWAP_CLUSTER_MAX) > + order = sc.order = 0; > + > goto loop_again; > } > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-30 10:55 [PATCH] mm: stop kswapd's infinite loop at high order allocation KOSAKI Motohiro 2008-12-30 11:10 ` Nick Piggin @ 2008-12-30 18:59 ` Mel Gorman 2008-12-31 1:32 ` Nick Piggin 2008-12-31 4:54 ` KOSAKI Motohiro 1 sibling, 2 replies; 20+ messages in thread From: Mel Gorman @ 2008-12-30 18:59 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash On Tue, Dec 30, 2008 at 07:55:47PM +0900, KOSAKI Motohiro wrote: > > ok, wassim confirmed this patch works well. > > > == > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation > > Wassim Dagash reported following kswapd infinite loop problem. > > kswapd runs in some infinite loop trying to swap until order 10 of zone > highmem is OK, While zone higmem (as I understand) has nothing to do > with contiguous memory (cause there is no 1-1 mapping) which means > kswapd will continue to try to balance order 10 of zone highmem > forever (or until someone release a very large chunk of highmem). > > He proposed remove contenious checking on highmem at all. > However hugepage on highmem need contenious highmem page. > I'm lacking the original problem report, but contiguous order-10 pages are indeed required for hugepages in highmem and reclaiming for them should not be totally disabled at any point. While no 1-1 mapping exists for the kernel, contiguity is still required. kswapd gets a sc.order when it is known there is a process trying to get high-order pages so it can reclaim at that order in an attempt to prevent future direct reclaim at a high-order. Your patch does not appear to depend on GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to loop again at order-0 means it may scan and reclaim more memory unnecessarily seeing as all_zones_ok was calculated based on a high-order value, not order-0. While constantly looping trying to balance for high-orders is indeed bad, I'm unconvinced this is the correct change. As we have already gone through a priorities and scanned everything at the high-order, would it not make more sense to do just give up with something like the following? /* * If zones are still not balanced, loop again and continue attempting * to rebalance the system. For high-order allocations, fragmentation * can prevent the zones being rebalanced no matter how hard kswapd * works, particularly on systems with little or no swap. For costly * orders, just give up and assume interested processes will either * direct reclaim or wake up kswapd as necessary. */ if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { cond_resched(); try_to_freeze(); goto loop_again; } I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are expected to support allocations up to that order in a fairly reliable fashion. > To add infinite loop stopper is simple and good. > > > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> For the moment NAKed-by: Mel Gorman <mel@csn.ul.ie> How about the following (compile-tested-only) patch? ============= From: Mel Gorman <mel@csn.ul.ie> Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation kswapd runs in some infinite loop trying to swap until order 10 of zone highmem is OK.... kswapd will continue to try to balance order 10 of zone highmem forever (or until someone release a very large chunk of highmem). For costly high-order allocations, the system may never be balanced due to fragmentation but kswapd should not infinitely loop as a result. The following patch lets kswapd stop reclaiming in the event it cannot balance zones and the order is high-order. Reported-by: wassim dagash <wassim.dagash@gmail.com> Signed-off-by: Mel Gorman <mel@csn.ul.ie> diff --git a/mm/vmscan.c b/mm/vmscan.c index 62e7f62..03ed9a0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1867,7 +1867,16 @@ out: zone->prev_priority = temp_priority[i]; } - if (!all_zones_ok) { + + /* + * If zones are still not balanced, loop again and continue attempting + * to rebalance the system. For high-order allocations, fragmentation + * can prevent the zones being rebalanced no matter how hard kswapd + * works, particularly on systems with little or no swap. For costly + * orders, just give up and assume interested processes will either + * direct reclaim or wake up kswapd as necessary. + */ + if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { cond_resched(); try_to_freeze(); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-30 18:59 ` Mel Gorman @ 2008-12-31 1:32 ` Nick Piggin 2008-12-31 11:06 ` Mel Gorman 2008-12-31 4:54 ` KOSAKI Motohiro 1 sibling, 1 reply; 20+ messages in thread From: Nick Piggin @ 2008-12-31 1:32 UTC (permalink / raw) To: Mel Gorman; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, wassim dagash On Tue, Dec 30, 2008 at 06:59:19PM +0000, Mel Gorman wrote: > On Tue, Dec 30, 2008 at 07:55:47PM +0900, KOSAKI Motohiro wrote: > > > > ok, wassim confirmed this patch works well. > > > > > > == > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation > > > > Wassim Dagash reported following kswapd infinite loop problem. > > > > kswapd runs in some infinite loop trying to swap until order 10 of zone > > highmem is OK, While zone higmem (as I understand) has nothing to do > > with contiguous memory (cause there is no 1-1 mapping) which means > > kswapd will continue to try to balance order 10 of zone highmem > > forever (or until someone release a very large chunk of highmem). > > > > He proposed remove contenious checking on highmem at all. > > However hugepage on highmem need contenious highmem page. > > > > I'm lacking the original problem report, but contiguous order-10 pages are > indeed required for hugepages in highmem and reclaiming for them should not > be totally disabled at any point. While no 1-1 mapping exists for the kernel, > contiguity is still required. This doesn't totally disable them. It disables asynchronous reclaim for them until the next time kswapd kicks is kicked by a higher order allocator. The guy who kicked us off this time should go into direct reclaim. > kswapd gets a sc.order when it is known there is a process trying to get > high-order pages so it can reclaim at that order in an attempt to prevent > future direct reclaim at a high-order. Your patch does not appear to depend on > GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to > loop again at order-0 means it may scan and reclaim more memory unnecessarily > seeing as all_zones_ok was calculated based on a high-order value, not order-0. It shouldn't, because it should check all that. > While constantly looping trying to balance for high-orders is indeed bad, > I'm unconvinced this is the correct change. As we have already gone through > a priorities and scanned everything at the high-order, would it not make > more sense to do just give up with something like the following? > > /* > * If zones are still not balanced, loop again and continue attempting > * to rebalance the system. For high-order allocations, fragmentation > * can prevent the zones being rebalanced no matter how hard kswapd > * works, particularly on systems with little or no swap. For costly > * orders, just give up and assume interested processes will either > * direct reclaim or wake up kswapd as necessary. > */ > if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > cond_resched(); > > try_to_freeze(); > > goto loop_again; > } > > I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are > expected to support allocations up to that order in a fairly reliable fashion. I actually think it's better to do it for all orders, because that constant is more or less arbitrary. It is possible a zone might become too fragmented to support this, but the allocating process has been OOMed or had their allocation satisfied from another zone. kswapd would have no way out of the loop even if the system no longer requires higher order allocations. IOW, I don't see a big downside, and there is a real upside. I think the patch is good. > > > To add infinite loop stopper is simple and good. > > > > > > > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > For the moment > > NAKed-by: Mel Gorman <mel@csn.ul.ie> > > How about the following (compile-tested-only) patch? > > ============= > From: Mel Gorman <mel@csn.ul.ie> > Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation > > kswapd runs in some infinite loop trying to swap until order 10 of zone > highmem is OK.... kswapd will continue to try to balance order 10 of zone > highmem forever (or until someone release a very large chunk of highmem). > > For costly high-order allocations, the system may never be balanced due to > fragmentation but kswapd should not infinitely loop as a result. The > following patch lets kswapd stop reclaiming in the event it cannot > balance zones and the order is high-order. > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 62e7f62..03ed9a0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1867,7 +1867,16 @@ out: > > zone->prev_priority = temp_priority[i]; > } > - if (!all_zones_ok) { > + > + /* > + * If zones are still not balanced, loop again and continue attempting > + * to rebalance the system. For high-order allocations, fragmentation > + * can prevent the zones being rebalanced no matter how hard kswapd > + * works, particularly on systems with little or no swap. For costly > + * orders, just give up and assume interested processes will either > + * direct reclaim or wake up kswapd as necessary. > + */ > + if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > cond_resched(); > > try_to_freeze(); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-31 1:32 ` Nick Piggin @ 2008-12-31 11:06 ` Mel Gorman 2008-12-31 11:16 ` Nick Piggin 0 siblings, 1 reply; 20+ messages in thread From: Mel Gorman @ 2008-12-31 11:06 UTC (permalink / raw) To: Nick Piggin; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, wassim dagash On Wed, Dec 31, 2008 at 02:32:33AM +0100, Nick Piggin wrote: > On Tue, Dec 30, 2008 at 06:59:19PM +0000, Mel Gorman wrote: > > On Tue, Dec 30, 2008 at 07:55:47PM +0900, KOSAKI Motohiro wrote: > > > > > > ok, wassim confirmed this patch works well. > > > > > > > > > == > > > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > > > Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation > > > > > > Wassim Dagash reported following kswapd infinite loop problem. > > > > > > kswapd runs in some infinite loop trying to swap until order 10 of zone > > > highmem is OK, While zone higmem (as I understand) has nothing to do > > > with contiguous memory (cause there is no 1-1 mapping) which means > > > kswapd will continue to try to balance order 10 of zone highmem > > > forever (or until someone release a very large chunk of highmem). > > > > > > He proposed remove contenious checking on highmem at all. > > > However hugepage on highmem need contenious highmem page. > > > > > > > I'm lacking the original problem report, but contiguous order-10 pages are > > indeed required for hugepages in highmem and reclaiming for them should not > > be totally disabled at any point. While no 1-1 mapping exists for the kernel, > > contiguity is still required. > > This doesn't totally disable them. It disables asynchronous reclaim for them > until the next time kswapd kicks is kicked by a higher order allocator. The > guy who kicked us off this time should go into direct reclaim. > I get that. The check to bail out is made after we've already done the scanning. I wanted to be clear that contiguity in highmem is required for hugepages. > > > kswapd gets a sc.order when it is known there is a process trying to get > > high-order pages so it can reclaim at that order in an attempt to prevent > > future direct reclaim at a high-order. Your patch does not appear to depend on > > GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to > > loop again at order-0 means it may scan and reclaim more memory unnecessarily > > seeing as all_zones_ok was calculated based on a high-order value, not order-0. > > It shouldn't, because it should check all that. > Ok, with KOSAKI's patch we 1. Set order to 0 (and stop kswapd doing what it was asked to do) 2. goto loop_again 3. nr_reclaimed gets set to 0 (meaning we lose that value, but no biggie as it doesn't get used by the caller anyway) 4. Reset all priorities 5. Do something like the following for (priority = DEF_PRIORITY; priority >= 0; priority--) { ... all_zones_ok = 1; for (i = pgdat->nr_zones - 1; i >= 0; i--) { ... if (inactive_anon_is_low(zone)) { shrink_active_list(SWAP_CLUSTER_MAX, zone, &sc, priority, 0); } if (!zone_watermark_ok(zone, order, zone->pages_high, 0, 0)) { end_zone = i; break; } } } So, by looping around, we could end up shrinking the active list again before we recheck the zone watermarks depending on the size of the inactive lists. If the size of the lists is ok, I agree that we'll go through the lists, do no reclaiming and exit out, albeit returning 0 when we have reclaimed pages. > > > While constantly looping trying to balance for high-orders is indeed bad, > > I'm unconvinced this is the correct change. As we have already gone through > > a priorities and scanned everything at the high-order, would it not make > > more sense to do just give up with something like the following? > > > > /* > > * If zones are still not balanced, loop again and continue attempting > > * to rebalance the system. For high-order allocations, fragmentation > > * can prevent the zones being rebalanced no matter how hard kswapd > > * works, particularly on systems with little or no swap. For costly > > * orders, just give up and assume interested processes will either > > * direct reclaim or wake up kswapd as necessary. > > */ > > if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > > cond_resched(); > > > > try_to_freeze(); > > > > goto loop_again; > > } > > > > I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are > > expected to support allocations up to that order in a fairly reliable fashion. > > I actually think it's better to do it for all orders, because that > constant is more or less arbitrary. i.e. if (!all_zones_ok && sc.order == 0) { ? or something else What I did miss was that we have if (nr_reclaimed >= SWAP_CLUSTER_MAX) break; so with my patch, kswapd is bailing out early without trying to reclaim for high-orders that hard. That was not what I intended as it means we only ever really rebalance the full system for order-0 pages and for everything else we do relatively light scanning. The impact is that high-order users will direct reclaim rather than depending on kswapd scanning very heavily. Arguably, this is a good thing. However, it also means that KOSAKI's and my patches only differs in that mine bails early and KOSAKI rechecks everything at order-0, possibly reclaiming more. If the comment was not so misleading, I'd have been a lot happier. > It is possible a zone might become > too fragmented to support this, but the allocating process has been OOMed or > had their allocation satisfied from another zone. kswapd would have no way > out of the loop even if the system no longer requires higher order allocations. > > IOW, I don't see a big downside, and there is a real upside. > > I think the patch is good. > Which one, KOSAKI's or my one? Here is my one again which bails out for any high-order allocation after just light scanning. ==== ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-31 11:06 ` Mel Gorman @ 2008-12-31 11:16 ` Nick Piggin 2008-12-31 12:11 ` Mel Gorman 0 siblings, 1 reply; 20+ messages in thread From: Nick Piggin @ 2008-12-31 11:16 UTC (permalink / raw) To: Mel Gorman; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, wassim dagash On Wed, Dec 31, 2008 at 11:06:19AM +0000, Mel Gorman wrote: > On Wed, Dec 31, 2008 at 02:32:33AM +0100, Nick Piggin wrote: > > On Tue, Dec 30, 2008 at 06:59:19PM +0000, Mel Gorman wrote: > > > On Tue, Dec 30, 2008 at 07:55:47PM +0900, KOSAKI Motohiro wrote: > > > kswapd gets a sc.order when it is known there is a process trying to get > > > high-order pages so it can reclaim at that order in an attempt to prevent > > > future direct reclaim at a high-order. Your patch does not appear to depend on > > > GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to > > > loop again at order-0 means it may scan and reclaim more memory unnecessarily > > > seeing as all_zones_ok was calculated based on a high-order value, not order-0. > > > > It shouldn't, because it should check all that. > > > > Ok, with KOSAKI's patch we > > 1. Set order to 0 (and stop kswapd doing what it was asked to do) > 2. goto loop_again > 3. nr_reclaimed gets set to 0 (meaning we lose that value, but no biggie > as it doesn't get used by the caller anyway) > 4. Reset all priorities > 5. Do something like the following > > for (priority = DEF_PRIORITY; priority >= 0; priority--) { > ... > all_zones_ok = 1; > for (i = pgdat->nr_zones - 1; i >= 0; i--) { > ... > if (inactive_anon_is_low(zone)) { > shrink_active_list(SWAP_CLUSTER_MAX, zone, > &sc, priority, 0); > } > > if (!zone_watermark_ok(zone, order, zone->pages_high, > 0, 0)) { > end_zone = i; > break; > } > } > } > > So, by looping around, we could end up shrinking the active list again > before we recheck the zone watermarks depending on the size of the > inactive lists. If this is a problem, it is a problem with that code, because kswapd can be woken up for any zone at any time anyway. > > > cond_resched(); > > > > > > try_to_freeze(); > > > > > > goto loop_again; > > > } > > > > > > I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are > > > expected to support allocations up to that order in a fairly reliable fashion. > > > > I actually think it's better to do it for all orders, because that > > constant is more or less arbitrary. > > i.e. > > if (!all_zones_ok && sc.order == 0) { > > ? or something else Well, I jus tdon't see what's wrong with the original patch. > What I did miss was that we have > > if (nr_reclaimed >= SWAP_CLUSTER_MAX) > break; > > so with my patch, kswapd is bailing out early without trying to reclaim for > high-orders that hard. That was not what I intended as it means we only ever > really rebalance the full system for order-0 pages and for everything else we > do relatively light scanning. The impact is that high-order users will direct > reclaim rather than depending on kswapd scanning very heavily. Arguably, > this is a good thing. > > However, it also means that KOSAKI's and my patches only differs in that mine > bails early and KOSAKI rechecks everything at order-0, possibly reclaiming > more. If the comment was not so misleading, I'd have been a lot happier. Rechecking everything is fine by me; order-0 is going to be the most common and most important. If higher order allocations sometimes have to enter direct reclaim or kick off kswapd again, it isn't a big deal. > > IOW, I don't see a big downside, and there is a real upside. > > > > I think the patch is good. > > > > Which one, KOSAKI's or my one? > > Here is my one again which bails out for any high-order allocation after > just light scanning. > > ==== > > >From 0e09fe002d8e9956de227b880ef8458842b71ca9 Mon Sep 17 00:00:00 2001 > From: Mel Gorman <mel@csn.ul.ie> > Date: Tue, 30 Dec 2008 18:53:23 +0000 > Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation > > Wassim Dagash reported the following (editted) kswapd infinite loop problem. > > kswapd runs in some infinite loop trying to swap until order 10 of zone > highmem is OK.... kswapd will continue to try to balance order 10 of zone > highmem forever (or until someone release a very large chunk of highmem). > > For costly high-order allocations, the system may never be balanced due to > fragmentation but kswapd should not infinitely loop as a result. The > following patch lets kswapd stop reclaiming in the event it cannot > balance zones and the order is high-order. This one bails out if it was a higher order reclaim, but there is still an order-0 shortage. I prefer to run the loop again at order==0 to avoid that condition. A higher kswapd reclaim order shouldn't weaken kswapd postcondition for order-0 memory. > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > --- > mm/vmscan.c | 11 ++++++++++- > 1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 62e7f62..7b0f412 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1867,7 +1867,16 @@ out: > > zone->prev_priority = temp_priority[i]; > } > - if (!all_zones_ok) { > + > + /* > + * If zones are still not balanced, loop again and continue attempting > + * to rebalance the system. For high-order allocations, fragmentation > + * can prevent the zones being rebalanced no matter how hard kswapd > + * works, particularly on systems with little or no swap. For > + * high-orders, just give up and assume interested processes will > + * either direct reclaim or wake up kswapd again as necessary. > + */ > + if (!all_zones_ok && sc.order == 0) { > cond_resched(); > > try_to_freeze(); -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-31 11:16 ` Nick Piggin @ 2008-12-31 12:11 ` Mel Gorman 0 siblings, 0 replies; 20+ messages in thread From: Mel Gorman @ 2008-12-31 12:11 UTC (permalink / raw) To: Nick Piggin; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, wassim dagash On Wed, Dec 31, 2008 at 12:16:47PM +0100, Nick Piggin wrote: > On Wed, Dec 31, 2008 at 11:06:19AM +0000, Mel Gorman wrote: > > On Wed, Dec 31, 2008 at 02:32:33AM +0100, Nick Piggin wrote: > > > On Tue, Dec 30, 2008 at 06:59:19PM +0000, Mel Gorman wrote: > > > > On Tue, Dec 30, 2008 at 07:55:47PM +0900, KOSAKI Motohiro wrote: > > > > kswapd gets a sc.order when it is known there is a process trying to get > > > > high-order pages so it can reclaim at that order in an attempt to prevent > > > > future direct reclaim at a high-order. Your patch does not appear to depend on > > > > GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to > > > > loop again at order-0 means it may scan and reclaim more memory unnecessarily > > > > seeing as all_zones_ok was calculated based on a high-order value, not order-0. > > > > > > It shouldn't, because it should check all that. > > > > > > > Ok, with KOSAKI's patch we > > > > 1. Set order to 0 (and stop kswapd doing what it was asked to do) > > 2. goto loop_again > > 3. nr_reclaimed gets set to 0 (meaning we lose that value, but no biggie > > as it doesn't get used by the caller anyway) > > 4. Reset all priorities > > 5. Do something like the following > > > > for (priority = DEF_PRIORITY; priority >= 0; priority--) { > > ... > > all_zones_ok = 1; > > for (i = pgdat->nr_zones - 1; i >= 0; i--) { > > ... > > if (inactive_anon_is_low(zone)) { > > shrink_active_list(SWAP_CLUSTER_MAX, zone, > > &sc, priority, 0); > > } > > > > if (!zone_watermark_ok(zone, order, zone->pages_high, > > 0, 0)) { > > end_zone = i; > > break; > > } > > } > > } > > > > So, by looping around, we could end up shrinking the active list again > > before we recheck the zone watermarks depending on the size of the > > inactive lists. > > If this is a problem, it is a problem with that code, because kswapd > can be woken up for any zone at any time anyway. > > > > > > cond_resched(); > > > > > > > > try_to_freeze(); > > > > > > > > goto loop_again; > > > > } > > > > > > > > I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are > > > > expected to support allocations up to that order in a fairly reliable fashion. > > > > > > I actually think it's better to do it for all orders, because that > > > constant is more or less arbitrary. > > > > i.e. > > > > if (!all_zones_ok && sc.order == 0) { > > > > ? or something else > > Well, I jus tdon't see what's wrong with the original patch. > I've more or less convinced myself it's ok as any anomolies I spotted have either been described as intentional behaviour or is arguably correct. A fixed up (or deleted - misleading comments suck) comment and I'm happy. > > > What I did miss was that we have > > > > if (nr_reclaimed >= SWAP_CLUSTER_MAX) > > break; > > > > so with my patch, kswapd is bailing out early without trying to reclaim for > > high-orders that hard. That was not what I intended as it means we only ever > > really rebalance the full system for order-0 pages and for everything else we > > do relatively light scanning. The impact is that high-order users will direct > > reclaim rather than depending on kswapd scanning very heavily. Arguably, > > this is a good thing. > > > > However, it also means that KOSAKI's and my patches only differs in that mine > > bails early and KOSAKI rechecks everything at order-0, possibly reclaiming > > more. If the comment was not so misleading, I'd have been a lot happier. > > Rechecking everything is fine by me; order-0 is going to be the most > common and most important. If higher order allocations sometimes have > to enter direct reclaim or kick off kswapd again, it isn't a big deal. > Grand so. Initially it looked like accidental rather than intentional behaviour but after thinking about it some more, it should be ok. > > > > IOW, I don't see a big downside, and there is a real upside. > > > > > > I think the patch is good. > > > > > > > Which one, KOSAKI's or my one? > > > > Here is my one again which bails out for any high-order allocation after > > just light scanning. > > > > ==== > > > > >From 0e09fe002d8e9956de227b880ef8458842b71ca9 Mon Sep 17 00:00:00 2001 > > From: Mel Gorman <mel@csn.ul.ie> > > Date: Tue, 30 Dec 2008 18:53:23 +0000 > > Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation > > > > Wassim Dagash reported the following (editted) kswapd infinite loop problem. > > > > kswapd runs in some infinite loop trying to swap until order 10 of zone > > highmem is OK.... kswapd will continue to try to balance order 10 of zone > > highmem forever (or until someone release a very large chunk of highmem). > > > > For costly high-order allocations, the system may never be balanced due to > > fragmentation but kswapd should not infinitely loop as a result. The > > following patch lets kswapd stop reclaiming in the event it cannot > > balance zones and the order is high-order. > > This one bails out if it was a higher order reclaim, but there is still > an order-0 shortage. I prefer to run the loop again at order==0 to avoid > that condition. A higher kswapd reclaim order shouldn't weaken kswapd > postcondition for order-0 memory. > > > > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > > > --- > > mm/vmscan.c | 11 ++++++++++- > > 1 files changed, 10 insertions(+), 1 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 62e7f62..7b0f412 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1867,7 +1867,16 @@ out: > > > > zone->prev_priority = temp_priority[i]; > > } > > - if (!all_zones_ok) { > > + > > + /* > > + * If zones are still not balanced, loop again and continue attempting > > + * to rebalance the system. For high-order allocations, fragmentation > > + * can prevent the zones being rebalanced no matter how hard kswapd > > + * works, particularly on systems with little or no swap. For > > + * high-orders, just give up and assume interested processes will > > + * either direct reclaim or wake up kswapd again as necessary. > > + */ > > + if (!all_zones_ok && sc.order == 0) { > > cond_resched(); > > > > try_to_freeze(); > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-30 18:59 ` Mel Gorman 2008-12-31 1:32 ` Nick Piggin @ 2008-12-31 4:54 ` KOSAKI Motohiro 2008-12-31 8:59 ` wassim dagash 2008-12-31 11:53 ` Mel Gorman 1 sibling, 2 replies; 20+ messages in thread From: KOSAKI Motohiro @ 2008-12-31 4:54 UTC (permalink / raw) To: Mel Gorman; +Cc: LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash Hi thank you for reviewing. >> == >> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> >> Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation >> >> Wassim Dagash reported following kswapd infinite loop problem. >> >> kswapd runs in some infinite loop trying to swap until order 10 of zone >> highmem is OK, While zone higmem (as I understand) has nothing to do >> with contiguous memory (cause there is no 1-1 mapping) which means >> kswapd will continue to try to balance order 10 of zone highmem >> forever (or until someone release a very large chunk of highmem). >> >> He proposed remove contenious checking on highmem at all. >> However hugepage on highmem need contenious highmem page. >> > > I'm lacking the original problem report, but contiguous order-10 pages are > indeed required for hugepages in highmem and reclaiming for them should not > be totally disabled at any point. While no 1-1 mapping exists for the kernel, > contiguity is still required. correct. but that's ok. my patch only change corner case bahavior and only disable high-order when priority==0. typical hugepage reclaim don't need and don't reach priority==0. and sorry. I agree with my "2nd loop" word of the patch comment is a bit misleading. > kswapd gets a sc.order when it is known there is a process trying to get > high-order pages so it can reclaim at that order in an attempt to prevent > future direct reclaim at a high-order. Your patch does not appear to depend on > GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to > loop again at order-0 means it may scan and reclaim more memory unnecessarily > seeing as all_zones_ok was calculated based on a high-order value, not order-0. Yup. my patch doesn't depend on GFP_KERNEL. but, Why order-0 means it may scan more memory unnecessary? all_zones_ok() is calculated by zone_watermark_ok() and zone_watermark_ok() depend on order argument. and my patch set order variable to 0 too. > While constantly looping trying to balance for high-orders is indeed bad, > I'm unconvinced this is the correct change. As we have already gone through > a priorities and scanned everything at the high-order, would it not make > more sense to do just give up with something like the following? > > /* > * If zones are still not balanced, loop again and continue attempting > * to rebalance the system. For high-order allocations, fragmentation > * can prevent the zones being rebalanced no matter how hard kswapd > * works, particularly on systems with little or no swap. For costly > * orders, just give up and assume interested processes will either > * direct reclaim or wake up kswapd as necessary. > */ > if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > cond_resched(); > > try_to_freeze(); > > goto loop_again; > } > > I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are > expected to support allocations up to that order in a fairly reliable fashion. my comment is bellow. > ============= > From: Mel Gorman <mel@csn.ul.ie> > Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation > > kswapd runs in some infinite loop trying to swap until order 10 of zone > highmem is OK.... kswapd will continue to try to balance order 10 of zone > highmem forever (or until someone release a very large chunk of highmem). > > For costly high-order allocations, the system may never be balanced due to > fragmentation but kswapd should not infinitely loop as a result. The > following patch lets kswapd stop reclaiming in the event it cannot > balance zones and the order is high-order. > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 62e7f62..03ed9a0 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1867,7 +1867,16 @@ out: > > zone->prev_priority = temp_priority[i]; > } > - if (!all_zones_ok) { > + > + /* > + * If zones are still not balanced, loop again and continue attempting > + * to rebalance the system. For high-order allocations, fragmentation > + * can prevent the zones being rebalanced no matter how hard kswapd > + * works, particularly on systems with little or no swap. For costly > + * orders, just give up and assume interested processes will either > + * direct reclaim or wake up kswapd as necessary. > + */ > + if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > cond_resched(); > > try_to_freeze(); this patch seems no good. kswapd come this point every SWAP_CLUSTER_MAX reclaimed because to avoid unnecessary priority variable decreasing. then "nr_reclaimed >= SWAP_CLUSTER_MAX" indicate kswapd need reclaim more. kswapd purpose is "reclaim until pages_high", not reclaim SWAP_CLUSTER_MAX pages. if your patch applied and kswapd start to reclaim for hugepage, kswapd exit balance_pgdat() function after to reclaim only 32 pages (SWAP_CLUSTER_MAX). In the other hand, "nr_reclaimed < SWAP_CLUSTER_MAX" mean kswapd can't reclaim enough page although priority == 0. in this case, retry is worthless. sorting out again. "goto loop_again" reaching happend by two case. 1. kswapd reclaimed SWAP_CLUSTER_MAX pages. at that time, kswapd reset priority variable to prevent unnecessary priority decreasing. I don't hope this behavior change. 2. kswapd scanned until priority==0. this case is debatable. my patch reset any order to 0. but following code is also considerable to me. (sorry for tab corrupted, current my mail environment is very poor) code-A: if (!all_zones_ok) { if ((nr_reclaimed >= SWAP_CLUSTER_MAX) || (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { cond_resched(); try_to_freeze(); goto loop_again; } } or code-B: if (!all_zones_ok) { cond_resched(); try_to_freeze(); if (nr_reclaimed >= SWAP_CLUSTER_MAX) goto loop_again; if (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { order = sc.order = 0; goto loop_again; } } However, I still like my original proposal because .. - code-A forget to order-1 (for stack) allocation also can cause infinite loop. - code-B doesn't simpler than my original proposal. What do you think it? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-31 4:54 ` KOSAKI Motohiro @ 2008-12-31 8:59 ` wassim dagash 2008-12-31 12:05 ` Mel Gorman 2008-12-31 11:53 ` Mel Gorman 1 sibling, 1 reply; 20+ messages in thread From: wassim dagash @ 2008-12-31 8:59 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: Mel Gorman, LKML, linux-mm, Andrew Morton, Nick Piggin [-- Attachment #1: Type: text/plain, Size: 7293 bytes --] Hi , Thank you all for reviewing. Why don't we implement a solution where the order is defined per zone? I implemented such a solution for my kernel (2.6.18) and tested it, it worked fine for me. Attached a patch with a solution for 2.6.28 (compile tested only). On Wed, Dec 31, 2008 at 6:54 AM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > Hi > > thank you for reviewing. > >>> == >>> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> >>> Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation >>> >>> Wassim Dagash reported following kswapd infinite loop problem. >>> >>> kswapd runs in some infinite loop trying to swap until order 10 of zone >>> highmem is OK, While zone higmem (as I understand) has nothing to do >>> with contiguous memory (cause there is no 1-1 mapping) which means >>> kswapd will continue to try to balance order 10 of zone highmem >>> forever (or until someone release a very large chunk of highmem). >>> >>> He proposed remove contenious checking on highmem at all. >>> However hugepage on highmem need contenious highmem page. >>> >> >> I'm lacking the original problem report, but contiguous order-10 pages are >> indeed required for hugepages in highmem and reclaiming for them should not >> be totally disabled at any point. While no 1-1 mapping exists for the kernel, >> contiguity is still required. > > correct. > but that's ok. > > my patch only change corner case bahavior and only disable high-order > when priority==0. typical hugepage reclaim don't need and don't reach > priority==0. > > and sorry. I agree with my "2nd loop" word of the patch comment is a > bit misleading. > > >> kswapd gets a sc.order when it is known there is a process trying to get >> high-order pages so it can reclaim at that order in an attempt to prevent >> future direct reclaim at a high-order. Your patch does not appear to depend on >> GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to >> loop again at order-0 means it may scan and reclaim more memory unnecessarily >> seeing as all_zones_ok was calculated based on a high-order value, not order-0. > > Yup. my patch doesn't depend on GFP_KERNEL. > > but, Why order-0 means it may scan more memory unnecessary? > all_zones_ok() is calculated by zone_watermark_ok() and zone_watermark_ok() > depend on order argument. and my patch set order variable to 0 too. > > >> While constantly looping trying to balance for high-orders is indeed bad, >> I'm unconvinced this is the correct change. As we have already gone through >> a priorities and scanned everything at the high-order, would it not make >> more sense to do just give up with something like the following? >> >> /* >> * If zones are still not balanced, loop again and continue attempting >> * to rebalance the system. For high-order allocations, fragmentation >> * can prevent the zones being rebalanced no matter how hard kswapd >> * works, particularly on systems with little or no swap. For costly >> * orders, just give up and assume interested processes will either >> * direct reclaim or wake up kswapd as necessary. >> */ >> if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { >> cond_resched(); >> >> try_to_freeze(); >> >> goto loop_again; >> } >> >> I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are >> expected to support allocations up to that order in a fairly reliable fashion. > > my comment is bellow. > > >> ============= >> From: Mel Gorman <mel@csn.ul.ie> >> Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation >> >> kswapd runs in some infinite loop trying to swap until order 10 of zone >> highmem is OK.... kswapd will continue to try to balance order 10 of zone >> highmem forever (or until someone release a very large chunk of highmem). >> >> For costly high-order allocations, the system may never be balanced due to >> fragmentation but kswapd should not infinitely loop as a result. The >> following patch lets kswapd stop reclaiming in the event it cannot >> balance zones and the order is high-order. >> >> Reported-by: wassim dagash <wassim.dagash@gmail.com> >> Signed-off-by: Mel Gorman <mel@csn.ul.ie> >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> index 62e7f62..03ed9a0 100644 >> --- a/mm/vmscan.c >> +++ b/mm/vmscan.c >> @@ -1867,7 +1867,16 @@ out: >> >> zone->prev_priority = temp_priority[i]; >> } >> - if (!all_zones_ok) { >> + >> + /* >> + * If zones are still not balanced, loop again and continue attempting >> + * to rebalance the system. For high-order allocations, fragmentation >> + * can prevent the zones being rebalanced no matter how hard kswapd >> + * works, particularly on systems with little or no swap. For costly >> + * orders, just give up and assume interested processes will either >> + * direct reclaim or wake up kswapd as necessary. >> + */ >> + if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { >> cond_resched(); >> >> try_to_freeze(); > > this patch seems no good. > kswapd come this point every SWAP_CLUSTER_MAX reclaimed because to avoid > unnecessary priority variable decreasing. > then "nr_reclaimed >= SWAP_CLUSTER_MAX" indicate kswapd need reclaim more. > > kswapd purpose is "reclaim until pages_high", not reclaim > SWAP_CLUSTER_MAX pages. > > if your patch applied and kswapd start to reclaim for hugepage, kswapd > exit balance_pgdat() function after to reclaim only 32 pages > (SWAP_CLUSTER_MAX). > > In the other hand, "nr_reclaimed < SWAP_CLUSTER_MAX" mean kswapd can't > reclaim enough > page although priority == 0. > in this case, retry is worthless. > > sorting out again. > "goto loop_again" reaching happend by two case. > > 1. kswapd reclaimed SWAP_CLUSTER_MAX pages. > at that time, kswapd reset priority variable to prevent > unnecessary priority decreasing. > I don't hope this behavior change. > 2. kswapd scanned until priority==0. > this case is debatable. my patch reset any order to 0. but > following code is also considerable to me. (sorry for tab corrupted, > current my mail environment is very poor) > > > code-A: > if (!all_zones_ok) { > if ((nr_reclaimed >= SWAP_CLUSTER_MAX) || > (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { > cond_resched(); > try_to_freeze(); > goto loop_again; > } > } > > or > > code-B: > if (!all_zones_ok) { > cond_resched(); > try_to_freeze(); > > if (nr_reclaimed >= SWAP_CLUSTER_MAX) > goto loop_again; > > if (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { > order = sc.order = 0; > goto loop_again; > } > } > > > However, I still like my original proposal because .. > - code-A forget to order-1 (for stack) allocation also can cause > infinite loop. > - code-B doesn't simpler than my original proposal. > > What do you think it? > -- too much is never enough!!!!! [-- Attachment #2: kswapd.patch --] [-- Type: application/octet-stream, Size: 6454 bytes --] Reported-by: wassim dagash <wassim.dagash@gmail.com> Signed-off-by: wassim dagash <wassim.dagash@gmail.com> diff -Nuar linux-2.6.28.orig/include/linux/mmzone.h linux-2.6.28/include/linux/mmzone.h --- linux-2.6.28.orig/include/linux/mmzone.h 2008-12-25 12:20:10.000000000 +0200 +++ linux-2.6.28/include/linux/mmzone.h 2008-12-31 10:16:03.000000000 +0200 @@ -409,6 +409,7 @@ * rarely used fields: */ const char *name; + unsigned int kswapd_max_order; } ____cacheline_internodealigned_in_smp; typedef enum { @@ -625,7 +626,6 @@ int node_id; wait_queue_head_t kswapd_wait; struct task_struct *kswapd; - int kswapd_max_order; } pg_data_t; #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) @@ -642,8 +642,8 @@ void get_zone_counts(unsigned long *active, unsigned long *inactive, unsigned long *free); void build_all_zonelists(void); -void wakeup_kswapd(struct zone *zone, int order); -int zone_watermark_ok(struct zone *z, int order, unsigned long mark, +void wakeup_kswapd(struct zone *zone); +int zone_watermark_ok(struct zone *z, unsigned long mark, int classzone_idx, int alloc_flags); enum memmap_context { MEMMAP_EARLY, diff -Nuar linux-2.6.28.orig/Makefile linux-2.6.28/Makefile --- linux-2.6.28.orig/Makefile 2008-12-25 12:19:10.000000000 +0200 +++ linux-2.6.28/Makefile 2008-12-31 10:51:18.000000000 +0200 @@ -2,7 +2,7 @@ PATCHLEVEL = 6 SUBLEVEL = 28 EXTRAVERSION = -NAME = Erotic Pickled Herring +NAME = kswapd_sol # *DOCUMENTATION* # To see a list of typical targets execute "make help" diff -Nuar linux-2.6.28.orig/mm/page_alloc.c linux-2.6.28/mm/page_alloc.c --- linux-2.6.28.orig/mm/page_alloc.c 2008-12-25 12:20:14.000000000 +0200 +++ linux-2.6.28/mm/page_alloc.c 2008-12-31 10:26:32.000000000 +0200 @@ -1224,10 +1224,11 @@ * Return 1 if free pages are above 'mark'. This takes into account the order * of the allocation. */ -int zone_watermark_ok(struct zone *z, int order, unsigned long mark, +int zone_watermark_ok(struct zone *z, unsigned long mark, int classzone_idx, int alloc_flags) { /* free_pages my go negative - that's OK */ + unsigned int order = z->kswapd_max_order; long min = mark; long free_pages = zone_page_state(z, NR_FREE_PAGES) - (1 << order) + 1; int o; @@ -1417,7 +1418,7 @@ mark = zone->pages_low; else mark = zone->pages_high; - if (!zone_watermark_ok(zone, order, mark, + if (!zone_watermark_ok(zone, mark, classzone_idx, alloc_flags)) { if (!zone_reclaim_mode || !zone_reclaim(zone, gfp_mask, order)) @@ -1485,9 +1486,16 @@ page = get_page_from_freelist(gfp_mask|__GFP_HARDWALL, nodemask, order, zonelist, high_zoneidx, ALLOC_WMARK_LOW|ALLOC_CPUSET); + if (page) goto got_pg; + /* + First item in list of zones suitable for gfp_mask is the zone the request intended to, + the other items are fallback + */ + z->zone->kswapd_max_order = order; + /* * GFP_THISNODE (meaning __GFP_THISNODE, __GFP_NORETRY and * __GFP_NOWARN set) should not cause reclaim since the subsystem @@ -1500,7 +1508,7 @@ goto nopage; for_each_zone_zonelist(zone, z, zonelist, high_zoneidx) - wakeup_kswapd(zone, order); + wakeup_kswapd(zone); /* * OK, we're below the kswapd watermark and have kicked background @@ -3448,7 +3456,6 @@ pgdat_resize_init(pgdat); pgdat->nr_zones = 0; init_waitqueue_head(&pgdat->kswapd_wait); - pgdat->kswapd_max_order = 0; pgdat_page_cgroup_init(pgdat); for (j = 0; j < MAX_NR_ZONES; j++) { @@ -3488,6 +3495,8 @@ nr_kernel_pages += realsize; nr_all_pages += realsize; + /* Initializing kswapd_max_order to zero */ + zone->kswapd_max_order = 0; zone->spanned_pages = size; zone->present_pages = realsize; #ifdef CONFIG_NUMA diff -Nuar linux-2.6.28.orig/mm/vmscan.c linux-2.6.28/mm/vmscan.c --- linux-2.6.28.orig/mm/vmscan.c 2008-12-25 12:20:14.000000000 +0200 +++ linux-2.6.28/mm/vmscan.c 2008-12-31 10:29:27.000000000 +0200 @@ -1770,7 +1770,7 @@ shrink_active_list(SWAP_CLUSTER_MAX, zone, &sc, priority, 0); - if (!zone_watermark_ok(zone, order, zone->pages_high, + if (!zone_watermark_ok(zone, zone->pages_high, 0, 0)) { end_zone = i; break; @@ -1805,7 +1805,7 @@ priority != DEF_PRIORITY) continue; - if (!zone_watermark_ok(zone, order, zone->pages_high, + if (!zone_watermark_ok(zone, zone->pages_high, end_zone, 0)) all_zones_ok = 0; temp_priority[i] = priority; @@ -1815,7 +1815,7 @@ * We put equal pressure on every zone, unless one * zone has way too many pages free already. */ - if (!zone_watermark_ok(zone, order, 8*zone->pages_high, + if (!zone_watermark_ok(zone, 8*zone->pages_high, end_zone, 0)) nr_reclaimed += shrink_zone(priority, zone, &sc); reclaim_state->reclaimed_slab = 0; @@ -1924,22 +1924,30 @@ order = 0; for ( ; ; ) { unsigned long new_order; - + int i,max_order; prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE); - new_order = pgdat->kswapd_max_order; - pgdat->kswapd_max_order = 0; - if (order < new_order) { - /* - * Don't sleep if someone wants a larger 'order' - * allocation - */ - order = new_order; - } else { - if (!freezing(current)) - schedule(); - order = pgdat->kswapd_max_order; + max_order = 0; + for (i = pgdat->nr_zones - 1; i >= 0; i--) + { + struct zone *zone = pgdat->node_zones + i; + new_order = zone->kswapd_max_order; + zone->kswapd_max_order = 0; + if (max_order < new_order){ + max_order = new_order; + } + } + + if(order < max_order) + { + order = max_order; } + else + { + if (!freezing(current)) + schedule(); + } + finish_wait(&pgdat->kswapd_wait, &wait); if (!try_to_freeze()) { @@ -1955,7 +1963,7 @@ /* * A zone is low on free memory, so wake its kswapd task to service it. */ -void wakeup_kswapd(struct zone *zone, int order) +void wakeup_kswapd(struct zone *zone) { pg_data_t *pgdat; @@ -1963,10 +1971,8 @@ return; pgdat = zone->zone_pgdat; - if (zone_watermark_ok(zone, order, zone->pages_low, 0, 0)) + if (zone_watermark_ok(zone, zone->pages_low, 0, 0)) return; - if (pgdat->kswapd_max_order < order) - pgdat->kswapd_max_order = order; if (!cpuset_zone_allowed_hardwall(zone, GFP_KERNEL)) return; if (!waitqueue_active(&pgdat->kswapd_wait)) ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-31 8:59 ` wassim dagash @ 2008-12-31 12:05 ` Mel Gorman 2008-12-31 12:24 ` wassim dagash 0 siblings, 1 reply; 20+ messages in thread From: Mel Gorman @ 2008-12-31 12:05 UTC (permalink / raw) To: wassim dagash; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Nick Piggin On Wed, Dec 31, 2008 at 10:59:58AM +0200, wassim dagash wrote: > Hi , > Thank you all for reviewing. > Why don't we implement a solution where the order is defined per zone? kswapd is scanning trying to balance the whole system, not just one of the zones. You don't know which of the zones would be the best to rebalance and glancing through your path, the zone that would be balanced is the last zone in the zonelist. i.e. kswapd will try to rebalance the least-preferred zone to be used by the calling process. > I implemented such a solution for my kernel (2.6.18) and tested it, it > worked fine for me. Attached a patch with a solution for 2.6.28 > (compile tested only). > One big one; zone_watermark_ok() now always calculates order based on zone->kswapd_max_order. This means that a process that wants an order-0 page may check watermarks at order-10 because some other process overwrote the zone value. That is unfair. One less obvious one; You add a field that is written often beside a field that is read-mostly in struct zone. This results in poorer cache behaviour. It's easily fixed though. > On Wed, Dec 31, 2008 at 6:54 AM, KOSAKI Motohiro > <kosaki.motohiro@jp.fujitsu.com> wrote: > > Hi > > > > thank you for reviewing. > > > >>> == > >>> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > >>> Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation > >>> > >>> Wassim Dagash reported following kswapd infinite loop problem. > >>> > >>> kswapd runs in some infinite loop trying to swap until order 10 of zone > >>> highmem is OK, While zone higmem (as I understand) has nothing to do > >>> with contiguous memory (cause there is no 1-1 mapping) which means > >>> kswapd will continue to try to balance order 10 of zone highmem > >>> forever (or until someone release a very large chunk of highmem). > >>> > >>> He proposed remove contenious checking on highmem at all. > >>> However hugepage on highmem need contenious highmem page. > >>> > >> > >> I'm lacking the original problem report, but contiguous order-10 pages are > >> indeed required for hugepages in highmem and reclaiming for them should not > >> be totally disabled at any point. While no 1-1 mapping exists for the kernel, > >> contiguity is still required. > > > > correct. > > but that's ok. > > > > my patch only change corner case bahavior and only disable high-order > > when priority==0. typical hugepage reclaim don't need and don't reach > > priority==0. > > > > and sorry. I agree with my "2nd loop" word of the patch comment is a > > bit misleading. > > > > > >> kswapd gets a sc.order when it is known there is a process trying to get > >> high-order pages so it can reclaim at that order in an attempt to prevent > >> future direct reclaim at a high-order. Your patch does not appear to depend on > >> GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to > >> loop again at order-0 means it may scan and reclaim more memory unnecessarily > >> seeing as all_zones_ok was calculated based on a high-order value, not order-0. > > > > Yup. my patch doesn't depend on GFP_KERNEL. > > > > but, Why order-0 means it may scan more memory unnecessary? > > all_zones_ok() is calculated by zone_watermark_ok() and zone_watermark_ok() > > depend on order argument. and my patch set order variable to 0 too. > > > > > >> While constantly looping trying to balance for high-orders is indeed bad, > >> I'm unconvinced this is the correct change. As we have already gone through > >> a priorities and scanned everything at the high-order, would it not make > >> more sense to do just give up with something like the following? > >> > >> /* > >> * If zones are still not balanced, loop again and continue attempting > >> * to rebalance the system. For high-order allocations, fragmentation > >> * can prevent the zones being rebalanced no matter how hard kswapd > >> * works, particularly on systems with little or no swap. For costly > >> * orders, just give up and assume interested processes will either > >> * direct reclaim or wake up kswapd as necessary. > >> */ > >> if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > >> cond_resched(); > >> > >> try_to_freeze(); > >> > >> goto loop_again; > >> } > >> > >> I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are > >> expected to support allocations up to that order in a fairly reliable fashion. > > > > my comment is bellow. > > > > > >> ============= > >> From: Mel Gorman <mel@csn.ul.ie> > >> Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation > >> > >> kswapd runs in some infinite loop trying to swap until order 10 of zone > >> highmem is OK.... kswapd will continue to try to balance order 10 of zone > >> highmem forever (or until someone release a very large chunk of highmem). > >> > >> For costly high-order allocations, the system may never be balanced due to > >> fragmentation but kswapd should not infinitely loop as a result. The > >> following patch lets kswapd stop reclaiming in the event it cannot > >> balance zones and the order is high-order. > >> > >> Reported-by: wassim dagash <wassim.dagash@gmail.com> > >> Signed-off-by: Mel Gorman <mel@csn.ul.ie> > >> > >> diff --git a/mm/vmscan.c b/mm/vmscan.c > >> index 62e7f62..03ed9a0 100644 > >> --- a/mm/vmscan.c > >> +++ b/mm/vmscan.c > >> @@ -1867,7 +1867,16 @@ out: > >> > >> zone->prev_priority = temp_priority[i]; > >> } > >> - if (!all_zones_ok) { > >> + > >> + /* > >> + * If zones are still not balanced, loop again and continue attempting > >> + * to rebalance the system. For high-order allocations, fragmentation > >> + * can prevent the zones being rebalanced no matter how hard kswapd > >> + * works, particularly on systems with little or no swap. For costly > >> + * orders, just give up and assume interested processes will either > >> + * direct reclaim or wake up kswapd as necessary. > >> + */ > >> + if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > >> cond_resched(); > >> > >> try_to_freeze(); > > > > this patch seems no good. > > kswapd come this point every SWAP_CLUSTER_MAX reclaimed because to avoid > > unnecessary priority variable decreasing. > > then "nr_reclaimed >= SWAP_CLUSTER_MAX" indicate kswapd need reclaim more. > > > > kswapd purpose is "reclaim until pages_high", not reclaim > > SWAP_CLUSTER_MAX pages. > > > > if your patch applied and kswapd start to reclaim for hugepage, kswapd > > exit balance_pgdat() function after to reclaim only 32 pages > > (SWAP_CLUSTER_MAX). > > > > In the other hand, "nr_reclaimed < SWAP_CLUSTER_MAX" mean kswapd can't > > reclaim enough > > page although priority == 0. > > in this case, retry is worthless. > > > > sorting out again. > > "goto loop_again" reaching happend by two case. > > > > 1. kswapd reclaimed SWAP_CLUSTER_MAX pages. > > at that time, kswapd reset priority variable to prevent > > unnecessary priority decreasing. > > I don't hope this behavior change. > > 2. kswapd scanned until priority==0. > > this case is debatable. my patch reset any order to 0. but > > following code is also considerable to me. (sorry for tab corrupted, > > current my mail environment is very poor) > > > > > > code-A: > > if (!all_zones_ok) { > > if ((nr_reclaimed >= SWAP_CLUSTER_MAX) || > > (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { > > cond_resched(); > > try_to_freeze(); > > goto loop_again; > > } > > } > > > > or > > > > code-B: > > if (!all_zones_ok) { > > cond_resched(); > > try_to_freeze(); > > > > if (nr_reclaimed >= SWAP_CLUSTER_MAX) > > goto loop_again; > > > > if (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { > > order = sc.order = 0; > > goto loop_again; > > } > > } > > > > > > However, I still like my original proposal because .. > > - code-A forget to order-1 (for stack) allocation also can cause > > infinite loop. > > - code-B doesn't simpler than my original proposal. > > > > What do you think it? > > > > > > -- > too much is never enough!!!!! -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-31 12:05 ` Mel Gorman @ 2008-12-31 12:24 ` wassim dagash 0 siblings, 0 replies; 20+ messages in thread From: wassim dagash @ 2008-12-31 12:24 UTC (permalink / raw) To: Mel Gorman; +Cc: KOSAKI Motohiro, LKML, linux-mm, Andrew Morton, Nick Piggin On Wed, Dec 31, 2008 at 2:05 PM, Mel Gorman <mel@csn.ul.ie> wrote: > On Wed, Dec 31, 2008 at 10:59:58AM +0200, wassim dagash wrote: >> Hi , >> Thank you all for reviewing. >> Why don't we implement a solution where the order is defined per zone? > > kswapd is scanning trying to balance the whole system, not just one of > the zones. You don't know which of the zones would be the best to > rebalance and glancing through your path, the zone that would be > balanced is the last zone in the zonelist. i.e. kswapd will try to > rebalance the least-preferred zone to be used by the calling process. > You are right, but what is meant by 'order per zone' is that we will balance all zones to kswapd_max_order which is the highest order from which allocation was done on that zone. What I tried to re-balance is the zone for which the allocation was made, the other zones are fallback to that allocation ( correct me if I'm wrong). As I understood from documentation, the first zone on the list is the zone for which the allocation was made. >> I implemented such a solution for my kernel (2.6.18) and tested it, it >> worked fine for me. Attached a patch with a solution for 2.6.28 >> (compile tested only). >> > > One big one; > > zone_watermark_ok() now always calculates order based on > zone->kswapd_max_order. This means that a process that wants an order-0 page > may check watermarks at order-10 because some other process overwrote the > zone value. That is unfair. > This can be fixed by differing re-balance and allocation. > One less obvious one; > > You add a field that is written often beside a field that is read-mostly > in struct zone. This results in poorer cache behaviour. It's easily > fixed though. > Sorry, but I'm very new to kernel programming, I read your book (and other books) in order to figure out why KSWAPD loops in an endless loop :) I just wanted to understand why 'kswapd_max_order' was implemented 'per node' and not 'per zone' ? Is there a reason? Thanks, >> On Wed, Dec 31, 2008 at 6:54 AM, KOSAKI Motohiro >> <kosaki.motohiro@jp.fujitsu.com> wrote: >> > Hi >> > >> > thank you for reviewing. >> > >> >>> == >> >>> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> >> >>> Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation >> >>> >> >>> Wassim Dagash reported following kswapd infinite loop problem. >> >>> >> >>> kswapd runs in some infinite loop trying to swap until order 10 of zone >> >>> highmem is OK, While zone higmem (as I understand) has nothing to do >> >>> with contiguous memory (cause there is no 1-1 mapping) which means >> >>> kswapd will continue to try to balance order 10 of zone highmem >> >>> forever (or until someone release a very large chunk of highmem). >> >>> >> >>> He proposed remove contenious checking on highmem at all. >> >>> However hugepage on highmem need contenious highmem page. >> >>> >> >> >> >> I'm lacking the original problem report, but contiguous order-10 pages are >> >> indeed required for hugepages in highmem and reclaiming for them should not >> >> be totally disabled at any point. While no 1-1 mapping exists for the kernel, >> >> contiguity is still required. >> > >> > correct. >> > but that's ok. >> > >> > my patch only change corner case bahavior and only disable high-order >> > when priority==0. typical hugepage reclaim don't need and don't reach >> > priority==0. >> > >> > and sorry. I agree with my "2nd loop" word of the patch comment is a >> > bit misleading. >> > >> > >> >> kswapd gets a sc.order when it is known there is a process trying to get >> >> high-order pages so it can reclaim at that order in an attempt to prevent >> >> future direct reclaim at a high-order. Your patch does not appear to depend on >> >> GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to >> >> loop again at order-0 means it may scan and reclaim more memory unnecessarily >> >> seeing as all_zones_ok was calculated based on a high-order value, not order-0. >> > >> > Yup. my patch doesn't depend on GFP_KERNEL. >> > >> > but, Why order-0 means it may scan more memory unnecessary? >> > all_zones_ok() is calculated by zone_watermark_ok() and zone_watermark_ok() >> > depend on order argument. and my patch set order variable to 0 too. >> > >> > >> >> While constantly looping trying to balance for high-orders is indeed bad, >> >> I'm unconvinced this is the correct change. As we have already gone through >> >> a priorities and scanned everything at the high-order, would it not make >> >> more sense to do just give up with something like the following? >> >> >> >> /* >> >> * If zones are still not balanced, loop again and continue attempting >> >> * to rebalance the system. For high-order allocations, fragmentation >> >> * can prevent the zones being rebalanced no matter how hard kswapd >> >> * works, particularly on systems with little or no swap. For costly >> >> * orders, just give up and assume interested processes will either >> >> * direct reclaim or wake up kswapd as necessary. >> >> */ >> >> if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { >> >> cond_resched(); >> >> >> >> try_to_freeze(); >> >> >> >> goto loop_again; >> >> } >> >> >> >> I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are >> >> expected to support allocations up to that order in a fairly reliable fashion. >> > >> > my comment is bellow. >> > >> > >> >> ============= >> >> From: Mel Gorman <mel@csn.ul.ie> >> >> Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation >> >> >> >> kswapd runs in some infinite loop trying to swap until order 10 of zone >> >> highmem is OK.... kswapd will continue to try to balance order 10 of zone >> >> highmem forever (or until someone release a very large chunk of highmem). >> >> >> >> For costly high-order allocations, the system may never be balanced due to >> >> fragmentation but kswapd should not infinitely loop as a result. The >> >> following patch lets kswapd stop reclaiming in the event it cannot >> >> balance zones and the order is high-order. >> >> >> >> Reported-by: wassim dagash <wassim.dagash@gmail.com> >> >> Signed-off-by: Mel Gorman <mel@csn.ul.ie> >> >> >> >> diff --git a/mm/vmscan.c b/mm/vmscan.c >> >> index 62e7f62..03ed9a0 100644 >> >> --- a/mm/vmscan.c >> >> +++ b/mm/vmscan.c >> >> @@ -1867,7 +1867,16 @@ out: >> >> >> >> zone->prev_priority = temp_priority[i]; >> >> } >> >> - if (!all_zones_ok) { >> >> + >> >> + /* >> >> + * If zones are still not balanced, loop again and continue attempting >> >> + * to rebalance the system. For high-order allocations, fragmentation >> >> + * can prevent the zones being rebalanced no matter how hard kswapd >> >> + * works, particularly on systems with little or no swap. For costly >> >> + * orders, just give up and assume interested processes will either >> >> + * direct reclaim or wake up kswapd as necessary. >> >> + */ >> >> + if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { >> >> cond_resched(); >> >> >> >> try_to_freeze(); >> > >> > this patch seems no good. >> > kswapd come this point every SWAP_CLUSTER_MAX reclaimed because to avoid >> > unnecessary priority variable decreasing. >> > then "nr_reclaimed >= SWAP_CLUSTER_MAX" indicate kswapd need reclaim more. >> > >> > kswapd purpose is "reclaim until pages_high", not reclaim >> > SWAP_CLUSTER_MAX pages. >> > >> > if your patch applied and kswapd start to reclaim for hugepage, kswapd >> > exit balance_pgdat() function after to reclaim only 32 pages >> > (SWAP_CLUSTER_MAX). >> > >> > In the other hand, "nr_reclaimed < SWAP_CLUSTER_MAX" mean kswapd can't >> > reclaim enough >> > page although priority == 0. >> > in this case, retry is worthless. >> > >> > sorting out again. >> > "goto loop_again" reaching happend by two case. >> > >> > 1. kswapd reclaimed SWAP_CLUSTER_MAX pages. >> > at that time, kswapd reset priority variable to prevent >> > unnecessary priority decreasing. >> > I don't hope this behavior change. >> > 2. kswapd scanned until priority==0. >> > this case is debatable. my patch reset any order to 0. but >> > following code is also considerable to me. (sorry for tab corrupted, >> > current my mail environment is very poor) >> > >> > >> > code-A: >> > if (!all_zones_ok) { >> > if ((nr_reclaimed >= SWAP_CLUSTER_MAX) || >> > (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { >> > cond_resched(); >> > try_to_freeze(); >> > goto loop_again; >> > } >> > } >> > >> > or >> > >> > code-B: >> > if (!all_zones_ok) { >> > cond_resched(); >> > try_to_freeze(); >> > >> > if (nr_reclaimed >= SWAP_CLUSTER_MAX) >> > goto loop_again; >> > >> > if (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { >> > order = sc.order = 0; >> > goto loop_again; >> > } >> > } >> > >> > >> > However, I still like my original proposal because .. >> > - code-A forget to order-1 (for stack) allocation also can cause >> > infinite loop. >> > - code-B doesn't simpler than my original proposal. >> > >> > What do you think it? >> > >> >> >> >> -- >> too much is never enough!!!!! > > > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > -- too much is never enough!!!!! -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-31 4:54 ` KOSAKI Motohiro 2008-12-31 8:59 ` wassim dagash @ 2008-12-31 11:53 ` Mel Gorman 2008-12-31 13:34 ` KOSAKI Motohiro 1 sibling, 1 reply; 20+ messages in thread From: Mel Gorman @ 2008-12-31 11:53 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash On Wed, Dec 31, 2008 at 01:54:17PM +0900, KOSAKI Motohiro wrote: > Hi > > thank you for reviewing. > > >> == > >> From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > >> Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation > >> > >> Wassim Dagash reported following kswapd infinite loop problem. > >> > >> kswapd runs in some infinite loop trying to swap until order 10 of zone > >> highmem is OK, While zone higmem (as I understand) has nothing to do > >> with contiguous memory (cause there is no 1-1 mapping) which means > >> kswapd will continue to try to balance order 10 of zone highmem > >> forever (or until someone release a very large chunk of highmem). > >> > >> He proposed remove contenious checking on highmem at all. > >> However hugepage on highmem need contenious highmem page. > >> > > > > I'm lacking the original problem report, but contiguous order-10 pages are > > indeed required for hugepages in highmem and reclaiming for them should not > > be totally disabled at any point. While no 1-1 mapping exists for the kernel, > > contiguity is still required. > > correct. > but that's ok. > > my patch only change corner case bahavior and only disable high-order > when priority==0. typical hugepage reclaim don't need and don't reach > priority==0. > > and sorry. I agree with my "2nd loop" word of the patch comment is a > bit misleading. > As I mentioned in the last mail, if it wasn't so misleading, I probably would have said nothing at all :) > > > kswapd gets a sc.order when it is known there is a process trying to get > > high-order pages so it can reclaim at that order in an attempt to prevent > > future direct reclaim at a high-order. Your patch does not appear to depend on > > GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to > > loop again at order-0 means it may scan and reclaim more memory unnecessarily > > seeing as all_zones_ok was calculated based on a high-order value, not order-0. > > Yup. my patch doesn't depend on GFP_KERNEL. > > but, Why order-0 means it may scan more memory unnecessary? Because we can enter shrink_active_list() depending on the size of the LRU lists. Maybe it doesn't matter but it's what I was concerned with as well as the fact we are changing kswapd to do work other than what it was asked for. > all_zones_ok() is calculated by zone_watermark_ok() and zone_watermark_ok() > depend on order argument. and my patch set order variable to 0 too. > > > > While constantly looping trying to balance for high-orders is indeed bad, > > I'm unconvinced this is the correct change. As we have already gone through > > a priorities and scanned everything at the high-order, would it not make > > more sense to do just give up with something like the following? > > > > /* > > * If zones are still not balanced, loop again and continue attempting > > * to rebalance the system. For high-order allocations, fragmentation > > * can prevent the zones being rebalanced no matter how hard kswapd > > * works, particularly on systems with little or no swap. For costly > > * orders, just give up and assume interested processes will either > > * direct reclaim or wake up kswapd as necessary. > > */ > > if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > > cond_resched(); > > > > try_to_freeze(); > > > > goto loop_again; > > } > > > > I used PAGE_ALLOC_COSTLY_ORDER instead of sc.order == 0 because we are > > expected to support allocations up to that order in a fairly reliable fashion. > > my comment is bellow. > > > > ============= > > From: Mel Gorman <mel@csn.ul.ie> > > Subject: [PATCH] mm: stop kswapd's infinite loop at high order allocation > > > > kswapd runs in some infinite loop trying to swap until order 10 of zone > > highmem is OK.... kswapd will continue to try to balance order 10 of zone > > highmem forever (or until someone release a very large chunk of highmem). > > > > For costly high-order allocations, the system may never be balanced due to > > fragmentation but kswapd should not infinitely loop as a result. The > > following patch lets kswapd stop reclaiming in the event it cannot > > balance zones and the order is high-order. > > > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > > Signed-off-by: Mel Gorman <mel@csn.ul.ie> > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 62e7f62..03ed9a0 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -1867,7 +1867,16 @@ out: > > > > zone->prev_priority = temp_priority[i]; > > } > > - if (!all_zones_ok) { > > + > > + /* > > + * If zones are still not balanced, loop again and continue attempting > > + * to rebalance the system. For high-order allocations, fragmentation > > + * can prevent the zones being rebalanced no matter how hard kswapd > > + * works, particularly on systems with little or no swap. For costly > > + * orders, just give up and assume interested processes will either > > + * direct reclaim or wake up kswapd as necessary. > > + */ > > + if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > > cond_resched(); > > > > try_to_freeze(); > > this patch seems no good. > kswapd come this point every SWAP_CLUSTER_MAX reclaimed because to avoid > unnecessary priority variable decreasing. > then "nr_reclaimed >= SWAP_CLUSTER_MAX" indicate kswapd need reclaim more. > > kswapd purpose is "reclaim until pages_high", not reclaim > SWAP_CLUSTER_MAX pages. > > if your patch applied and kswapd start to reclaim for hugepage, kswapd > exit balance_pgdat() function after to reclaim only 32 pages > (SWAP_CLUSTER_MAX). > It probably will have reclaimed more. Lumpy reclaim will have isolated more pages in down in isolate_lru_pages() and reclaimed pages within a high-order blocks of pages even if that is more than SWAP_CLUSTER_MAX pages (right?). The bailing out does mean that kswapd no longer works as hard for high-order pages but as I said in the other mail, this is not necessarily a bad thing as processes will still direct reclaim if they have to. > In the other hand, "nr_reclaimed < SWAP_CLUSTER_MAX" mean kswapd can't > reclaim enough > page although priority == 0. > in this case, retry is worthless. > Good point. With my patch, we would just give up in the event SWAP_CLUSTER_MAX pages were not even reclaimed. With your patch, we rescan at order-0 to ensure the system is actually balanced without waiting to be woken up again. It's not what kswapd was asked to do, but arguably it's the smart thing to do. > sorting out again. > "goto loop_again" reaching happend by two case. > > 1. kswapd reclaimed SWAP_CLUSTER_MAX pages. > at that time, kswapd reset priority variable to prevent > unnecessary priority decreasing. > I don't hope this behavior change. > 2. kswapd scanned until priority==0. > this case is debatable. my patch reset any order to 0. but > following code is also considerable to me. (sorry for tab corrupted, > current my mail environment is very poor) > > > code-A: > if (!all_zones_ok) { > if ((nr_reclaimed >= SWAP_CLUSTER_MAX) || > (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { > cond_resched(); > try_to_freeze(); > goto loop_again; > } > } > > or > > code-B: > if (!all_zones_ok) { > cond_resched(); > try_to_freeze(); > > if (nr_reclaimed >= SWAP_CLUSTER_MAX) > goto loop_again; > > if (sc.order <= PAGE_ALLOC_COSTLY_ORDER)) { > order = sc.order = 0; > goto loop_again; > } > } > > > However, I still like my original proposal because .. > - code-A forget to order-1 (for stack) allocation also can cause > infinite loop. Conceivably it would cause an infinite loop although we are meant to be able to grant allocations of that order. The point stands though, it is not guaranteed which is why I changed it to sc.order == 0 in the second revision. > - code-B doesn't simpler than my original proposal. > Indeed. > What do you think it? > AFter looking at this for long enough, our patches are functionally similar except you loop a second time at order-0 without waiting for kswapd to be woken up. It may reclaim more but if people are ok with that, I'll stay quiet. Fix the comment and I'll be happy (or even delete it, I prefer no comments to misleading ones :/). Maybe something like /* * Fragmentation may mean that the system cannot be * rebalanced for high-order allocations in all zones. * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, * it means the zones have been fully scanned and are still * not balanced. For high-order allocations, there is * little point trying all over again as kswapd may * infinite loop. * * Instead, recheck all watermarks at order-0 as they * are the most important. If watermarks are ok, kswapd will go * back to sleep. High-order users can still direct reclaim * if they wish. */ ? -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation 2008-12-31 11:53 ` Mel Gorman @ 2008-12-31 13:34 ` KOSAKI Motohiro 2009-01-01 14:52 ` [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 KOSAKI Motohiro 0 siblings, 1 reply; 20+ messages in thread From: KOSAKI Motohiro @ 2008-12-31 13:34 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash Hi > > > I'm lacking the original problem report, but contiguous order-10 pages are > > > indeed required for hugepages in highmem and reclaiming for them should not > > > be totally disabled at any point. While no 1-1 mapping exists for the kernel, > > > contiguity is still required. > > > > correct. > > but that's ok. > > > > my patch only change corner case bahavior and only disable high-order > > when priority==0. typical hugepage reclaim don't need and don't reach > > priority==0. > > > > and sorry. I agree with my "2nd loop" word of the patch comment is a > > bit misleading. > > > > As I mentioned in the last mail, if it wasn't so misleading, I probably > would have said nothing at all :) very sorry. > > > kswapd gets a sc.order when it is known there is a process trying to get > > > high-order pages so it can reclaim at that order in an attempt to prevent > > > future direct reclaim at a high-order. Your patch does not appear to depend on > > > GFP_KERNEL at all so I found the comment misleading. Furthermore, asking it to > > > loop again at order-0 means it may scan and reclaim more memory unnecessarily > > > seeing as all_zones_ok was calculated based on a high-order value, not order-0. > > > > Yup. my patch doesn't depend on GFP_KERNEL. > > > > but, Why order-0 means it may scan more memory unnecessary? > > Because we can enter shrink_active_list() depending on the size of the LRU > lists. Maybe it doesn't matter but it's what I was concerned with as well > as the fact we are changing kswapd to do work other than what it was asked for. I think it isn't matter. if (inactive_anon_is_low(zone)) { shrink_active_list(SWAP_CLUSTER_MAX, zone, &sc, priority, 0); } this code isn't reclaim, it adjustfor number of pages in inactive list. if the number of inactive anon pages are already enough, inactive_anon_is_low() return 0. then above code doesn't have bad side effect. > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > > index 62e7f62..03ed9a0 100644 > > > --- a/mm/vmscan.c > > > +++ b/mm/vmscan.c > > > @@ -1867,7 +1867,16 @@ out: > > > > > > zone->prev_priority = temp_priority[i]; > > > } > > > - if (!all_zones_ok) { > > > + > > > + /* > > > + * If zones are still not balanced, loop again and continue attempting > > > + * to rebalance the system. For high-order allocations, fragmentation > > > + * can prevent the zones being rebalanced no matter how hard kswapd > > > + * works, particularly on systems with little or no swap. For costly > > > + * orders, just give up and assume interested processes will either > > > + * direct reclaim or wake up kswapd as necessary. > > > + */ > > > + if (!all_zones_ok && sc.order <= PAGE_ALLOC_COSTLY_ORDER) { > > > cond_resched(); > > > > > > try_to_freeze(); > > > > this patch seems no good. > > kswapd come this point every SWAP_CLUSTER_MAX reclaimed because to avoid > > unnecessary priority variable decreasing. > > then "nr_reclaimed >= SWAP_CLUSTER_MAX" indicate kswapd need reclaim more. > > > > kswapd purpose is "reclaim until pages_high", not reclaim > > SWAP_CLUSTER_MAX pages. > > > > if your patch applied and kswapd start to reclaim for hugepage, kswapd > > exit balance_pgdat() function after to reclaim only 32 pages > > (SWAP_CLUSTER_MAX). > > > > It probably will have reclaimed more. Lumpy reclaim will have isolated > more pages in down in isolate_lru_pages() and reclaimed pages within a > high-order blocks of pages even if that is more than SWAP_CLUSTER_MAX pages > (right?). correct. but please recall, lumpy reclaim try to get contenious pages, not guarantee get contenious pages. Then, although nr_reclaimed >= SWAP_CLUSTER_MAX, no contenious memory can happend. > The bailing out does mean that kswapd no longer works as hard for > high-order pages but as I said in the other mail, this is not necessarily > a bad thing as processes will still direct reclaim if they have to. > > > In the other hand, "nr_reclaimed < SWAP_CLUSTER_MAX" mean kswapd can't > > reclaim enough > > page although priority == 0. > > in this case, retry is worthless. > > > > Good point. With my patch, we would just give up in the event SWAP_CLUSTER_MAX > pages were not even reclaimed. With your patch, we rescan at order-0 to ensure > the system is actually balanced without waiting to be woken up again. It's > not what kswapd was asked to do, but arguably it's the smart thing to do. Agreed. > AFter looking at this for long enough, our patches are functionally similar > except you loop a second time at order-0 without waiting for kswapd to be > woken up. It may reclaim more but if people are ok with that, I'll stay > quiet. Fix the comment and I'll be happy (or even delete it, I prefer no > comments to misleading ones :/). Maybe something like > > /* > * Fragmentation may mean that the system cannot be > * rebalanced for high-order allocations in all zones. > * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, > * it means the zones have been fully scanned and are still > * not balanced. For high-order allocations, there is > * little point trying all over again as kswapd may > * infinite loop. > * > * Instead, recheck all watermarks at order-0 as they > * are the most important. If watermarks are ok, kswapd will go > * back to sleep. High-order users can still direct reclaim > * if they wish. > */ > > ? Excellent. I strongly like this and I hope merge it to my patch. I'll resend new patch. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 2008-12-31 13:34 ` KOSAKI Motohiro @ 2009-01-01 14:52 ` KOSAKI Motohiro 2009-01-02 9:55 ` MinChan Kim 2009-01-02 11:14 ` Mel Gorman 0 siblings, 2 replies; 20+ messages in thread From: KOSAKI Motohiro @ 2009-01-01 14:52 UTC (permalink / raw) To: Mel Gorman Cc: kosaki.motohiro, LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash > > /* > > * Fragmentation may mean that the system cannot be > > * rebalanced for high-order allocations in all zones. > > * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, > > * it means the zones have been fully scanned and are still > > * not balanced. For high-order allocations, there is > > * little point trying all over again as kswapd may > > * infinite loop. > > * > > * Instead, recheck all watermarks at order-0 as they > > * are the most important. If watermarks are ok, kswapd will go > > * back to sleep. High-order users can still direct reclaim > > * if they wish. > > */ > > > > ? > > Excellent. I strongly like this and I hope merge it to my patch. > I'll resend new patch. Done. == From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation Wassim Dagash reported following kswapd infinite loop problem. kswapd runs in some infinite loop trying to swap until order 10 of zone highmem is OK.... kswapd will continue to try to balance order 10 of zone highmem forever (or until someone release a very large chunk of highmem). For non order-0 allocations, the system may never be balanced due to fragmentation but kswapd should not infinitely loop as a result. Instead, recheck all watermarks at order-0 as they are the most important. If watermarks are ok, kswapd will go back to sleep. Reported-by: wassim dagash <wassim.dagash@gmail.com> Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Reviewed-by: Nick Piggin <npiggin@suse.de> Signed-off-by: Mel Gorman <mel@csn.ul.ie>, --- mm/vmscan.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+) Index: b/mm/vmscan.c =================================================================== --- a/mm/vmscan.c 2008-12-25 08:26:37.000000000 +0900 +++ b/mm/vmscan.c 2009-01-01 01:56:02.000000000 +0900 @@ -1872,6 +1872,23 @@ out: try_to_freeze(); + /* + * Fragmentation may mean that the system cannot be + * rebalanced for high-order allocations in all zones. + * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, + * it means the zones have been fully scanned and are still + * not balanced. For high-order allocations, there is + * little point trying all over again as kswapd may + * infinite loop. + * + * Instead, recheck all watermarks at order-0 as they + * are the most important. If watermarks are ok, kswapd will go + * back to sleep. High-order users can still direct reclaim + * if they wish. + */ + if (nr_reclaimed < SWAP_CLUSTER_MAX) + order = sc.order = 0; + goto loop_again; } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 2009-01-01 14:52 ` [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 KOSAKI Motohiro @ 2009-01-02 9:55 ` MinChan Kim 2009-01-02 10:00 ` KOSAKI Motohiro 2009-01-02 11:14 ` Mel Gorman 1 sibling, 1 reply; 20+ messages in thread From: MinChan Kim @ 2009-01-02 9:55 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Mel Gorman, LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash Hi, kosaki-san. I read the previous threads now. It's rather late :(. I think it's rather awkward that sudden big change of order from 10 to 0. This problem causes zone_water_mark's fail. It mean now this zone's proportional free page per order size is not good. Although order-0 page is very important, Shouldn't we consider other order allocations ? So I want to balance zone's proportional free page. How about following ? if (nr_reclaimed < SWAP_CLUSTER_MAX) { if (order != 0) { order -=1; sc.order -=1; } } It prevents infinite loop and do best effort to make zone's proportional free page per order size good. It's just my opinion within my knowledge. If it have a problem, pz, explain me :) On Thu, Jan 1, 2009 at 11:52 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: > >> > /* >> > * Fragmentation may mean that the system cannot be >> > * rebalanced for high-order allocations in all zones. >> > * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, >> > * it means the zones have been fully scanned and are still >> > * not balanced. For high-order allocations, there is >> > * little point trying all over again as kswapd may >> > * infinite loop. >> > * >> > * Instead, recheck all watermarks at order-0 as they >> > * are the most important. If watermarks are ok, kswapd will go >> > * back to sleep. High-order users can still direct reclaim >> > * if they wish. >> > */ >> > >> > ? >> >> Excellent. I strongly like this and I hope merge it to my patch. >> I'll resend new patch. > > Done. > > > > == > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation > > Wassim Dagash reported following kswapd infinite loop problem. > > kswapd runs in some infinite loop trying to swap until order 10 of zone > highmem is OK.... kswapd will continue to try to balance order 10 of zone > highmem forever (or until someone release a very large chunk of highmem). > > For non order-0 allocations, the system may never be balanced due to > fragmentation but kswapd should not infinitely loop as a result. > > Instead, recheck all watermarks at order-0 as they are the most important. > If watermarks are ok, kswapd will go back to sleep. > > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Reviewed-by: Nick Piggin <npiggin@suse.de> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>, > --- > mm/vmscan.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > Index: b/mm/vmscan.c > =================================================================== > --- a/mm/vmscan.c 2008-12-25 08:26:37.000000000 +0900 > +++ b/mm/vmscan.c 2009-01-01 01:56:02.000000000 +0900 > @@ -1872,6 +1872,23 @@ out: > > try_to_freeze(); > > + /* > + * Fragmentation may mean that the system cannot be > + * rebalanced for high-order allocations in all zones. > + * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, > + * it means the zones have been fully scanned and are still > + * not balanced. For high-order allocations, there is > + * little point trying all over again as kswapd may > + * infinite loop. > + * > + * Instead, recheck all watermarks at order-0 as they > + * are the most important. If watermarks are ok, kswapd will go > + * back to sleep. High-order users can still direct reclaim > + * if they wish. > + */ > + if (nr_reclaimed < SWAP_CLUSTER_MAX) > + order = sc.order = 0; > + > goto loop_again; > } > > > > > -- > To unsubscribe, send a message with 'unsubscribe linux-mm' in > the body to majordomo@kvack.org. For more info on Linux MM, > see: http://www.linux-mm.org/ . > Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> > -- Kinds regards, MinChan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 2009-01-02 9:55 ` MinChan Kim @ 2009-01-02 10:00 ` KOSAKI Motohiro 2009-01-02 10:29 ` MinChan Kim 0 siblings, 1 reply; 20+ messages in thread From: KOSAKI Motohiro @ 2009-01-02 10:00 UTC (permalink / raw) To: MinChan Kim Cc: Mel Gorman, LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash > Hi, kosaki-san. > > I read the previous threads now. It's rather late :(. > > I think it's rather awkward that sudden big change of order from 10 to 0. > > This problem causes zone_water_mark's fail. > It mean now this zone's proportional free page per order size is not good. > Although order-0 page is very important, Shouldn't we consider other > order allocations ? > > So I want to balance zone's proportional free page. > How about following ? > > if (nr_reclaimed < SWAP_CLUSTER_MAX) { > if (order != 0) { > order -=1; > sc.order -=1; > } > } > > It prevents infinite loop and do best effort to make zone's > proportional free page per order size good. > > It's just my opinion within my knowledge. > If it have a problem, pz, explain me :) Please read Nick's expalin. it explain very kindly :) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 2009-01-02 10:00 ` KOSAKI Motohiro @ 2009-01-02 10:29 ` MinChan Kim 2009-01-02 10:54 ` KOSAKI Motohiro 0 siblings, 1 reply; 20+ messages in thread From: MinChan Kim @ 2009-01-02 10:29 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Mel Gorman, LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash On Fri, Jan 2, 2009 at 7:00 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >> Hi, kosaki-san. >> >> I read the previous threads now. It's rather late :(. >> >> I think it's rather awkward that sudden big change of order from 10 to 0. >> >> This problem causes zone_water_mark's fail. >> It mean now this zone's proportional free page per order size is not good. >> Although order-0 page is very important, Shouldn't we consider other >> order allocations ? >> >> So I want to balance zone's proportional free page. >> How about following ? >> >> if (nr_reclaimed < SWAP_CLUSTER_MAX) { >> if (order != 0) { >> order -=1; >> sc.order -=1; >> } >> } >> >> It prevents infinite loop and do best effort to make zone's >> proportional free page per order size good. >> >> It's just my opinion within my knowledge. >> If it have a problem, pz, explain me :) > > Please read Nick's expalin. it explain very kindly :) Hm. I read Nick's explain. I understand his point. Nick said, "A higher kswapd reclaim order shouldn't weaken kswapd postcondition for order-0 memory." My patch don't prevent order-0 memory reclaim. After all, it will do it. It also can do best effort to reclaim other order size. In this case, others order size reclaim is needless ? -- Kinds regards, MinChan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 2009-01-02 10:29 ` MinChan Kim @ 2009-01-02 10:54 ` KOSAKI Motohiro 2009-01-02 11:18 ` MinChan Kim 0 siblings, 1 reply; 20+ messages in thread From: KOSAKI Motohiro @ 2009-01-02 10:54 UTC (permalink / raw) To: MinChan Kim Cc: Mel Gorman, LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash >>> So I want to balance zone's proportional free page. >>> How about following ? >>> >>> if (nr_reclaimed < SWAP_CLUSTER_MAX) { >>> if (order != 0) { >>> order -=1; >>> sc.order -=1; >>> } >>> } >>> >>> It prevents infinite loop and do best effort to make zone's >>> proportional free page per order size good. >>> >>> It's just my opinion within my knowledge. >>> If it have a problem, pz, explain me :) >> >> Please read Nick's expalin. it explain very kindly :) > > Hm. I read Nick's explain. > I understand his point. > > Nick said, > "A higher kswapd reclaim order shouldn't weaken kswapd > postcondition for order-0 memory." > > My patch don't prevent order-0 memory reclaim. After all, it will do it. > It also can do best effort to reclaim other order size. > > In this case, others order size reclaim is needless ? Yes, needless. wakeup_kswapd() function mean - please make free memory until pages_high - and, I want to "order argument" conteniously pages. then, shorter conteniously pages than "order argumet" pages aren't needed by caller. Unfortunately, your patch has more bad side effect. high order shrink_zone() cause lumpy reclaim. lumpy reclaim cause reclaim neighbor pages although it is active page. needlessly active page reclaiming decrease system performance. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 2009-01-02 10:54 ` KOSAKI Motohiro @ 2009-01-02 11:18 ` MinChan Kim 0 siblings, 0 replies; 20+ messages in thread From: MinChan Kim @ 2009-01-02 11:18 UTC (permalink / raw) To: KOSAKI Motohiro Cc: Mel Gorman, LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash On Fri, Jan 2, 2009 at 7:54 PM, KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> wrote: >>>> So I want to balance zone's proportional free page. >>>> How about following ? >>>> >>>> if (nr_reclaimed < SWAP_CLUSTER_MAX) { >>>> if (order != 0) { >>>> order -=1; >>>> sc.order -=1; >>>> } >>>> } >>>> >>>> It prevents infinite loop and do best effort to make zone's >>>> proportional free page per order size good. >>>> >>>> It's just my opinion within my knowledge. >>>> If it have a problem, pz, explain me :) >>> >>> Please read Nick's expalin. it explain very kindly :) >> >> Hm. I read Nick's explain. >> I understand his point. >> >> Nick said, >> "A higher kswapd reclaim order shouldn't weaken kswapd >> postcondition for order-0 memory." >> >> My patch don't prevent order-0 memory reclaim. After all, it will do it. >> It also can do best effort to reclaim other order size. >> >> In this case, others order size reclaim is needless ? > > Yes, needless. > > wakeup_kswapd() function mean > - please make free memory until pages_high > - and, I want to "order argument" conteniously pages. > > then, shorter conteniously pages than "order argumet" pages aren't needed > by caller. > > Unfortunately, your patch has more bad side effect. > high order shrink_zone() cause lumpy reclaim. > lumpy reclaim cause reclaim neighbor pages although it is active page. > > needlessly active page reclaiming decrease system performance. > I agree. It can reclaim active pages. Thanks for kind explain. :) -- Kinds regards, MinChan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 2009-01-01 14:52 ` [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 KOSAKI Motohiro 2009-01-02 9:55 ` MinChan Kim @ 2009-01-02 11:14 ` Mel Gorman 1 sibling, 0 replies; 20+ messages in thread From: Mel Gorman @ 2009-01-02 11:14 UTC (permalink / raw) To: KOSAKI Motohiro; +Cc: LKML, linux-mm, Andrew Morton, Nick Piggin, wassim dagash On Thu, Jan 01, 2009 at 11:52:02PM +0900, KOSAKI Motohiro wrote: > > > > /* > > > * Fragmentation may mean that the system cannot be > > > * rebalanced for high-order allocations in all zones. > > > * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, > > > * it means the zones have been fully scanned and are still > > > * not balanced. For high-order allocations, there is > > > * little point trying all over again as kswapd may > > > * infinite loop. > > > * > > > * Instead, recheck all watermarks at order-0 as they > > > * are the most important. If watermarks are ok, kswapd will go > > > * back to sleep. High-order users can still direct reclaim > > > * if they wish. > > > */ > > > > > > ? > > > > Excellent. I strongly like this and I hope merge it to my patch. > > I'll resend new patch. > > Done. > Looks good, thanks. > > > == > From: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Subject: [PATCH] mm: kswapd stop infinite loop at high order allocation > > Wassim Dagash reported following kswapd infinite loop problem. > > kswapd runs in some infinite loop trying to swap until order 10 of zone > highmem is OK.... kswapd will continue to try to balance order 10 of zone > highmem forever (or until someone release a very large chunk of highmem). > > For non order-0 allocations, the system may never be balanced due to > fragmentation but kswapd should not infinitely loop as a result. > > Instead, recheck all watermarks at order-0 as they are the most important. > If watermarks are ok, kswapd will go back to sleep. > > > Reported-by: wassim dagash <wassim.dagash@gmail.com> > Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> > Reviewed-by: Nick Piggin <npiggin@suse.de> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>, > --- > mm/vmscan.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > Index: b/mm/vmscan.c > =================================================================== > --- a/mm/vmscan.c 2008-12-25 08:26:37.000000000 +0900 > +++ b/mm/vmscan.c 2009-01-01 01:56:02.000000000 +0900 > @@ -1872,6 +1872,23 @@ out: > > try_to_freeze(); > > + /* > + * Fragmentation may mean that the system cannot be > + * rebalanced for high-order allocations in all zones. > + * At this point, if nr_reclaimed < SWAP_CLUSTER_MAX, > + * it means the zones have been fully scanned and are still > + * not balanced. For high-order allocations, there is > + * little point trying all over again as kswapd may > + * infinite loop. > + * > + * Instead, recheck all watermarks at order-0 as they > + * are the most important. If watermarks are ok, kswapd will go > + * back to sleep. High-order users can still direct reclaim > + * if they wish. > + */ > + if (nr_reclaimed < SWAP_CLUSTER_MAX) > + order = sc.order = 0; > + > goto loop_again; > } > > > > -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2009-01-02 11:18 UTC | newest] Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-12-30 10:55 [PATCH] mm: stop kswapd's infinite loop at high order allocation KOSAKI Motohiro 2008-12-30 11:10 ` Nick Piggin 2008-12-30 18:59 ` Mel Gorman 2008-12-31 1:32 ` Nick Piggin 2008-12-31 11:06 ` Mel Gorman 2008-12-31 11:16 ` Nick Piggin 2008-12-31 12:11 ` Mel Gorman 2008-12-31 4:54 ` KOSAKI Motohiro 2008-12-31 8:59 ` wassim dagash 2008-12-31 12:05 ` Mel Gorman 2008-12-31 12:24 ` wassim dagash 2008-12-31 11:53 ` Mel Gorman 2008-12-31 13:34 ` KOSAKI Motohiro 2009-01-01 14:52 ` [PATCH] mm: stop kswapd's infinite loop at high order allocation take2 KOSAKI Motohiro 2009-01-02 9:55 ` MinChan Kim 2009-01-02 10:00 ` KOSAKI Motohiro 2009-01-02 10:29 ` MinChan Kim 2009-01-02 10:54 ` KOSAKI Motohiro 2009-01-02 11:18 ` MinChan Kim 2009-01-02 11:14 ` Mel Gorman
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox