* [PATCH]vmscan: fix a livelock in kswapd
@ 2011-07-19 7:09 Shaohua Li
2011-07-19 8:39 ` Mel Gorman
` (2 more replies)
0 siblings, 3 replies; 10+ messages in thread
From: Shaohua Li @ 2011-07-19 7:09 UTC (permalink / raw)
To: Andrew Morton; +Cc: mgorman, linux-mm, lkml
I'm running a workload which triggers a lot of swap in a machine with 4 nodes.
After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or
kswapd2 are keeping running and I can't access filesystem, but most memory is
free. This looks like a regression since commit 08951e545918c159.
Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for
classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if
all zones have watermark ok, end_zone will keep 0.
Later sleeping_prematurely() always returns true. Because this is an order 3
wakeup, and if classzone_idx is 0, both balanced_pages and present_pages
in pgdat_balanced() are 0.
We add a special case here. If a zone has no page, we think it's balanced. This
fixes the livelock.
Signed-off-by: Shaohua Li <shaohua.li@intel.com>
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 5ed24b9..ad4056f 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2310,7 +2310,8 @@ static bool pgdat_balanced(pg_data_t *pgdat, unsigned long balanced_pages,
for (i = 0; i <= classzone_idx; i++)
present_pages += pgdat->node_zones[i].present_pages;
- return balanced_pages > (present_pages >> 2);
+ /* A special case here: if zone has no page, we think it's balanced */
+ return balanced_pages >= (present_pages >> 2);
}
/* is kswapd sleeping prematurely? */
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-19 7:09 [PATCH]vmscan: fix a livelock in kswapd Shaohua Li @ 2011-07-19 8:39 ` Mel Gorman 2011-07-19 8:45 ` Minchan Kim 2011-07-20 5:44 ` Minchan Kim 2 siblings, 0 replies; 10+ messages in thread From: Mel Gorman @ 2011-07-19 8:39 UTC (permalink / raw) To: Shaohua Li; +Cc: Andrew Morton, linux-mm, lkml On Tue, Jul 19, 2011 at 03:09:27PM +0800, Shaohua Li wrote: > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > kswapd2 are keeping running and I can't access filesystem, but most memory is > free. This looks like a regression since commit 08951e545918c159. > Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for > classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if > all zones have watermark ok, end_zone will keep 0. > Later sleeping_prematurely() always returns true. Because this is an order 3 > wakeup, and if classzone_idx is 0, both balanced_pages and present_pages > in pgdat_balanced() are 0. > We add a special case here. If a zone has no page, we think it's balanced. This > fixes the livelock. > > Signed-off-by: Shaohua Li <shaohua.li@intel.com> > Acked-by: Mel Gorman <mgorman@suse.de> It's also needed for 3.0 and 2.6.39-stable I believe. -- Mel Gorman SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-19 7:09 [PATCH]vmscan: fix a livelock in kswapd Shaohua Li 2011-07-19 8:39 ` Mel Gorman @ 2011-07-19 8:45 ` Minchan Kim 2011-07-19 8:53 ` Shaohua Li 2011-07-20 5:44 ` Minchan Kim 2 siblings, 1 reply; 10+ messages in thread From: Minchan Kim @ 2011-07-19 8:45 UTC (permalink / raw) To: Shaohua Li; +Cc: Andrew Morton, mgorman, linux-mm, lkml On Tue, Jul 19, 2011 at 4:09 PM, Shaohua Li <shaohua.li@intel.com> wrote: > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > kswapd2 are keeping running and I can't access filesystem, but most memory is > free. This looks like a regression since commit 08951e545918c159. Could you tell me what is 08951e545918c159? You mean [ebd64e21ec5a, mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully] ? -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-19 8:45 ` Minchan Kim @ 2011-07-19 8:53 ` Shaohua Li 2011-07-19 16:51 ` Minchan Kim 0 siblings, 1 reply; 10+ messages in thread From: Shaohua Li @ 2011-07-19 8:53 UTC (permalink / raw) To: Minchan Kim; +Cc: Andrew Morton, mgorman, linux-mm, lkml On Tue, 2011-07-19 at 16:45 +0800, Minchan Kim wrote: > On Tue, Jul 19, 2011 at 4:09 PM, Shaohua Li <shaohua.li@intel.com> wrote: > > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > > kswapd2 are keeping running and I can't access filesystem, but most memory is > > free. This looks like a regression since commit 08951e545918c159. > > Could you tell me what is 08951e545918c159? > You mean [ebd64e21ec5a, > mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully] > ? ha, sorry, I should copy the commit title. 08951e545918c159(mm: vmscan: correct check for kswapd sleeping in sleeping_prematurely) -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-19 8:53 ` Shaohua Li @ 2011-07-19 16:51 ` Minchan Kim 2011-07-20 0:43 ` Shaohua Li 0 siblings, 1 reply; 10+ messages in thread From: Minchan Kim @ 2011-07-19 16:51 UTC (permalink / raw) To: Shaohua Li; +Cc: Andrew Morton, mgorman, linux-mm, lkml On Tue, Jul 19, 2011 at 04:53:04PM +0800, Shaohua Li wrote: > On Tue, 2011-07-19 at 16:45 +0800, Minchan Kim wrote: > > On Tue, Jul 19, 2011 at 4:09 PM, Shaohua Li <shaohua.li@intel.com> wrote: > > > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > > > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > > > kswapd2 are keeping running and I can't access filesystem, but most memory is > > > free. This looks like a regression since commit 08951e545918c159. > > > > Could you tell me what is 08951e545918c159? > > You mean [ebd64e21ec5a, > > mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully] > > ? > ha, sorry, I should copy the commit title. > 08951e545918c159(mm: vmscan: correct check for kswapd sleeping in > sleeping_prematurely) > I don't mean it. In my bogus git tree, I can't find it but I can look at it in repaired git tree. :) Anyway, I have a comment. Please look at below. On Tue, Jul 19, 2011 at 03:09:27PM +0800, Shaohua Li wrote: > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > kswapd2 are keeping running and I can't access filesystem, but most memory is > free. This looks like a regression since commit 08951e545918c159. > Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for > classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if > all zones have watermark ok, end_zone will keep 0. > Later sleeping_prematurely() always returns true. Because this is an order 3 > wakeup, and if classzone_idx is 0, both balanced_pages and present_pages > in pgdat_balanced() are 0. Sigh. Yes. > We add a special case here. If a zone has no page, we think it's balanced. This > fixes the livelock. Yes. Your patch can fix it but I don't like that it adds handling special case. (Although Andrew merged quickly). The problem is to return 0-classzone_idx if all zones was okay. So how about this? This can change old behavior slightly. For example, if balance_pgdat calls with order-3 and all zones are okay about order-3, it will recheck order-0 as end_zone isn't 0 any more. But I think it's desriable side effect we have missed. diff --git a/mm/vmscan.c b/mm/vmscan.c index 5ed24b9..cfef52b 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2389,7 +2389,7 @@ static unsigned long balance_pgdat(pg_data_t *pgdat, int order, unsigned long balanced; int priority; int i; - int end_zone = 0; /* Inclusive. 0 = ZONE_DMA */ + int end_zone = *classzone_idx; unsigned long total_scanned; struct reclaim_state *reclaim_state = current->reclaim_state; unsigned long nr_soft_reclaimed; -- Kinds regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-19 16:51 ` Minchan Kim @ 2011-07-20 0:43 ` Shaohua Li 2011-07-20 4:09 ` Minchan Kim 0 siblings, 1 reply; 10+ messages in thread From: Shaohua Li @ 2011-07-20 0:43 UTC (permalink / raw) To: Minchan Kim; +Cc: Andrew Morton, mgorman, linux-mm, lkml On Wed, 2011-07-20 at 00:51 +0800, Minchan Kim wrote: > On Tue, Jul 19, 2011 at 04:53:04PM +0800, Shaohua Li wrote: > > On Tue, 2011-07-19 at 16:45 +0800, Minchan Kim wrote: > > > On Tue, Jul 19, 2011 at 4:09 PM, Shaohua Li <shaohua.li@intel.com> wrote: > > > > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > > > > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > > > > kswapd2 are keeping running and I can't access filesystem, but most memory is > > > > free. This looks like a regression since commit 08951e545918c159. > > > > > > Could you tell me what is 08951e545918c159? > > > You mean [ebd64e21ec5a, > > > mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully] > > > ? > > ha, sorry, I should copy the commit title. > > 08951e545918c159(mm: vmscan: correct check for kswapd sleeping in > > sleeping_prematurely) > > > > I don't mean it. In my bogus git tree, I can't find it but I can look at it in repaired git tree. :) > Anyway, I have a comment. Please look at below. > > On Tue, Jul 19, 2011 at 03:09:27PM +0800, Shaohua Li wrote: > > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > > kswapd2 are keeping running and I can't access filesystem, but most memory is > > free. This looks like a regression since commit 08951e545918c159. > > Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for > > classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if > > all zones have watermark ok, end_zone will keep 0. > > Later sleeping_prematurely() always returns true. Because this is an order 3 > > wakeup, and if classzone_idx is 0, both balanced_pages and present_pages > > in pgdat_balanced() are 0. > > Sigh. Yes. > > > We add a special case here. If a zone has no page, we think it's balanced. This > > fixes the livelock. > > Yes. Your patch can fix it but I don't like that it adds handling special case. > (Although Andrew merged quickly). The special case is reasonable, because if a zone has no page, it should be considered balanced. > The problem is to return 0-classzone_idx if all zones was okay. > So how about this? My original implementation is like this (I return a populated zone with minimum zone index). But I changed my mind later. the end_zone is zone we work, so return 0 is reasonable, because all zones are ok. Maybe we should return -1 if all zones are ok, but this is another story. > This can change old behavior slightly. > For example, if balance_pgdat calls with order-3 and all zones are okay about order-3, > it will recheck order-0 as end_zone isn't 0 any more. > But I think it's desriable side effect we have missed. if order-3 is ok, order-0 is ok too I think, so the check is unnecessary. Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-20 0:43 ` Shaohua Li @ 2011-07-20 4:09 ` Minchan Kim 2011-07-20 5:18 ` Shaohua Li 0 siblings, 1 reply; 10+ messages in thread From: Minchan Kim @ 2011-07-20 4:09 UTC (permalink / raw) To: Shaohua Li; +Cc: Andrew Morton, mgorman, linux-mm, lkml On Wed, Jul 20, 2011 at 9:43 AM, Shaohua Li <shaohua.li@intel.com> wrote: > On Wed, 2011-07-20 at 00:51 +0800, Minchan Kim wrote: >> On Tue, Jul 19, 2011 at 04:53:04PM +0800, Shaohua Li wrote: >> > On Tue, 2011-07-19 at 16:45 +0800, Minchan Kim wrote: >> > > On Tue, Jul 19, 2011 at 4:09 PM, Shaohua Li <shaohua.li@intel.com> wrote: >> > > > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. >> > > > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or >> > > > kswapd2 are keeping running and I can't access filesystem, but most memory is >> > > > free. This looks like a regression since commit 08951e545918c159. >> > > >> > > Could you tell me what is 08951e545918c159? >> > > You mean [ebd64e21ec5a, >> > > mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully] >> > > ? >> > ha, sorry, I should copy the commit title. >> > 08951e545918c159(mm: vmscan: correct check for kswapd sleeping in >> > sleeping_prematurely) >> > >> >> I don't mean it. In my bogus git tree, I can't find it but I can look at it in repaired git tree. :) >> Anyway, I have a comment. Please look at below. >> >> On Tue, Jul 19, 2011 at 03:09:27PM +0800, Shaohua Li wrote: >> > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. >> > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or >> > kswapd2 are keeping running and I can't access filesystem, but most memory is >> > free. This looks like a regression since commit 08951e545918c159. >> > Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for >> > classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if >> > all zones have watermark ok, end_zone will keep 0. >> > Later sleeping_prematurely() always returns true. Because this is an order 3 >> > wakeup, and if classzone_idx is 0, both balanced_pages and present_pages >> > in pgdat_balanced() are 0. >> >> Sigh. Yes. >> >> > We add a special case here. If a zone has no page, we think it's balanced. This >> > fixes the livelock. >> >> Yes. Your patch can fix it but I don't like that it adds handling special case. >> (Although Andrew merged quickly). > The special case is reasonable, because if a zone has no page, it should > be considered balanced. Yes. It's not bad and even simple but my concern is that at the moment kswapd code is very complicated and it's not hot path so I would like to add more readable code. > >> The problem is to return 0-classzone_idx if all zones was okay. >> So how about this? > My original implementation is like this (I return a populated zone with > minimum zone index). But I changed my mind later. the end_zone is zone > we work, so return 0 is reasonable, because all zones are ok. Maybe we If it is reasonable, did you work on ZONE_DMA(zone index: 0)? > should return -1 if all zones are ok, but this is another story. I think that return classzone_id(-1) and handle such case is more readable. > >> This can change old behavior slightly. >> For example, if balance_pgdat calls with order-3 and all zones are okay about order-3, >> it will recheck order-0 as end_zone isn't 0 any more. >> But I think it's desriable side effect we have missed. > if order-3 is ok, order-0 is ok too I think, so the check is > unnecessary. No. It's not for the zone but *zones. In case of reclaiming higher order zone, it can sleep without all zones being balanced so that precious order-0 of some zone would be not balanced. Even we can lost chance of clearing congestion flag of the zone. It would be a another patch. In conclusion, I would like to avoid complicated thing but I am going to be not against you strongly if other doesn't agree on me. I might need a time to clean kswapd's spagetti up. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-20 4:09 ` Minchan Kim @ 2011-07-20 5:18 ` Shaohua Li 2011-07-20 5:43 ` Minchan Kim 0 siblings, 1 reply; 10+ messages in thread From: Shaohua Li @ 2011-07-20 5:18 UTC (permalink / raw) To: Minchan Kim; +Cc: Andrew Morton, mgorman, linux-mm, lkml On Wed, 2011-07-20 at 12:09 +0800, Minchan Kim wrote: > On Wed, Jul 20, 2011 at 9:43 AM, Shaohua Li <shaohua.li@intel.com> wrote: > > On Wed, 2011-07-20 at 00:51 +0800, Minchan Kim wrote: > >> On Tue, Jul 19, 2011 at 04:53:04PM +0800, Shaohua Li wrote: > >> > On Tue, 2011-07-19 at 16:45 +0800, Minchan Kim wrote: > >> > > On Tue, Jul 19, 2011 at 4:09 PM, Shaohua Li <shaohua.li@intel.com> wrote: > >> > > > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > >> > > > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > >> > > > kswapd2 are keeping running and I can't access filesystem, but most memory is > >> > > > free. This looks like a regression since commit 08951e545918c159. > >> > > > >> > > Could you tell me what is 08951e545918c159? > >> > > You mean [ebd64e21ec5a, > >> > > mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully] > >> > > ? > >> > ha, sorry, I should copy the commit title. > >> > 08951e545918c159(mm: vmscan: correct check for kswapd sleeping in > >> > sleeping_prematurely) > >> > > >> > >> I don't mean it. In my bogus git tree, I can't find it but I can look at it in repaired git tree. :) > >> Anyway, I have a comment. Please look at below. > >> > >> On Tue, Jul 19, 2011 at 03:09:27PM +0800, Shaohua Li wrote: > >> > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > >> > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > >> > kswapd2 are keeping running and I can't access filesystem, but most memory is > >> > free. This looks like a regression since commit 08951e545918c159. > >> > Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for > >> > classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if > >> > all zones have watermark ok, end_zone will keep 0. > >> > Later sleeping_prematurely() always returns true. Because this is an order 3 > >> > wakeup, and if classzone_idx is 0, both balanced_pages and present_pages > >> > in pgdat_balanced() are 0. > >> > >> Sigh. Yes. > >> > >> > We add a special case here. If a zone has no page, we think it's balanced. This > >> > fixes the livelock. > >> > >> Yes. Your patch can fix it but I don't like that it adds handling special case. > >> (Although Andrew merged quickly). > > The special case is reasonable, because if a zone has no page, it should > > be considered balanced. > > Yes. It's not bad and even simple but my concern is that at the moment > kswapd code is very complicated and it's not hot path so I would like > to add more readable code. > > > > >> The problem is to return 0-classzone_idx if all zones was okay. > >> So how about this? > > My original implementation is like this (I return a populated zone with > > minimum zone index). But I changed my mind later. the end_zone is zone > > we work, so return 0 is reasonable, because all zones are ok. Maybe we > > If it is reasonable, did you work on ZONE_DMA(zone index: 0)? return -1 can help. > > should return -1 if all zones are ok, but this is another story. > > I think that return classzone_id(-1) and handle such case is more readable. sure, we need another patch to clean up it. > > > >> This can change old behavior slightly. > >> For example, if balance_pgdat calls with order-3 and all zones are okay about order-3, > >> it will recheck order-0 as end_zone isn't 0 any more. > >> But I think it's desriable side effect we have missed. > > if order-3 is ok, order-0 is ok too I think, so the check is > > unnecessary. > > No. It's not for the zone but *zones. > In case of reclaiming higher order zone, it can sleep without all > zones being balanced so that precious order-0 of some zone would be > not balanced. when balance_pgdat() skips the loop for higher order zone, it already sets end_zone, so I thought this isn't a problem. > Even we can lost chance of clearing congestion flag of the zone. > It would be a another patch. yep, the congestion flag clearing is a bit confusing. I don't even know why just do it in high order allocation. If all zones are ok, we should clear the flag regardless the order. > In conclusion, I would like to avoid complicated thing but I am going > to be not against you strongly if other doesn't agree on me. > I might need a time to clean kswapd's spagetti up. ok, understand it. I have similar concerns actually. I thought my patch is simple enough to solve the livelock. But we do have space to cleanup balance_pgdat(). Thanks, Shaohua -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-20 5:18 ` Shaohua Li @ 2011-07-20 5:43 ` Minchan Kim 0 siblings, 0 replies; 10+ messages in thread From: Minchan Kim @ 2011-07-20 5:43 UTC (permalink / raw) To: Shaohua Li; +Cc: Andrew Morton, mgorman, linux-mm, lkml On Wed, Jul 20, 2011 at 2:18 PM, Shaohua Li <shaohua.li@intel.com> wrote: > On Wed, 2011-07-20 at 12:09 +0800, Minchan Kim wrote: >> On Wed, Jul 20, 2011 at 9:43 AM, Shaohua Li <shaohua.li@intel.com> wrote: >> > On Wed, 2011-07-20 at 00:51 +0800, Minchan Kim wrote: >> >> On Tue, Jul 19, 2011 at 04:53:04PM +0800, Shaohua Li wrote: >> >> > On Tue, 2011-07-19 at 16:45 +0800, Minchan Kim wrote: >> >> > > On Tue, Jul 19, 2011 at 4:09 PM, Shaohua Li <shaohua.li@intel.com> wrote: >> >> > > > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. >> >> > > > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or >> >> > > > kswapd2 are keeping running and I can't access filesystem, but most memory is >> >> > > > free. This looks like a regression since commit 08951e545918c159. >> >> > > >> >> > > Could you tell me what is 08951e545918c159? >> >> > > You mean [ebd64e21ec5a, >> >> > > mm-vmscan-only-read-new_classzone_idx-from-pgdat-when-reclaiming-successfully] >> >> > > ? >> >> > ha, sorry, I should copy the commit title. >> >> > 08951e545918c159(mm: vmscan: correct check for kswapd sleeping in >> >> > sleeping_prematurely) >> >> > >> >> >> >> I don't mean it. In my bogus git tree, I can't find it but I can look at it in repaired git tree. :) >> >> Anyway, I have a comment. Please look at below. >> >> >> >> On Tue, Jul 19, 2011 at 03:09:27PM +0800, Shaohua Li wrote: >> >> > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. >> >> > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or >> >> > kswapd2 are keeping running and I can't access filesystem, but most memory is >> >> > free. This looks like a regression since commit 08951e545918c159. >> >> > Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for >> >> > classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if >> >> > all zones have watermark ok, end_zone will keep 0. >> >> > Later sleeping_prematurely() always returns true. Because this is an order 3 >> >> > wakeup, and if classzone_idx is 0, both balanced_pages and present_pages >> >> > in pgdat_balanced() are 0. >> >> >> >> Sigh. Yes. >> >> >> >> > We add a special case here. If a zone has no page, we think it's balanced. This >> >> > fixes the livelock. >> >> >> >> Yes. Your patch can fix it but I don't like that it adds handling special case. >> >> (Although Andrew merged quickly). >> > The special case is reasonable, because if a zone has no page, it should >> > be considered balanced. >> >> Yes. It's not bad and even simple but my concern is that at the moment >> kswapd code is very complicated and it's not hot path so I would like >> to add more readable code. >> >> > >> >> The problem is to return 0-classzone_idx if all zones was okay. >> >> So how about this? >> > My original implementation is like this (I return a populated zone with >> > minimum zone index). But I changed my mind later. the end_zone is zone >> > we work, so return 0 is reasonable, because all zones are ok. Maybe we >> >> If it is reasonable, did you work on ZONE_DMA(zone index: 0)? > return -1 can help. > >> > should return -1 if all zones are ok, but this is another story. >> >> I think that return classzone_id(-1) and handle such case is more readable. > sure, we need another patch to clean up it. > >> > >> >> This can change old behavior slightly. >> >> For example, if balance_pgdat calls with order-3 and all zones are okay about order-3, >> >> it will recheck order-0 as end_zone isn't 0 any more. >> >> But I think it's desriable side effect we have missed. >> > if order-3 is ok, order-0 is ok too I think, so the check is >> > unnecessary. >> >> No. It's not for the zone but *zones. >> In case of reclaiming higher order zone, it can sleep without all >> zones being balanced so that precious order-0 of some zone would be >> not balanced. > when balance_pgdat() skips the loop for higher order zone, it already > sets end_zone, so I thought this isn't a problem. In case of higher order zone reclaiming, we just make sure 25% of zones are balanced but not for order-0 on *all* zones. > >> Even we can lost chance of clearing congestion flag of the zone. >> It would be a another patch. > yep, the congestion flag clearing is a bit confusing. I don't even know > why just do it in high order allocation. If all zones are ok, we should > clear the flag regardless the order. We do in not-high-order allocation as well as high order allocation if the zone is watermark_okay. > >> In conclusion, I would like to avoid complicated thing but I am going >> to be not against you strongly if other doesn't agree on me. >> I might need a time to clean kswapd's spagetti up. > ok, understand it. I have similar concerns actually. I thought my patch > is simple enough to solve the livelock. But we do have space to cleanup > balance_pgdat(). Okay. I will put it in my TODO. Thanks, Shaohua. -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH]vmscan: fix a livelock in kswapd 2011-07-19 7:09 [PATCH]vmscan: fix a livelock in kswapd Shaohua Li 2011-07-19 8:39 ` Mel Gorman 2011-07-19 8:45 ` Minchan Kim @ 2011-07-20 5:44 ` Minchan Kim 2 siblings, 0 replies; 10+ messages in thread From: Minchan Kim @ 2011-07-20 5:44 UTC (permalink / raw) To: Shaohua Li; +Cc: Andrew Morton, mgorman, linux-mm, lkml On Tue, Jul 19, 2011 at 4:09 PM, Shaohua Li <shaohua.li@intel.com> wrote: > I'm running a workload which triggers a lot of swap in a machine with 4 nodes. > After I kill the workload, I found a kswapd livelock. Sometimes kswapd3 or > kswapd2 are keeping running and I can't access filesystem, but most memory is > free. This looks like a regression since commit 08951e545918c159. > Node 2 and 3 have only ZONE_NORMAL, but balance_pgdat() will return 0 for > classzone_idx. The reason is end_zone in balance_pgdat() is 0 by default, if > all zones have watermark ok, end_zone will keep 0. > Later sleeping_prematurely() always returns true. Because this is an order 3 > wakeup, and if classzone_idx is 0, both balanced_pages and present_pages > in pgdat_balanced() are 0. > We add a special case here. If a zone has no page, we think it's balanced. This > fixes the livelock. > > Signed-off-by: Shaohua Li <shaohua.li@intel.com> Reviewed-by: Minchan Kim <minchan.kim@gmail.com> -- Kind regards, Minchan Kim -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Fight unfair telecom internet charges in Canada: sign http://stopthemeter.ca/ Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2011-07-20 5:44 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2011-07-19 7:09 [PATCH]vmscan: fix a livelock in kswapd Shaohua Li 2011-07-19 8:39 ` Mel Gorman 2011-07-19 8:45 ` Minchan Kim 2011-07-19 8:53 ` Shaohua Li 2011-07-19 16:51 ` Minchan Kim 2011-07-20 0:43 ` Shaohua Li 2011-07-20 4:09 ` Minchan Kim 2011-07-20 5:18 ` Shaohua Li 2011-07-20 5:43 ` Minchan Kim 2011-07-20 5:44 ` Minchan Kim
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox