* [PATCH] mm: limit lowmem_reserve [not found] ` <200604041235.59876.kernel@kolivas.org> @ 2006-04-06 1:10 ` Con Kolivas 2006-04-06 1:29 ` Respin: " Con Kolivas 2006-04-07 6:25 ` Nick Piggin 0 siblings, 2 replies; 19+ messages in thread From: Con Kolivas @ 2006-04-06 1:10 UTC (permalink / raw) To: Andrew Morton; +Cc: ck, Nick Piggin, linux list, linux-mm It is possible with a low enough lowmem_reserve ratio to make zone_watermark_ok always fail if the lower_zone is small enough. Impose a lower limit on the ratio to only allow 1/4 of the lower_zone size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing the default vmsplit on i386 even without changing the default sysctl values. Signed-off-by: Con Kolivas <kernel@kolivas.org> --- mm/page_alloc.c | 24 +++++++++++++++++++++--- 1 files changed, 21 insertions(+), 3 deletions(-) Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c =================================================================== --- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c 2006-04-06 10:32:31.000000000 +1000 +++ linux-2.6.17-rc1-mm1/mm/page_alloc.c 2006-04-06 11:09:17.000000000 +1000 @@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv zone->lowmem_reserve[j] = 0; for (idx = j-1; idx >= 0; idx--) { + unsigned long max_reserve; + unsigned long reserve; struct zone *lower_zone; + lower_zone = pgdat->node_zones + idx; + /* + * Put an upper limit on the reserve at 1/4 + * the lower_zone size. This prevents large + * zone size differences such as 3G VMSPLIT + * or low sysctl values from making + * zone_watermark_ok always fail. This + * enforces a lower limit on the reserve_ratio + */ + max_reserve = lower_zone->present_pages / 4; + if (sysctl_lowmem_reserve_ratio[idx] < 1) sysctl_lowmem_reserve_ratio[idx] = 1; - - lower_zone = pgdat->node_zones + idx; - lower_zone->lowmem_reserve[j] = present_pages / + reserve = present_pages / sysctl_lowmem_reserve_ratio[idx]; + if (reserve > max_reserve) { + reserve = max_reserve; + sysctl_lowmem_reserve_ratio[idx] = + present_pages / max_reserve; + } + + lower_zone->lowmem_reserve[j] = reserve; present_pages += lower_zone->present_pages; } } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Respin: [PATCH] mm: limit lowmem_reserve 2006-04-06 1:10 ` [PATCH] mm: limit lowmem_reserve Con Kolivas @ 2006-04-06 1:29 ` Con Kolivas 2006-04-06 2:43 ` Andrew Morton 2006-04-07 6:25 ` Nick Piggin 1 sibling, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-04-06 1:29 UTC (permalink / raw) To: linux-kernel; +Cc: Andrew Morton, ck, Nick Piggin, linux-mm Err zone needs to have some pages too sorry. Respin --- It is possible with a low enough lowmem_reserve ratio to make zone_watermark_ok fail repeatedly if the lower_zone is small enough. Impose a lower limit on the ratio to only allow 1/4 of the lower_zone size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing the default vmsplit on i386 even without changing the default sysctl values. Signed-off-by: Con Kolivas <kernel@kolivas.org> --- mm/page_alloc.c | 24 +++++++++++++++++++++--- 1 files changed, 21 insertions(+), 3 deletions(-) Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c =================================================================== --- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c 2006-04-06 10:32:31.000000000 +1000 +++ linux-2.6.17-rc1-mm1/mm/page_alloc.c 2006-04-06 11:28:11.000000000 +1000 @@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv zone->lowmem_reserve[j] = 0; for (idx = j-1; idx >= 0; idx--) { + unsigned long max_reserve; + unsigned long reserve; struct zone *lower_zone; + lower_zone = pgdat->node_zones + idx; + /* + * Put an upper limit on the reserve at 1/4 + * the lower_zone size. This prevents large + * zone size differences such as 3G VMSPLIT + * or low sysctl values from making + * zone_watermark_ok always fail. This + * enforces a lower limit on the reserve_ratio + */ + max_reserve = lower_zone->present_pages / 4; + if (sysctl_lowmem_reserve_ratio[idx] < 1) sysctl_lowmem_reserve_ratio[idx] = 1; - - lower_zone = pgdat->node_zones + idx; - lower_zone->lowmem_reserve[j] = present_pages / + reserve = present_pages / sysctl_lowmem_reserve_ratio[idx]; + if (max_reserve && reserve > max_reserve) { + reserve = max_reserve; + sysctl_lowmem_reserve_ratio[idx] = + present_pages / max_reserve; + } + + lower_zone->lowmem_reserve[j] = reserve; present_pages += lower_zone->present_pages; } } -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Respin: [PATCH] mm: limit lowmem_reserve 2006-04-06 1:29 ` Respin: " Con Kolivas @ 2006-04-06 2:43 ` Andrew Morton 2006-04-06 2:55 ` Con Kolivas 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2006-04-06 2:43 UTC (permalink / raw) To: Con Kolivas; +Cc: linux-kernel, ck, nickpiggin, linux-mm Con Kolivas <kernel@kolivas.org> wrote: > > It is possible with a low enough lowmem_reserve ratio to make > zone_watermark_ok fail repeatedly if the lower_zone is small enough. Is that actually a problem? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Respin: [PATCH] mm: limit lowmem_reserve 2006-04-06 2:43 ` Andrew Morton @ 2006-04-06 2:55 ` Con Kolivas 2006-04-06 2:58 ` Con Kolivas 0 siblings, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-04-06 2:55 UTC (permalink / raw) To: Andrew Morton; +Cc: linux-kernel, ck, nickpiggin, linux-mm On Thursday 06 April 2006 12:43, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > It is possible with a low enough lowmem_reserve ratio to make > > zone_watermark_ok fail repeatedly if the lower_zone is small enough. > > Is that actually a problem? Every single call to get_page_from_freelist will call on zone reclaim. It seems a problem to me if every call to __alloc_pages will do that? Cheers, Con -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Respin: [PATCH] mm: limit lowmem_reserve 2006-04-06 2:55 ` Con Kolivas @ 2006-04-06 2:58 ` Con Kolivas 2006-04-06 3:40 ` Andrew Morton 0 siblings, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-04-06 2:58 UTC (permalink / raw) To: ck; +Cc: Andrew Morton, nickpiggin, linux-kernel, linux-mm On Thursday 06 April 2006 12:55, Con Kolivas wrote: > On Thursday 06 April 2006 12:43, Andrew Morton wrote: > > Con Kolivas <kernel@kolivas.org> wrote: > > > It is possible with a low enough lowmem_reserve ratio to make > > > zone_watermark_ok fail repeatedly if the lower_zone is small enough. > > > > Is that actually a problem? > > Every single call to get_page_from_freelist will call on zone reclaim. It > seems a problem to me if every call to __alloc_pages will do that? every call to __alloc_pages of that zone I mean Cheers, Con -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Respin: [PATCH] mm: limit lowmem_reserve 2006-04-06 2:58 ` Con Kolivas @ 2006-04-06 3:40 ` Andrew Morton 2006-04-06 4:36 ` Con Kolivas 0 siblings, 1 reply; 19+ messages in thread From: Andrew Morton @ 2006-04-06 3:40 UTC (permalink / raw) To: Con Kolivas; +Cc: ck, nickpiggin, linux-kernel, linux-mm Con Kolivas <kernel@kolivas.org> wrote: > > On Thursday 06 April 2006 12:55, Con Kolivas wrote: > > On Thursday 06 April 2006 12:43, Andrew Morton wrote: > > > Con Kolivas <kernel@kolivas.org> wrote: > > > > It is possible with a low enough lowmem_reserve ratio to make > > > > zone_watermark_ok fail repeatedly if the lower_zone is small enough. > > > > > > Is that actually a problem? > > > > Every single call to get_page_from_freelist will call on zone reclaim. It > > seems a problem to me if every call to __alloc_pages will do that? > > every call to __alloc_pages of that zone I mean > One would need to check with the NUMA guys. zone_reclaim() has a (lame-looking) timer in there to prevent it from doing too much work. That, or I'm missing something. This problem wasn't particularly well described, sorry. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Respin: [PATCH] mm: limit lowmem_reserve 2006-04-06 3:40 ` Andrew Morton @ 2006-04-06 4:36 ` Con Kolivas 2006-04-06 4:52 ` Con Kolivas 0 siblings, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-04-06 4:36 UTC (permalink / raw) To: Andrew Morton; +Cc: ck, nickpiggin, linux-kernel, linux-mm On Thursday 06 April 2006 13:40, Andrew Morton wrote: > Con Kolivas <kernel@kolivas.org> wrote: > > On Thursday 06 April 2006 12:55, Con Kolivas wrote: > > > On Thursday 06 April 2006 12:43, Andrew Morton wrote: > > > > Con Kolivas <kernel@kolivas.org> wrote: > > > > > It is possible with a low enough lowmem_reserve ratio to make > > > > > zone_watermark_ok fail repeatedly if the lower_zone is small > > > > > enough. > > > > > > > > Is that actually a problem? > > > > > > Every single call to get_page_from_freelist will call on zone reclaim. > > > It seems a problem to me if every call to __alloc_pages will do that? > > > > every call to __alloc_pages of that zone I mean > > One would need to check with the NUMA guys. zone_reclaim() has a > (lame-looking) timer in there to prevent it from doing too much work. > > That, or I'm missing something. This problem wasn't particularly well > described, sorry. Ah ok. This all came about because I'm trying to honour the lowmem_reserve better in swap_prefetch at Nick's request. It's hard to honour a watermark that on some configurations is never reached. Cheers, Con -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: Respin: [PATCH] mm: limit lowmem_reserve 2006-04-06 4:36 ` Con Kolivas @ 2006-04-06 4:52 ` Con Kolivas 0 siblings, 0 replies; 19+ messages in thread From: Con Kolivas @ 2006-04-06 4:52 UTC (permalink / raw) To: ck; +Cc: Andrew Morton, nickpiggin, linux-kernel, linux-mm On Thursday 06 April 2006 14:36, Con Kolivas wrote: > On Thursday 06 April 2006 13:40, Andrew Morton wrote: > > Con Kolivas <kernel@kolivas.org> wrote: > > > On Thursday 06 April 2006 12:55, Con Kolivas wrote: > > > > On Thursday 06 April 2006 12:43, Andrew Morton wrote: > > > > > Con Kolivas <kernel@kolivas.org> wrote: > > > > > > It is possible with a low enough lowmem_reserve ratio to make > > > > > > zone_watermark_ok fail repeatedly if the lower_zone is small > > > > > > enough. > > > > > > > > > > Is that actually a problem? > > > > > > > > Every single call to get_page_from_freelist will call on zone > > > > reclaim. It seems a problem to me if every call to __alloc_pages will > > > > do that? > > > > > > every call to __alloc_pages of that zone I mean > > > > One would need to check with the NUMA guys. zone_reclaim() has a > > (lame-looking) timer in there to prevent it from doing too much work. > > > > That, or I'm missing something. This problem wasn't particularly well > > described, sorry. > > Ah ok. This all came about because I'm trying to honour the lowmem_reserve > better in swap_prefetch at Nick's request. It's hard to honour a watermark > that on some configurations is never reached. Forget that. If the numa people don't care about it I shouldn't touch it. I thought I was doing something helpful at the source but got no response from Nick or the the other numa_ids out there so they obviously don't care. I'll tackle it differently in swap prefetch. Cheers, Con -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-04-06 1:10 ` [PATCH] mm: limit lowmem_reserve Con Kolivas 2006-04-06 1:29 ` Respin: " Con Kolivas @ 2006-04-07 6:25 ` Nick Piggin 2006-04-07 9:02 ` Con Kolivas 1 sibling, 1 reply; 19+ messages in thread From: Nick Piggin @ 2006-04-07 6:25 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm Con Kolivas wrote: > It is possible with a low enough lowmem_reserve ratio to make > zone_watermark_ok always fail if the lower_zone is small enough. I don't see how this would happen? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-04-07 6:25 ` Nick Piggin @ 2006-04-07 9:02 ` Con Kolivas 2006-04-07 12:40 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-04-07 9:02 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm On Friday 07 April 2006 16:25, Nick Piggin wrote: > Con Kolivas wrote: > > It is possible with a low enough lowmem_reserve ratio to make > > zone_watermark_ok always fail if the lower_zone is small enough. > > I don't see how this would happen? 3GB lowmem and a reserve ratio of 180 is enough to do it. Cheers, Con -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-04-07 9:02 ` Con Kolivas @ 2006-04-07 12:40 ` Nick Piggin 2006-04-08 0:15 ` Con Kolivas 0 siblings, 1 reply; 19+ messages in thread From: Nick Piggin @ 2006-04-07 12:40 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm Con Kolivas wrote: > On Friday 07 April 2006 16:25, Nick Piggin wrote: > >>Con Kolivas wrote: >> >>>It is possible with a low enough lowmem_reserve ratio to make >>>zone_watermark_ok always fail if the lower_zone is small enough. >> >>I don't see how this would happen? > > > 3GB lowmem and a reserve ratio of 180 is enough to do it. > How would zone_watermark_ok always fail though? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-04-07 12:40 ` Nick Piggin @ 2006-04-08 0:15 ` Con Kolivas 2006-04-08 0:55 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-04-08 0:15 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm On Friday 07 April 2006 22:40, Nick Piggin wrote: > Con Kolivas wrote: > > On Friday 07 April 2006 16:25, Nick Piggin wrote: > >>Con Kolivas wrote: > >>>It is possible with a low enough lowmem_reserve ratio to make > >>>zone_watermark_ok always fail if the lower_zone is small enough. > >> > >>I don't see how this would happen? > > > > 3GB lowmem and a reserve ratio of 180 is enough to do it. > > How would zone_watermark_ok always fail though? Withdrew this patch a while back; ignore -- -ck -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-04-08 0:15 ` Con Kolivas @ 2006-04-08 0:55 ` Nick Piggin 2006-04-08 1:01 ` Con Kolivas 0 siblings, 1 reply; 19+ messages in thread From: Nick Piggin @ 2006-04-08 0:55 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm Con Kolivas wrote: > On Friday 07 April 2006 22:40, Nick Piggin wrote: > >>How would zone_watermark_ok always fail though? > > > Withdrew this patch a while back; ignore > Well, whether or not that particular patch isa good idea, it is definitely a bug if zone_watermark_ok could ever always fail due to lowmem reserve and we should fix it. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-04-08 0:55 ` Nick Piggin @ 2006-04-08 1:01 ` Con Kolivas 2006-04-08 1:25 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-04-08 1:01 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm On Saturday 08 April 2006 10:55, Nick Piggin wrote: > Con Kolivas wrote: > > On Friday 07 April 2006 22:40, Nick Piggin wrote: > >>How would zone_watermark_ok always fail though? > > > > Withdrew this patch a while back; ignore > > Well, whether or not that particular patch isa good idea, it > is definitely a bug if zone_watermark_ok could ever always > fail due to lowmem reserve and we should fix it. Ok. I think I presented enough information for why I thought zone_watermark_ok would fail (for ZONE_DMA). With 16MB ZONE_DMA and a vmsplit of 3GB we have a lowmem_reserve of 12MB. It's pretty hard to keep that much ZONE_DMA free, I don't think I've ever seen that much free on my ZONE_DMA on an ordinary desktop without any particular ZONE_DMA users. Changing the tunable can make the lowmem_reserve larger than ZONE_DMA is on any vmsplit too as far as I understand the ratio. -- -ck -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-04-08 1:01 ` Con Kolivas @ 2006-04-08 1:25 ` Nick Piggin 2006-05-17 14:11 ` Con Kolivas 0 siblings, 1 reply; 19+ messages in thread From: Nick Piggin @ 2006-04-08 1:25 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm Con Kolivas wrote: > On Saturday 08 April 2006 10:55, Nick Piggin wrote: > >>Con Kolivas wrote: >> >>>On Friday 07 April 2006 22:40, Nick Piggin wrote: >>> >>>>How would zone_watermark_ok always fail though? >>> >>>Withdrew this patch a while back; ignore >> >>Well, whether or not that particular patch isa good idea, it >>is definitely a bug if zone_watermark_ok could ever always >>fail due to lowmem reserve and we should fix it. > > > Ok. I think I presented enough information for why I thought zone_watermark_ok > would fail (for ZONE_DMA). With 16MB ZONE_DMA and a vmsplit of 3GB we have a > lowmem_reserve of 12MB. It's pretty hard to keep that much ZONE_DMA free, I > don't think I've ever seen that much free on my ZONE_DMA on an ordinary > desktop without any particular ZONE_DMA users. Changing the tunable can make > the lowmem_reserve larger than ZONE_DMA is on any vmsplit too as far as I > understand the ratio. > Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that 12MB protection should never come into it (unless it is buggy?). -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-04-08 1:25 ` Nick Piggin @ 2006-05-17 14:11 ` Con Kolivas 2006-05-18 7:11 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-05-17 14:11 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm I hate to resuscitate this old thread, sorry but I'm still not sure we resolved it and I want to make sure this issue isn't here as I see it. On Saturday 08 April 2006 11:25, Nick Piggin wrote: > Con Kolivas wrote: > > Ok. I think I presented enough information for why I thought > > zone_watermark_ok would fail (for ZONE_DMA). With 16MB ZONE_DMA and a > > vmsplit of 3GB we have a lowmem_reserve of 12MB. It's pretty hard to keep > > that much ZONE_DMA free, I don't think I've ever seen that much free on > > my ZONE_DMA on an ordinary desktop without any particular ZONE_DMA users. > > Changing the tunable can make the lowmem_reserve larger than ZONE_DMA is > > on any vmsplit too as far as I understand the ratio. > > Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that > 12MB protection should never come into it (unless it is buggy?). An i386 pc with a 3GB split will have approx 4000 pages ZONE_DMA and lowmem reserve will set lowmem reserve to approx 0 0 3000 3000 So if we call zone_watermark_ok with zone of ZONE_DMA and a classzone_idx of a ZONE_NORMAL we will fail a zone_watermark_ok test almost always since it's almost impossible to have 3000 free ZONE_DMA pages. I believe it can happen like this: In balance_pgdat (vmscan.c:1116) if we end up with end_zone being a ZONE_NORMAL zone, then during the scan below we (vmscan.c:1137) iterate over all zones from 0 to end_zone and (vmscan.c:1147) we end up calling if (!zone_watermark_ok(zone, order, zone->pages_high, end_zone, 0)) which would now call zone_watermark_ok with zone being a ZONE_DMA, and end_zone being the idx of a ZONE_NORMAL. So in summary if I'm not mistaken (and I'm good at being mistaken), if we balance pgdat and find that ZONE_NORMAL or higher needs scanning, we'll end up trying to flush the crap out of ZONE_DMA. On my test case this indeed happens and my ZONE_DMA never goes below 3000 pages free. If I lower the reserve even further my pages free gets stuck at 3208 and can't free any more, and doesn't ever drop below that either. Here is the patch I was proposing --- It is possible with a low enough lowmem_reserve ratio to make zone_watermark_ok fail repeatedly if the lower_zone is small enough. Impose a lower limit on the ratio to only allow 1/4 of the lower_zone size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing the default vmsplit on i386 even without changing the default sysctl values. Signed-off-by: Con Kolivas <kernel@kolivas.org> --- mm/page_alloc.c | 24 +++++++++++++++++++++--- 1 files changed, 21 insertions(+), 3 deletions(-) Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c =================================================================== --- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c 2006-04-06 10:32:31.000000000 +1000 +++ linux-2.6.17-rc1-mm1/mm/page_alloc.c 2006-04-06 11:28:11.000000000 +1000 @@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv zone->lowmem_reserve[j] = 0; for (idx = j-1; idx >= 0; idx--) { + unsigned long max_reserve; + unsigned long reserve; struct zone *lower_zone; + lower_zone = pgdat->node_zones + idx; + /* + * Put an upper limit on the reserve at 1/4 + * the lower_zone size. This prevents large + * zone size differences such as 3G VMSPLIT + * or low sysctl values from making + * zone_watermark_ok always fail. This + * enforces a lower limit on the reserve_ratio + */ + max_reserve = lower_zone->present_pages / 4; + if (sysctl_lowmem_reserve_ratio[idx] < 1) sysctl_lowmem_reserve_ratio[idx] = 1; - - lower_zone = pgdat->node_zones + idx; - lower_zone->lowmem_reserve[j] = present_pages / + reserve = present_pages / sysctl_lowmem_reserve_ratio[idx]; + if (max_reserve && reserve > max_reserve) { + reserve = max_reserve; + sysctl_lowmem_reserve_ratio[idx] = + present_pages / max_reserve; + } + + lower_zone->lowmem_reserve[j] = reserve; present_pages += lower_zone->present_pages; } } -- -ck -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-05-17 14:11 ` Con Kolivas @ 2006-05-18 7:11 ` Nick Piggin 2006-05-18 7:21 ` Con Kolivas 0 siblings, 1 reply; 19+ messages in thread From: Nick Piggin @ 2006-05-18 7:11 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm Con Kolivas wrote: > I hate to resuscitate this old thread, sorry but I'm still not sure we > resolved it and I want to make sure this issue isn't here as I see it. > OK, reclaim is slightly different. > On Saturday 08 April 2006 11:25, Nick Piggin wrote: > >>Con Kolivas wrote: >> >>>Ok. I think I presented enough information for why I thought >>>zone_watermark_ok would fail (for ZONE_DMA). With 16MB ZONE_DMA and a >>>vmsplit of 3GB we have a lowmem_reserve of 12MB. It's pretty hard to keep >>>that much ZONE_DMA free, I don't think I've ever seen that much free on >>>my ZONE_DMA on an ordinary desktop without any particular ZONE_DMA users. >>>Changing the tunable can make the lowmem_reserve larger than ZONE_DMA is >>>on any vmsplit too as far as I understand the ratio. >> >>Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that >>12MB protection should never come into it (unless it is buggy?). > > > An i386 pc with a 3GB split will have approx > > 4000 pages ZONE_DMA > > and lowmem reserve will set lowmem reserve to approx > > 0 0 3000 3000 > > So if we call zone_watermark_ok with zone of ZONE_DMA and a classzone_idx of a > ZONE_NORMAL we will fail a zone_watermark_ok test almost always since it's > almost impossible to have 3000 free ZONE_DMA pages. I believe it can happen > like this: > > In balance_pgdat (vmscan.c:1116) if we end up with end_zone being a > ZONE_NORMAL zone, then during the scan below we (vmscan.c:1137) iterate over > all zones from 0 to end_zone and (vmscan.c:1147) we end up calling > > if (!zone_watermark_ok(zone, order, zone->pages_high, end_zone, 0)) > > which would now call zone_watermark_ok with zone being a ZONE_DMA, and > end_zone being the idx of a ZONE_NORMAL. > > So in summary if I'm not mistaken (and I'm good at being mistaken), if we > balance pgdat and find that ZONE_NORMAL or higher needs scanning, we'll end > up trying to flush the crap out of ZONE_DMA. If we're under memory pressure, kswapd will try to free up any candidate zone, yes. > > On my test case this indeed happens and my ZONE_DMA never goes below 3000 > pages free. If I lower the reserve even further my pages free gets stuck at > 3208 and can't free any more, and doesn't ever drop below that either. > > Here is the patch I was proposing What problem does that fix though? -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-05-18 7:11 ` Nick Piggin @ 2006-05-18 7:21 ` Con Kolivas 2006-05-18 7:26 ` Nick Piggin 0 siblings, 1 reply; 19+ messages in thread From: Con Kolivas @ 2006-05-18 7:21 UTC (permalink / raw) To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm On Thursday 18 May 2006 17:11, Nick Piggin wrote: > If we're under memory pressure, kswapd will try to free up any candidate > zone, yes. > > > On my test case this indeed happens and my ZONE_DMA never goes below 3000 > > pages free. If I lower the reserve even further my pages free gets stuck > > at 3208 and can't free any more, and doesn't ever drop below that either. > > > > Here is the patch I was proposing > > What problem does that fix though? It's a generic concern and I honestly don't know how significant it is which is why I'm asking if it needs attention. That concern being that any time we're under any sort of memory pressure, ZONE_DMA will undergo intense reclaim even though there may not really be anything specifically going on in ZONE_DMA. It just seems a waste of cycles doing that. -- -ck -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [PATCH] mm: limit lowmem_reserve 2006-05-18 7:21 ` Con Kolivas @ 2006-05-18 7:26 ` Nick Piggin 0 siblings, 0 replies; 19+ messages in thread From: Nick Piggin @ 2006-05-18 7:26 UTC (permalink / raw) To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm Con Kolivas wrote: > On Thursday 18 May 2006 17:11, Nick Piggin wrote: > >>If we're under memory pressure, kswapd will try to free up any candidate >>zone, yes. >> >> >>>On my test case this indeed happens and my ZONE_DMA never goes below 3000 >>>pages free. If I lower the reserve even further my pages free gets stuck >>>at 3208 and can't free any more, and doesn't ever drop below that either. >>> >>>Here is the patch I was proposing >> >>What problem does that fix though? > > > It's a generic concern and I honestly don't know how significant it is which > is why I'm asking if it needs attention. That concern being that any time > we're under any sort of memory pressure, ZONE_DMA will undergo intense > reclaim even though there may not really be anything specifically going on in > ZONE_DMA. It just seems a waste of cycles doing that. > If it doesn't have any/much pagecache or slab cache in it, there won't be intense reclaim; if it does then it can be reclaimed and the memory used. reclaim / allocation could be slightly smarter about scaling watermarks, however I don't think it is much of an issue at the moment. -- SUSE Labs, Novell Inc. Send instant messages to your online friends http://au.messenger.yahoo.com -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2006-05-18 7:26 UTC | newest]
Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <200604021401.13331.kernel@kolivas.org>
[not found] ` <200604031248.13532.kernel@kolivas.org>
[not found] ` <200604041235.59876.kernel@kolivas.org>
2006-04-06 1:10 ` [PATCH] mm: limit lowmem_reserve Con Kolivas
2006-04-06 1:29 ` Respin: " Con Kolivas
2006-04-06 2:43 ` Andrew Morton
2006-04-06 2:55 ` Con Kolivas
2006-04-06 2:58 ` Con Kolivas
2006-04-06 3:40 ` Andrew Morton
2006-04-06 4:36 ` Con Kolivas
2006-04-06 4:52 ` Con Kolivas
2006-04-07 6:25 ` Nick Piggin
2006-04-07 9:02 ` Con Kolivas
2006-04-07 12:40 ` Nick Piggin
2006-04-08 0:15 ` Con Kolivas
2006-04-08 0:55 ` Nick Piggin
2006-04-08 1:01 ` Con Kolivas
2006-04-08 1:25 ` Nick Piggin
2006-05-17 14:11 ` Con Kolivas
2006-05-18 7:11 ` Nick Piggin
2006-05-18 7:21 ` Con Kolivas
2006-05-18 7:26 ` Nick Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox