linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: limit lowmem_reserve
       [not found]   ` <200604041235.59876.kernel@kolivas.org>
@ 2006-04-06  1:10     ` Con Kolivas
  2006-04-06  1:29       ` Respin: " Con Kolivas
  2006-04-07  6:25       ` Nick Piggin
  0 siblings, 2 replies; 19+ messages in thread
From: Con Kolivas @ 2006-04-06  1:10 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ck, Nick Piggin, linux list, linux-mm

It is possible with a low enough lowmem_reserve ratio to make
zone_watermark_ok always fail if the lower_zone is small enough.
Impose a lower limit on the ratio to only allow 1/4 of the lower_zone
size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing
the default vmsplit on i386 even without changing the default sysctl values.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 mm/page_alloc.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c	2006-04-06 10:32:31.000000000 +1000
+++ linux-2.6.17-rc1-mm1/mm/page_alloc.c	2006-04-06 11:09:17.000000000 +1000
@@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv
 			zone->lowmem_reserve[j] = 0;
 
 			for (idx = j-1; idx >= 0; idx--) {
+				unsigned long max_reserve;
+				unsigned long reserve;
 				struct zone *lower_zone;
 
+				lower_zone = pgdat->node_zones + idx;
+				/*
+				 * Put an upper limit on the reserve at 1/4
+				 * the lower_zone size. This prevents large
+				 * zone size differences such as 3G VMSPLIT
+				 * or low sysctl values from making
+				 * zone_watermark_ok always fail. This
+				 * enforces a lower limit on the reserve_ratio
+				 */
+				max_reserve = lower_zone->present_pages / 4;
+
 				if (sysctl_lowmem_reserve_ratio[idx] < 1)
 					sysctl_lowmem_reserve_ratio[idx] = 1;
-
-				lower_zone = pgdat->node_zones + idx;
-				lower_zone->lowmem_reserve[j] = present_pages /
+				reserve = present_pages /
 					sysctl_lowmem_reserve_ratio[idx];
+				if (reserve > max_reserve) {
+					reserve = max_reserve;
+					sysctl_lowmem_reserve_ratio[idx] =
+						present_pages / max_reserve;
+				}
+
+				lower_zone->lowmem_reserve[j] = reserve;
 				present_pages += lower_zone->present_pages;
 			}
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  1:10     ` [PATCH] mm: limit lowmem_reserve Con Kolivas
@ 2006-04-06  1:29       ` Con Kolivas
  2006-04-06  2:43         ` Andrew Morton
  2006-04-07  6:25       ` Nick Piggin
  1 sibling, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-04-06  1:29 UTC (permalink / raw)
  To: linux-kernel; +Cc: Andrew Morton, ck, Nick Piggin, linux-mm

Err zone needs to have some pages too sorry.

Respin
---
It is possible with a low enough lowmem_reserve ratio to make
zone_watermark_ok fail repeatedly if the lower_zone is small enough.
Impose a lower limit on the ratio to only allow 1/4 of the lower_zone
size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing
the default vmsplit on i386 even without changing the default sysctl values.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 mm/page_alloc.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c	2006-04-06 10:32:31.000000000 +1000
+++ linux-2.6.17-rc1-mm1/mm/page_alloc.c	2006-04-06 11:28:11.000000000 +1000
@@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv
 			zone->lowmem_reserve[j] = 0;
 
 			for (idx = j-1; idx >= 0; idx--) {
+				unsigned long max_reserve;
+				unsigned long reserve;
 				struct zone *lower_zone;
 
+				lower_zone = pgdat->node_zones + idx;
+				/*
+				 * Put an upper limit on the reserve at 1/4
+				 * the lower_zone size. This prevents large
+				 * zone size differences such as 3G VMSPLIT
+				 * or low sysctl values from making
+				 * zone_watermark_ok always fail. This
+				 * enforces a lower limit on the reserve_ratio
+				 */
+				max_reserve = lower_zone->present_pages / 4;
+
 				if (sysctl_lowmem_reserve_ratio[idx] < 1)
 					sysctl_lowmem_reserve_ratio[idx] = 1;
-
-				lower_zone = pgdat->node_zones + idx;
-				lower_zone->lowmem_reserve[j] = present_pages /
+				reserve = present_pages /
 					sysctl_lowmem_reserve_ratio[idx];
+				if (max_reserve && reserve > max_reserve) {
+					reserve = max_reserve;
+					sysctl_lowmem_reserve_ratio[idx] =
+						present_pages / max_reserve;
+				}
+
+				lower_zone->lowmem_reserve[j] = reserve;
 				present_pages += lower_zone->present_pages;
 			}
 		}

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  1:29       ` Respin: " Con Kolivas
@ 2006-04-06  2:43         ` Andrew Morton
  2006-04-06  2:55           ` Con Kolivas
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2006-04-06  2:43 UTC (permalink / raw)
  To: Con Kolivas; +Cc: linux-kernel, ck, nickpiggin, linux-mm

Con Kolivas <kernel@kolivas.org> wrote:
>
> It is possible with a low enough lowmem_reserve ratio to make
>  zone_watermark_ok fail repeatedly if the lower_zone is small enough.

Is that actually a problem?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  2:43         ` Andrew Morton
@ 2006-04-06  2:55           ` Con Kolivas
  2006-04-06  2:58             ` Con Kolivas
  0 siblings, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-04-06  2:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, ck, nickpiggin, linux-mm

On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > It is possible with a low enough lowmem_reserve ratio to make
> >  zone_watermark_ok fail repeatedly if the lower_zone is small enough.
>
> Is that actually a problem?

Every single call to get_page_from_freelist will call on zone reclaim. It 
seems a problem to me if every call to __alloc_pages will do that?

Cheers,
Con

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  2:55           ` Con Kolivas
@ 2006-04-06  2:58             ` Con Kolivas
  2006-04-06  3:40               ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-04-06  2:58 UTC (permalink / raw)
  To: ck; +Cc: Andrew Morton, nickpiggin, linux-kernel, linux-mm

On Thursday 06 April 2006 12:55, Con Kolivas wrote:
> On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> > Con Kolivas <kernel@kolivas.org> wrote:
> > > It is possible with a low enough lowmem_reserve ratio to make
> > >  zone_watermark_ok fail repeatedly if the lower_zone is small enough.
> >
> > Is that actually a problem?
>
> Every single call to get_page_from_freelist will call on zone reclaim. It
> seems a problem to me if every call to __alloc_pages will do that?

every call to __alloc_pages of that zone I mean

Cheers,
Con

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  2:58             ` Con Kolivas
@ 2006-04-06  3:40               ` Andrew Morton
  2006-04-06  4:36                 ` Con Kolivas
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2006-04-06  3:40 UTC (permalink / raw)
  To: Con Kolivas; +Cc: ck, nickpiggin, linux-kernel, linux-mm

Con Kolivas <kernel@kolivas.org> wrote:
>
> On Thursday 06 April 2006 12:55, Con Kolivas wrote:
> > On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> > > Con Kolivas <kernel@kolivas.org> wrote:
> > > > It is possible with a low enough lowmem_reserve ratio to make
> > > >  zone_watermark_ok fail repeatedly if the lower_zone is small enough.
> > >
> > > Is that actually a problem?
> >
> > Every single call to get_page_from_freelist will call on zone reclaim. It
> > seems a problem to me if every call to __alloc_pages will do that?
> 
> every call to __alloc_pages of that zone I mean
> 

One would need to check with the NUMA guys.  zone_reclaim() has a
(lame-looking) timer in there to prevent it from doing too much work.

That, or I'm missing something.  This problem wasn't particularly well
described, sorry.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  3:40               ` Andrew Morton
@ 2006-04-06  4:36                 ` Con Kolivas
  2006-04-06  4:52                   ` Con Kolivas
  0 siblings, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-04-06  4:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ck, nickpiggin, linux-kernel, linux-mm

On Thursday 06 April 2006 13:40, Andrew Morton wrote:
> Con Kolivas <kernel@kolivas.org> wrote:
> > On Thursday 06 April 2006 12:55, Con Kolivas wrote:
> > > On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> > > > Con Kolivas <kernel@kolivas.org> wrote:
> > > > > It is possible with a low enough lowmem_reserve ratio to make
> > > > >  zone_watermark_ok fail repeatedly if the lower_zone is small
> > > > > enough.
> > > >
> > > > Is that actually a problem?
> > >
> > > Every single call to get_page_from_freelist will call on zone reclaim.
> > > It seems a problem to me if every call to __alloc_pages will do that?
> >
> > every call to __alloc_pages of that zone I mean
>
> One would need to check with the NUMA guys.  zone_reclaim() has a
> (lame-looking) timer in there to prevent it from doing too much work.
>
> That, or I'm missing something.  This problem wasn't particularly well
> described, sorry.

Ah ok. This all came about because I'm trying to honour the lowmem_reserve 
better in swap_prefetch at Nick's request. It's hard to honour a watermark 
that on some configurations is never reached.

Cheers,
Con

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Respin: [PATCH] mm: limit lowmem_reserve
  2006-04-06  4:36                 ` Con Kolivas
@ 2006-04-06  4:52                   ` Con Kolivas
  0 siblings, 0 replies; 19+ messages in thread
From: Con Kolivas @ 2006-04-06  4:52 UTC (permalink / raw)
  To: ck; +Cc: Andrew Morton, nickpiggin, linux-kernel, linux-mm

On Thursday 06 April 2006 14:36, Con Kolivas wrote:
> On Thursday 06 April 2006 13:40, Andrew Morton wrote:
> > Con Kolivas <kernel@kolivas.org> wrote:
> > > On Thursday 06 April 2006 12:55, Con Kolivas wrote:
> > > > On Thursday 06 April 2006 12:43, Andrew Morton wrote:
> > > > > Con Kolivas <kernel@kolivas.org> wrote:
> > > > > > It is possible with a low enough lowmem_reserve ratio to make
> > > > > >  zone_watermark_ok fail repeatedly if the lower_zone is small
> > > > > > enough.
> > > > >
> > > > > Is that actually a problem?
> > > >
> > > > Every single call to get_page_from_freelist will call on zone
> > > > reclaim. It seems a problem to me if every call to __alloc_pages will
> > > > do that?
> > >
> > > every call to __alloc_pages of that zone I mean
> >
> > One would need to check with the NUMA guys.  zone_reclaim() has a
> > (lame-looking) timer in there to prevent it from doing too much work.
> >
> > That, or I'm missing something.  This problem wasn't particularly well
> > described, sorry.
>
> Ah ok. This all came about because I'm trying to honour the lowmem_reserve
> better in swap_prefetch at Nick's request. It's hard to honour a watermark
> that on some configurations is never reached.

Forget that. If the numa people don't care about it I shouldn't touch it. I 
thought I was doing something helpful at the source but got no response from 
Nick or the the other numa_ids out there so they obviously don't care. I'll 
tackle it differently in swap prefetch.

Cheers,
Con

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-06  1:10     ` [PATCH] mm: limit lowmem_reserve Con Kolivas
  2006-04-06  1:29       ` Respin: " Con Kolivas
@ 2006-04-07  6:25       ` Nick Piggin
  2006-04-07  9:02         ` Con Kolivas
  1 sibling, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-04-07  6:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> It is possible with a low enough lowmem_reserve ratio to make
> zone_watermark_ok always fail if the lower_zone is small enough.

I don't see how this would happen?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-07  6:25       ` Nick Piggin
@ 2006-04-07  9:02         ` Con Kolivas
  2006-04-07 12:40           ` Nick Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-04-07  9:02 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

On Friday 07 April 2006 16:25, Nick Piggin wrote:
> Con Kolivas wrote:
> > It is possible with a low enough lowmem_reserve ratio to make
> > zone_watermark_ok always fail if the lower_zone is small enough.
>
> I don't see how this would happen?

3GB lowmem and a reserve ratio of 180 is enough to do it.

Cheers,
Con

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-07  9:02         ` Con Kolivas
@ 2006-04-07 12:40           ` Nick Piggin
  2006-04-08  0:15             ` Con Kolivas
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-04-07 12:40 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> On Friday 07 April 2006 16:25, Nick Piggin wrote:
> 
>>Con Kolivas wrote:
>>
>>>It is possible with a low enough lowmem_reserve ratio to make
>>>zone_watermark_ok always fail if the lower_zone is small enough.
>>
>>I don't see how this would happen?
> 
> 
> 3GB lowmem and a reserve ratio of 180 is enough to do it.
> 

How would zone_watermark_ok always fail though?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-07 12:40           ` Nick Piggin
@ 2006-04-08  0:15             ` Con Kolivas
  2006-04-08  0:55               ` Nick Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-04-08  0:15 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

On Friday 07 April 2006 22:40, Nick Piggin wrote:
> Con Kolivas wrote:
> > On Friday 07 April 2006 16:25, Nick Piggin wrote:
> >>Con Kolivas wrote:
> >>>It is possible with a low enough lowmem_reserve ratio to make
> >>>zone_watermark_ok always fail if the lower_zone is small enough.
> >>
> >>I don't see how this would happen?
> >
> > 3GB lowmem and a reserve ratio of 180 is enough to do it.
>
> How would zone_watermark_ok always fail though?

Withdrew this patch a while back; ignore

-- 
-ck

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-08  0:15             ` Con Kolivas
@ 2006-04-08  0:55               ` Nick Piggin
  2006-04-08  1:01                 ` Con Kolivas
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-04-08  0:55 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> On Friday 07 April 2006 22:40, Nick Piggin wrote:
> 

>>How would zone_watermark_ok always fail though?
> 
> 
> Withdrew this patch a while back; ignore
> 

Well, whether or not that particular patch isa good idea, it
is definitely a bug if zone_watermark_ok could ever always
fail due to lowmem reserve and we should fix it.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-08  0:55               ` Nick Piggin
@ 2006-04-08  1:01                 ` Con Kolivas
  2006-04-08  1:25                   ` Nick Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-04-08  1:01 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

On Saturday 08 April 2006 10:55, Nick Piggin wrote:
> Con Kolivas wrote:
> > On Friday 07 April 2006 22:40, Nick Piggin wrote:
> >>How would zone_watermark_ok always fail though?
> >
> > Withdrew this patch a while back; ignore
>
> Well, whether or not that particular patch isa good idea, it
> is definitely a bug if zone_watermark_ok could ever always
> fail due to lowmem reserve and we should fix it.

Ok. I think I presented enough information for why I thought zone_watermark_ok 
would fail (for ZONE_DMA). With 16MB ZONE_DMA and a vmsplit of 3GB we have a 
lowmem_reserve of 12MB. It's pretty hard to keep that much ZONE_DMA free, I 
don't think I've ever seen that much free on my ZONE_DMA on an ordinary 
desktop without any particular ZONE_DMA users. Changing the tunable can make 
the lowmem_reserve larger than ZONE_DMA is on any vmsplit too as far as I 
understand the ratio.

-- 
-ck

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-08  1:01                 ` Con Kolivas
@ 2006-04-08  1:25                   ` Nick Piggin
  2006-05-17 14:11                     ` Con Kolivas
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-04-08  1:25 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> On Saturday 08 April 2006 10:55, Nick Piggin wrote:
> 
>>Con Kolivas wrote:
>>
>>>On Friday 07 April 2006 22:40, Nick Piggin wrote:
>>>
>>>>How would zone_watermark_ok always fail though?
>>>
>>>Withdrew this patch a while back; ignore
>>
>>Well, whether or not that particular patch isa good idea, it
>>is definitely a bug if zone_watermark_ok could ever always
>>fail due to lowmem reserve and we should fix it.
> 
> 
> Ok. I think I presented enough information for why I thought zone_watermark_ok 
> would fail (for ZONE_DMA). With 16MB ZONE_DMA and a vmsplit of 3GB we have a 
> lowmem_reserve of 12MB. It's pretty hard to keep that much ZONE_DMA free, I 
> don't think I've ever seen that much free on my ZONE_DMA on an ordinary 
> desktop without any particular ZONE_DMA users. Changing the tunable can make 
> the lowmem_reserve larger than ZONE_DMA is on any vmsplit too as far as I 
> understand the ratio.
> 

Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that
12MB protection should never come into it (unless it is buggy?).

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-04-08  1:25                   ` Nick Piggin
@ 2006-05-17 14:11                     ` Con Kolivas
  2006-05-18  7:11                       ` Nick Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-05-17 14:11 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

I hate to resuscitate this old thread, sorry but I'm still not sure we 
resolved it and I want to make sure this issue isn't here as I see it.

On Saturday 08 April 2006 11:25, Nick Piggin wrote:
> Con Kolivas wrote:
> > Ok. I think I presented enough information for why I thought
> > zone_watermark_ok would fail (for ZONE_DMA). With 16MB ZONE_DMA and a
> > vmsplit of 3GB we have a lowmem_reserve of 12MB. It's pretty hard to keep
> > that much ZONE_DMA free, I don't think I've ever seen that much free on
> > my ZONE_DMA on an ordinary desktop without any particular ZONE_DMA users.
> > Changing the tunable can make the lowmem_reserve larger than ZONE_DMA is
> > on any vmsplit too as far as I understand the ratio.
>
> Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that
> 12MB protection should never come into it (unless it is buggy?).

An i386 pc with a 3GB split will have approx

4000 pages ZONE_DMA

and lowmem reserve will set lowmem reserve to approx

0 0 3000 3000

So if we call zone_watermark_ok with zone of ZONE_DMA and a classzone_idx of a 
ZONE_NORMAL we will fail a zone_watermark_ok test almost always since it's 
almost impossible to have 3000 free ZONE_DMA pages. I believe it can happen 
like this:

In balance_pgdat (vmscan.c:1116) if we end up with end_zone being a 
ZONE_NORMAL zone, then during the scan below we (vmscan.c:1137) iterate over 
all zones from 0 to end_zone and (vmscan.c:1147) we end up calling

if (!zone_watermark_ok(zone, order, zone->pages_high, end_zone, 0))

which would now call zone_watermark_ok with zone being a ZONE_DMA, and 
end_zone being the idx of a ZONE_NORMAL.

So in summary if I'm not mistaken (and I'm good at being mistaken), if we 
balance pgdat and find that ZONE_NORMAL or higher needs scanning, we'll end 
up trying to flush the crap out of ZONE_DMA.

On my test case this indeed happens and my ZONE_DMA never goes below 3000
pages free. If I lower the reserve even further my pages free gets stuck at
3208 and can't free any more, and doesn't ever drop below that either.

Here is the patch I was proposing

---
It is possible with a low enough lowmem_reserve ratio to make
zone_watermark_ok fail repeatedly if the lower_zone is small enough.
Impose a lower limit on the ratio to only allow 1/4 of the lower_zone
size to be set as lowmem_reserve. This limit is hit in ZONE_DMA by changing
the default vmsplit on i386 even without changing the default sysctl values.

Signed-off-by: Con Kolivas <kernel@kolivas.org>

---
 mm/page_alloc.c |   24 +++++++++++++++++++++---
 1 files changed, 21 insertions(+), 3 deletions(-)

Index: linux-2.6.17-rc1-mm1/mm/page_alloc.c
===================================================================
--- linux-2.6.17-rc1-mm1.orig/mm/page_alloc.c	2006-04-06 10:32:31.000000000 +1000
+++ linux-2.6.17-rc1-mm1/mm/page_alloc.c	2006-04-06 11:28:11.000000000 +1000
@@ -2566,14 +2566,32 @@ static void setup_per_zone_lowmem_reserv
 			zone->lowmem_reserve[j] = 0;
 
 			for (idx = j-1; idx >= 0; idx--) {
+				unsigned long max_reserve;
+				unsigned long reserve;
 				struct zone *lower_zone;
 
+				lower_zone = pgdat->node_zones + idx;
+				/*
+				 * Put an upper limit on the reserve at 1/4
+				 * the lower_zone size. This prevents large
+				 * zone size differences such as 3G VMSPLIT
+				 * or low sysctl values from making
+				 * zone_watermark_ok always fail. This
+				 * enforces a lower limit on the reserve_ratio
+				 */
+				max_reserve = lower_zone->present_pages / 4;
+
 				if (sysctl_lowmem_reserve_ratio[idx] < 1)
 					sysctl_lowmem_reserve_ratio[idx] = 1;
-
-				lower_zone = pgdat->node_zones + idx;
-				lower_zone->lowmem_reserve[j] = present_pages /
+				reserve = present_pages /
 					sysctl_lowmem_reserve_ratio[idx];
+				if (max_reserve && reserve > max_reserve) {
+					reserve = max_reserve;
+					sysctl_lowmem_reserve_ratio[idx] =
+						present_pages / max_reserve;
+				}
+
+				lower_zone->lowmem_reserve[j] = reserve;
 				present_pages += lower_zone->present_pages;
 			}
 		}


-- 
-ck

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-05-17 14:11                     ` Con Kolivas
@ 2006-05-18  7:11                       ` Nick Piggin
  2006-05-18  7:21                         ` Con Kolivas
  0 siblings, 1 reply; 19+ messages in thread
From: Nick Piggin @ 2006-05-18  7:11 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> I hate to resuscitate this old thread, sorry but I'm still not sure we 
> resolved it and I want to make sure this issue isn't here as I see it.
> 

OK, reclaim is slightly different.

> On Saturday 08 April 2006 11:25, Nick Piggin wrote:
> 
>>Con Kolivas wrote:
>>
>>>Ok. I think I presented enough information for why I thought
>>>zone_watermark_ok would fail (for ZONE_DMA). With 16MB ZONE_DMA and a
>>>vmsplit of 3GB we have a lowmem_reserve of 12MB. It's pretty hard to keep
>>>that much ZONE_DMA free, I don't think I've ever seen that much free on
>>>my ZONE_DMA on an ordinary desktop without any particular ZONE_DMA users.
>>>Changing the tunable can make the lowmem_reserve larger than ZONE_DMA is
>>>on any vmsplit too as far as I understand the ratio.
>>
>>Umm, for ZONE_DMA allocations, ZONE_DMA isn't a lower zone. So that
>>12MB protection should never come into it (unless it is buggy?).
> 
> 
> An i386 pc with a 3GB split will have approx
> 
> 4000 pages ZONE_DMA
> 
> and lowmem reserve will set lowmem reserve to approx
> 
> 0 0 3000 3000
> 
> So if we call zone_watermark_ok with zone of ZONE_DMA and a classzone_idx of a 
> ZONE_NORMAL we will fail a zone_watermark_ok test almost always since it's 
> almost impossible to have 3000 free ZONE_DMA pages. I believe it can happen 
> like this:
> 
> In balance_pgdat (vmscan.c:1116) if we end up with end_zone being a 
> ZONE_NORMAL zone, then during the scan below we (vmscan.c:1137) iterate over 
> all zones from 0 to end_zone and (vmscan.c:1147) we end up calling
> 
> if (!zone_watermark_ok(zone, order, zone->pages_high, end_zone, 0))
> 
> which would now call zone_watermark_ok with zone being a ZONE_DMA, and 
> end_zone being the idx of a ZONE_NORMAL.
> 
> So in summary if I'm not mistaken (and I'm good at being mistaken), if we 
> balance pgdat and find that ZONE_NORMAL or higher needs scanning, we'll end 
> up trying to flush the crap out of ZONE_DMA.

If we're under memory pressure, kswapd will try to free up any candidate
zone, yes.

> 
> On my test case this indeed happens and my ZONE_DMA never goes below 3000
> pages free. If I lower the reserve even further my pages free gets stuck at
> 3208 and can't free any more, and doesn't ever drop below that either.
> 
> Here is the patch I was proposing

What problem does that fix though?

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-05-18  7:11                       ` Nick Piggin
@ 2006-05-18  7:21                         ` Con Kolivas
  2006-05-18  7:26                           ` Nick Piggin
  0 siblings, 1 reply; 19+ messages in thread
From: Con Kolivas @ 2006-05-18  7:21 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Andrew Morton, ck, linux list, linux-mm

On Thursday 18 May 2006 17:11, Nick Piggin wrote:
> If we're under memory pressure, kswapd will try to free up any candidate
> zone, yes.
>
> > On my test case this indeed happens and my ZONE_DMA never goes below 3000
> > pages free. If I lower the reserve even further my pages free gets stuck
> > at 3208 and can't free any more, and doesn't ever drop below that either.
> >
> > Here is the patch I was proposing
>
> What problem does that fix though?

It's a generic concern and I honestly don't know how significant it is which 
is why I'm asking if it needs attention. That concern being that any time 
we're under any sort of memory pressure, ZONE_DMA will undergo intense 
reclaim even though there may not really be anything specifically going on in 
ZONE_DMA. It just seems a waste of cycles doing that.

-- 
-ck

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] mm: limit lowmem_reserve
  2006-05-18  7:21                         ` Con Kolivas
@ 2006-05-18  7:26                           ` Nick Piggin
  0 siblings, 0 replies; 19+ messages in thread
From: Nick Piggin @ 2006-05-18  7:26 UTC (permalink / raw)
  To: Con Kolivas; +Cc: Andrew Morton, ck, linux list, linux-mm

Con Kolivas wrote:
> On Thursday 18 May 2006 17:11, Nick Piggin wrote:
> 
>>If we're under memory pressure, kswapd will try to free up any candidate
>>zone, yes.
>>
>>
>>>On my test case this indeed happens and my ZONE_DMA never goes below 3000
>>>pages free. If I lower the reserve even further my pages free gets stuck
>>>at 3208 and can't free any more, and doesn't ever drop below that either.
>>>
>>>Here is the patch I was proposing
>>
>>What problem does that fix though?
> 
> 
> It's a generic concern and I honestly don't know how significant it is which 
> is why I'm asking if it needs attention. That concern being that any time 
> we're under any sort of memory pressure, ZONE_DMA will undergo intense 
> reclaim even though there may not really be anything specifically going on in 
> ZONE_DMA. It just seems a waste of cycles doing that.
> 

If it doesn't have any/much pagecache or slab cache in it, there won't be
intense reclaim; if it does then it can be reclaimed and the memory used.

reclaim / allocation could be slightly smarter about scaling watermarks,
however I don't think it is much of an issue at the moment.

-- 
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2006-05-18  7:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <200604021401.13331.kernel@kolivas.org>
     [not found] ` <200604031248.13532.kernel@kolivas.org>
     [not found]   ` <200604041235.59876.kernel@kolivas.org>
2006-04-06  1:10     ` [PATCH] mm: limit lowmem_reserve Con Kolivas
2006-04-06  1:29       ` Respin: " Con Kolivas
2006-04-06  2:43         ` Andrew Morton
2006-04-06  2:55           ` Con Kolivas
2006-04-06  2:58             ` Con Kolivas
2006-04-06  3:40               ` Andrew Morton
2006-04-06  4:36                 ` Con Kolivas
2006-04-06  4:52                   ` Con Kolivas
2006-04-07  6:25       ` Nick Piggin
2006-04-07  9:02         ` Con Kolivas
2006-04-07 12:40           ` Nick Piggin
2006-04-08  0:15             ` Con Kolivas
2006-04-08  0:55               ` Nick Piggin
2006-04-08  1:01                 ` Con Kolivas
2006-04-08  1:25                   ` Nick Piggin
2006-05-17 14:11                     ` Con Kolivas
2006-05-18  7:11                       ` Nick Piggin
2006-05-18  7:21                         ` Con Kolivas
2006-05-18  7:26                           ` Nick Piggin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox