* [PATCH] vmscan: skip freeing memory from zones with lots free
@ 2008-11-28 11:08 Rik van Riel
2008-11-28 11:30 ` Peter Zijlstra
` (2 more replies)
0 siblings, 3 replies; 23+ messages in thread
From: Rik van Riel @ 2008-11-28 11:08 UTC (permalink / raw)
To: linux-mm; +Cc: linux-kernel, KOSAKI Motohiro, akpm
Skip freeing memory from zones that already have lots of free memory.
If one memory zone has harder to free memory, we want to avoid freeing
excessive amounts of memory from other zones, if only because pageout
IO from the other zones can slow down page freeing from the problem zone.
This is similar to the check already done by kswapd in balance_pgdat().
Signed-off-by: Rik van Riel <riel@redhat.com>
---
Kosaki-san, this should address point (3) from your list.
mm/vmscan.c | 3 +++
1 file changed, 3 insertions(+)
Index: linux-2.6.28-rc5/mm/vmscan.c
===================================================================
--- linux-2.6.28-rc5.orig/mm/vmscan.c 2008-11-28 05:53:56.000000000 -0500
+++ linux-2.6.28-rc5/mm/vmscan.c 2008-11-28 06:05:29.000000000 -0500
@@ -1510,6 +1510,9 @@ static unsigned long shrink_zones(int pr
if (zone_is_all_unreclaimable(zone) &&
priority != DEF_PRIORITY)
continue; /* Let kswapd poll it */
+ if (zone_watermark_ok(zone, sc->order,
+ 4*zone->pages_high, high_zoneidx, 0))
+ continue; /* Lots free already */
sc->all_unreclaimable = 0;
} else {
/*
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-28 11:08 [PATCH] vmscan: skip freeing memory from zones with lots free Rik van Riel
@ 2008-11-28 11:30 ` Peter Zijlstra
2008-11-28 22:43 ` Johannes Weiner
2008-11-29 7:19 ` Andrew Morton
2 siblings, 0 replies; 23+ messages in thread
From: Peter Zijlstra @ 2008-11-28 11:30 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm
On Fri, 2008-11-28 at 06:08 -0500, Rik van Riel wrote:
> Skip freeing memory from zones that already have lots of free memory.
> If one memory zone has harder to free memory, we want to avoid freeing
> excessive amounts of memory from other zones, if only because pageout
> IO from the other zones can slow down page freeing from the problem zone.
>
> This is similar to the check already done by kswapd in balance_pgdat().
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
Make sense,
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> ---
> Kosaki-san, this should address point (3) from your list.
>
> mm/vmscan.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: linux-2.6.28-rc5/mm/vmscan.c
> ===================================================================
> --- linux-2.6.28-rc5.orig/mm/vmscan.c 2008-11-28 05:53:56.000000000 -0500
> +++ linux-2.6.28-rc5/mm/vmscan.c 2008-11-28 06:05:29.000000000 -0500
> @@ -1510,6 +1510,9 @@ static unsigned long shrink_zones(int pr
> if (zone_is_all_unreclaimable(zone) &&
> priority != DEF_PRIORITY)
> continue; /* Let kswapd poll it */
> + if (zone_watermark_ok(zone, sc->order,
> + 4*zone->pages_high, high_zoneidx, 0))
> + continue; /* Lots free already */
> sc->all_unreclaimable = 0;
> } else {
> /*
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org. For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-28 11:08 [PATCH] vmscan: skip freeing memory from zones with lots free Rik van Riel
2008-11-28 11:30 ` Peter Zijlstra
@ 2008-11-28 22:43 ` Johannes Weiner
2008-11-29 7:19 ` Andrew Morton
2 siblings, 0 replies; 23+ messages in thread
From: Johannes Weiner @ 2008-11-28 22:43 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro, akpm
On Fri, Nov 28, 2008 at 06:08:03AM -0500, Rik van Riel wrote:
> Skip freeing memory from zones that already have lots of free memory.
> If one memory zone has harder to free memory, we want to avoid freeing
> excessive amounts of memory from other zones, if only because pageout
> IO from the other zones can slow down page freeing from the problem zone.
>
> This is similar to the check already done by kswapd in balance_pgdat().
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
Acked-by: Johannes Weiner <hannes@saeurebad.de>
> ---
> Kosaki-san, this should address point (3) from your list.
>
> mm/vmscan.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: linux-2.6.28-rc5/mm/vmscan.c
> ===================================================================
> --- linux-2.6.28-rc5.orig/mm/vmscan.c 2008-11-28 05:53:56.000000000 -0500
> +++ linux-2.6.28-rc5/mm/vmscan.c 2008-11-28 06:05:29.000000000 -0500
> @@ -1510,6 +1510,9 @@ static unsigned long shrink_zones(int pr
> if (zone_is_all_unreclaimable(zone) &&
> priority != DEF_PRIORITY)
> continue; /* Let kswapd poll it */
> + if (zone_watermark_ok(zone, sc->order,
> + 4*zone->pages_high, high_zoneidx, 0))
> + continue; /* Lots free already */
> sc->all_unreclaimable = 0;
> } else {
> /*
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-28 11:08 [PATCH] vmscan: skip freeing memory from zones with lots free Rik van Riel
2008-11-28 11:30 ` Peter Zijlstra
2008-11-28 22:43 ` Johannes Weiner
@ 2008-11-29 7:19 ` Andrew Morton
2008-11-29 10:55 ` KOSAKI Motohiro
2008-11-29 16:47 ` Rik van Riel
2 siblings, 2 replies; 23+ messages in thread
From: Andrew Morton @ 2008-11-29 7:19 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
On Fri, 28 Nov 2008 06:08:03 -0500 Rik van Riel <riel@redhat.com> wrote:
> Skip freeing memory from zones that already have lots of free memory.
> If one memory zone has harder to free memory, we want to avoid freeing
> excessive amounts of memory from other zones, if only because pageout
> IO from the other zones can slow down page freeing from the problem zone.
>
> This is similar to the check already done by kswapd in balance_pgdat().
>
> Signed-off-by: Rik van Riel <riel@redhat.com>
> ---
> Kosaki-san, this should address point (3) from your list.
>
> mm/vmscan.c | 3 +++
> 1 file changed, 3 insertions(+)
>
> Index: linux-2.6.28-rc5/mm/vmscan.c
> ===================================================================
> --- linux-2.6.28-rc5.orig/mm/vmscan.c 2008-11-28 05:53:56.000000000 -0500
> +++ linux-2.6.28-rc5/mm/vmscan.c 2008-11-28 06:05:29.000000000 -0500
> @@ -1510,6 +1510,9 @@ static unsigned long shrink_zones(int pr
> if (zone_is_all_unreclaimable(zone) &&
> priority != DEF_PRIORITY)
> continue; /* Let kswapd poll it */
> + if (zone_watermark_ok(zone, sc->order,
> + 4*zone->pages_high, high_zoneidx, 0))
> + continue; /* Lots free already */
> sc->all_unreclaimable = 0;
> } else {
> /*
We already tried this, or something very similar in effect, I think...
commit 26e4931632352e3c95a61edac22d12ebb72038fe
Author: akpm <akpm>
Date: Sun Sep 8 19:21:55 2002 +0000
[PATCH] refill the inactive list more quickly
Fix a problem noticed by Ed Tomlinson: under shifting workloads the
shrink_zone() logic will refill the inactive load too slowly.
Bale out of the zone scan when we've reclaimed enough pages. Fixes a
rarely-occurring problem wherein refill_inactive_zone() ends up
shuffling 100,000 pages and generally goes silly.
This needs to be revisited - we should go on and rebalance the lower
zones even if we reclaimed enough pages from highmem.
Then it was reverted a year or two later:
commit 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3
Author: akpm <akpm>
Date: Fri Mar 12 16:23:50 2004 +0000
[PATCH] vmscan: zone balancing fix
We currently have a problem with the balancing of reclaim between zones: much
more reclaim happens against highmem than against lowmem.
This patch partially fixes this by changing the direct reclaim path so it
does not bale out of the zone walk after having reclaimed sufficient pages
from highmem: go on to reclaim from lowmem regardless of how many pages we
reclaimed from lowmem.
My changelog does not adequately explain the reasons.
But we don't want to rediscover these reasons in early 2010 :( Some trolling
of the linux-mm and lkml archives around those dates might help us avoid
a mistake here.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 7:19 ` Andrew Morton
@ 2008-11-29 10:55 ` KOSAKI Motohiro
2008-12-08 13:00 ` KOSAKI Motohiro
2008-11-29 16:47 ` Rik van Riel
1 sibling, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2008-11-29 10:55 UTC (permalink / raw)
To: Andrew Morton; +Cc: kosaki.motohiro, Rik van Riel, linux-mm, linux-kernel
> We already tried this, or something very similar in effect, I think...
>
>
> commit 26e4931632352e3c95a61edac22d12ebb72038fe
> Author: akpm <akpm>
> Date: Sun Sep 8 19:21:55 2002 +0000
>
> [PATCH] refill the inactive list more quickly
>
> Fix a problem noticed by Ed Tomlinson: under shifting workloads the
> shrink_zone() logic will refill the inactive load too slowly.
>
> Bale out of the zone scan when we've reclaimed enough pages. Fixes a
> rarely-occurring problem wherein refill_inactive_zone() ends up
> shuffling 100,000 pages and generally goes silly.
>
> This needs to be revisited - we should go on and rebalance the lower
> zones even if we reclaimed enough pages from highmem.
>
>
>
> Then it was reverted a year or two later:
>
>
> commit 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3
> Author: akpm <akpm>
> Date: Fri Mar 12 16:23:50 2004 +0000
>
> [PATCH] vmscan: zone balancing fix
>
> We currently have a problem with the balancing of reclaim between zones: much
> more reclaim happens against highmem than against lowmem.
>
> This patch partially fixes this by changing the direct reclaim path so it
> does not bale out of the zone walk after having reclaimed sufficient pages
> from highmem: go on to reclaim from lowmem regardless of how many pages we
> reclaimed from lowmem.
>
>
> My changelog does not adequately explain the reasons.
>
> But we don't want to rediscover these reasons in early 2010 :( Some trolling
> of the linux-mm and lkml archives around those dates might help us avoid
> a mistake here.
I hope to digg past discussion archive.
Andrew, plese wait merge this patch awhile.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 7:19 ` Andrew Morton
2008-11-29 10:55 ` KOSAKI Motohiro
@ 2008-11-29 16:47 ` Rik van Riel
2008-11-29 17:45 ` Andrew Morton
1 sibling, 1 reply; 23+ messages in thread
From: Rik van Riel @ 2008-11-29 16:47 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
Andrew Morton wrote:
>> Index: linux-2.6.28-rc5/mm/vmscan.c
>> ===================================================================
>> --- linux-2.6.28-rc5.orig/mm/vmscan.c 2008-11-28 05:53:56.000000000 -0500
>> +++ linux-2.6.28-rc5/mm/vmscan.c 2008-11-28 06:05:29.000000000 -0500
>> @@ -1510,6 +1510,9 @@ static unsigned long shrink_zones(int pr
>> if (zone_is_all_unreclaimable(zone) &&
>> priority != DEF_PRIORITY)
>> continue; /* Let kswapd poll it */
>> + if (zone_watermark_ok(zone, sc->order,
>> + 4*zone->pages_high, high_zoneidx, 0))
>> + continue; /* Lots free already */
>> sc->all_unreclaimable = 0;
>> } else {
>> /*
>
> We already tried this, or something very similar in effect, I think...
Yes, we have a check just like this in balance_pgdat().
It's been there forever with no ill effect.
> commit 26e4931632352e3c95a61edac22d12ebb72038fe
> Author: akpm <akpm>
> Date: Sun Sep 8 19:21:55 2002 +0000
>
> [PATCH] refill the inactive list more quickly
>
> Fix a problem noticed by Ed Tomlinson: under shifting workloads the
> shrink_zone() logic will refill the inactive load too slowly.
>
> Bale out of the zone scan when we've reclaimed enough pages. Fixes a
> rarely-occurring problem wherein refill_inactive_zone() ends up
> shuffling 100,000 pages and generally goes silly.
This is not a bale out, this is a "skip zones that have way
too many free pages already".
Kswapd has been doing this for years already.
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 16:47 ` Rik van Riel
@ 2008-11-29 17:45 ` Andrew Morton
2008-11-29 17:58 ` Rik van Riel
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2008-11-29 17:45 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
On Sat, 29 Nov 2008 11:47:25 -0500 Rik van Riel <riel@redhat.com> wrote:
> Andrew Morton wrote:
>
> >> Index: linux-2.6.28-rc5/mm/vmscan.c
> >> ===================================================================
> >> --- linux-2.6.28-rc5.orig/mm/vmscan.c 2008-11-28 05:53:56.000000000 -0500
> >> +++ linux-2.6.28-rc5/mm/vmscan.c 2008-11-28 06:05:29.000000000 -0500
> >> @@ -1510,6 +1510,9 @@ static unsigned long shrink_zones(int pr
> >> if (zone_is_all_unreclaimable(zone) &&
> >> priority != DEF_PRIORITY)
> >> continue; /* Let kswapd poll it */
> >> + if (zone_watermark_ok(zone, sc->order,
> >> + 4*zone->pages_high, high_zoneidx, 0))
> >> + continue; /* Lots free already */
> >> sc->all_unreclaimable = 0;
> >> } else {
> >> /*
> >
> > We already tried this, or something very similar in effect, I think...
>
> Yes, we have a check just like this in balance_pgdat().
>
> It's been there forever with no ill effect.
This patch affects direct reclaim as well as kswapd.
> > commit 26e4931632352e3c95a61edac22d12ebb72038fe
> > Author: akpm <akpm>
> > Date: Sun Sep 8 19:21:55 2002 +0000
> >
> > [PATCH] refill the inactive list more quickly
> >
> > Fix a problem noticed by Ed Tomlinson: under shifting workloads the
> > shrink_zone() logic will refill the inactive load too slowly.
> >
> > Bale out of the zone scan when we've reclaimed enough pages. Fixes a
> > rarely-occurring problem wherein refill_inactive_zone() ends up
> > shuffling 100,000 pages and generally goes silly.
>
> This is not a bale out, this is a "skip zones that have way
> too many free pages already".
It is similar in effect.
Will this new patch reintroduce the problem which
26e4931632352e3c95a61edac22d12ebb72038fe fixed?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 17:45 ` Andrew Morton
@ 2008-11-29 17:58 ` Rik van Riel
2008-11-29 18:26 ` Andrew Morton
0 siblings, 1 reply; 23+ messages in thread
From: Rik van Riel @ 2008-11-29 17:58 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
Andrew Morton wrote:
> On Sat, 29 Nov 2008 11:47:25 -0500 Rik van Riel <riel@redhat.com> wrote:
>
>> Andrew Morton wrote:
>>
>>>> Index: linux-2.6.28-rc5/mm/vmscan.c
>>>> ===================================================================
>>>> --- linux-2.6.28-rc5.orig/mm/vmscan.c 2008-11-28 05:53:56.000000000 -0500
>>>> +++ linux-2.6.28-rc5/mm/vmscan.c 2008-11-28 06:05:29.000000000 -0500
>>>> @@ -1510,6 +1510,9 @@ static unsigned long shrink_zones(int pr
>>>> if (zone_is_all_unreclaimable(zone) &&
>>>> priority != DEF_PRIORITY)
>>>> continue; /* Let kswapd poll it */
>>>> + if (zone_watermark_ok(zone, sc->order,
>>>> + 4*zone->pages_high, high_zoneidx, 0))
>>>> + continue; /* Lots free already */
>>>> sc->all_unreclaimable = 0;
>>>> } else {
>>>> /*
>>> We already tried this, or something very similar in effect, I think...
>> Yes, we have a check just like this in balance_pgdat().
>>
>> It's been there forever with no ill effect.
>
> This patch affects direct reclaim as well as kswapd.
No, kswapd calls shrink_zone directly from balance_pgdat,
it does not go through shrink_zones.
>>> commit 26e4931632352e3c95a61edac22d12ebb72038fe
>>> Author: akpm <akpm>
>>> Date: Sun Sep 8 19:21:55 2002 +0000
>>>
>>> [PATCH] refill the inactive list more quickly
>>>
>>> Fix a problem noticed by Ed Tomlinson: under shifting workloads the
>>> shrink_zone() logic will refill the inactive load too slowly.
>>>
>>> Bale out of the zone scan when we've reclaimed enough pages. Fixes a
>>> rarely-occurring problem wherein refill_inactive_zone() ends up
>>> shuffling 100,000 pages and generally goes silly.
>> This is not a bale out, this is a "skip zones that have way
>> too many free pages already".
>
> It is similar in effect.
>
> Will this new patch reintroduce the problem which
> 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
Googling on 26e4931632352e3c95a61edac22d12ebb72038fe only finds
your emails with that commit id in it - which git tree do I
need to search to get that changeset?
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 17:58 ` Rik van Riel
@ 2008-11-29 18:26 ` Andrew Morton
2008-11-29 18:41 ` Rik van Riel
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2008-11-29 18:26 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
On Sat, 29 Nov 2008 12:58:32 -0500 Rik van Riel <riel@redhat.com> wrote:
> > Will this new patch reintroduce the problem which
> > 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
>
> Googling on 26e4931632352e3c95a61edac22d12ebb72038fe only finds
> your emails with that commit id in it - which git tree do I
> need to search to get that changeset?
It's the historical git tree. All the pre-2.6.12 history which was
migrated from bitkeeper.
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/old-2.6-bkcvs.git
Spending a couple of fun hours reading `git-log mm/vmscan.c' is pretty
instructive. For some reason that command generates rather a lot of
unrelated changelog info which needs to be manually skipped over.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 18:26 ` Andrew Morton
@ 2008-11-29 18:41 ` Rik van Riel
2008-11-29 18:51 ` Andrew Morton
0 siblings, 1 reply; 23+ messages in thread
From: Rik van Riel @ 2008-11-29 18:41 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
Andrew Morton wrote:
> On Sat, 29 Nov 2008 12:58:32 -0500 Rik van Riel <riel@redhat.com> wrote:
>
>>> Will this new patch reintroduce the problem which
>>> 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
No, that problem is already taken care of by the fact that
active pages always get deactivated in the current VM,
regardless of whether or not they were referenced.
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/old-2.6-bkcvs.git
>
> Spending a couple of fun hours reading `git-log mm/vmscan.c' is pretty
> instructive. For some reason that command generates rather a lot of
> unrelated changelog info which needs to be manually skipped over.
Will do. Thank you for the pointer.
(and not sure why google wouldn't find it - it finds other
git changesets...)
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 18:41 ` Rik van Riel
@ 2008-11-29 18:51 ` Andrew Morton
2008-11-29 18:59 ` Rik van Riel
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2008-11-29 18:51 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
On Sat, 29 Nov 2008 13:41:34 -0500 Rik van Riel <riel@redhat.com> wrote:
> Andrew Morton wrote:
> > On Sat, 29 Nov 2008 12:58:32 -0500 Rik van Riel <riel@redhat.com> wrote:
> >
> >>> Will this new patch reintroduce the problem which
> >>> 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
>
> No, that problem is already taken care of by the fact that
> active pages always get deactivated in the current VM,
> regardless of whether or not they were referenced.
err, sorry, that was the wrong commit.
26e4931632352e3c95a61edac22d12ebb72038fe _introduced_ the problem, as
predicted in the changelog.
265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 later fixed it up.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 18:51 ` Andrew Morton
@ 2008-11-29 18:59 ` Rik van Riel
2008-11-29 20:29 ` Andrew Morton
0 siblings, 1 reply; 23+ messages in thread
From: Rik van Riel @ 2008-11-29 18:59 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
Andrew Morton wrote:
> On Sat, 29 Nov 2008 13:41:34 -0500 Rik van Riel <riel@redhat.com> wrote:
>
>> Andrew Morton wrote:
>>> On Sat, 29 Nov 2008 12:58:32 -0500 Rik van Riel <riel@redhat.com> wrote:
>>>
>>>>> Will this new patch reintroduce the problem which
>>>>> 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
>> No, that problem is already taken care of by the fact that
>> active pages always get deactivated in the current VM,
>> regardless of whether or not they were referenced.
>
> err, sorry, that was the wrong commit.
> 26e4931632352e3c95a61edac22d12ebb72038fe _introduced_ the problem, as
> predicted in the changelog.
>
> 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 later fixed it up.
The patch I sent in this thread does not do any baling out,
it only skips zones where the number of free pages is more
than 4 times zone->pages_high.
Equal pressure is still applied to the other zones.
This should not be a problem since we do not enter direct
reclaim unless the free pages in every zone in our zonelist
are below zone->pages_low.
Zone skipping is only done by tasks that have been in the
direct reclaim code for a long time.
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 18:59 ` Rik van Riel
@ 2008-11-29 20:29 ` Andrew Morton
2008-11-29 21:35 ` Rik van Riel
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2008-11-29 20:29 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
On Sat, 29 Nov 2008 13:59:21 -0500 Rik van Riel <riel@redhat.com> wrote:
> Andrew Morton wrote:
> > On Sat, 29 Nov 2008 13:41:34 -0500 Rik van Riel <riel@redhat.com> wrote:
> >
> >> Andrew Morton wrote:
> >>> On Sat, 29 Nov 2008 12:58:32 -0500 Rik van Riel <riel@redhat.com> wrote:
> >>>
> >>>>> Will this new patch reintroduce the problem which
> >>>>> 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
> >> No, that problem is already taken care of by the fact that
> >> active pages always get deactivated in the current VM,
> >> regardless of whether or not they were referenced.
> >
> > err, sorry, that was the wrong commit.
> > 26e4931632352e3c95a61edac22d12ebb72038fe _introduced_ the problem, as
> > predicted in the changelog.
> >
> > 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 later fixed it up.
>
> The patch I sent in this thread does not do any baling out,
> it only skips zones where the number of free pages is more
> than 4 times zone->pages_high.
But that will have the same effect as baling out. Moreso, in fact.
> Equal pressure is still applied to the other zones.
>
> This should not be a problem since we do not enter direct
> reclaim unless the free pages in every zone in our zonelist
> are below zone->pages_low.
>
> Zone skipping is only done by tasks that have been in the
> direct reclaim code for a long time.
>From 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3:
We currently have a problem with the balancing of reclaim
between zones: much more reclaim happens against highmem than
against lowmem.
This problem will be reintroduced, will it not?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 20:29 ` Andrew Morton
@ 2008-11-29 21:35 ` Rik van Riel
2008-11-29 21:57 ` Andrew Morton
0 siblings, 1 reply; 23+ messages in thread
From: Rik van Riel @ 2008-11-29 21:35 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
Andrew Morton wrote:
> On Sat, 29 Nov 2008 13:59:21 -0500 Rik van Riel <riel@redhat.com> wrote:
>
>> Andrew Morton wrote:
>>> On Sat, 29 Nov 2008 13:41:34 -0500 Rik van Riel <riel@redhat.com> wrote:
>>>
>>>> Andrew Morton wrote:
>>>>> On Sat, 29 Nov 2008 12:58:32 -0500 Rik van Riel <riel@redhat.com> wrote:
>>>>>
>>>>>>> Will this new patch reintroduce the problem which
>>>>>>> 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
>>>> No, that problem is already taken care of by the fact that
>>>> active pages always get deactivated in the current VM,
>>>> regardless of whether or not they were referenced.
>>> err, sorry, that was the wrong commit.
>>> 26e4931632352e3c95a61edac22d12ebb72038fe _introduced_ the problem, as
>>> predicted in the changelog.
>>>
>>> 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 later fixed it up.
>> The patch I sent in this thread does not do any baling out,
>> it only skips zones where the number of free pages is more
>> than 4 times zone->pages_high.
>
> But that will have the same effect as baling out. Moreso, in fact.
Kswapd already does the same in balance_pgdat.
Unequal pressure is sometimes desired, because allocation
pressure is not equal between zones. Having lots of
lowmem allocations should not lead to gigabytes of swapped
out highmem. A numactl pinned application should not cause
memory on other NUMA nodes to be swapped out.
Equal pressure between the zones makes sense when allocation
pressure is similar.
When allocation pressure is different, we have a choice
between evicting potentially useful data from memory or
applying uneven pressure on zones.
>> Equal pressure is still applied to the other zones.
>>
>> This should not be a problem since we do not enter direct
>> reclaim unless the free pages in every zone in our zonelist
>> are below zone->pages_low.
>>
>> Zone skipping is only done by tasks that have been in the
>> direct reclaim code for a long time.
>
>>From 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3:
>
> We currently have a problem with the balancing of reclaim
> between zones: much more reclaim happens against highmem than
> against lowmem.
>
> This problem will be reintroduced, will it not?
We already have that behaviour in balance_pgdat().
We do not do any reclaim on zones higher than the first
zone where the zone_watermark_ok call returns true:
if (!zone_watermark_ok(zone, order, zone->pages_high,
0, 0)) {
end_zone = i;
break;
}
Further down in balance_pgdat(), we skip reclaiming from zones
that have way too much memory free.
/*
* We put equal pressure on every zone, unless one
* zone has way too many pages free already.
*/
if (!zone_watermark_ok(zone, order, 8*zone->pages_high,
end_zone, 0))
shrink_zone(priority, zone, &sc);
All my patch does is add one of these sanity checks to the
direct reclaim path.
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 21:35 ` Rik van Riel
@ 2008-11-29 21:57 ` Andrew Morton
2008-11-29 22:07 ` Rik van Riel
0 siblings, 1 reply; 23+ messages in thread
From: Andrew Morton @ 2008-11-29 21:57 UTC (permalink / raw)
To: Rik van Riel; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
On Sat, 29 Nov 2008 16:35:45 -0500 Rik van Riel <riel@redhat.com> wrote:
> Andrew Morton wrote:
> > On Sat, 29 Nov 2008 13:59:21 -0500 Rik van Riel <riel@redhat.com> wrote:
> >
> >> Andrew Morton wrote:
> >>> On Sat, 29 Nov 2008 13:41:34 -0500 Rik van Riel <riel@redhat.com> wrote:
> >>>
> >>>> Andrew Morton wrote:
> >>>>> On Sat, 29 Nov 2008 12:58:32 -0500 Rik van Riel <riel@redhat.com> wrote:
> >>>>>
> >>>>>>> Will this new patch reintroduce the problem which
> >>>>>>> 26e4931632352e3c95a61edac22d12ebb72038fe fixed?
> >>>> No, that problem is already taken care of by the fact that
> >>>> active pages always get deactivated in the current VM,
> >>>> regardless of whether or not they were referenced.
> >>> err, sorry, that was the wrong commit.
> >>> 26e4931632352e3c95a61edac22d12ebb72038fe _introduced_ the problem, as
> >>> predicted in the changelog.
> >>>
> >>> 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 later fixed it up.
> >> The patch I sent in this thread does not do any baling out,
> >> it only skips zones where the number of free pages is more
> >> than 4 times zone->pages_high.
> >
> > But that will have the same effect as baling out. Moreso, in fact.
>
> Kswapd already does the same in balance_pgdat.
>
> Unequal pressure is sometimes desired, because allocation
> pressure is not equal between zones. Having lots of
> lowmem allocations should not lead to gigabytes of swapped
> out highmem. A numactl pinned application should not cause
> memory on other NUMA nodes to be swapped out.
>
> Equal pressure between the zones makes sense when allocation
> pressure is similar.
>
> When allocation pressure is different, we have a choice
> between evicting potentially useful data from memory or
> applying uneven pressure on zones.
>
> >> Equal pressure is still applied to the other zones.
> >>
> >> This should not be a problem since we do not enter direct
> >> reclaim unless the free pages in every zone in our zonelist
> >> are below zone->pages_low.
> >>
> >> Zone skipping is only done by tasks that have been in the
> >> direct reclaim code for a long time.
> >
> >>From 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3:
> >
> > We currently have a problem with the balancing of reclaim
> > between zones: much more reclaim happens against highmem than
> > against lowmem.
> >
> > This problem will be reintroduced, will it not?
>
> We already have that behaviour in balance_pgdat().
I expect that was the case back in March 2004.
265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 removed the bale-out only for
the direct reclaim path.
> We do not do any reclaim on zones higher than the first
> zone where the zone_watermark_ok call returns true:
>
> if (!zone_watermark_ok(zone, order, zone->pages_high,
> 0, 0)) {
> end_zone = i;
> break;
> }
>
> Further down in balance_pgdat(), we skip reclaiming from zones
> that have way too much memory free.
>
> /*
> * We put equal pressure on every zone, unless one
> * zone has way too many pages free already.
> */
> if (!zone_watermark_ok(zone, order, 8*zone->pages_high,
> end_zone, 0))
> shrink_zone(priority, zone, &sc);
>
> All my patch does is add one of these sanity checks to the
> direct reclaim path.
It's a change in behaviour, not a "sanity check"!
The bottom line here is that we don't fully understand the problem
which 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 fixed, hence we cannot
say whether this proposed change will reintroduce it.
Why did it matter that "much more reclaim happens against highmem than
against lowmem"? What were the observeable effects of this?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 21:57 ` Andrew Morton
@ 2008-11-29 22:07 ` Rik van Riel
0 siblings, 0 replies; 23+ messages in thread
From: Rik van Riel @ 2008-11-29 22:07 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-mm, linux-kernel, KOSAKI Motohiro
Andrew Morton wrote:
> The bottom line here is that we don't fully understand the problem
> which 265b2b8cac1774f5f30c88e0ab8d0bcf794ef7b3 fixed, hence we cannot
> say whether this proposed change will reintroduce it.
>
> Why did it matter that "much more reclaim happens against highmem than
> against lowmem"? What were the observeable effects of this?
On a 1GB system, with 892MB lowmem and 128MB highmem, it could
lead to the page cache coming mostly from highmem. This in turn
would mean that lowmem could have hundreds of megabytes of unused
memory, while large files would not get cached in memory.
Baling out early and not putting any memory pressure on a zone
can lead to problems.
It is important that zones with easily freeable memory get some
extra memory freed, so more allocations go to that zone.
However, we also do not want to go overboard. Kicking potentially
useful data out of memory or causing unnecessary pageout IO is
harmful too.
By doing some amount of extra reclaim in zones with easily
freeable memory means more memory will get allocated from that
zone. Over time this equalizes pressure between zones.
The patch I sent in limits that extra reclaim (extra allocation
space) in easily freeable zones to 4 * zone->pages_high. That
gives the zone extra free space for alloc_pages, while limiting
unnecessary pageout IO and evicting of useful data.
I am pretty sure that we do understand the differences between
that 2004 patch and the code we have today.
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-11-29 10:55 ` KOSAKI Motohiro
@ 2008-12-08 13:00 ` KOSAKI Motohiro
2008-12-08 13:03 ` KOSAKI Motohiro
0 siblings, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2008-12-08 13:00 UTC (permalink / raw)
To: Andrew Morton, Rik van Riel
Cc: kosaki.motohiro, linux-mm, linux-kernel, Peter Zijlstra, Johannes Weiner
Hi
verry sorry for late responce.
> > My changelog does not adequately explain the reasons.
> >
> > But we don't want to rediscover these reasons in early 2010 :( Some trolling
> > of the linux-mm and lkml archives around those dates might help us avoid
> > a mistake here.
>
> I hope to digg past discussion archive.
> Andrew, plese wait merge this patch awhile.
I search past archive patiently.
But unfortunately, I don't find the reason at all.
this reverting fix appeared at 2.6.3-mm3 suddenly.
http://marc.info/?l=linux-kernel&m=107749956707874&w=2
but I don't find related discussion at near month.
So, I guess akpm find anything problem by himself.
Therefore, instead, I'd like to talk about rik patch safeness by mesurement.
1. Checked this patch break reclaim balancing?
run FFSB bench by following conf.
---------------------------------------------------
directio=0
time=300
[filesystem0]
location=/mnt/sdb1/kosaki/ffsb
num_files=20
num_dirs=10
max_filesize=91534338
min_filesize=65535
[end0]
[threadgroup0]
num_threads=10
write_size=2816
write_blocksize=4096
read_size=2816
read_blocksize=4096
create_weight=100
write_weight=30
read_weight=100
[end0]
--------------------------------------------------------
<without patch>
pgscan_kswapd_dma 10624
pgscan_kswapd_normal 20640
-> normal/dma ratio 20640 / 10624 = 1.9
pgscan_direct_dma 576
pgscan_direct_normal 2528
-> normal/dma ratio 2528 / 576 = 4.38
<with patch>
pgscan_kswapd_dma 21824
pgscan_kswapd_normal 47424
-> normal/dma ratio 20640 / 10624 = 2.17
pgscan_direct_dma 1632
pgscan_direct_normal 6912
-> normal/dma ratio 2528 / 576 = 4.23
The reason is simple.
This patch only works following two case.
1) Another process freed large memory in direct reclaim processing.
2) Another process reclaimed large memory in direct reclaim processing.
IOW, its logic doesn't works on typical workload at all.
2. Mesured most benefit case. (IOW, much thread concurrently process swap-out at the same time)
$ ./hackbench 140 process 300 (ten times mesurement)
2.6.28-rc6 +this patch
+bail-out
--------------------------
62.514 29.270
225.698 30.209
114.694 20.881
179.108 19.795
111.080 19.563
189.796 19.226
114.124 13.330
112.999 10.280
227.842 9.669
81.869 10.113
avg 141.972 18.234
std 55.937 7.099
min 62.514 9.669
max 227.842 30.209
-> about 10 times improvement
3. Mesured worst case (much thread without swap)
mesured following three case. (ten times)
$ ./hackbench 125 process 3000
$ ./hackbench 130 process 3000
$ ./hackbench 135 process 3000
2.6.28-rc6
+ evice streaming first + skip freeing memory
+ rvr bail out
+ kosaki bail out improve
nr_group 125 130 135 125 130 135
----------------------------------------------------------
67.302 68.269 77.161 89.450 75.328 173.437
72.616 72.712 79.060 69.843 74.145 76.217
72.475 75.712 77.735 73.531 76.426 85.527
69.229 73.062 78.814 72.472 74.891 75.129
71.551 74.392 78.564 69.423 73.517 75.544
69.227 74.310 78.837 72.543 75.347 79.237
70.759 75.256 76.600 70.477 77.848 90.981
69.966 76.001 78.464 71.792 78.722 92.048
69.068 75.218 80.321 71.313 74.958 78.113
72.057 77.151 79.068 72.306 75.644 79.888
avg 70.425 74.208 78.462 73.315 75.683 90.612
std 1.665 2.348 1.007 5.516 1.514 28.218
min 67.302 68.269 76.600 69.423 73.517 75.129
max 72.616 77.151 80.321 89.450 78.722 173.437
-> 1 - 10% slow down
because zone_watermark_ok() is a bit slow function.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-12-08 13:00 ` KOSAKI Motohiro
@ 2008-12-08 13:03 ` KOSAKI Motohiro
2008-12-08 17:48 ` KOSAKI Motohiro
2008-12-08 20:25 ` Rik van Riel
0 siblings, 2 replies; 23+ messages in thread
From: KOSAKI Motohiro @ 2008-12-08 13:03 UTC (permalink / raw)
To: Andrew Morton, Rik van Riel
Cc: kosaki.motohiro, linux-mm, linux-kernel, Peter Zijlstra,
Johannes Weiner, Christoph Lameter, Nick Piggin
> 2.6.28-rc6
> + evice streaming first + skip freeing memory
> + rvr bail out
> + kosaki bail out improve
>
> nr_group 125 130 135 125 130 135
> ----------------------------------------------------------
> 67.302 68.269 77.161 89.450 75.328 173.437
> 72.616 72.712 79.060 69.843 74.145 76.217
> 72.475 75.712 77.735 73.531 76.426 85.527
> 69.229 73.062 78.814 72.472 74.891 75.129
> 71.551 74.392 78.564 69.423 73.517 75.544
> 69.227 74.310 78.837 72.543 75.347 79.237
> 70.759 75.256 76.600 70.477 77.848 90.981
> 69.966 76.001 78.464 71.792 78.722 92.048
> 69.068 75.218 80.321 71.313 74.958 78.113
> 72.057 77.151 79.068 72.306 75.644 79.888
>
> avg 70.425 74.208 78.462 73.315 75.683 90.612
> std 1.665 2.348 1.007 5.516 1.514 28.218
> min 67.302 68.269 76.600 69.423 73.517 75.129
> max 72.616 77.151 80.321 89.450 78.722 173.437
>
>
> -> 1 - 10% slow down
> because zone_watermark_ok() is a bit slow function.
>
Next, I'd like to talk about why I think the reason is zone_watermark_ok().
I have zone_watermark_ok() improvement patch.
following patch developed for another issue.
However I observed it solve rvr patch performance degression.
<with following patch>
2.6.28-rc6
+ evice streaming first + skip freeing memory
+ rvr bail out + this patch
+ kosaki bail out improve
nr_group 125 130 135 125 130 135
----------------------------------------------------------
67.302 68.269 77.161 68.534 75.733 79.416
72.616 72.712 79.060 70.868 74.264 76.858
72.475 75.712 77.735 73.215 80.278 81.033
69.229 73.062 78.814 70.780 72.518 75.764
71.551 74.392 78.564 69.631 77.252 77.131
69.227 74.310 78.837 72.325 72.723 79.274
70.759 75.256 76.600 70.328 74.046 75.783
69.966 76.001 78.464 69.014 72.566 77.236
69.068 75.218 80.321 68.373 76.447 76.015
72.057 77.151 79.068 74.403 72.794 75.872
avg 70.425 74.208 78.462 70.747 74.862 77.438
std 1.665 2.348 1.007 1.921 2.428 1.752
min 67.302 68.269 76.600 68.373 72.518 75.764
max 72.616 77.151 80.321 74.403 80.278 81.033
-> ok, performance degression disappeared.
===========================
Subject: [PATCH] mm: zone_watermark_ok() doesn't require small fragment block
Currently, zone_watermark_ok() has a bit unfair logic.
example,
Called zone_watermark_ok(zone, 2, pages_min, 0, 0);
pages_min = 64
free pages = 80
case A.
order nr_pages
--------------------
2 5
1 10
0 30
-> zone_watermark_ok() return 1
case B.
order nr_pages
--------------------
3 10
2 0
1 0
0 0
-> zone_watermark_ok() return 0
IOW, current zone_watermark_ok() tend to prefer small fragment block.
If dividing large block to small block by buddy is slow, abeve logic is reasonable.
However its assumption is not formed at all. linux buddy can treat large block efficiently.
In the order aspect, zone_watermark_ok() is called from get_page_from_freelist() everytime.
The get_page_from_freelist() is one of king of fast path.
In general, fast path require to
- if system has much memory, it work as fast as possible.
- if system doesn't have enough memory, it doesn't need to fast processing.
but need to avoid oom as far as possible.
Unfortunately, following loop has reverse performance tendency.
for (o = 0; o < order; o++) {
free_pages -= z->free_area[o].nr_free << o;
min >>= 1;
if (free_pages <= min)
return 0;
}
If the system doesn't have enough memory, above loop bail out soon.
But the system have enough memory, this loop work just number of order times.
This patch change zone_watermark_ok() logic to prefer large contenious block.
Result:
test machine:
CPU: ia64 x 8
MEM: 8GB
benchmark:
$ tbench 8 (three times mesurement)
tbench works between about 600sec.
alloc_pages() and zone_watermark_ok() are called about 15,000,000 times.
2.6.28-rc6 this patch
throughput max-latency throughput max-latency
---------------------------------------------------------
1480.92 20.896 1,490.27 19.606
1483.94 19.202 1,482.86 21.082
1478.93 22.215 1,490.57 23.493
avg 1,481.26 20.771 1,487.90 21.394
std 2.06 1.233 3.56 1.602
min 1,478.93 19.202 1,477.86 19.606
max 1,483.94 22.215 1,490.57 23.493
throughput improve about 5MB/sec. it over measurement wobbly.
Signed-off-by: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
CC: Nick Piggin <npiggin@suse.de>
CC: Christoph Lameter <cl@linux-foundation.org>
---
mm/page_alloc.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
Index: b/mm/page_alloc.c
===================================================================
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1227,7 +1227,7 @@ static inline int should_fail_alloc_page
int zone_watermark_ok(struct zone *z, int order, unsigned long mark,
int classzone_idx, int alloc_flags)
{
- /* free_pages my go negative - that's OK */
+ /* free_pages may go negative - that's OK */
long min = mark;
long free_pages = zone_page_state(z, NR_FREE_PAGES) - (1 << order) + 1;
int o;
@@ -1239,17 +1239,13 @@ int zone_watermark_ok(struct zone *z, in
if (free_pages <= min + z->lowmem_reserve[classzone_idx])
return 0;
- for (o = 0; o < order; o++) {
- /* At the next order, this order's pages become unavailable */
- free_pages -= z->free_area[o].nr_free << o;
- /* Require fewer higher order pages to be free */
- min >>= 1;
-
- if (free_pages <= min)
- return 0;
+ for (o = order; o < MAX_ORDER; o++) {
+ if (z->free_area[o].nr_free)
+ return 1;
}
- return 1;
+
+ return 0;
}
#ifdef CONFIG_NUMA
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-12-08 13:03 ` KOSAKI Motohiro
@ 2008-12-08 17:48 ` KOSAKI Motohiro
2008-12-10 5:07 ` Nick Piggin
2008-12-08 20:25 ` Rik van Riel
1 sibling, 1 reply; 23+ messages in thread
From: KOSAKI Motohiro @ 2008-12-08 17:48 UTC (permalink / raw)
To: Andrew Morton, Rik van Riel
Cc: kosaki.motohiro, linux-mm, linux-kernel, Peter Zijlstra,
Johannes Weiner, Christoph Lameter, Nick Piggin
> example,
>
> Called zone_watermark_ok(zone, 2, pages_min, 0, 0);
> pages_min = 64
> free pages = 80
>
> case A.
>
> order nr_pages
> --------------------
> 2 5
> 1 10
> 0 30
>
> -> zone_watermark_ok() return 1
>
> case B.
>
> order nr_pages
> --------------------
> 3 10
> 2 0
> 1 0
> 0 0
>
> -> zone_watermark_ok() return 0
Doh!
this example is obiously buggy.
I guess Mr. KOSAKI is very silly or Idiot.
I recommend to he get feathery blanket and good sleeping, instead
black black coffee ;-)
...but below mesurement result still true.
> This patch change zone_watermark_ok() logic to prefer large contenious block.
>
>
> Result:
>
> test machine:
> CPU: ia64 x 8
> MEM: 8GB
>
> benchmark:
> $ tbench 8 (three times mesurement)
>
> tbench works between about 600sec.
> alloc_pages() and zone_watermark_ok() are called about 15,000,000 times.
>
>
> 2.6.28-rc6 this patch
>
> throughput max-latency throughput max-latency
> ---------------------------------------------------------
> 1480.92 20.896 1,490.27 19.606
> 1483.94 19.202 1,482.86 21.082
> 1478.93 22.215 1,490.57 23.493
>
> avg 1,481.26 20.771 1,487.90 21.394
> std 2.06 1.233 3.56 1.602
> min 1,478.93 19.202 1,477.86 19.606
> max 1,483.94 22.215 1,490.57 23.493
>
>
> throughput improve about 5MB/sec. it over measurement wobbly.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-12-08 13:03 ` KOSAKI Motohiro
2008-12-08 17:48 ` KOSAKI Motohiro
@ 2008-12-08 20:25 ` Rik van Riel
2008-12-10 5:09 ` Nick Piggin
2008-12-12 5:50 ` KOSAKI Motohiro
1 sibling, 2 replies; 23+ messages in thread
From: Rik van Riel @ 2008-12-08 20:25 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrew Morton, linux-mm, linux-kernel, Peter Zijlstra,
Johannes Weiner, Christoph Lameter, Nick Piggin
KOSAKI Motohiro wrote:
> + for (o = order; o < MAX_ORDER; o++) {
> + if (z->free_area[o].nr_free)
> + return 1;
Since page breakup and coalescing always manipulates .nr_free,
I wonder if it would make sense to pack the nr_free variables
in their own cache line(s), so we have fewer cache misses when
going through zone_watermark_ok() ?
That would end up looking something like this:
(whitespace mangled because it doesn't make sense to apply
just this thing, anyway)
Index: linux-2.6.28-rc7/include/linux/mmzone.h
===================================================================
--- linux-2.6.28-rc7.orig/include/linux/mmzone.h 2008-12-02
15:04:33.000000000 -0500
+++ linux-2.6.28-rc7/include/linux/mmzone.h 2008-12-08
15:24:25.000000000 -0500
@@ -58,7 +58,6 @@ static inline int get_pageblock_migratet
struct free_area {
struct list_head free_list[MIGRATE_TYPES];
- unsigned long nr_free;
};
struct pglist_data;
@@ -296,6 +295,7 @@ struct zone {
seqlock_t span_seqlock;
#endif
struct free_area free_area[MAX_ORDER];
+ struct nr_free [MAX_ORDER];
#ifndef CONFIG_SPARSEMEM
/*
--
All rights reversed.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-12-08 17:48 ` KOSAKI Motohiro
@ 2008-12-10 5:07 ` Nick Piggin
0 siblings, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2008-12-10 5:07 UTC (permalink / raw)
To: KOSAKI Motohiro
Cc: Andrew Morton, Rik van Riel, linux-mm, linux-kernel,
Peter Zijlstra, Johannes Weiner, Christoph Lameter
On Tue, Dec 09, 2008 at 02:48:40AM +0900, KOSAKI Motohiro wrote:
> > example,
> >
> > Called zone_watermark_ok(zone, 2, pages_min, 0, 0);
> > pages_min = 64
> > free pages = 80
> >
> > case A.
> >
> > order nr_pages
> > --------------------
> > 2 5
> > 1 10
> > 0 30
> >
> > -> zone_watermark_ok() return 1
> >
> > case B.
> >
> > order nr_pages
> > --------------------
> > 3 10
> > 2 0
> > 1 0
> > 0 0
> >
> > -> zone_watermark_ok() return 0
>
> Doh!
> this example is obiously buggy.
>
> I guess Mr. KOSAKI is very silly or Idiot.
> I recommend to he get feathery blanket and good sleeping, instead
> black black coffee ;-)
:) No, actually it is always good to have people reviewing existing
code, so thank you for that.
> ...but below mesurement result still true.
And it is an interesting result. As far as I can see, your patch changes
zone_watermark_ok so that it avoids some watermark checking for higher
order page blocks? I am surprised it makes a noticable difference in
performance, however such a change would be slightly detrimental to
atomic and "emergency" allocations of higher order pages, wouldn't it?
It would be interesting to know where the higher order allocations are
coming from. Do packets over loopback device still do higher order
allocations? If so, I suspect this is a bit artificial.
>
> > This patch change zone_watermark_ok() logic to prefer large contenious block.
> >
> >
> > Result:
> >
> > test machine:
> > CPU: ia64 x 8
> > MEM: 8GB
> >
> > benchmark:
> > $ tbench 8 (three times mesurement)
> >
> > tbench works between about 600sec.
> > alloc_pages() and zone_watermark_ok() are called about 15,000,000 times.
> >
> >
> > 2.6.28-rc6 this patch
> >
> > throughput max-latency throughput max-latency
> > ---------------------------------------------------------
> > 1480.92 20.896 1,490.27 19.606
> > 1483.94 19.202 1,482.86 21.082
> > 1478.93 22.215 1,490.57 23.493
> >
> > avg 1,481.26 20.771 1,487.90 21.394
> > std 2.06 1.233 3.56 1.602
> > min 1,478.93 19.202 1,477.86 19.606
> > max 1,483.94 22.215 1,490.57 23.493
> >
> >
> > throughput improve about 5MB/sec. it over measurement wobbly.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-12-08 20:25 ` Rik van Riel
@ 2008-12-10 5:09 ` Nick Piggin
2008-12-12 5:50 ` KOSAKI Motohiro
1 sibling, 0 replies; 23+ messages in thread
From: Nick Piggin @ 2008-12-10 5:09 UTC (permalink / raw)
To: Rik van Riel
Cc: KOSAKI Motohiro, Andrew Morton, linux-mm, linux-kernel,
Peter Zijlstra, Johannes Weiner, Christoph Lameter
On Mon, Dec 08, 2008 at 03:25:10PM -0500, Rik van Riel wrote:
> KOSAKI Motohiro wrote:
>
> >+ for (o = order; o < MAX_ORDER; o++) {
> >+ if (z->free_area[o].nr_free)
> >+ return 1;
>
> Since page breakup and coalescing always manipulates .nr_free,
> I wonder if it would make sense to pack the nr_free variables
> in their own cache line(s), so we have fewer cache misses when
> going through zone_watermark_ok() ?
For order-0 allocations, they should not be touched at all. For
higher order allocations in performance critical paths, we should
try to fix those to use order-0 ;)
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [PATCH] vmscan: skip freeing memory from zones with lots free
2008-12-08 20:25 ` Rik van Riel
2008-12-10 5:09 ` Nick Piggin
@ 2008-12-12 5:50 ` KOSAKI Motohiro
1 sibling, 0 replies; 23+ messages in thread
From: KOSAKI Motohiro @ 2008-12-12 5:50 UTC (permalink / raw)
To: Rik van Riel
Cc: kosaki.motohiro, Andrew Morton, linux-mm, linux-kernel,
Peter Zijlstra, Johannes Weiner, Christoph Lameter, Nick Piggin
> Index: linux-2.6.28-rc7/include/linux/mmzone.h
> ===================================================================
> --- linux-2.6.28-rc7.orig/include/linux/mmzone.h 2008-12-02
> 15:04:33.000000000 -0500
> +++ linux-2.6.28-rc7/include/linux/mmzone.h 2008-12-08
> 15:24:25.000000000 -0500
> @@ -58,7 +58,6 @@ static inline int get_pageblock_migratet
>
> struct free_area {
> struct list_head free_list[MIGRATE_TYPES];
> - unsigned long nr_free;
> };
>
> struct pglist_data;
> @@ -296,6 +295,7 @@ struct zone {
> seqlock_t span_seqlock;
> #endif
> struct free_area free_area[MAX_ORDER];
> + struct nr_free [MAX_ORDER];
>
> #ifndef CONFIG_SPARSEMEM
> /*
mesurement result:
% tbench 8
2.6.28-rc6 +rvr free area restructure
throughput max latency throughput max latency
------------------------------------------------------------
1480.920 20.896 742.470 30.401
1483.940 19.202 791.648 635.623
1478.930 22.215 733.433 92.515
avg 1481.263 20.771 755.850 252.846
std 2.060 1.233 25.580 271.849
min 1478.930 19.202 733.433 30.401
max 1483.940 22.215 791.648 635.623
I think nick is right. I drop this idea.
^ permalink raw reply [flat|nested] 23+ messages in thread
end of thread, other threads:[~2008-12-12 5:50 UTC | newest]
Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-11-28 11:08 [PATCH] vmscan: skip freeing memory from zones with lots free Rik van Riel
2008-11-28 11:30 ` Peter Zijlstra
2008-11-28 22:43 ` Johannes Weiner
2008-11-29 7:19 ` Andrew Morton
2008-11-29 10:55 ` KOSAKI Motohiro
2008-12-08 13:00 ` KOSAKI Motohiro
2008-12-08 13:03 ` KOSAKI Motohiro
2008-12-08 17:48 ` KOSAKI Motohiro
2008-12-10 5:07 ` Nick Piggin
2008-12-08 20:25 ` Rik van Riel
2008-12-10 5:09 ` Nick Piggin
2008-12-12 5:50 ` KOSAKI Motohiro
2008-11-29 16:47 ` Rik van Riel
2008-11-29 17:45 ` Andrew Morton
2008-11-29 17:58 ` Rik van Riel
2008-11-29 18:26 ` Andrew Morton
2008-11-29 18:41 ` Rik van Riel
2008-11-29 18:51 ` Andrew Morton
2008-11-29 18:59 ` Rik van Riel
2008-11-29 20:29 ` Andrew Morton
2008-11-29 21:35 ` Rik van Riel
2008-11-29 21:57 ` Andrew Morton
2008-11-29 22:07 ` Rik van Riel
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox