From: Mel Gorman <mel@csn.ul.ie>
To: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
Frans Pop <elendil@planet.nl>, Jiri Kosina <jkosina@suse.cz>,
Sven Geggus <lists@fuchsschwanzdomain.de>,
Karol Lewandowski <karol.k.lewandowski@gmail.com>,
Tobias Oetiker <tobi@oetiker.ch>,
linux-kernel@vger.kernel.org,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
Pekka Enberg <penberg@cs.helsinki.fi>,
Rik van Riel <riel@redhat.com>,
Christoph Lameter <cl@linux-foundation.org>,
Stephan von Krawczynski <skraw@ithnet.com>,
"Rafael J. Wysocki" <rjw@sisk.pl>,
Kernel Testers List <kernel-testers@vger.kernel.org>
Subject: Re: [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep
Date: Fri, 13 Nov 2009 14:13:03 +0000 [thread overview]
Message-ID: <20091113141303.GI29804@csn.ul.ie> (raw)
In-Reply-To: <20091113142558.33B6.A69D9226@jp.fujitsu.com>
On Fri, Nov 13, 2009 at 07:43:09PM +0900, KOSAKI Motohiro wrote:
> > After kswapd balances all zones in a pgdat, it goes to sleep. In the event
> > of no IO congestion, kswapd can go to sleep very shortly after the high
> > watermark was reached. If there are a constant stream of allocations from
> > parallel processes, it can mean that kswapd went to sleep too quickly and
> > the high watermark is not being maintained for sufficient length time.
> >
> > This patch makes kswapd go to sleep as a two-stage process. It first
> > tries to sleep for HZ/10. If it is woken up by another process or the
> > high watermark is no longer met, it's considered a premature sleep and
> > kswapd continues work. Otherwise it goes fully to sleep.
> >
> > This adds more counters to distinguish between fast and slow breaches of
> > watermarks. A "fast" premature sleep is one where the low watermark was
> > hit in a very short time after kswapd going to sleep. A "slow" premature
> > sleep indicates that the high watermark was breached after a very short
> > interval.
> >
> > Signed-off-by: Mel Gorman <mel@csn.ul.ie>
>
> Why do you submit this patch to mainline? this is debugging patch
> no more and no less.
>
Do you mean the stats part? The stats are included until such time as the page
allocator failure reports stop or are significantly reduced. In the event a
report is received, the value of the counters help determine if kswapd was
struggling or not. They should be removed once this mess is ironed out.
If there is a preference, I can split out the stats part and send it to
people with page allocator failure reports for retesting.
>
> > ---
> > include/linux/vmstat.h | 1 +
> > mm/vmscan.c | 44 ++++++++++++++++++++++++++++++++++++++++++--
> > mm/vmstat.c | 2 ++
> > 3 files changed, 45 insertions(+), 2 deletions(-)
> >
> > diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
> > index 2d0f222..9716003 100644
> > --- a/include/linux/vmstat.h
> > +++ b/include/linux/vmstat.h
> > @@ -40,6 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
> > PGSCAN_ZONE_RECLAIM_FAILED,
> > #endif
> > PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
> > + KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW,
> > PAGEOUTRUN, ALLOCSTALL, PGROTATED,
>
> Please don't use the word of "premature" and "fast". it is too hard to understand the meanings.
How about KSWAPD_LOW_WMARK_HIT_QUICKLY and
KSWAPD_HIGH_WMARK_HIT_QUICKLY?
> Plus, please use per-zone stastics (like NUMA_HIT).
>
Is that not overkill? The intention is just to track in general terms if
kswapd is struggling or not. I hadn't been considering the problem on a
per-zone basis to date.
> >
> > #ifdef CONFIG_HUGETLB_PAGE
> > HTLB_BUDDY_PGALLOC, HTLB_BUDDY_PGALLOC_FAIL,
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index 190bae1..ffa1766 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -1904,6 +1904,24 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
> > }
> > #endif
> >
> > +/* is kswapd sleeping prematurely? */
> > +static int sleeping_prematurely(int order, long remaining)
> > +{
> > + struct zone *zone;
> > +
> > + /* If a direct reclaimer woke kswapd within HZ/10, it's premature */
> > + if (remaining)
> > + return 1;
> > +
> > + /* If after HZ/10, a zone is below the high mark, it's premature */
> > + for_each_populated_zone(zone)
> > + if (!zone_watermark_ok(zone, order, high_wmark_pages(zone),
> > + 0, 0))
> > + return 1;
>
> for_each_populated_zone() iterate all populated zone. but kswapd shuld't see another node.
>
Good spot.
How about the following?
==== CUT HERE ====
vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1
This patch is a fix and a claritifacation to the patch "vmscan: Have
kswapd sleep for a short interval and double check it should be asleep".
The fix is for kswapd to only check zones in the node it is responsible
for. The clarification is to rename two counters to better explain what is
being counted.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
include/linux/vmstat.h | 2 +-
mm/vmscan.c | 20 +++++++++++++-------
mm/vmstat.c | 4 ++--
3 files changed, 16 insertions(+), 10 deletions(-)
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index 7d66695..0591a48 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -40,7 +40,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
PGSCAN_ZONE_RECLAIM_FAILED,
#endif
PGINODESTEAL, SLABS_SCANNED, KSWAPD_STEAL, KSWAPD_INODESTEAL,
- KSWAPD_PREMATURE_FAST, KSWAPD_PREMATURE_SLOW,
+ KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
KSWAPD_NO_CONGESTION_WAIT,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
#ifdef CONFIG_HUGETLB_PAGE
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 70967e1..5557555 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -1905,19 +1905,25 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *mem_cont,
#endif
/* is kswapd sleeping prematurely? */
-static int sleeping_prematurely(int order, long remaining)
+static int sleeping_prematurely(pg_data_t *pgdat, int order, long remaining)
{
- struct zone *zone;
+ int i;
/* If a direct reclaimer woke kswapd within HZ/10, it's premature */
if (remaining)
return 1;
/* If after HZ/10, a zone is below the high mark, it's premature */
- for_each_populated_zone(zone)
+ for (i = 0; i < pgdat->nr_zones; i++) {
+ struct zone *zone = pgdat->node_zones + i;
+
+ if (!populated_zone(zone))
+ continue;
+
if (!zone_watermark_ok(zone, order, high_wmark_pages(zone),
0, 0))
return 1;
+ }
return 0;
}
@@ -2221,7 +2227,7 @@ static int kswapd(void *p)
long remaining = 0;
/* Try to sleep for a short interval */
- if (!sleeping_prematurely(order, remaining)) {
+ if (!sleeping_prematurely(pgdat, order, remaining)) {
remaining = schedule_timeout(HZ/10);
finish_wait(&pgdat->kswapd_wait, &wait);
prepare_to_wait(&pgdat->kswapd_wait, &wait, TASK_INTERRUPTIBLE);
@@ -2232,13 +2238,13 @@ static int kswapd(void *p)
* premature sleep. If not, then go fully
* to sleep until explicitly woken up
*/
- if (!sleeping_prematurely(order, remaining))
+ if (!sleeping_prematurely(pgdat, order, remaining))
schedule();
else {
if (remaining)
- count_vm_event(KSWAPD_PREMATURE_FAST);
+ count_vm_event(KSWAPD_LOW_WMARK_HIT_QUICKLY);
else
- count_vm_event(KSWAPD_PREMATURE_SLOW);
+ count_vm_event(KSWAPD_HIGH_WMARK_HIT_QUICKLY);
}
}
diff --git a/mm/vmstat.c b/mm/vmstat.c
index bc09547..6cc8dc6 100644
--- a/mm/vmstat.c
+++ b/mm/vmstat.c
@@ -683,8 +683,8 @@ static const char * const vmstat_text[] = {
"slabs_scanned",
"kswapd_steal",
"kswapd_inodesteal",
- "kswapd_slept_prematurely_fast",
- "kswapd_slept_prematurely_slow",
+ "kswapd_low_wmark_hit_quickly",
+ "kswapd_high_wmark_hit_quickly",
"kswapd_no_congestion_wait",
"pageoutrun",
"allocstall",
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2009-11-13 14:13 UTC|newest]
Thread overview: 57+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-11-12 19:30 [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Mel Gorman
2009-11-12 19:30 ` [PATCH 1/5] page allocator: Always wake kswapd when restarting an allocation attempt after direct reclaim failed Mel Gorman
2009-11-13 5:23 ` KOSAKI Motohiro
2009-11-13 13:55 ` Mel Gorman
2009-11-12 19:30 ` [PATCH 2/5] page allocator: Do not allow interrupts to use ALLOC_HARDER Mel Gorman
2009-11-13 5:24 ` KOSAKI Motohiro
2009-11-13 13:56 ` Mel Gorman
2009-11-12 19:30 ` [PATCH 3/5] page allocator: Wait on both sync and async congestion after direct reclaim Mel Gorman
2009-11-13 11:20 ` KOSAKI Motohiro
2009-11-13 11:55 ` Jens Axboe
2009-11-13 12:28 ` Mel Gorman
2009-11-13 13:32 ` Jens Axboe
2009-11-13 13:41 ` Pekka Enberg
2009-11-13 15:22 ` Chris Mason
2009-11-13 14:16 ` Mel Gorman
2009-11-20 14:56 ` Mel Gorman
2009-11-12 19:30 ` [PATCH 4/5] vmscan: Have kswapd sleep for a short interval and double check it should be asleep Mel Gorman
2009-11-13 10:43 ` KOSAKI Motohiro
2009-11-13 14:13 ` Mel Gorman [this message]
2009-11-13 18:00 ` KOSAKI Motohiro
2009-11-13 18:17 ` Mel Gorman
2009-11-14 9:34 ` KOSAKI Motohiro
2009-11-14 15:46 ` Mel Gorman
2009-11-17 11:03 ` KOSAKI Motohiro
2009-11-17 11:44 ` Mel Gorman
2009-11-17 12:18 ` KOSAKI Motohiro
2009-11-17 12:25 ` Mel Gorman
2009-11-18 5:20 ` KOSAKI Motohiro
2009-11-17 10:34 ` [PATCH] vmscan: Have kswapd sleep for a short interval and double check it should be asleep fix 1 Mel Gorman
2009-11-18 5:27 ` KOSAKI Motohiro
2009-11-12 19:30 ` [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble Mel Gorman
2009-11-13 9:54 ` KOSAKI Motohiro
2009-11-13 13:54 ` Mel Gorman
2009-11-13 14:48 ` Minchan Kim
2009-11-13 18:00 ` KOSAKI Motohiro
2009-11-13 18:15 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met Mel Gorman
2009-11-13 18:26 ` Frans Pop
2009-11-13 18:33 ` KOSAKI Motohiro
2009-11-13 20:03 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2 Mel Gorman
2009-11-26 14:45 ` Tobias Oetiker
2009-11-29 7:42 ` still getting allocation failures (was Re: [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met V2) Tobi Oetiker
2009-12-02 11:32 ` Mel Gorman
2009-12-02 21:30 ` Tobias Oetiker
2009-12-03 20:26 ` Corrado Zoccolo
2009-12-14 5:59 ` Tobias Oetiker
2009-12-14 8:49 ` Corrado Zoccolo
2009-11-13 18:36 ` [PATCH] vmscan: Stop kswapd waiting on congestion when the min watermark is not being met Rik van Riel
2009-11-13 14:38 ` [PATCH 5/5] vmscan: Take order into consideration when deciding if kswapd is in trouble Minchan Kim
2009-11-13 12:41 ` Minchan Kim
2009-11-13 9:04 ` [PATCH 0/5] Reduce GFP_ATOMIC allocation failures, candidate fix V3 Frans Pop
2009-11-16 17:57 ` Mel Gorman
2009-11-13 12:47 ` Tobias Oetiker
2009-11-13 13:37 ` Mel Gorman
2009-11-15 12:07 ` Karol Lewandowski
2009-11-16 9:52 ` Mel Gorman
2009-11-16 12:08 ` Karol Lewandowski
2009-11-16 14:32 ` Karol Lewandowski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20091113141303.GI29804@csn.ul.ie \
--to=mel@csn.ul.ie \
--cc=akpm@linux-foundation.org \
--cc=cl@linux-foundation.org \
--cc=elendil@planet.nl \
--cc=jkosina@suse.cz \
--cc=karol.k.lewandowski@gmail.com \
--cc=kernel-testers@vger.kernel.org \
--cc=kosaki.motohiro@jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lists@fuchsschwanzdomain.de \
--cc=penberg@cs.helsinki.fi \
--cc=riel@redhat.com \
--cc=rjw@sisk.pl \
--cc=skraw@ithnet.com \
--cc=tobi@oetiker.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox