From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51])
	by kanga.kvack.org (Postfix) with SMTP id AEBF26B0085
	for <linux-mm@kvack.org>; Tue,  8 Sep 2009 18:39:57 -0400 (EDT)
Date: Tue, 8 Sep 2009 15:39:59 -0700 (PDT)
From: Vincent Li <macli@brc.ubc.ca>
Subject: Re: [RESEND][PATCH V1] mm/vsmcan: check shrink_active_list()
 sc->isolate_pages() return value.
In-Reply-To: <20090908132100.GA17446@csn.ul.ie>
Message-ID: <alpine.DEB.2.00.0909081516550.3524@kernelhack.brc.ubc.ca>
References: <1251935365-7044-1-git-send-email-macli@brc.ubc.ca> <20090903140602.e0169ffc.akpm@linux-foundation.org> <alpine.DEB.2.00.0909031458160.5762@kernelhack.brc.ubc.ca> <20090903154704.da62dd76.akpm@linux-foundation.org>
 <alpine.DEB.2.00.0909041431370.32680@kernelhack.brc.ubc.ca> <20090904165305.c19429ce.akpm@linux-foundation.org> <20090908132100.GA17446@csn.ul.ie>
MIME-Version: 1.0
Content-Type: TEXT/PLAIN; charset=US-ASCII
Sender: owner-linux-mm@kvack.org
To: Mel Gorman <mel@csn.ul.ie>
Cc: Andrew Morton <akpm@linux-foundation.org>, kosaki.motohiro@jp.fujitsu.com, riel@redhat.com, minchan.kim@gmail.com, fengguang.wu@intel.com, linux-mm@kvack.org
List-ID: <linux-mm.kvack.org>

On Tue, 8 Sep 2009, Mel Gorman wrote:

> On Fri, Sep 04, 2009 at 04:53:05PM -0700, Andrew Morton wrote:
> > On Fri, 4 Sep 2009 14:39:32 -0700 (PDT)
> > Vincent Li <macli@brc.ubc.ca> wrote:
> > 
> > Well you want to count two things: 1: how many times nr_taken==0 and 2:
> > how many times nr_taken!=0.
> > 
> 
> Indeed. I'm not aware of the specifics that led to this patch, but minimally
> one would be interested in the exact value of nr_taken as it can be used to
> answer more than one question.
> 
> > > Then I got test result with:
> > > 
> > > root@kernelhack:/usr/src/mmotm-0903# perf  stat --repeat 5  -e \ 
> > > kmem:mm_vmscan_isolate_pages hackbench 100
> > > 
> > > Running with 100*40 (== 4000) tasks.
> > > Time: 52.736
> > > Running with 100*40 (== 4000) tasks.
> > > Time: 64.982
> > > Running with 100*40 (== 4000) tasks.
> > > Time: 56.866
> > > Running with 100*40 (== 4000) tasks.
> > > Time: 37.137
> > > Running with 100*40 (== 4000) tasks.
> > > Time: 48.415
> > > 
> > >  Performance counter stats for 'hackbench 100' (5 runs):
> > > 
> > >           14189  kmem:mm_vmscan_isolate_pages   ( +-   9.084% )
> > > 
> > >    52.680621973  seconds time elapsed   ( +-   0.689% )
> > > 
> > > Is the testing patch written write? I don't understand what the number 
> > > 14189 means? Does it make any sense?
> > 
> 
> Broadly speaking
> 
> "For each of the 5 runs of hackbench, there were 14189 times the
> kmem:mm_vmscan_isolate_pages was sampled  +/- 9.084%"
> 
> Without knowing how many times nr_taken_zero was positive, it's
> difficult to tell whether 14189 is common or not.
> 
> > I don't think you need nr_taken_zeros at all.  You'd want something like
> > 
> > 	if (nr_taken == 0)
> > 		trace_mm_vmscan_nr_taken_zero();
> > 	else
> > 		trace_mm_vmscan_nr_taken_nonzero();
> > 
> > which would pointlessly generate a huge stream of events which would
> > have to be added up downstream, which is dumb.
> > 
> 
> Dumb it might be, but perf acts as that aggregator. For the purposes of
> debugging, it would be fine although it would not be a very suitable pair
> of events to merge to mainline. A more sensible trace point for mainline
> would record what nr_taken was so a higher-level tool could answer the zero
> vs non-zero question or optionally do things like figure out how many pages
> were being taken of the lists and being put back.
> 
> For this question though, use the two tracepoints with no additional parameters
> and have perf how many times each event occurred.

Thank you for the explaintion. I am not sure I follow your discussion 
correctly, no additional parameters means something like:

TRACE_EVENT(mm_vmscan_nr_taken_zero,
       TP_PROTO( ),
       TP_ARGS( ),
       TP_STRUCT__entry( ),
       TP_fast_assign( ),
       TP_printk( )
);

? which looks strange to me and does not compile. I guess that is not what 
you mean.

I ended up with a following patch:

----PATCH BEGIN---

---
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index eaf46bd..1f9e7bf 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -388,6 +388,42 @@ TRACE_EVENT(mm_page_alloc_extfrag,
 		__entry->alloc_migratetype == __entry->fallback_migratetype)
 );
 
+TRACE_EVENT(mm_vmscan_nr_taken_zero,
+
+	TP_PROTO(unsigned long nr_taken),
+
+	TP_ARGS(nr_taken),
+
+	TP_STRUCT__entry(
+		__field(        unsigned long,          nr_taken        )
+	),
+
+	TP_fast_assign(
+		__entry->nr_taken       = nr_taken;
+	),
+
+	TP_printk("nr_taken=%lu",
+	__entry->nr_taken)
+);
+
+TRACE_EVENT(mm_vmscan_nr_taken_nonzero,
+
+	TP_PROTO(unsigned long nr_taken),
+
+	TP_ARGS(nr_taken),
+
+	TP_STRUCT__entry(
+		__field(        unsigned long,          nr_taken        )
+	),
+
+	TP_fast_assign(
+		__entry->nr_taken       = nr_taken;
+	),
+
+	TP_printk("nr_taken=%lu",
+		__entry->nr_taken)
+);
+
 #endif /* _TRACE_KMEM_H */
 
 /* This part must be outside protection */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ad93096..eec4099 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -40,6 +40,7 @@
 #include <linux/memcontrol.h>
 #include <linux/delayacct.h>
 #include <linux/sysctl.h>
+#include <trace/events/kmem.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -1322,7 +1323,9 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 	__count_zone_vm_events(PGREFILL, zone, pgscanned);
 
 	if (nr_taken == 0)
-		goto done;
+		trace_mm_vmscan_nr_taken_zero(nr_taken);
+	else
+		trace_mm_vmscan_nr_taken_nonzero(nr_taken);
 
 	reclaim_stat->recent_scanned[file] += nr_taken;
 	if (file)
@@ -1388,7 +1391,6 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone,
 							nr_rotated);
 	__mod_zone_page_state(zone, NR_INACTIVE_ANON + file * LRU_FILE,
 							nr_deactivated);
-done:
 	spin_unlock_irq(&zone->lru_lock);
 }

----PATCH END---
 
/usr/src/mmotm-0903# perf stat --repeat 5 -e kmem:mm_vmscan_nr_taken_zero \
 -e kmem:mm_vmscan_nr_taken_nonzero hackbench 100

Running with 100*40 (== 4000) tasks.
Time: 41.599
Running with 100*40 (== 4000) tasks.
Time: 80.192
Running with 100*40 (== 4000) tasks.
Time: 26.451
Running with 100*40 (== 4000) tasks.
Time: 65.428
Running with 100*40 (== 4000) tasks.
Time: 30.054

 Performance counter stats for 'hackbench 100' (5 runs):

          10330  kmem:mm_vmscan_nr_taken_zero   ( +-  11.732% )
           2601  kmem:mm_vmscan_nr_taken_nonzero   ( +-  10.876% )

   49.509328260  seconds time elapsed   ( +-   0.934% )

Sampling of nr_taken_zero is way bigger than sampling of nr_taken_nonzero 
in the 5 hackbench runs. I thought the sampling result would be the 
opposite. 

Maybe I get it all wrong :-).


> 
> > I don't know if the tracing code is capable of maintaining the counters
> > for you.  Perhaps you _do_ need nr_taken_zeros.  In which case you want
> > 
> > 	if (nr_taken == 0) {
> > 		nr_taken_zeros++;
> > 		trace_mm_vmscan_isolate_pages_zero(nr_taken_zeros);
> > 	} else {
> > 		nr_taken_nonzeros++;
> > 		trace_mm_vmscan_isolate_pages_nonzero(nr_taken_nonzeros);
> > 	}
> > 
> > which is awkward.  Mel will know.
> > 
> 
> I am not aware of a way of maintaining counters within tracing. The difficulty
> is figuring out when that event should be recorded. Worse, the count is
> being aggregated for all processes that enter this path so there is noise
> in the value that cannot be filtered out. An aggregated counter like this
> makes sense when reporting to /proc or for discovering via kgdb, but less
> suitable for tracing.
> 
> > > > 
> > > > The way I used to do stuff like this is:
> > > > 
> > > > int akpm1;
> > > > int akpm2;
> > > > 
> > > > 	...
> > > > 	if (nr_taken)
> > > > 		akpm1++;
> > > > 	else
> > > > 		akpm2++;
> > > > 
> > > > then inspect the values of akpm1 and akpm2 in the running kernel using kgdb.
> > 
> > That's looking more attractive ;)
> > 
> 
> It's simplier but it's global in nature so it's harder to figure out how much
> of the counter is for the target workload and how much of it is everything
> else. The main advantage of using the two tracepoints is that you know exactly
> how many of the events were due to hackbench and not the rest of the system.
> 
> -- 
> Mel Gorman
> Part-time Phd Student                          Linux Technology Center
> University of Limerick                         IBM Dublin Software Lab
> 
> 

Vincent Li
Biomedical Research Center
University of British Columbia

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>