From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail203.messagelabs.com (mail203.messagelabs.com [216.82.254.243]) by kanga.kvack.org (Postfix) with ESMTP id A604D6B007E for ; Tue, 8 Sep 2009 09:20:55 -0400 (EDT) Date: Tue, 8 Sep 2009 14:21:00 +0100 From: Mel Gorman Subject: Re: [RESEND][PATCH V1] mm/vsmcan: check shrink_active_list() sc->isolate_pages() return value. Message-ID: <20090908132100.GA17446@csn.ul.ie> References: <1251935365-7044-1-git-send-email-macli@brc.ubc.ca> <20090903140602.e0169ffc.akpm@linux-foundation.org> <20090903154704.da62dd76.akpm@linux-foundation.org> <20090904165305.c19429ce.akpm@linux-foundation.org> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Disposition: inline In-Reply-To: <20090904165305.c19429ce.akpm@linux-foundation.org> Sender: owner-linux-mm@kvack.org To: Andrew Morton Cc: Vincent Li , kosaki.motohiro@jp.fujitsu.com, riel@redhat.com, minchan.kim@gmail.com, fengguang.wu@intel.com, linux-mm@kvack.org List-ID: On Fri, Sep 04, 2009 at 04:53:05PM -0700, Andrew Morton wrote: > On Fri, 4 Sep 2009 14:39:32 -0700 (PDT) > Vincent Li wrote: > > > > > Ok, I followed the patches above to make following testing code: > > > > --- > > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h > > index eaf46bd..863820a 100644 > > --- a/include/trace/events/kmem.h > > +++ b/include/trace/events/kmem.h > > @@ -388,6 +388,24 @@ TRACE_EVENT(mm_page_alloc_extfrag, > > __entry->alloc_migratetype == __entry->fallback_migratetype) > > ); > > > > +TRACE_EVENT(mm_vmscan_isolate_pages, > > + > > + TP_PROTO(int nr_taken_zeros), > > + > > + TP_ARGS(nr_taken_zeros), > > + > > + TP_STRUCT__entry( > > + __field(int, nr_taken_zeros) > > + ), > > + > > + TP_fast_assign( > > + __entry->nr_taken_zeros = nr_taken_zeros; > > + ), > > + > > + TP_printk("nr_taken_zeros=%d", > > + __entry->nr_taken_zeros) > > +); > > + > > #endif /* _TRACE_KMEM_H */ > > > > /* This part must be outside protection */ > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index ad93096..c2cf4dd 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -40,6 +40,7 @@ > > #include > > #include > > #include > > +#include > > > > #include > > #include > > @@ -1306,6 +1307,7 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone, > > struct zone_reclaim_stat *reclaim_stat = get_reclaim_stat(zone, sc); > > unsigned long nr_rotated = 0; > > unsigned long nr_deactivated = 0; > > + int nr_taken_zeros = 0; > > > > lru_add_drain(); > > spin_lock_irq(&zone->lru_lock); > > @@ -1321,8 +1323,11 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone, > > } > > __count_zone_vm_events(PGREFILL, zone, pgscanned); > > > > - if (nr_taken == 0) > > + if (nr_taken == 0) { > > + nr_taken_zeros++; > > + trace_mm_vmscan_isolate_pages(nr_taken_zeros); > > goto done; > > + } > > > > reclaim_stat->recent_scanned[file] += nr_taken; > > if (file) > > Well you want to count two things: 1: how many times nr_taken==0 and 2: > how many times nr_taken!=0. > Indeed. I'm not aware of the specifics that led to this patch, but minimally one would be interested in the exact value of nr_taken as it can be used to answer more than one question. > > Then I got test result with: > > > > root@kernelhack:/usr/src/mmotm-0903# perf stat --repeat 5 -e \ > > kmem:mm_vmscan_isolate_pages hackbench 100 > > > > Running with 100*40 (== 4000) tasks. > > Time: 52.736 > > Running with 100*40 (== 4000) tasks. > > Time: 64.982 > > Running with 100*40 (== 4000) tasks. > > Time: 56.866 > > Running with 100*40 (== 4000) tasks. > > Time: 37.137 > > Running with 100*40 (== 4000) tasks. > > Time: 48.415 > > > > Performance counter stats for 'hackbench 100' (5 runs): > > > > 14189 kmem:mm_vmscan_isolate_pages ( +- 9.084% ) > > > > 52.680621973 seconds time elapsed ( +- 0.689% ) > > > > Is the testing patch written write? I don't understand what the number > > 14189 means? Does it make any sense? > Broadly speaking "For each of the 5 runs of hackbench, there were 14189 times the kmem:mm_vmscan_isolate_pages was sampled +/- 9.084%" Without knowing how many times nr_taken_zero was positive, it's difficult to tell whether 14189 is common or not. > I don't think you need nr_taken_zeros at all. You'd want something like > > if (nr_taken == 0) > trace_mm_vmscan_nr_taken_zero(); > else > trace_mm_vmscan_nr_taken_nonzero(); > > which would pointlessly generate a huge stream of events which would > have to be added up downstream, which is dumb. > Dumb it might be, but perf acts as that aggregator. For the purposes of debugging, it would be fine although it would not be a very suitable pair of events to merge to mainline. A more sensible trace point for mainline would record what nr_taken was so a higher-level tool could answer the zero vs non-zero question or optionally do things like figure out how many pages were being taken of the lists and being put back. For this question though, use the two tracepoints with no additional parameters and have perf how many times each event occurred. > I don't know if the tracing code is capable of maintaining the counters > for you. Perhaps you _do_ need nr_taken_zeros. In which case you want > > if (nr_taken == 0) { > nr_taken_zeros++; > trace_mm_vmscan_isolate_pages_zero(nr_taken_zeros); > } else { > nr_taken_nonzeros++; > trace_mm_vmscan_isolate_pages_nonzero(nr_taken_nonzeros); > } > > which is awkward. Mel will know. > I am not aware of a way of maintaining counters within tracing. The difficulty is figuring out when that event should be recorded. Worse, the count is being aggregated for all processes that enter this path so there is noise in the value that cannot be filtered out. An aggregated counter like this makes sense when reporting to /proc or for discovering via kgdb, but less suitable for tracing. > > > > > > The way I used to do stuff like this is: > > > > > > int akpm1; > > > int akpm2; > > > > > > ... > > > if (nr_taken) > > > akpm1++; > > > else > > > akpm2++; > > > > > > then inspect the values of akpm1 and akpm2 in the running kernel using kgdb. > > That's looking more attractive ;) > It's simplier but it's global in nature so it's harder to figure out how much of the counter is for the target workload and how much of it is everything else. The main advantage of using the two tracepoints is that you know exactly how many of the events were due to hackbench and not the rest of the system. -- Mel Gorman Part-time Phd Student Linux Technology Center University of Limerick IBM Dublin Software Lab -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org