From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail190.messagelabs.com (mail190.messagelabs.com [216.82.249.51]) by kanga.kvack.org (Postfix) with SMTP id AEBF26B0085 for ; Tue, 8 Sep 2009 18:39:57 -0400 (EDT) Date: Tue, 8 Sep 2009 15:39:59 -0700 (PDT) From: Vincent Li Subject: Re: [RESEND][PATCH V1] mm/vsmcan: check shrink_active_list() sc->isolate_pages() return value. In-Reply-To: <20090908132100.GA17446@csn.ul.ie> Message-ID: References: <1251935365-7044-1-git-send-email-macli@brc.ubc.ca> <20090903140602.e0169ffc.akpm@linux-foundation.org> <20090903154704.da62dd76.akpm@linux-foundation.org> <20090904165305.c19429ce.akpm@linux-foundation.org> <20090908132100.GA17446@csn.ul.ie> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-linux-mm@kvack.org To: Mel Gorman Cc: Andrew Morton , kosaki.motohiro@jp.fujitsu.com, riel@redhat.com, minchan.kim@gmail.com, fengguang.wu@intel.com, linux-mm@kvack.org List-ID: On Tue, 8 Sep 2009, Mel Gorman wrote: > On Fri, Sep 04, 2009 at 04:53:05PM -0700, Andrew Morton wrote: > > On Fri, 4 Sep 2009 14:39:32 -0700 (PDT) > > Vincent Li wrote: > > > > Well you want to count two things: 1: how many times nr_taken==0 and 2: > > how many times nr_taken!=0. > > > > Indeed. I'm not aware of the specifics that led to this patch, but minimally > one would be interested in the exact value of nr_taken as it can be used to > answer more than one question. > > > > Then I got test result with: > > > > > > root@kernelhack:/usr/src/mmotm-0903# perf stat --repeat 5 -e \ > > > kmem:mm_vmscan_isolate_pages hackbench 100 > > > > > > Running with 100*40 (== 4000) tasks. > > > Time: 52.736 > > > Running with 100*40 (== 4000) tasks. > > > Time: 64.982 > > > Running with 100*40 (== 4000) tasks. > > > Time: 56.866 > > > Running with 100*40 (== 4000) tasks. > > > Time: 37.137 > > > Running with 100*40 (== 4000) tasks. > > > Time: 48.415 > > > > > > Performance counter stats for 'hackbench 100' (5 runs): > > > > > > 14189 kmem:mm_vmscan_isolate_pages ( +- 9.084% ) > > > > > > 52.680621973 seconds time elapsed ( +- 0.689% ) > > > > > > Is the testing patch written write? I don't understand what the number > > > 14189 means? Does it make any sense? > > > > Broadly speaking > > "For each of the 5 runs of hackbench, there were 14189 times the > kmem:mm_vmscan_isolate_pages was sampled +/- 9.084%" > > Without knowing how many times nr_taken_zero was positive, it's > difficult to tell whether 14189 is common or not. > > > I don't think you need nr_taken_zeros at all. You'd want something like > > > > if (nr_taken == 0) > > trace_mm_vmscan_nr_taken_zero(); > > else > > trace_mm_vmscan_nr_taken_nonzero(); > > > > which would pointlessly generate a huge stream of events which would > > have to be added up downstream, which is dumb. > > > > Dumb it might be, but perf acts as that aggregator. For the purposes of > debugging, it would be fine although it would not be a very suitable pair > of events to merge to mainline. A more sensible trace point for mainline > would record what nr_taken was so a higher-level tool could answer the zero > vs non-zero question or optionally do things like figure out how many pages > were being taken of the lists and being put back. > > For this question though, use the two tracepoints with no additional parameters > and have perf how many times each event occurred. Thank you for the explaintion. I am not sure I follow your discussion correctly, no additional parameters means something like: TRACE_EVENT(mm_vmscan_nr_taken_zero, TP_PROTO( ), TP_ARGS( ), TP_STRUCT__entry( ), TP_fast_assign( ), TP_printk( ) ); ? which looks strange to me and does not compile. I guess that is not what you mean. I ended up with a following patch: ----PATCH BEGIN--- --- diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h index eaf46bd..1f9e7bf 100644 --- a/include/trace/events/kmem.h +++ b/include/trace/events/kmem.h @@ -388,6 +388,42 @@ TRACE_EVENT(mm_page_alloc_extfrag, __entry->alloc_migratetype == __entry->fallback_migratetype) ); +TRACE_EVENT(mm_vmscan_nr_taken_zero, + + TP_PROTO(unsigned long nr_taken), + + TP_ARGS(nr_taken), + + TP_STRUCT__entry( + __field( unsigned long, nr_taken ) + ), + + TP_fast_assign( + __entry->nr_taken = nr_taken; + ), + + TP_printk("nr_taken=%lu", + __entry->nr_taken) +); + +TRACE_EVENT(mm_vmscan_nr_taken_nonzero, + + TP_PROTO(unsigned long nr_taken), + + TP_ARGS(nr_taken), + + TP_STRUCT__entry( + __field( unsigned long, nr_taken ) + ), + + TP_fast_assign( + __entry->nr_taken = nr_taken; + ), + + TP_printk("nr_taken=%lu", + __entry->nr_taken) +); + #endif /* _TRACE_KMEM_H */ /* This part must be outside protection */ diff --git a/mm/vmscan.c b/mm/vmscan.c index ad93096..eec4099 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -40,6 +40,7 @@ #include #include #include +#include #include #include @@ -1322,7 +1323,9 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone, __count_zone_vm_events(PGREFILL, zone, pgscanned); if (nr_taken == 0) - goto done; + trace_mm_vmscan_nr_taken_zero(nr_taken); + else + trace_mm_vmscan_nr_taken_nonzero(nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; if (file) @@ -1388,7 +1391,6 @@ static void shrink_active_list(unsigned long nr_pages, struct zone *zone, nr_rotated); __mod_zone_page_state(zone, NR_INACTIVE_ANON + file * LRU_FILE, nr_deactivated); -done: spin_unlock_irq(&zone->lru_lock); } ----PATCH END--- /usr/src/mmotm-0903# perf stat --repeat 5 -e kmem:mm_vmscan_nr_taken_zero \ -e kmem:mm_vmscan_nr_taken_nonzero hackbench 100 Running with 100*40 (== 4000) tasks. Time: 41.599 Running with 100*40 (== 4000) tasks. Time: 80.192 Running with 100*40 (== 4000) tasks. Time: 26.451 Running with 100*40 (== 4000) tasks. Time: 65.428 Running with 100*40 (== 4000) tasks. Time: 30.054 Performance counter stats for 'hackbench 100' (5 runs): 10330 kmem:mm_vmscan_nr_taken_zero ( +- 11.732% ) 2601 kmem:mm_vmscan_nr_taken_nonzero ( +- 10.876% ) 49.509328260 seconds time elapsed ( +- 0.934% ) Sampling of nr_taken_zero is way bigger than sampling of nr_taken_nonzero in the 5 hackbench runs. I thought the sampling result would be the opposite. Maybe I get it all wrong :-). > > > I don't know if the tracing code is capable of maintaining the counters > > for you. Perhaps you _do_ need nr_taken_zeros. In which case you want > > > > if (nr_taken == 0) { > > nr_taken_zeros++; > > trace_mm_vmscan_isolate_pages_zero(nr_taken_zeros); > > } else { > > nr_taken_nonzeros++; > > trace_mm_vmscan_isolate_pages_nonzero(nr_taken_nonzeros); > > } > > > > which is awkward. Mel will know. > > > > I am not aware of a way of maintaining counters within tracing. The difficulty > is figuring out when that event should be recorded. Worse, the count is > being aggregated for all processes that enter this path so there is noise > in the value that cannot be filtered out. An aggregated counter like this > makes sense when reporting to /proc or for discovering via kgdb, but less > suitable for tracing. > > > > > > > > > The way I used to do stuff like this is: > > > > > > > > int akpm1; > > > > int akpm2; > > > > > > > > ... > > > > if (nr_taken) > > > > akpm1++; > > > > else > > > > akpm2++; > > > > > > > > then inspect the values of akpm1 and akpm2 in the running kernel using kgdb. > > > > That's looking more attractive ;) > > > > It's simplier but it's global in nature so it's harder to figure out how much > of the counter is for the target workload and how much of it is everything > else. The main advantage of using the two tracepoints is that you know exactly > how many of the events were due to hackbench and not the rest of the system. > > -- > Mel Gorman > Part-time Phd Student Linux Technology Center > University of Limerick IBM Dublin Software Lab > > Vincent Li Biomedical Research Center University of British Columbia -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org