From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail137.messagelabs.com (mail137.messagelabs.com [216.82.249.19]) by kanga.kvack.org (Postfix) with SMTP id 4E2B76B0085 for ; Tue, 17 Feb 2009 09:33:30 -0500 (EST) Received: by bwz28 with SMTP id 28so4438117bwz.14 for ; Tue, 17 Feb 2009 06:33:26 -0800 (PST) Date: Tue, 17 Feb 2009 15:33:22 +0100 From: Frederic Weisbecker Subject: Re: [PATCH] Add tracepoints to track pagecache transition Message-ID: <20090217143321.GB5888@nowhere> References: <1234863220.4744.34.camel@laptop> <499A99BC.2080700@bk.jp.nec.com> <20090217201651.576E.A69D9226@jp.fujitsu.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20090217201651.576E.A69D9226@jp.fujitsu.com> Sender: owner-linux-mm@kvack.org To: KOSAKI Motohiro Cc: Atsushi Tsuji , Peter Zijlstra , linux-kernel@vger.kernel.org, Jason Baron , Ingo Molnar , Mathieu Desnoyers , "Frank Ch. Eigler" , Kazuto Miyoshi , rostedt@goodmis.org, linux-mm , Andrew Morton , Nick Piggin , Hugh Dickins List-ID: On Tue, Feb 17, 2009 at 08:38:20PM +0900, KOSAKI Motohiro wrote: > Hi > > > In my 1st impression, this patch description is a bit strange. > > > The below patch adds instrumentation for pagecache. > > > > I thought it would be useful to trace pagecache behavior for problem > > analysis (performance bottlenecks, behavior differences between stable > > time and trouble time). > > > > By using those tracepoints, we can describe and visualize pagecache > > transition (file-by-file basis) in kernel and pagecache > > consumes most of the memory in running system and pagecache hit rate > > and writeback behavior will influence system load and performance. > > Why do you think this tracepoint describe pagecache hit rate? > and, why describe writeback behavior? > > > > > I attached an example which is visualization of pagecache status using > > SystemTap. > > it seems no attached. and SystemTap isn't used kernel developer at all. > I don't think it's enough explanation. > Can you make seekwatcher liked completed comsumer program? > (if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/) > > > That graph describes pagecache transition of File A and File B > > on a file-by-file basis with the situation where regular I/O to File A > > is delayed because of other I/O to File B. > > If you want to see I/O activity, you need to add tracepoint into block layer. > > > We visually understand > > pagecache for File A is narrowed down due to I/O pressure from File B. > > confused. Can we assume the number of anon pages/files pages ratio don't chage? > > > > Peter Zijlstra wrote: > > > On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote: > > > > > >> The below patch adds instrumentation for pagecache. > > > > > > And somehow you forgot to CC any of the mm people.. ;-) > > > > Hi Peter, > > > > Ah, sorry. > > Thank you for adding to CC list. > > > > >> +DECLARE_TRACE(filemap_add_to_page_cache, > > >> + TPPROTO(struct address_space *mapping, pgoff_t offset), > > >> + TPARGS(mapping, offset)); > > >> +DECLARE_TRACE(filemap_remove_from_page_cache, > > >> + TPPROTO(struct address_space *mapping), > > >> + TPARGS(mapping)); > > > > > > This is rather asymmetric, why don't we care about the offset for the > > > removed page? > > > > > > > Indeed. > > I added the offset to the argument for the removed page and resend fixed patch. > > > > Signed-off-by: Atsushi Tsuji > > --- > > diff --git a/include/trace/filemap.h b/include/trace/filemap.h > > please add diffstat. > > > > new file mode 100644 > > index 0000000..a17dc92 > > --- /dev/null > > +++ b/include/trace/filemap.h > > @@ -0,0 +1,13 @@ > > +#ifndef _TRACE_FILEMAP_H > > +#define _TRACE_FILEMAP_H > > + > > +#include > > + > > +DECLARE_TRACE(filemap_add_to_page_cache, > > + TPPROTO(struct address_space *mapping, pgoff_t offset), > > + TPARGS(mapping, offset)); > > +DECLARE_TRACE(filemap_remove_from_page_cache, > > + TPPROTO(struct address_space *mapping, pgoff_t offset), > > + TPARGS(mapping, offset)); > > + > > +#endif > > diff --git a/mm/filemap.c b/mm/filemap.c > > index 23acefe..23f75d2 100644 > > --- a/mm/filemap.c > > +++ b/mm/filemap.c > > @@ -34,6 +34,7 @@ > > #include /* for BUG_ON(!in_atomic()) only */ > > #include > > #include /* for page_is_file_cache() */ > > +#include > > #include "internal.h" > > > > /* > > @@ -43,6 +44,8 @@ > > > > #include > > > > +DEFINE_TRACE(filemap_add_to_page_cache); > > +DEFINE_TRACE(filemap_remove_from_page_cache); > > > > /* > > * Shared mappings implemented 30.11.1994. It's not fully working yet, > > @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page) > > page->mapping = NULL; > > mapping->nrpages--; > > __dec_zone_page_state(page, NR_FILE_PAGES); > > + trace_filemap_remove_from_page_cache(mapping, page->index); > > __remove_from_page_cache() is passed struct page. > Why don't you use struct page > > and, this mean > - the page have been removed from mapping. > - vmstate have been decremented. > - but, the page haven't been uncharged from memcg. > > Why? > > > > BUG_ON(page_mapped(page)); > > mem_cgroup_uncharge_cache_page(page); > > > > @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping, > > if (likely(!error)) { > > mapping->nrpages++; > > __inc_zone_page_state(page, NR_FILE_PAGES); > > + trace_filemap_add_to_page_cache(mapping, offset); > > Why do you select this line? > In general, trace point calling under spin lock grabbing is a bit problematic. > > > > } else { > > page->mapping = NULL; > > mem_cgroup_uncharge_cache_page(page); > > > > And, both function is freqentlly called one. > I worry about performance issue. can you prove no degression? It would be very hard to prove. Tracepoints are very cheap in that they only add the overhead of a single branch check while off. But are there some plans about writing a tracer or so for pagecache? > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org