linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] Add tracepoints to track pagecache transition
       [not found] <499A7CAD.9030409@bk.jp.nec.com>
@ 2009-02-17  9:33 ` Peter Zijlstra
  2009-02-17 11:04   ` Atsushi Tsuji
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2009-02-17  9:33 UTC (permalink / raw)
  To: Atsushi Tsuji
  Cc: linux-kernel, Jason Baron, Ingo Molnar, Mathieu Desnoyers,
	Frank Ch. Eigler, Kazuto Miyoshi, rostedt, linux-mm,
	Andrew Morton, Nick Piggin, Hugh Dickins

On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:

> The below patch adds instrumentation for pagecache.

And somehow you forgot to CC any of the mm people.. ;-)

> I thought it would be useful to trace pagecache behavior for problem
> analysis (performance bottlenecks, behavior differences between stable
> time and trouble time).
> 
> By using those tracepoints, we can describe and visualize pagecache
> transition (file-by-file basis) in kernel and  pagecache
> consumes most of the memory in running system and pagecache hit rate
> and writeback behavior will influence system load and performance.

> Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> ---
> diff --git a/include/trace/filemap.h b/include/trace/filemap.h
> new file mode 100644
> index 0000000..196955e
> --- /dev/null
> +++ b/include/trace/filemap.h
> @@ -0,0 +1,13 @@
> +#ifndef _TRACE_FILEMAP_H
> +#define _TRACE_FILEMAP_H
> +
> +#include <linux/tracepoint.h>
> +
> +DECLARE_TRACE(filemap_add_to_page_cache,
> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> +	TPARGS(mapping, offset));
> +DECLARE_TRACE(filemap_remove_from_page_cache,
> +	TPPROTO(struct address_space *mapping),
> +	TPARGS(mapping));

This is rather asymmetric, why don't we care about the offset for the
removed page?

> +#endif
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 23acefe..76a6887 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -34,6 +34,7 @@
>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>  #include <linux/memcontrol.h>
>  #include <linux/mm_inline.h> /* for page_is_file_cache() */
> +#include <trace/filemap.h>
>  #include "internal.h"
>  
>  /*
> @@ -43,6 +44,8 @@
>  
>  #include <asm/mman.h>
>  
> +DEFINE_TRACE(filemap_add_to_page_cache);
> +DEFINE_TRACE(filemap_remove_from_page_cache);
>  
>  /*
>   * Shared mappings implemented 30.11.1994. It's not fully working yet,
> @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
>  	page->mapping = NULL;
>  	mapping->nrpages--;
>  	__dec_zone_page_state(page, NR_FILE_PAGES);
> +	trace_filemap_remove_from_page_cache(mapping);
>  	BUG_ON(page_mapped(page));
>  	mem_cgroup_uncharge_cache_page(page);
>  
> @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  		if (likely(!error)) {
>  			mapping->nrpages++;
>  			__inc_zone_page_state(page, NR_FILE_PAGES);
> +			trace_filemap_add_to_page_cache(mapping, offset);
>  		} else {
>  			page->mapping = NULL;
>  			mem_cgroup_uncharge_cache_page(page);
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17  9:33 ` [PATCH] Add tracepoints to track pagecache transition Peter Zijlstra
@ 2009-02-17 11:04   ` Atsushi Tsuji
  2009-02-17 11:38     ` KOSAKI Motohiro
  2009-02-17 15:24     ` Steven Rostedt
  0 siblings, 2 replies; 13+ messages in thread
From: Atsushi Tsuji @ 2009-02-17 11:04 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Jason Baron, Ingo Molnar, Mathieu Desnoyers,
	Frank Ch. Eigler, Kazuto Miyoshi, rostedt, linux-mm,
	Andrew Morton, Nick Piggin, Hugh Dickins

Peter Zijlstra wrote:
> On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:
> 
>> The below patch adds instrumentation for pagecache.
> 
> And somehow you forgot to CC any of the mm people.. ;-)

Hi Peter,

Ah, sorry.
Thank you for adding to CC list.

>> +DECLARE_TRACE(filemap_add_to_page_cache,
>> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
>> +	TPARGS(mapping, offset));
>> +DECLARE_TRACE(filemap_remove_from_page_cache,
>> +	TPPROTO(struct address_space *mapping),
>> +	TPARGS(mapping));
> 
> This is rather asymmetric, why don't we care about the offset for the
> removed page?
> 

Indeed.
I added the offset to the argument for the removed page and resend fixed patch.

Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
---
diff --git a/include/trace/filemap.h b/include/trace/filemap.h
new file mode 100644
index 0000000..a17dc92
--- /dev/null
+++ b/include/trace/filemap.h
@@ -0,0 +1,13 @@
+#ifndef _TRACE_FILEMAP_H
+#define _TRACE_FILEMAP_H
+
+#include <linux/tracepoint.h>
+
+DECLARE_TRACE(filemap_add_to_page_cache,
+	TPPROTO(struct address_space *mapping, pgoff_t offset),
+	TPARGS(mapping, offset));
+DECLARE_TRACE(filemap_remove_from_page_cache,
+	TPPROTO(struct address_space *mapping, pgoff_t offset),
+	TPARGS(mapping, offset));
+
+#endif
diff --git a/mm/filemap.c b/mm/filemap.c
index 23acefe..23f75d2 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -34,6 +34,7 @@
 #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
 #include <linux/memcontrol.h>
 #include <linux/mm_inline.h> /* for page_is_file_cache() */
+#include <trace/filemap.h>
 #include "internal.h"
 
 /*
@@ -43,6 +44,8 @@
 
 #include <asm/mman.h>
 
+DEFINE_TRACE(filemap_add_to_page_cache);
+DEFINE_TRACE(filemap_remove_from_page_cache);
 
 /*
  * Shared mappings implemented 30.11.1994. It's not fully working yet,
@@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
 	page->mapping = NULL;
 	mapping->nrpages--;
 	__dec_zone_page_state(page, NR_FILE_PAGES);
+	trace_filemap_remove_from_page_cache(mapping, page->index);
 	BUG_ON(page_mapped(page));
 	mem_cgroup_uncharge_cache_page(page);
 
@@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
 		if (likely(!error)) {
 			mapping->nrpages++;
 			__inc_zone_page_state(page, NR_FILE_PAGES);
+			trace_filemap_add_to_page_cache(mapping, offset);
 		} else {
 			page->mapping = NULL;
 			mem_cgroup_uncharge_cache_page(page);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:04   ` Atsushi Tsuji
@ 2009-02-17 11:38     ` KOSAKI Motohiro
  2009-02-17 11:54       ` Peter Zijlstra
                         ` (2 more replies)
  2009-02-17 15:24     ` Steven Rostedt
  1 sibling, 3 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-17 11:38 UTC (permalink / raw)
  To: Atsushi Tsuji
  Cc: kosaki.motohiro, Peter Zijlstra, linux-kernel, Jason Baron,
	Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi,
	rostedt, linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

Hi


In my 1st impression, this patch description is a bit strange.

> The below patch adds instrumentation for pagecache.
> 
> I thought it would be useful to trace pagecache behavior for problem
> analysis (performance bottlenecks, behavior differences between stable
> time and trouble time).
> 
> By using those tracepoints, we can describe and visualize pagecache
> transition (file-by-file basis) in kernel and  pagecache
> consumes most of the memory in running system and pagecache hit rate
> and writeback behavior will influence system load and performance.

Why do you think this tracepoint describe pagecache hit rate?
and, why describe writeback behavior?

> 
> I attached an example which is visualization of pagecache status using
> SystemTap. 

it seems no attached. and SystemTap isn't used kernel developer at all.
I don't think it's enough explanation.
Can you make seekwatcher liked completed comsumer program?
(if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/)

> That graph describes pagecache transition of File A and File B
> on a file-by-file basis with the situation where regular I/O to File A
> is delayed because of other I/O to File B. 

If you want to see I/O activity, you need to add tracepoint into block layer.

> We visually understand
> pagecache for File A is narrowed down due to I/O pressure from File B.

confused. Can we assume the number of anon pages/files pages ratio don't chage?


> Peter Zijlstra wrote:
> > On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:
> > 
> >> The below patch adds instrumentation for pagecache.
> > 
> > And somehow you forgot to CC any of the mm people.. ;-)
> 
> Hi Peter,
> 
> Ah, sorry.
> Thank you for adding to CC list.
> 
> >> +DECLARE_TRACE(filemap_add_to_page_cache,
> >> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> >> +	TPARGS(mapping, offset));
> >> +DECLARE_TRACE(filemap_remove_from_page_cache,
> >> +	TPPROTO(struct address_space *mapping),
> >> +	TPARGS(mapping));
> > 
> > This is rather asymmetric, why don't we care about the offset for the
> > removed page?
> > 
> 
> Indeed.
> I added the offset to the argument for the removed page and resend fixed patch.
> 
> Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> ---
> diff --git a/include/trace/filemap.h b/include/trace/filemap.h

please add diffstat.


> new file mode 100644
> index 0000000..a17dc92
> --- /dev/null
> +++ b/include/trace/filemap.h
> @@ -0,0 +1,13 @@
> +#ifndef _TRACE_FILEMAP_H
> +#define _TRACE_FILEMAP_H
> +
> +#include <linux/tracepoint.h>
> +
> +DECLARE_TRACE(filemap_add_to_page_cache,
> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> +	TPARGS(mapping, offset));
> +DECLARE_TRACE(filemap_remove_from_page_cache,
> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> +	TPARGS(mapping, offset));
> +
> +#endif
> diff --git a/mm/filemap.c b/mm/filemap.c
> index 23acefe..23f75d2 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -34,6 +34,7 @@
>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>  #include <linux/memcontrol.h>
>  #include <linux/mm_inline.h> /* for page_is_file_cache() */
> +#include <trace/filemap.h>
>  #include "internal.h"
>  
>  /*
> @@ -43,6 +44,8 @@
>  
>  #include <asm/mman.h>
>  
> +DEFINE_TRACE(filemap_add_to_page_cache);
> +DEFINE_TRACE(filemap_remove_from_page_cache);
>  
>  /*
>   * Shared mappings implemented 30.11.1994. It's not fully working yet,
> @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
>  	page->mapping = NULL;
>  	mapping->nrpages--;
>  	__dec_zone_page_state(page, NR_FILE_PAGES);
> +	trace_filemap_remove_from_page_cache(mapping, page->index);

__remove_from_page_cache() is passed struct page.
Why don't you use struct page

and, this mean
  - the page have been removed from mapping.
  - vmstate have been decremented.
  - but, the page haven't been uncharged from memcg.

Why?


>  	BUG_ON(page_mapped(page));
>  	mem_cgroup_uncharge_cache_page(page);
>  
> @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>  		if (likely(!error)) {
>  			mapping->nrpages++;
>  			__inc_zone_page_state(page, NR_FILE_PAGES);
> +			trace_filemap_add_to_page_cache(mapping, offset);

Why do you select this line?
In general, trace point calling under spin lock grabbing is a bit problematic.


>  		} else {
>  			page->mapping = NULL;
>  			mem_cgroup_uncharge_cache_page(page);
> 

And, both function is freqentlly called one.
I worry about performance issue. can you prove no degression?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:38     ` KOSAKI Motohiro
@ 2009-02-17 11:54       ` Peter Zijlstra
  2009-02-17 12:33         ` KOSAKI Motohiro
  2009-02-17 14:33       ` Frederic Weisbecker
  2009-02-19  4:41       ` Atsushi Tsuji
  2 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2009-02-17 11:54 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Atsushi Tsuji, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, rostedt,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins,
	Arnaldo Carvalho de Melo

On Tue, 2009-02-17 at 20:38 +0900, KOSAKI Motohiro wrote:
> If you want to see I/O activity, you need to add tracepoint into block
> layer.

We already have that, Arnaldo converted blktrace to use tracepoints and
wrote an ftrace tracer for it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:54       ` Peter Zijlstra
@ 2009-02-17 12:33         ` KOSAKI Motohiro
  0 siblings, 0 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-17 12:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Atsushi Tsuji, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, rostedt,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins,
	Arnaldo Carvalho de Melo

>> If you want to see I/O activity, you need to add tracepoint into block
>> layer.
>
> We already have that, Arnaldo converted blktrace to use tracepoints and
> wrote an ftrace tracer for it.

Yup. I agree I selected wrong word.
My point is, if he want to know I/O delaying reason, the number of the
cache pages seems unrelated thing.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:38     ` KOSAKI Motohiro
  2009-02-17 11:54       ` Peter Zijlstra
@ 2009-02-17 14:33       ` Frederic Weisbecker
  2009-02-18  0:29         ` KOSAKI Motohiro
  2009-02-19  4:41       ` Atsushi Tsuji
  2 siblings, 1 reply; 13+ messages in thread
From: Frederic Weisbecker @ 2009-02-17 14:33 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Atsushi Tsuji, Peter Zijlstra, linux-kernel, Jason Baron,
	Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi,
	rostedt, linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

On Tue, Feb 17, 2009 at 08:38:20PM +0900, KOSAKI Motohiro wrote:
> Hi
> 
> 
> In my 1st impression, this patch description is a bit strange.
> 
> > The below patch adds instrumentation for pagecache.
> > 
> > I thought it would be useful to trace pagecache behavior for problem
> > analysis (performance bottlenecks, behavior differences between stable
> > time and trouble time).
> > 
> > By using those tracepoints, we can describe and visualize pagecache
> > transition (file-by-file basis) in kernel and  pagecache
> > consumes most of the memory in running system and pagecache hit rate
> > and writeback behavior will influence system load and performance.
> 
> Why do you think this tracepoint describe pagecache hit rate?
> and, why describe writeback behavior?
> 
> > 
> > I attached an example which is visualization of pagecache status using
> > SystemTap. 
> 
> it seems no attached. and SystemTap isn't used kernel developer at all.
> I don't think it's enough explanation.
> Can you make seekwatcher liked completed comsumer program?
> (if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/)
> 
> > That graph describes pagecache transition of File A and File B
> > on a file-by-file basis with the situation where regular I/O to File A
> > is delayed because of other I/O to File B. 
> 
> If you want to see I/O activity, you need to add tracepoint into block layer.
> 
> > We visually understand
> > pagecache for File A is narrowed down due to I/O pressure from File B.
> 
> confused. Can we assume the number of anon pages/files pages ratio don't chage?
> 
> 
> > Peter Zijlstra wrote:
> > > On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:
> > > 
> > >> The below patch adds instrumentation for pagecache.
> > > 
> > > And somehow you forgot to CC any of the mm people.. ;-)
> > 
> > Hi Peter,
> > 
> > Ah, sorry.
> > Thank you for adding to CC list.
> > 
> > >> +DECLARE_TRACE(filemap_add_to_page_cache,
> > >> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> > >> +	TPARGS(mapping, offset));
> > >> +DECLARE_TRACE(filemap_remove_from_page_cache,
> > >> +	TPPROTO(struct address_space *mapping),
> > >> +	TPARGS(mapping));
> > > 
> > > This is rather asymmetric, why don't we care about the offset for the
> > > removed page?
> > > 
> > 
> > Indeed.
> > I added the offset to the argument for the removed page and resend fixed patch.
> > 
> > Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> > ---
> > diff --git a/include/trace/filemap.h b/include/trace/filemap.h
> 
> please add diffstat.
> 
> 
> > new file mode 100644
> > index 0000000..a17dc92
> > --- /dev/null
> > +++ b/include/trace/filemap.h
> > @@ -0,0 +1,13 @@
> > +#ifndef _TRACE_FILEMAP_H
> > +#define _TRACE_FILEMAP_H
> > +
> > +#include <linux/tracepoint.h>
> > +
> > +DECLARE_TRACE(filemap_add_to_page_cache,
> > +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> > +	TPARGS(mapping, offset));
> > +DECLARE_TRACE(filemap_remove_from_page_cache,
> > +	TPPROTO(struct address_space *mapping, pgoff_t offset),
> > +	TPARGS(mapping, offset));
> > +
> > +#endif
> > diff --git a/mm/filemap.c b/mm/filemap.c
> > index 23acefe..23f75d2 100644
> > --- a/mm/filemap.c
> > +++ b/mm/filemap.c
> > @@ -34,6 +34,7 @@
> >  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
> >  #include <linux/memcontrol.h>
> >  #include <linux/mm_inline.h> /* for page_is_file_cache() */
> > +#include <trace/filemap.h>
> >  #include "internal.h"
> >  
> >  /*
> > @@ -43,6 +44,8 @@
> >  
> >  #include <asm/mman.h>
> >  
> > +DEFINE_TRACE(filemap_add_to_page_cache);
> > +DEFINE_TRACE(filemap_remove_from_page_cache);
> >  
> >  /*
> >   * Shared mappings implemented 30.11.1994. It's not fully working yet,
> > @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
> >  	page->mapping = NULL;
> >  	mapping->nrpages--;
> >  	__dec_zone_page_state(page, NR_FILE_PAGES);
> > +	trace_filemap_remove_from_page_cache(mapping, page->index);
> 
> __remove_from_page_cache() is passed struct page.
> Why don't you use struct page
> 
> and, this mean
>   - the page have been removed from mapping.
>   - vmstate have been decremented.
>   - but, the page haven't been uncharged from memcg.
> 
> Why?
> 
> 
> >  	BUG_ON(page_mapped(page));
> >  	mem_cgroup_uncharge_cache_page(page);
> >  
> > @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
> >  		if (likely(!error)) {
> >  			mapping->nrpages++;
> >  			__inc_zone_page_state(page, NR_FILE_PAGES);
> > +			trace_filemap_add_to_page_cache(mapping, offset);
> 
> Why do you select this line?
> In general, trace point calling under spin lock grabbing is a bit problematic.
> 
> 
> >  		} else {
> >  			page->mapping = NULL;
> >  			mem_cgroup_uncharge_cache_page(page);
> > 
> 
> And, both function is freqentlly called one.
> I worry about performance issue. can you prove no degression?


It would be very hard to prove. Tracepoints are very cheap in that they only
add the overhead of a single branch check while off.

But are there some plans about writing a tracer or so for pagecache?

 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:04   ` Atsushi Tsuji
  2009-02-17 11:38     ` KOSAKI Motohiro
@ 2009-02-17 15:24     ` Steven Rostedt
  2009-02-19 18:51       ` Christoph Hellwig
  1 sibling, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2009-02-17 15:24 UTC (permalink / raw)
  To: Atsushi Tsuji
  Cc: Peter Zijlstra, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, linux-mm,
	Andrew Morton, Nick Piggin, Hugh Dickins


On Tue, 17 Feb 2009, Atsushi Tsuji wrote:
> > 
> > This is rather asymmetric, why don't we care about the offset for the
> > removed page?
> > 
> 
> Indeed.
> I added the offset to the argument for the removed page and resend fixed patch.
> 
> Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>

Could you package it up in one patch again and resend with [PATCH v2].
Also make sure to Cc the memory folks, and ask for an Acked-by from them.

Thanks,

-- Steve

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 14:33       ` Frederic Weisbecker
@ 2009-02-18  0:29         ` KOSAKI Motohiro
  0 siblings, 0 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-18  0:29 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: kosaki.motohiro, Atsushi Tsuji, Peter Zijlstra, linux-kernel,
	Jason Baron, Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler,
	Kazuto Miyoshi, rostedt, linux-mm, Andrew Morton, Nick Piggin,
	Hugh Dickins

Hi Frederic,

> > And, both function is freqentlly called one.
> > I worry about performance issue. can you prove no degression?
> 
> It would be very hard to prove. Tracepoints are very cheap in that they only
> add the overhead of a single branch check while off.

this is typical reviewing comment.

Memory folks adage says,
	Don't believe theory, you believe benchmark result.

I don't oppose your theorical background opinion :)


> But are there some plans about writing a tracer or so for pagecache?



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 11:38     ` KOSAKI Motohiro
  2009-02-17 11:54       ` Peter Zijlstra
  2009-02-17 14:33       ` Frederic Weisbecker
@ 2009-02-19  4:41       ` Atsushi Tsuji
  2009-02-19 13:12         ` KOSAKI Motohiro
  2 siblings, 1 reply; 13+ messages in thread
From: Atsushi Tsuji @ 2009-02-19  4:41 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Peter Zijlstra, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, rostedt,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

Hi Kosaki-san,

Thank you for your comment.

KOSAKI Motohiro wrote:
> Hi
> 
> 
> In my 1st impression, this patch description is a bit strange.
> 
>> The below patch adds instrumentation for pagecache.
>>
>> I thought it would be useful to trace pagecache behavior for problem
>> analysis (performance bottlenecks, behavior differences between stable
>> time and trouble time).
>>
>> By using those tracepoints, we can describe and visualize pagecache
>> transition (file-by-file basis) in kernel and  pagecache
>> consumes most of the memory in running system and pagecache hit rate
>> and writeback behavior will influence system load and performance.
> 
> Why do you think this tracepoint describe pagecache hit rate?
> and, why describe writeback behavior?

I mean, we can describe file-by-file basis pagecache usage by using 
these tracepoints and it is important for analyzing process I/O behavior. 
Currently, we can understand the amount of pagecache from "Cached"
in /proc/meminfo. So I'd like to understand which files are using pagecache. 

> 
>> I attached an example which is visualization of pagecache status using
>> SystemTap. 
> 
> it seems no attached. and SystemTap isn't used kernel developer at all.
> I don't think it's enough explanation.
> Can you make seekwatcher liked completed comsumer program?
> (if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/)

I understand a tracer using these tracepoints need to be implemented.
What I want to do is counting pagecache per file. We can retrieve inode
from mapping and count pagecache per inode in these tracepoints.

>> That graph describes pagecache transition of File A and File B
>> on a file-by-file basis with the situation where regular I/O to File A
>> is delayed because of other I/O to File B. 
> 
> If you want to see I/O activity, you need to add tracepoint into block layer.

I think tracking pagecache is useful for understanding process I/O activity,
because whether process I/O completes by accessing memory or HDD is determined by
accessed files on pagecache or not.

> 
>> We visually understand
>> pagecache for File A is narrowed down due to I/O pressure from File B.
> 
> confused. Can we assume the number of anon pages/files pages ratio don't chage?
> 
> 
>> Peter Zijlstra wrote:
>>> On Tue, 2009-02-17 at 18:00 +0900, Atsushi Tsuji wrote:
>>>
>>>> The below patch adds instrumentation for pagecache.
>>> And somehow you forgot to CC any of the mm people.. ;-)
>> Hi Peter,
>>
>> Ah, sorry.
>> Thank you for adding to CC list.
>>
>>>> +DECLARE_TRACE(filemap_add_to_page_cache,
>>>> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
>>>> +	TPARGS(mapping, offset));
>>>> +DECLARE_TRACE(filemap_remove_from_page_cache,
>>>> +	TPPROTO(struct address_space *mapping),
>>>> +	TPARGS(mapping));
>>> This is rather asymmetric, why don't we care about the offset for the
>>> removed page?
>>>
>> Indeed.
>> I added the offset to the argument for the removed page and resend fixed patch.
>>
>> Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
>> ---
>> diff --git a/include/trace/filemap.h b/include/trace/filemap.h
> 
> please add diffstat.
> 
> 
>> new file mode 100644
>> index 0000000..a17dc92
>> --- /dev/null
>> +++ b/include/trace/filemap.h
>> @@ -0,0 +1,13 @@
>> +#ifndef _TRACE_FILEMAP_H
>> +#define _TRACE_FILEMAP_H
>> +
>> +#include <linux/tracepoint.h>
>> +
>> +DECLARE_TRACE(filemap_add_to_page_cache,
>> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
>> +	TPARGS(mapping, offset));
>> +DECLARE_TRACE(filemap_remove_from_page_cache,
>> +	TPPROTO(struct address_space *mapping, pgoff_t offset),
>> +	TPARGS(mapping, offset));
>> +
>> +#endif
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index 23acefe..23f75d2 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -34,6 +34,7 @@
>>  #include <linux/hardirq.h> /* for BUG_ON(!in_atomic()) only */
>>  #include <linux/memcontrol.h>
>>  #include <linux/mm_inline.h> /* for page_is_file_cache() */
>> +#include <trace/filemap.h>
>>  #include "internal.h"
>>  
>>  /*
>> @@ -43,6 +44,8 @@
>>  
>>  #include <asm/mman.h>
>>  
>> +DEFINE_TRACE(filemap_add_to_page_cache);
>> +DEFINE_TRACE(filemap_remove_from_page_cache);
>>  
>>  /*
>>   * Shared mappings implemented 30.11.1994. It's not fully working yet,
>> @@ -120,6 +123,7 @@ void __remove_from_page_cache(struct page *page)
>>  	page->mapping = NULL;
>>  	mapping->nrpages--;
>>  	__dec_zone_page_state(page, NR_FILE_PAGES);
>> +	trace_filemap_remove_from_page_cache(mapping, page->index);
> 
> __remove_from_page_cache() is passed struct page.
> Why don't you use struct page
> 
> and, this mean
>   - the page have been removed from mapping.
>   - vmstate have been decremented.
>   - but, the page haven't been uncharged from memcg.
> 
> Why?
> 
> 
>>  	BUG_ON(page_mapped(page));
>>  	mem_cgroup_uncharge_cache_page(page);
>>  
>> @@ -475,6 +479,7 @@ int add_to_page_cache_locked(struct page *page, struct address_space *mapping,
>>  		if (likely(!error)) {
>>  			mapping->nrpages++;
>>  			__inc_zone_page_state(page, NR_FILE_PAGES);
>> +			trace_filemap_add_to_page_cache(mapping, offset);
> 
> Why do you select this line?
> In general, trace point calling under spin lock grabbing is a bit problematic.
 
I understand my patch is wrong. I will fix and send it later.

> 
>>  		} else {
>>  			page->mapping = NULL;
>>  			mem_cgroup_uncharge_cache_page(page);
>>
> 
> And, both function is freqentlly called one.
> I worry about performance issue. can you prove no degression?

I will try to probe that.

Thanks,
-Atsushi

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-19  4:41       ` Atsushi Tsuji
@ 2009-02-19 13:12         ` KOSAKI Motohiro
  2009-02-19 14:21           ` Lee Schermerhorn
  0 siblings, 1 reply; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-19 13:12 UTC (permalink / raw)
  To: Atsushi Tsuji
  Cc: Peter Zijlstra, linux-kernel, Jason Baron, Ingo Molnar,
	Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi, rostedt,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

> Hi Kosaki-san,
>
> Thank you for your comment.
>
> KOSAKI Motohiro wrote:
>> Hi
>>
>>
>> In my 1st impression, this patch description is a bit strange.
>>
>>> The below patch adds instrumentation for pagecache.
>>>
>>> I thought it would be useful to trace pagecache behavior for problem
>>> analysis (performance bottlenecks, behavior differences between stable
>>> time and trouble time).
>>>
>>> By using those tracepoints, we can describe and visualize pagecache
>>> transition (file-by-file basis) in kernel and  pagecache
>>> consumes most of the memory in running system and pagecache hit rate
>>> and writeback behavior will influence system load and performance.
>>
>> Why do you think this tracepoint describe pagecache hit rate?
>> and, why describe writeback behavior?
>
> I mean, we can describe file-by-file basis pagecache usage by using
> these tracepoints and it is important for analyzing process I/O behavior.

More confusing.
Your page cache tracepoint don't have any per-process information.


> Currently, we can understand the amount of pagecache from "Cached"
> in /proc/meminfo. So I'd like to understand which files are using pagecache.

There is one meta question, Why do you think file-by-file pagecache
infomartion is valueable?


>>> I attached an example which is visualization of pagecache status using
>>> SystemTap.
>>
>> it seems no attached. and SystemTap isn't used kernel developer at all.
>> I don't think it's enough explanation.
>> Can you make seekwatcher liked completed comsumer program?
>> (if you don't know seekwatcher, see http://oss.oracle.com/~mason/seekwatcher/)
>
> I understand a tracer using these tracepoints need to be implemented.
> What I want to do is counting pagecache per file. We can retrieve inode
> from mapping and count pagecache per inode in these tracepoints.
>
>
>>> That graph describes pagecache transition of File A and File B
>>> on a file-by-file basis with the situation where regular I/O to File A
>>> is delayed because of other I/O to File B.
>>
>> If you want to see I/O activity, you need to add tracepoint into block layer.
>
> I think tracking pagecache is useful for understanding process I/O activity,
> because whether process I/O completes by accessing memory or HDD is determined by
> accessed files on pagecache or not.

I don't know your opinion is right or not.
However, your opinion don't get consensus yet.

Perhaps, you need to make demonstrate programs, I think.


>> And, both function is freqentlly called one.
>> I worry about performance issue. can you prove no degression?
>
> I will try to probe that.

Perhaps, this is not needed yet.
Generally, worthless patch is never merged although it's no performance penalty.
So, I think you should explain the patch worth at first :)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-19 13:12         ` KOSAKI Motohiro
@ 2009-02-19 14:21           ` Lee Schermerhorn
  2009-02-20  1:13             ` KOSAKI Motohiro
  0 siblings, 1 reply; 13+ messages in thread
From: Lee Schermerhorn @ 2009-02-19 14:21 UTC (permalink / raw)
  To: KOSAKI Motohiro
  Cc: Atsushi Tsuji, Peter Zijlstra, linux-kernel, Jason Baron,
	Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi,
	rostedt, linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

On Thu, 2009-02-19 at 22:12 +0900, KOSAKI Motohiro wrote:
> > Hi Kosaki-san,
> >
> > Thank you for your comment.
> >
> > KOSAKI Motohiro wrote:
> >> Hi
> >>
> >>
> >> In my 1st impression, this patch description is a bit strange.
> >>
> >>> The below patch adds instrumentation for pagecache.
> >>>
> >>> I thought it would be useful to trace pagecache behavior for problem
> >>> analysis (performance bottlenecks, behavior differences between stable
> >>> time and trouble time).
> >>>
> >>> By using those tracepoints, we can describe and visualize pagecache
> >>> transition (file-by-file basis) in kernel and  pagecache
> >>> consumes most of the memory in running system and pagecache hit rate
> >>> and writeback behavior will influence system load and performance.
> >>
> >> Why do you think this tracepoint describe pagecache hit rate?
> >> and, why describe writeback behavior?
> >
> > I mean, we can describe file-by-file basis pagecache usage by using
> > these tracepoints and it is important for analyzing process I/O behavior.
> 
> More confusing.
> Your page cache tracepoint don't have any per-process information.
> 
> 
> > Currently, we can understand the amount of pagecache from "Cached"
> > in /proc/meminfo. So I'd like to understand which files are using pagecache.
> 
> There is one meta question, Why do you think file-by-file pagecache
> infomartion is valueable?
> 

One might take a look at Marcello Tosatti's old 'vmtrace' patch.  It
contains it's own data store/transport via relayfs, but the trace points
could be ported to the current kernel tracing infrastructure.

Here's a starting point:   http://linux-mm.org/VmTrace

Quoting from that page:

>From the previous email to linux-mm:
>"The sequence of pages which a given process or workload accesses
>during its lifetime, a.k.a. "reference trace", is very important
>information. It has been used in the past for comparison of page
>replacement algorithms and other optimizations..."

Lee

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-17 15:24     ` Steven Rostedt
@ 2009-02-19 18:51       ` Christoph Hellwig
  0 siblings, 0 replies; 13+ messages in thread
From: Christoph Hellwig @ 2009-02-19 18:51 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Atsushi Tsuji, Peter Zijlstra, linux-kernel, Jason Baron,
	Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler, Kazuto Miyoshi,
	linux-mm, Andrew Morton, Nick Piggin, Hugh Dickins

On Tue, Feb 17, 2009 at 10:24:46AM -0500, Steven Rostedt wrote:
> 
> On Tue, 17 Feb 2009, Atsushi Tsuji wrote:
> > > 
> > > This is rather asymmetric, why don't we care about the offset for the
> > > removed page?
> > > 
> > 
> > Indeed.
> > I added the offset to the argument for the removed page and resend fixed patch.
> > 
> > Signed-off-by: Atsushi Tsuji <a-tsuji@bk.jp.nec.com>
> 
> Could you package it up in one patch again and resend with [PATCH v2].
> Also make sure to Cc the memory folks, and ask for an Acked-by from them.

Well, until we actually get a consumer of those tracepoints, e.g. a
ftrace pluging into the tree strong NACK for me.

(p.s. I really don't get it why people keep trying to push dead code
 into the tree)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH] Add tracepoints to track pagecache transition
  2009-02-19 14:21           ` Lee Schermerhorn
@ 2009-02-20  1:13             ` KOSAKI Motohiro
  0 siblings, 0 replies; 13+ messages in thread
From: KOSAKI Motohiro @ 2009-02-20  1:13 UTC (permalink / raw)
  To: Lee Schermerhorn
  Cc: kosaki.motohiro, Atsushi Tsuji, Peter Zijlstra, linux-kernel,
	Jason Baron, Ingo Molnar, Mathieu Desnoyers, Frank Ch. Eigler,
	Kazuto Miyoshi, rostedt, linux-mm, Andrew Morton, Nick Piggin,
	Hugh Dickins

Hi

> > > Currently, we can understand the amount of pagecache from "Cached"
> > > in /proc/meminfo. So I'd like to understand which files are using pagecache.
> > 
> > There is one meta question, Why do you think file-by-file pagecache
> > infomartion is valueable?
> 
> One might take a look at Marcello Tosatti's old 'vmtrace' patch.  It
> contains it's own data store/transport via relayfs, but the trace points
> could be ported to the current kernel tracing infrastructure.
> 
> Here's a starting point:   http://linux-mm.org/VmTrace
> 
> Quoting from that page:
> 
> >From the previous email to linux-mm:
> >"The sequence of pages which a given process or workload accesses
> >during its lifetime, a.k.a. "reference trace", is very important
> >information. It has been used in the past for comparison of page
> >replacement algorithms and other optimizations..."

Sure.
but strong difference exist.

vmtrace
  - can run standalone
  - reviewer can confirm its output result is useful or not.

Christoph also explained reason more kindly.
I think we need useful consumer. 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2009-02-20  1:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <499A7CAD.9030409@bk.jp.nec.com>
2009-02-17  9:33 ` [PATCH] Add tracepoints to track pagecache transition Peter Zijlstra
2009-02-17 11:04   ` Atsushi Tsuji
2009-02-17 11:38     ` KOSAKI Motohiro
2009-02-17 11:54       ` Peter Zijlstra
2009-02-17 12:33         ` KOSAKI Motohiro
2009-02-17 14:33       ` Frederic Weisbecker
2009-02-18  0:29         ` KOSAKI Motohiro
2009-02-19  4:41       ` Atsushi Tsuji
2009-02-19 13:12         ` KOSAKI Motohiro
2009-02-19 14:21           ` Lee Schermerhorn
2009-02-20  1:13             ` KOSAKI Motohiro
2009-02-17 15:24     ` Steven Rostedt
2009-02-19 18:51       ` Christoph Hellwig

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox