linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm: vmscan: add tracepoints for node reclaim
@ 2019-02-28  8:14 Yafang Shao
  2019-02-28  8:59 ` Souptick Joarder
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: Yafang Shao @ 2019-02-28  8:14 UTC (permalink / raw)
  To: akpm, mhocko, ktkhai, broonie, hannes; +Cc: linux-mm, shaoyafang, Yafang Shao

In the page alloc fast path, it may do node reclaim, which may cause
latency spike.
We should add tracepoint for this event, and also mesure the latency
it causes.

So bellow two tracepoints are introduced,
	mm_vmscan_node_reclaim_begin
	mm_vmscan_node_reclaim_end

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
---
 include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
 mm/vmscan.c                   | 13 +++++++++++-
 2 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
index a1cb913..9310d5b 100644
--- a/include/trace/events/vmscan.h
+++ b/include/trace/events/vmscan.h
@@ -465,6 +465,54 @@
 		__entry->ratio,
 		show_reclaim_flags(__entry->reclaim_flags))
 );
+
+TRACE_EVENT(mm_vmscan_node_reclaim_begin,
+
+	TP_PROTO(int nid, int order, int may_writepage,
+		gfp_t gfp_flags, int zid),
+
+	TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
+
+	TP_STRUCT__entry(
+		__field(int, nid)
+		__field(int, order)
+		__field(int, may_writepage)
+		__field(gfp_t, gfp_flags)
+		__field(int, zid)
+	),
+
+	TP_fast_assign(
+		__entry->nid = nid;
+		__entry->order = order;
+		__entry->may_writepage = may_writepage;
+		__entry->gfp_flags = gfp_flags;
+		__entry->zid = zid;
+	),
+
+	TP_printk("nid=%d zid=%d order=%d may_writepage=%d gfp_flags=%s",
+		__entry->nid,
+		__entry->zid,
+		__entry->order,
+		__entry->may_writepage,
+		show_gfp_flags(__entry->gfp_flags))
+);
+
+TRACE_EVENT(mm_vmscan_node_reclaim_end,
+
+	TP_PROTO(int result),
+
+	TP_ARGS(result),
+
+	TP_STRUCT__entry(
+		__field(int, result)
+	),
+
+	TP_fast_assign(
+		__entry->result = result;
+	),
+
+	TP_printk("result=%d", __entry->result)
+);
 #endif /* _TRACE_VMSCAN_H */
 
 /* This part must be outside protection */
diff --git a/mm/vmscan.c b/mm/vmscan.c
index ac4806f..01a0401 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -4240,6 +4240,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 		.may_swap = 1,
 		.reclaim_idx = gfp_zone(gfp_mask),
 	};
+	int result;
+
+	trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
+					sc.may_writepage,
+					sc.gfp_mask,
+					sc.reclaim_idx);
 
 	cond_resched();
 	fs_reclaim_acquire(sc.gfp_mask);
@@ -4267,7 +4273,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
 	current->flags &= ~PF_SWAPWRITE;
 	memalloc_noreclaim_restore(noreclaim_flag);
 	fs_reclaim_release(sc.gfp_mask);
-	return sc.nr_reclaimed >= nr_pages;
+
+	result = sc.nr_reclaimed >= nr_pages;
+
+	trace_mm_vmscan_node_reclaim_end(result);
+
+	return result;
 }
 
 int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
-- 
1.8.3.1


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28  8:14 [PATCH] mm: vmscan: add tracepoints for node reclaim Yafang Shao
@ 2019-02-28  8:59 ` Souptick Joarder
  2019-02-28  9:35   ` Yafang Shao
  2019-02-28 10:17 ` Michal Hocko
  2019-02-28 10:21 ` Vlastimil Babka
  2 siblings, 1 reply; 10+ messages in thread
From: Souptick Joarder @ 2019-02-28  8:59 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Michal Hocko, ktkhai, broonie, hannes, Linux-MM,
	shaoyafang

On Thu, Feb 28, 2019 at 1:44 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> In the page alloc fast path, it may do node reclaim, which may cause
> latency spike.
> We should add tracepoint for this event, and also mesure the latency
> it causes.

Minor typo : mesure ->measure.

>
> So bellow two tracepoints are introduced,
>         mm_vmscan_node_reclaim_begin
>         mm_vmscan_node_reclaim_end
>
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
>  mm/vmscan.c                   | 13 +++++++++++-
>  2 files changed, 60 insertions(+), 1 deletion(-)
>
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index a1cb913..9310d5b 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -465,6 +465,54 @@
>                 __entry->ratio,
>                 show_reclaim_flags(__entry->reclaim_flags))
>  );
> +
> +TRACE_EVENT(mm_vmscan_node_reclaim_begin,
> +
> +       TP_PROTO(int nid, int order, int may_writepage,
> +               gfp_t gfp_flags, int zid),
> +
> +       TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
> +
> +       TP_STRUCT__entry(
> +               __field(int, nid)
> +               __field(int, order)
> +               __field(int, may_writepage)
> +               __field(gfp_t, gfp_flags)
> +               __field(int, zid)
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->nid = nid;
> +               __entry->order = order;
> +               __entry->may_writepage = may_writepage;
> +               __entry->gfp_flags = gfp_flags;
> +               __entry->zid = zid;
> +       ),
> +
> +       TP_printk("nid=%d zid=%d order=%d may_writepage=%d gfp_flags=%s",
> +               __entry->nid,
> +               __entry->zid,
> +               __entry->order,
> +               __entry->may_writepage,
> +               show_gfp_flags(__entry->gfp_flags))
> +);
> +
> +TRACE_EVENT(mm_vmscan_node_reclaim_end,
> +
> +       TP_PROTO(int result),
> +
> +       TP_ARGS(result),
> +
> +       TP_STRUCT__entry(
> +               __field(int, result)
> +       ),
> +
> +       TP_fast_assign(
> +               __entry->result = result;
> +       ),
> +
> +       TP_printk("result=%d", __entry->result)
> +);
>  #endif /* _TRACE_VMSCAN_H */
>
>  /* This part must be outside protection */
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ac4806f..01a0401 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4240,6 +4240,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>                 .may_swap = 1,
>                 .reclaim_idx = gfp_zone(gfp_mask),
>         };
> +       int result;

If it goes to v2, then
s/result/ret ?

> +
> +       trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> +                                       sc.may_writepage,
> +                                       sc.gfp_mask,
> +                                       sc.reclaim_idx);
>
>         cond_resched();
>         fs_reclaim_acquire(sc.gfp_mask);
> @@ -4267,7 +4273,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>         current->flags &= ~PF_SWAPWRITE;
>         memalloc_noreclaim_restore(noreclaim_flag);
>         fs_reclaim_release(sc.gfp_mask);
> -       return sc.nr_reclaimed >= nr_pages;
> +
> +       result = sc.nr_reclaimed >= nr_pages;
> +
> +       trace_mm_vmscan_node_reclaim_end(result);
> +
> +       return result;
>  }
>
>  int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
> --
> 1.8.3.1
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28  8:59 ` Souptick Joarder
@ 2019-02-28  9:35   ` Yafang Shao
  0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2019-02-28  9:35 UTC (permalink / raw)
  To: Souptick Joarder
  Cc: Andrew Morton, Michal Hocko, ktkhai, broonie, Johannes Weiner,
	Linux-MM, shaoyafang

On Thu, Feb 28, 2019 at 4:59 PM Souptick Joarder <jrdr.linux@gmail.com> wrote:
>
> On Thu, Feb 28, 2019 at 1:44 PM Yafang Shao <laoar.shao@gmail.com> wrote:
> >
> > In the page alloc fast path, it may do node reclaim, which may cause
> > latency spike.
> > We should add tracepoint for this event, and also mesure the latency
> > it causes.
>
> Minor typo : mesure ->measure.
>

Thanks for your correction.

> >
> > So bellow two tracepoints are introduced,
> >         mm_vmscan_node_reclaim_begin
> >         mm_vmscan_node_reclaim_end
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
> >  mm/vmscan.c                   | 13 +++++++++++-
> >  2 files changed, 60 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> > index a1cb913..9310d5b 100644
> > --- a/include/trace/events/vmscan.h
> > +++ b/include/trace/events/vmscan.h
> > @@ -465,6 +465,54 @@
> >                 __entry->ratio,
> >                 show_reclaim_flags(__entry->reclaim_flags))
> >  );
> > +
> > +TRACE_EVENT(mm_vmscan_node_reclaim_begin,
> > +
> > +       TP_PROTO(int nid, int order, int may_writepage,
> > +               gfp_t gfp_flags, int zid),
> > +
> > +       TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
> > +
> > +       TP_STRUCT__entry(
> > +               __field(int, nid)
> > +               __field(int, order)
> > +               __field(int, may_writepage)
> > +               __field(gfp_t, gfp_flags)
> > +               __field(int, zid)
> > +       ),
> > +
> > +       TP_fast_assign(
> > +               __entry->nid = nid;
> > +               __entry->order = order;
> > +               __entry->may_writepage = may_writepage;
> > +               __entry->gfp_flags = gfp_flags;
> > +               __entry->zid = zid;
> > +       ),
> > +
> > +       TP_printk("nid=%d zid=%d order=%d may_writepage=%d gfp_flags=%s",
> > +               __entry->nid,
> > +               __entry->zid,
> > +               __entry->order,
> > +               __entry->may_writepage,
> > +               show_gfp_flags(__entry->gfp_flags))
> > +);
> > +
> > +TRACE_EVENT(mm_vmscan_node_reclaim_end,
> > +
> > +       TP_PROTO(int result),
> > +
> > +       TP_ARGS(result),
> > +
> > +       TP_STRUCT__entry(
> > +               __field(int, result)
> > +       ),
> > +
> > +       TP_fast_assign(
> > +               __entry->result = result;
> > +       ),
> > +
> > +       TP_printk("result=%d", __entry->result)
> > +);
> >  #endif /* _TRACE_VMSCAN_H */
> >
> >  /* This part must be outside protection */
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index ac4806f..01a0401 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4240,6 +4240,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >                 .may_swap = 1,
> >                 .reclaim_idx = gfp_zone(gfp_mask),
> >         };
> > +       int result;
>
> If it goes to v2, then
> s/result/ret ?
>

Sure. Will change it.

> > +
> > +       trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > +                                       sc.may_writepage,
> > +                                       sc.gfp_mask,
> > +                                       sc.reclaim_idx);
> >
> >         cond_resched();
> >         fs_reclaim_acquire(sc.gfp_mask);
> > @@ -4267,7 +4273,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >         current->flags &= ~PF_SWAPWRITE;
> >         memalloc_noreclaim_restore(noreclaim_flag);
> >         fs_reclaim_release(sc.gfp_mask);
> > -       return sc.nr_reclaimed >= nr_pages;
> > +
> > +       result = sc.nr_reclaimed >= nr_pages;
> > +
> > +       trace_mm_vmscan_node_reclaim_end(result);
> > +
> > +       return result;
> >  }
> >
> >  int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
> > --
> > 1.8.3.1
> >

Thanks
Yafang


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28  8:14 [PATCH] mm: vmscan: add tracepoints for node reclaim Yafang Shao
  2019-02-28  8:59 ` Souptick Joarder
@ 2019-02-28 10:17 ` Michal Hocko
  2019-02-28 10:20   ` Yafang Shao
  2019-02-28 10:21 ` Vlastimil Babka
  2 siblings, 1 reply; 10+ messages in thread
From: Michal Hocko @ 2019-02-28 10:17 UTC (permalink / raw)
  To: Yafang Shao; +Cc: akpm, ktkhai, broonie, hannes, linux-mm, shaoyafang

On Thu 28-02-19 16:14:24, Yafang Shao wrote:
> In the page alloc fast path, it may do node reclaim, which may cause
> latency spike.
> We should add tracepoint for this event, and also mesure the latency
> it causes.
> 
> So bellow two tracepoints are introduced,
> 	mm_vmscan_node_reclaim_begin
> 	mm_vmscan_node_reclaim_end

This makes some sense to me. Regular direct reclaim already does have
similar tracepoints. Is there any reason you haven't used
mm_vmscan_direct_reclaim_{begin,end}_template as all other direct reclaim
paths?

> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
>  mm/vmscan.c                   | 13 +++++++++++-
>  2 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index a1cb913..9310d5b 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -465,6 +465,54 @@
>  		__entry->ratio,
>  		show_reclaim_flags(__entry->reclaim_flags))
>  );
> +
> +TRACE_EVENT(mm_vmscan_node_reclaim_begin,
> +
> +	TP_PROTO(int nid, int order, int may_writepage,
> +		gfp_t gfp_flags, int zid),
> +
> +	TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
> +
> +	TP_STRUCT__entry(
> +		__field(int, nid)
> +		__field(int, order)
> +		__field(int, may_writepage)
> +		__field(gfp_t, gfp_flags)
> +		__field(int, zid)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->nid = nid;
> +		__entry->order = order;
> +		__entry->may_writepage = may_writepage;
> +		__entry->gfp_flags = gfp_flags;
> +		__entry->zid = zid;
> +	),
> +
> +	TP_printk("nid=%d zid=%d order=%d may_writepage=%d gfp_flags=%s",
> +		__entry->nid,
> +		__entry->zid,
> +		__entry->order,
> +		__entry->may_writepage,
> +		show_gfp_flags(__entry->gfp_flags))
> +);
> +
> +TRACE_EVENT(mm_vmscan_node_reclaim_end,
> +
> +	TP_PROTO(int result),
> +
> +	TP_ARGS(result),
> +
> +	TP_STRUCT__entry(
> +		__field(int, result)
> +	),
> +
> +	TP_fast_assign(
> +		__entry->result = result;
> +	),
> +
> +	TP_printk("result=%d", __entry->result)
> +);
>  #endif /* _TRACE_VMSCAN_H */
>  
>  /* This part must be outside protection */
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ac4806f..01a0401 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4240,6 +4240,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>  		.may_swap = 1,
>  		.reclaim_idx = gfp_zone(gfp_mask),
>  	};
> +	int result;
> +
> +	trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> +					sc.may_writepage,
> +					sc.gfp_mask,
> +					sc.reclaim_idx);
>  
>  	cond_resched();
>  	fs_reclaim_acquire(sc.gfp_mask);
> @@ -4267,7 +4273,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>  	current->flags &= ~PF_SWAPWRITE;
>  	memalloc_noreclaim_restore(noreclaim_flag);
>  	fs_reclaim_release(sc.gfp_mask);
> -	return sc.nr_reclaimed >= nr_pages;
> +
> +	result = sc.nr_reclaimed >= nr_pages;
> +
> +	trace_mm_vmscan_node_reclaim_end(result);
> +
> +	return result;
>  }
>  
>  int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
> -- 
> 1.8.3.1

-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28 10:17 ` Michal Hocko
@ 2019-02-28 10:20   ` Yafang Shao
  2019-02-28 10:28     ` Michal Hocko
  0 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2019-02-28 10:20 UTC (permalink / raw)
  To: Michal Hocko
  Cc: Andrew Morton, ktkhai, broonie, Johannes Weiner, Linux MM, shaoyafang

On Thu, Feb 28, 2019 at 6:17 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 28-02-19 16:14:24, Yafang Shao wrote:
> > In the page alloc fast path, it may do node reclaim, which may cause
> > latency spike.
> > We should add tracepoint for this event, and also mesure the latency
> > it causes.
> >
> > So bellow two tracepoints are introduced,
> >       mm_vmscan_node_reclaim_begin
> >       mm_vmscan_node_reclaim_end
>
> This makes some sense to me. Regular direct reclaim already does have
> similar tracepoints. Is there any reason you haven't used
> mm_vmscan_direct_reclaim_{begin,end}_template as all other direct reclaim
> paths?
>

Because I also want to know the node id, which is not show in
mm_vmscan_direct_reclaim_{begin,end}_template.

Or should we modify mm_vmscan_direct_reclaim_{begin,end}_template to
show the node id as well ?

Thanks
Yafang

> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
> >  mm/vmscan.c                   | 13 +++++++++++-
> >  2 files changed, 60 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> > index a1cb913..9310d5b 100644
> > --- a/include/trace/events/vmscan.h
> > +++ b/include/trace/events/vmscan.h
> > @@ -465,6 +465,54 @@
> >               __entry->ratio,
> >               show_reclaim_flags(__entry->reclaim_flags))
> >  );
> > +
> > +TRACE_EVENT(mm_vmscan_node_reclaim_begin,
> > +
> > +     TP_PROTO(int nid, int order, int may_writepage,
> > +             gfp_t gfp_flags, int zid),
> > +
> > +     TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
> > +
> > +     TP_STRUCT__entry(
> > +             __field(int, nid)
> > +             __field(int, order)
> > +             __field(int, may_writepage)
> > +             __field(gfp_t, gfp_flags)
> > +             __field(int, zid)
> > +     ),
> > +
> > +     TP_fast_assign(
> > +             __entry->nid = nid;
> > +             __entry->order = order;
> > +             __entry->may_writepage = may_writepage;
> > +             __entry->gfp_flags = gfp_flags;
> > +             __entry->zid = zid;
> > +     ),
> > +
> > +     TP_printk("nid=%d zid=%d order=%d may_writepage=%d gfp_flags=%s",
> > +             __entry->nid,
> > +             __entry->zid,
> > +             __entry->order,
> > +             __entry->may_writepage,
> > +             show_gfp_flags(__entry->gfp_flags))
> > +);
> > +
> > +TRACE_EVENT(mm_vmscan_node_reclaim_end,
> > +
> > +     TP_PROTO(int result),
> > +
> > +     TP_ARGS(result),
> > +
> > +     TP_STRUCT__entry(
> > +             __field(int, result)
> > +     ),
> > +
> > +     TP_fast_assign(
> > +             __entry->result = result;
> > +     ),
> > +
> > +     TP_printk("result=%d", __entry->result)
> > +);
> >  #endif /* _TRACE_VMSCAN_H */
> >
> >  /* This part must be outside protection */
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index ac4806f..01a0401 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4240,6 +4240,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >               .may_swap = 1,
> >               .reclaim_idx = gfp_zone(gfp_mask),
> >       };
> > +     int result;
> > +
> > +     trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > +                                     sc.may_writepage,
> > +                                     sc.gfp_mask,
> > +                                     sc.reclaim_idx);
> >
> >       cond_resched();
> >       fs_reclaim_acquire(sc.gfp_mask);
> > @@ -4267,7 +4273,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >       current->flags &= ~PF_SWAPWRITE;
> >       memalloc_noreclaim_restore(noreclaim_flag);
> >       fs_reclaim_release(sc.gfp_mask);
> > -     return sc.nr_reclaimed >= nr_pages;
> > +
> > +     result = sc.nr_reclaimed >= nr_pages;
> > +
> > +     trace_mm_vmscan_node_reclaim_end(result);
> > +
> > +     return result;
> >  }
> >
> >  int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
> > --
> > 1.8.3.1
>
> --
> Michal Hocko
> SUSE Labs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28  8:14 [PATCH] mm: vmscan: add tracepoints for node reclaim Yafang Shao
  2019-02-28  8:59 ` Souptick Joarder
  2019-02-28 10:17 ` Michal Hocko
@ 2019-02-28 10:21 ` Vlastimil Babka
  2019-02-28 10:34   ` Yafang Shao
  2 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-02-28 10:21 UTC (permalink / raw)
  To: Yafang Shao, akpm, mhocko, ktkhai, broonie, hannes; +Cc: linux-mm, shaoyafang

On 2/28/19 9:14 AM, Yafang Shao wrote:
> In the page alloc fast path, it may do node reclaim, which may cause
> latency spike.
> We should add tracepoint for this event, and also mesure the latency
> it causes.
> 
> So bellow two tracepoints are introduced,
> 	mm_vmscan_node_reclaim_begin
> 	mm_vmscan_node_reclaim_end
> 
> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> ---
>  include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
>  mm/vmscan.c                   | 13 +++++++++++-
>  2 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> index a1cb913..9310d5b 100644
> --- a/include/trace/events/vmscan.h
> +++ b/include/trace/events/vmscan.h
> @@ -465,6 +465,54 @@
>  		__entry->ratio,
>  		show_reclaim_flags(__entry->reclaim_flags))
>  );
> +
> +TRACE_EVENT(mm_vmscan_node_reclaim_begin,
> +
> +	TP_PROTO(int nid, int order, int may_writepage,
> +		gfp_t gfp_flags, int zid),
> +
> +	TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
> +
> +	TP_STRUCT__entry(
> +		__field(int, nid)
> +		__field(int, order)
> +		__field(int, may_writepage)

For node reclaim may_writepage is statically set in node_reclaim_mode,
so I'm not sure it's worth including it.

> +		__field(gfp_t, gfp_flags)
> +		__field(int, zid)

zid seems wasteful and misleading as it's simply derived by
gfp_zone(gfp_mask), so I would drop it.

> +	),
> +
> +	TP_fast_assign(
> +		__entry->nid = nid;
> +		__entry->order = order;
> +		__entry->may_writepage = may_writepage;
> +		__entry->gfp_flags = gfp_flags;
> +		__entry->zid = zid;
> +	),
> +
> +	TP_printk("nid=%d zid=%d order=%d may_writepage=%d gfp_flags=%s",
> +		__entry->nid,
> +		__entry->zid,
> +		__entry->order,
> +		__entry->may_writepage,
> +		show_gfp_flags(__entry->gfp_flags))
> +);
> +
> +TRACE_EVENT(mm_vmscan_node_reclaim_end,
> +
> +	TP_PROTO(int result),
> +
> +	TP_ARGS(result),
> +
> +	TP_STRUCT__entry(
> +		__field(int, result)

Reporting sc.nr_reclaimed sounds more useful and in line with other
reclaim tracepoints. Result (sc.nr_reclaimed >= nr_pages) can then be
derived by postprocessing as the beginning tracepoint contains 'order'
thus we know nr_pages?

> +	),
> +
> +	TP_fast_assign(
> +		__entry->result = result;
> +	),
> +
> +	TP_printk("result=%d", __entry->result)
> +);
>  #endif /* _TRACE_VMSCAN_H */
>  
>  /* This part must be outside protection */
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index ac4806f..01a0401 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -4240,6 +4240,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>  		.may_swap = 1,
>  		.reclaim_idx = gfp_zone(gfp_mask),
>  	};
> +	int result;
> +
> +	trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> +					sc.may_writepage,
> +					sc.gfp_mask,
> +					sc.reclaim_idx);
>  
>  	cond_resched();
>  	fs_reclaim_acquire(sc.gfp_mask);
> @@ -4267,7 +4273,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
>  	current->flags &= ~PF_SWAPWRITE;
>  	memalloc_noreclaim_restore(noreclaim_flag);
>  	fs_reclaim_release(sc.gfp_mask);
> -	return sc.nr_reclaimed >= nr_pages;
> +
> +	result = sc.nr_reclaimed >= nr_pages;
> +
> +	trace_mm_vmscan_node_reclaim_end(result);
> +
> +	return result;
>  }
>  
>  int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
> 


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28 10:20   ` Yafang Shao
@ 2019-02-28 10:28     ` Michal Hocko
  0 siblings, 0 replies; 10+ messages in thread
From: Michal Hocko @ 2019-02-28 10:28 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, ktkhai, broonie, Johannes Weiner, Linux MM, shaoyafang

On Thu 28-02-19 18:20:16, Yafang Shao wrote:
> On Thu, Feb 28, 2019 at 6:17 PM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > On Thu 28-02-19 16:14:24, Yafang Shao wrote:
> > > In the page alloc fast path, it may do node reclaim, which may cause
> > > latency spike.
> > > We should add tracepoint for this event, and also mesure the latency
> > > it causes.
> > >
> > > So bellow two tracepoints are introduced,
> > >       mm_vmscan_node_reclaim_begin
> > >       mm_vmscan_node_reclaim_end
> >
> > This makes some sense to me. Regular direct reclaim already does have
> > similar tracepoints. Is there any reason you haven't used
> > mm_vmscan_direct_reclaim_{begin,end}_template as all other direct reclaim
> > paths?
> >
> 
> Because I also want to know the node id, which is not show in
> mm_vmscan_direct_reclaim_{begin,end}_template.
> 
> Or should we modify mm_vmscan_direct_reclaim_{begin,end}_template to
> show the node id as well ?

OK, I see. I thought it was there but it would make much less sense than
for the node reclaim for sure. A separate tracepoint makes more sense
then.
-- 
Michal Hocko
SUSE Labs


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28 10:21 ` Vlastimil Babka
@ 2019-02-28 10:34   ` Yafang Shao
  2019-02-28 10:44     ` Vlastimil Babka
  0 siblings, 1 reply; 10+ messages in thread
From: Yafang Shao @ 2019-02-28 10:34 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, ktkhai, broonie, Johannes Weiner,
	Linux MM, shaoyafang

On Thu, Feb 28, 2019 at 6:21 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/28/19 9:14 AM, Yafang Shao wrote:
> > In the page alloc fast path, it may do node reclaim, which may cause
> > latency spike.
> > We should add tracepoint for this event, and also mesure the latency
> > it causes.
> >
> > So bellow two tracepoints are introduced,
> >       mm_vmscan_node_reclaim_begin
> >       mm_vmscan_node_reclaim_end
> >
> > Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> > ---
> >  include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
> >  mm/vmscan.c                   | 13 +++++++++++-
> >  2 files changed, 60 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> > index a1cb913..9310d5b 100644
> > --- a/include/trace/events/vmscan.h
> > +++ b/include/trace/events/vmscan.h
> > @@ -465,6 +465,54 @@
> >               __entry->ratio,
> >               show_reclaim_flags(__entry->reclaim_flags))
> >  );
> > +
> > +TRACE_EVENT(mm_vmscan_node_reclaim_begin,
> > +
> > +     TP_PROTO(int nid, int order, int may_writepage,
> > +             gfp_t gfp_flags, int zid),
> > +
> > +     TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
> > +
> > +     TP_STRUCT__entry(
> > +             __field(int, nid)
> > +             __field(int, order)
> > +             __field(int, may_writepage)
>
> For node reclaim may_writepage is statically set in node_reclaim_mode,
> so I'm not sure it's worth including it.
>
> > +             __field(gfp_t, gfp_flags)
> > +             __field(int, zid)
>
> zid seems wasteful and misleading as it's simply derived by
> gfp_zone(gfp_mask), so I would drop it.
>

I agree with you that may_writepage and zid is wasteful, but I found
they are in other tracepoints in this file,
so I place them in this tracepoint as well.

Seems we'd better drop them from other tracepoints as well ?

> > +     ),
> > +
> > +     TP_fast_assign(
> > +             __entry->nid = nid;
> > +             __entry->order = order;
> > +             __entry->may_writepage = may_writepage;
> > +             __entry->gfp_flags = gfp_flags;
> > +             __entry->zid = zid;
> > +     ),
> > +
> > +     TP_printk("nid=%d zid=%d order=%d may_writepage=%d gfp_flags=%s",
> > +             __entry->nid,
> > +             __entry->zid,
> > +             __entry->order,
> > +             __entry->may_writepage,
> > +             show_gfp_flags(__entry->gfp_flags))
> > +);
> > +
> > +TRACE_EVENT(mm_vmscan_node_reclaim_end,
> > +
> > +     TP_PROTO(int result),
> > +
> > +     TP_ARGS(result),
> > +
> > +     TP_STRUCT__entry(
> > +             __field(int, result)
>
> Reporting sc.nr_reclaimed sounds more useful and in line with other
> reclaim tracepoints. Result (sc.nr_reclaimed >= nr_pages) can then be
> derived by postprocessing as the beginning tracepoint contains 'order'
> thus we know nr_pages?
>

Seems reasonable.
Will change it.

> > +     ),
> > +
> > +     TP_fast_assign(
> > +             __entry->result = result;
> > +     ),
> > +
> > +     TP_printk("result=%d", __entry->result)
> > +);
> >  #endif /* _TRACE_VMSCAN_H */
> >
> >  /* This part must be outside protection */
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index ac4806f..01a0401 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -4240,6 +4240,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >               .may_swap = 1,
> >               .reclaim_idx = gfp_zone(gfp_mask),
> >       };
> > +     int result;
> > +
> > +     trace_mm_vmscan_node_reclaim_begin(pgdat->node_id, order,
> > +                                     sc.may_writepage,
> > +                                     sc.gfp_mask,
> > +                                     sc.reclaim_idx);
> >
> >       cond_resched();
> >       fs_reclaim_acquire(sc.gfp_mask);
> > @@ -4267,7 +4273,12 @@ static int __node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned in
> >       current->flags &= ~PF_SWAPWRITE;
> >       memalloc_noreclaim_restore(noreclaim_flag);
> >       fs_reclaim_release(sc.gfp_mask);
> > -     return sc.nr_reclaimed >= nr_pages;
> > +
> > +     result = sc.nr_reclaimed >= nr_pages;
> > +
> > +     trace_mm_vmscan_node_reclaim_end(result);
> > +
> > +     return result;
> >  }
> >
> >  int node_reclaim(struct pglist_data *pgdat, gfp_t gfp_mask, unsigned int order)
> >
>

Thanks
Yafang


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28 10:34   ` Yafang Shao
@ 2019-02-28 10:44     ` Vlastimil Babka
  2019-02-28 10:48       ` Yafang Shao
  0 siblings, 1 reply; 10+ messages in thread
From: Vlastimil Babka @ 2019-02-28 10:44 UTC (permalink / raw)
  To: Yafang Shao
  Cc: Andrew Morton, Michal Hocko, ktkhai, broonie, Johannes Weiner,
	Linux MM, shaoyafang

On 2/28/19 11:34 AM, Yafang Shao wrote:
> On Thu, Feb 28, 2019 at 6:21 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>>
>> On 2/28/19 9:14 AM, Yafang Shao wrote:
>>> In the page alloc fast path, it may do node reclaim, which may cause
>>> latency spike.
>>> We should add tracepoint for this event, and also mesure the latency
>>> it causes.
>>>
>>> So bellow two tracepoints are introduced,
>>>       mm_vmscan_node_reclaim_begin
>>>       mm_vmscan_node_reclaim_end
>>>
>>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
>>> ---
>>>  include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
>>>  mm/vmscan.c                   | 13 +++++++++++-
>>>  2 files changed, 60 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
>>> index a1cb913..9310d5b 100644
>>> --- a/include/trace/events/vmscan.h
>>> +++ b/include/trace/events/vmscan.h
>>> @@ -465,6 +465,54 @@
>>>               __entry->ratio,
>>>               show_reclaim_flags(__entry->reclaim_flags))
>>>  );
>>> +
>>> +TRACE_EVENT(mm_vmscan_node_reclaim_begin,
>>> +
>>> +     TP_PROTO(int nid, int order, int may_writepage,
>>> +             gfp_t gfp_flags, int zid),
>>> +
>>> +     TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
>>> +
>>> +     TP_STRUCT__entry(
>>> +             __field(int, nid)
>>> +             __field(int, order)
>>> +             __field(int, may_writepage)
>>
>> For node reclaim may_writepage is statically set in node_reclaim_mode,
>> so I'm not sure it's worth including it.
>>
>>> +             __field(gfp_t, gfp_flags)
>>> +             __field(int, zid)
>>
>> zid seems wasteful and misleading as it's simply derived by
>> gfp_zone(gfp_mask), so I would drop it.
>>
> 
> I agree with you that may_writepage and zid is wasteful, but I found
> they are in other tracepoints in this file,
> so I place them in this tracepoint as well.

I see zid only in kswapd waking tracepoints? That's different kind of
event.

> Seems we'd better drop them from other tracepoints as well ?

Hmm seems may_writepage in other tracepoints depends on laptop_mode
which is also a static setting. do_try_to_free_pages() can override it
due to priority, but that doesn't affect the tracepoints. If they are to
be dropped, it would be a separate patch though.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] mm: vmscan: add tracepoints for node reclaim
  2019-02-28 10:44     ` Vlastimil Babka
@ 2019-02-28 10:48       ` Yafang Shao
  0 siblings, 0 replies; 10+ messages in thread
From: Yafang Shao @ 2019-02-28 10:48 UTC (permalink / raw)
  To: Vlastimil Babka
  Cc: Andrew Morton, Michal Hocko, ktkhai, broonie, Johannes Weiner,
	Linux MM, shaoyafang

On Thu, Feb 28, 2019 at 6:44 PM Vlastimil Babka <vbabka@suse.cz> wrote:
>
> On 2/28/19 11:34 AM, Yafang Shao wrote:
> > On Thu, Feb 28, 2019 at 6:21 PM Vlastimil Babka <vbabka@suse.cz> wrote:
> >>
> >> On 2/28/19 9:14 AM, Yafang Shao wrote:
> >>> In the page alloc fast path, it may do node reclaim, which may cause
> >>> latency spike.
> >>> We should add tracepoint for this event, and also mesure the latency
> >>> it causes.
> >>>
> >>> So bellow two tracepoints are introduced,
> >>>       mm_vmscan_node_reclaim_begin
> >>>       mm_vmscan_node_reclaim_end
> >>>
> >>> Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
> >>> ---
> >>>  include/trace/events/vmscan.h | 48 +++++++++++++++++++++++++++++++++++++++++++
> >>>  mm/vmscan.c                   | 13 +++++++++++-
> >>>  2 files changed, 60 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h
> >>> index a1cb913..9310d5b 100644
> >>> --- a/include/trace/events/vmscan.h
> >>> +++ b/include/trace/events/vmscan.h
> >>> @@ -465,6 +465,54 @@
> >>>               __entry->ratio,
> >>>               show_reclaim_flags(__entry->reclaim_flags))
> >>>  );
> >>> +
> >>> +TRACE_EVENT(mm_vmscan_node_reclaim_begin,
> >>> +
> >>> +     TP_PROTO(int nid, int order, int may_writepage,
> >>> +             gfp_t gfp_flags, int zid),
> >>> +
> >>> +     TP_ARGS(nid, order, may_writepage, gfp_flags, zid),
> >>> +
> >>> +     TP_STRUCT__entry(
> >>> +             __field(int, nid)
> >>> +             __field(int, order)
> >>> +             __field(int, may_writepage)
> >>
> >> For node reclaim may_writepage is statically set in node_reclaim_mode,
> >> so I'm not sure it's worth including it.
> >>
> >>> +             __field(gfp_t, gfp_flags)
> >>> +             __field(int, zid)
> >>
> >> zid seems wasteful and misleading as it's simply derived by
> >> gfp_zone(gfp_mask), so I would drop it.
> >>
> >
> > I agree with you that may_writepage and zid is wasteful, but I found
> > they are in other tracepoints in this file,
> > so I place them in this tracepoint as well.
>
> I see zid only in kswapd waking tracepoints? That's different kind of
> event.
>

Pls. see mm_vmscan_wakeup_kswapd and  classzone_idx in
mm_vmscan_direct_reclaim_begin_template.

mm_vmscan_direct_reclaim_begin_template:
    "order=%d may_writepage=%d gfp_flags=%s classzone_idx=%d"

mm_vmscan_wakeup_kswapd:
    "nid=%d zid=%d order=%d gfp_flags=%s"








> > Seems we'd better drop them from other tracepoints as well ?
>
> Hmm seems may_writepage in other tracepoints depends on laptop_mode
> which is also a static setting. do_try_to_free_pages() can override it
> due to priority, but that doesn't affect the tracepoints. If they are to
> be dropped, it would be a separate patch though.

OK.

Thanks
Yafang


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2019-02-28 10:49 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-28  8:14 [PATCH] mm: vmscan: add tracepoints for node reclaim Yafang Shao
2019-02-28  8:59 ` Souptick Joarder
2019-02-28  9:35   ` Yafang Shao
2019-02-28 10:17 ` Michal Hocko
2019-02-28 10:20   ` Yafang Shao
2019-02-28 10:28     ` Michal Hocko
2019-02-28 10:21 ` Vlastimil Babka
2019-02-28 10:34   ` Yafang Shao
2019-02-28 10:44     ` Vlastimil Babka
2019-02-28 10:48       ` Yafang Shao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox