* [PATCH 0/3] Add tracepoints for lowmem reserves, watermarks and totalreserve_pages
@ 2025-03-03 7:35 Martin Liu
2025-03-03 7:35 ` [PATCH 1/3] mm/page_alloc: Add trace event for per-zone watermark setup Martin Liu
` (2 more replies)
0 siblings, 3 replies; 6+ messages in thread
From: Martin Liu @ 2025-03-03 7:35 UTC (permalink / raw)
To: akpm, hannes; +Cc: linux-mm, surenb, minchan, Martin Liu
This patchset introduces tracepoints to track changes in the lowmem
reserves, watermarks and totalreserve_pages. This helps to track
the exact timing of such changes and understand their relation to
reclaim activities.
The tracepoints added are:
mm_setup_per_zone_lowmem_reserve
mm_setup_per_zone_wmarks
mm_calculate_totalreserve_pages
Martin Liu (3):
mm/page_alloc: Add trace event for per-zone watermark setup
mm/page_alloc: Add trace event for per-zone lowmem reserve setup
mm/page_alloc: Add trace event for totalreserve_pages calculation
include/trace/events/kmem.h | 78 +++++++++++++++++++++++++++++++++++++
mm/page_alloc.c | 4 ++
2 files changed, 82 insertions(+)
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/3] mm/page_alloc: Add trace event for per-zone watermark setup
2025-03-03 7:35 [PATCH 0/3] Add tracepoints for lowmem reserves, watermarks and totalreserve_pages Martin Liu
@ 2025-03-03 7:35 ` Martin Liu
2025-03-03 7:35 ` [PATCH 2/3] mm/page_alloc: Add trace event for per-zone lowmem reserve setup Martin Liu
2025-03-03 7:35 ` [PATCH 3/3] mm/page_alloc: Add trace event for totalreserve_pages calculation Martin Liu
2 siblings, 0 replies; 6+ messages in thread
From: Martin Liu @ 2025-03-03 7:35 UTC (permalink / raw)
To: akpm, hannes; +Cc: linux-mm, surenb, minchan, Martin Liu
This commit introduces the `mm_setup_per_zone_wmarks` trace event,
which provides detailed insights into the kernel's per-zone watermark
configuration, offering precise timing and the ability to correlate
watermark changes with specific kernel events.
While `/proc/zoneinfo` provides some information about zone watermarks,
this trace event offers:
1. The ability to link watermark changes to specific kernel events and
logic.
2. The ability to capture rapid or short-lived changes in watermarks
that may be missed by user-space polling
3. Diagnosing unexpected kswapd activity or excessive direct reclaim
triggered by rapidly changing watermarks.
Signed-off-by: Martin Liu <liumartin@google.com>
---
include/trace/events/kmem.h | 33 +++++++++++++++++++++++++++++++++
mm/page_alloc.c | 1 +
2 files changed, 34 insertions(+)
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index b37eb0a7060f..5fd392dae503 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -342,6 +342,39 @@ TRACE_EVENT(mm_alloc_contig_migrate_range_info,
__entry->nr_mapped)
);
+TRACE_EVENT(mm_setup_per_zone_wmarks,
+
+ TP_PROTO(struct zone *zone),
+
+ TP_ARGS(zone),
+
+ TP_STRUCT__entry(
+ __field(int, node_id)
+ __string(name, zone->name)
+ __field(unsigned long, watermark_min)
+ __field(unsigned long, watermark_low)
+ __field(unsigned long, watermark_high)
+ __field(unsigned long, watermark_promo)
+ ),
+
+ TP_fast_assign(
+ __entry->node_id = zone->zone_pgdat->node_id;
+ __assign_str(name);
+ __entry->watermark_min = zone->_watermark[WMARK_MIN];
+ __entry->watermark_low = zone->_watermark[WMARK_LOW];
+ __entry->watermark_high = zone->_watermark[WMARK_HIGH];
+ __entry->watermark_promo = zone->_watermark[WMARK_PROMO];
+ ),
+
+ TP_printk("node_id=%d zone name=%s watermark min=%lu low=%lu high=%lu promo=%lu",
+ __entry->node_id,
+ __get_str(name),
+ __entry->watermark_min,
+ __entry->watermark_low,
+ __entry->watermark_high,
+ __entry->watermark_promo)
+);
+
/*
* Required for uniquely and securely identifying mm in rss_stat tracepoint.
*/
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 579789600a3c..50893061db66 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5920,6 +5920,7 @@ static void __setup_per_zone_wmarks(void)
zone->_watermark[WMARK_LOW] = min_wmark_pages(zone) + tmp;
zone->_watermark[WMARK_HIGH] = low_wmark_pages(zone) + tmp;
zone->_watermark[WMARK_PROMO] = high_wmark_pages(zone) + tmp;
+ trace_mm_setup_per_zone_wmarks(zone);
spin_unlock_irqrestore(&zone->lock, flags);
}
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 2/3] mm/page_alloc: Add trace event for per-zone lowmem reserve setup
2025-03-03 7:35 [PATCH 0/3] Add tracepoints for lowmem reserves, watermarks and totalreserve_pages Martin Liu
2025-03-03 7:35 ` [PATCH 1/3] mm/page_alloc: Add trace event for per-zone watermark setup Martin Liu
@ 2025-03-03 7:35 ` Martin Liu
2025-03-03 8:18 ` Kalesh Singh
2025-03-03 7:35 ` [PATCH 3/3] mm/page_alloc: Add trace event for totalreserve_pages calculation Martin Liu
2 siblings, 1 reply; 6+ messages in thread
From: Martin Liu @ 2025-03-03 7:35 UTC (permalink / raw)
To: akpm, hannes; +Cc: linux-mm, surenb, minchan, Martin Liu
This commit introduces the `mm_setup_per_zone_lowmem_reserve` trace
event,which provides detailed insights into the kernel's per-zone lowmem
reserve configuration.
The trace event provides precise timestamps, allowing developers to
1. Correlate lowmem reserve changes with specific kernel events and
able to diagnose unexpected kswapd or direct reclaim behavior
triggered by dynamic changes in lowmem reserve.
2. know memory allocation failures that occur due to insufficient lowmem
reserve, by precisely correlating allocation attempts with reserve
adjustments.
Signed-off-by: Martin Liu <liumartin@google.com>
---
include/trace/events/kmem.h | 27 +++++++++++++++++++++++++++
mm/page_alloc.c | 2 ++
2 files changed, 29 insertions(+)
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 5fd392dae503..9623e68d4d26 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -375,6 +375,33 @@ TRACE_EVENT(mm_setup_per_zone_wmarks,
__entry->watermark_promo)
);
+TRACE_EVENT(mm_setup_per_zone_lowmem_reserve,
+
+ TP_PROTO(struct zone *zone, struct zone *upper_zone, long lowmem_reserve),
+
+ TP_ARGS(zone, upper_zone, lowmem_reserve),
+
+ TP_STRUCT__entry(
+ __field(int, node_id)
+ __string(name, zone->name)
+ __string(upper_name, upper_zone->name)
+ __field(long, lowmem_reserve)
+ ),
+
+ TP_fast_assign(
+ __entry->node_id = zone->zone_pgdat->node_id;
+ __assign_str(name);
+ __assign_str(upper_name);
+ __entry->lowmem_reserve = lowmem_reserve;
+ ),
+
+ TP_printk("node_id=%d zone name=%s upper_zone name=%s lowmem_reserve_pages=%ld",
+ __entry->node_id,
+ __get_str(name),
+ __get_str(upper_name),
+ __entry->lowmem_reserve)
+);
+
/*
* Required for uniquely and securely identifying mm in rss_stat tracepoint.
*/
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 50893061db66..48623a2bf1ac 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5857,6 +5857,8 @@ static void setup_per_zone_lowmem_reserve(void)
zone->lowmem_reserve[j] = 0;
else
zone->lowmem_reserve[j] = managed_pages / ratio;
+ trace_mm_setup_per_zone_lowmem_reserve(zone, upper_zone,
+ zone->lowmem_reserve[j]);
}
}
}
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 3/3] mm/page_alloc: Add trace event for totalreserve_pages calculation
2025-03-03 7:35 [PATCH 0/3] Add tracepoints for lowmem reserves, watermarks and totalreserve_pages Martin Liu
2025-03-03 7:35 ` [PATCH 1/3] mm/page_alloc: Add trace event for per-zone watermark setup Martin Liu
2025-03-03 7:35 ` [PATCH 2/3] mm/page_alloc: Add trace event for per-zone lowmem reserve setup Martin Liu
@ 2025-03-03 7:35 ` Martin Liu
2 siblings, 0 replies; 6+ messages in thread
From: Martin Liu @ 2025-03-03 7:35 UTC (permalink / raw)
To: akpm, hannes; +Cc: linux-mm, surenb, minchan, Martin Liu
This commit introduces a new trace event,
`mm_calculate_totalreserve_pages`, which reports the new reserve value
at the exact time when it takes effect.
The `totalreserve_pages` value represents the total amount of memory
reserved across all zones and nodes in the system. This reserved memory
is crucial for ensuring that critical kernel operations have access to
sufficient memory, even under memory pressure.
By tracing the `totalreserve_pages` value, developers can gain insights
that how the total reserved memory changes over time.
Signed-off-by: Martin Liu <liumartin@google.com>
---
include/trace/events/kmem.h | 18 ++++++++++++++++++
mm/page_alloc.c | 1 +
2 files changed, 19 insertions(+)
diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
index 9623e68d4d26..f74925a6cf69 100644
--- a/include/trace/events/kmem.h
+++ b/include/trace/events/kmem.h
@@ -402,6 +402,24 @@ TRACE_EVENT(mm_setup_per_zone_lowmem_reserve,
__entry->lowmem_reserve)
);
+TRACE_EVENT(mm_calculate_totalreserve_pages,
+
+ TP_PROTO(unsigned long totalreserve_pages),
+
+ TP_ARGS(totalreserve_pages),
+
+ TP_STRUCT__entry(
+ __field(unsigned long, totalreserve_pages)
+ ),
+
+ TP_fast_assign(
+ __entry->totalreserve_pages = totalreserve_pages;
+ ),
+
+ TP_printk("totalreserve_pages=%lu", __entry->totalreserve_pages)
+);
+
+
/*
* Required for uniquely and securely identifying mm in rss_stat tracepoint.
*/
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 48623a2bf1ac..dbe19b0ffb46 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5827,6 +5827,7 @@ static void calculate_totalreserve_pages(void)
}
}
totalreserve_pages = reserve_pages;
+ trace_mm_calculate_totalreserve_pages(totalreserve_pages);
}
/*
--
2.48.1.711.g2feabab25a-goog
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] mm/page_alloc: Add trace event for per-zone lowmem reserve setup
2025-03-03 7:35 ` [PATCH 2/3] mm/page_alloc: Add trace event for per-zone lowmem reserve setup Martin Liu
@ 2025-03-03 8:18 ` Kalesh Singh
2025-03-03 10:11 ` Martin Liu
0 siblings, 1 reply; 6+ messages in thread
From: Kalesh Singh @ 2025-03-03 8:18 UTC (permalink / raw)
To: Martin Liu; +Cc: akpm, hannes, linux-mm, surenb, minchan
On Sun, Mar 2, 2025 at 11:36 PM Martin Liu <liumartin@google.com> wrote:
>
> This commit introduces the `mm_setup_per_zone_lowmem_reserve` trace
> event,which provides detailed insights into the kernel's per-zone lowmem
> reserve configuration.
>
> The trace event provides precise timestamps, allowing developers to
>
> 1. Correlate lowmem reserve changes with specific kernel events and
> able to diagnose unexpected kswapd or direct reclaim behavior
> triggered by dynamic changes in lowmem reserve.
>
> 2. know memory allocation failures that occur due to insufficient lowmem
> reserve, by precisely correlating allocation attempts with reserve
> adjustments.
>
> Signed-off-by: Martin Liu <liumartin@google.com>
> ---
> include/trace/events/kmem.h | 27 +++++++++++++++++++++++++++
> mm/page_alloc.c | 2 ++
> 2 files changed, 29 insertions(+)
>
> diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> index 5fd392dae503..9623e68d4d26 100644
> --- a/include/trace/events/kmem.h
> +++ b/include/trace/events/kmem.h
> @@ -375,6 +375,33 @@ TRACE_EVENT(mm_setup_per_zone_wmarks,
> __entry->watermark_promo)
> );
>
> +TRACE_EVENT(mm_setup_per_zone_lowmem_reserve,
> +
> + TP_PROTO(struct zone *zone, struct zone *upper_zone, long lowmem_reserve),
> +
> + TP_ARGS(zone, upper_zone, lowmem_reserve),
> +
> + TP_STRUCT__entry(
> + __field(int, node_id)
> + __string(name, zone->name)
> + __string(upper_name, upper_zone->name)
> + __field(long, lowmem_reserve)
> + ),
> +
> + TP_fast_assign(
> + __entry->node_id = zone->zone_pgdat->node_id;
> + __assign_str(name);
> + __assign_str(upper_name);
> + __entry->lowmem_reserve = lowmem_reserve;
> + ),
> +
> + TP_printk("node_id=%d zone name=%s upper_zone name=%s lowmem_reserve_pages=%ld",
> + __entry->node_id,
> + __get_str(name),
> + __get_str(upper_name),
> + __entry->lowmem_reserve)
> +);
> +
> /*
> * Required for uniquely and securely identifying mm in rss_stat tracepoint.
> */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 50893061db66..48623a2bf1ac 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5857,6 +5857,8 @@ static void setup_per_zone_lowmem_reserve(void)
> zone->lowmem_reserve[j] = 0;
> else
> zone->lowmem_reserve[j] = managed_pages / ratio;
> + trace_mm_setup_per_zone_lowmem_reserve(zone, upper_zone,
> + zone->lowmem_reserve[j]);
Hi Martin,
Please use 8-character width tabs for indentation.
-- Kalesh
> }
> }
> }
> --
> 2.48.1.711.g2feabab25a-goog
>
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH 2/3] mm/page_alloc: Add trace event for per-zone lowmem reserve setup
2025-03-03 8:18 ` Kalesh Singh
@ 2025-03-03 10:11 ` Martin Liu
0 siblings, 0 replies; 6+ messages in thread
From: Martin Liu @ 2025-03-03 10:11 UTC (permalink / raw)
To: Kalesh Singh; +Cc: akpm, hannes, linux-mm, surenb, minchan
On Mon, Mar 03, 2025 at 12:18:46AM -0800, Kalesh Singh wrote:
> On Sun, Mar 2, 2025 at 11:36 PM Martin Liu <liumartin@google.com> wrote:
> >
> > This commit introduces the `mm_setup_per_zone_lowmem_reserve` trace
> > event,which provides detailed insights into the kernel's per-zone lowmem
> > reserve configuration.
> >
> > The trace event provides precise timestamps, allowing developers to
> >
> > 1. Correlate lowmem reserve changes with specific kernel events and
> > able to diagnose unexpected kswapd or direct reclaim behavior
> > triggered by dynamic changes in lowmem reserve.
> >
> > 2. know memory allocation failures that occur due to insufficient lowmem
> > reserve, by precisely correlating allocation attempts with reserve
> > adjustments.
> >
> > Signed-off-by: Martin Liu <liumartin@google.com>
> > ---
> > include/trace/events/kmem.h | 27 +++++++++++++++++++++++++++
> > mm/page_alloc.c | 2 ++
> > 2 files changed, 29 insertions(+)
> >
> > diff --git a/include/trace/events/kmem.h b/include/trace/events/kmem.h
> > index 5fd392dae503..9623e68d4d26 100644
> > --- a/include/trace/events/kmem.h
> > +++ b/include/trace/events/kmem.h
> > @@ -375,6 +375,33 @@ TRACE_EVENT(mm_setup_per_zone_wmarks,
> > __entry->watermark_promo)
> > );
> >
> > +TRACE_EVENT(mm_setup_per_zone_lowmem_reserve,
> > +
> > + TP_PROTO(struct zone *zone, struct zone *upper_zone, long lowmem_reserve),
> > +
> > + TP_ARGS(zone, upper_zone, lowmem_reserve),
> > +
> > + TP_STRUCT__entry(
> > + __field(int, node_id)
> > + __string(name, zone->name)
> > + __string(upper_name, upper_zone->name)
> > + __field(long, lowmem_reserve)
> > + ),
> > +
> > + TP_fast_assign(
> > + __entry->node_id = zone->zone_pgdat->node_id;
> > + __assign_str(name);
> > + __assign_str(upper_name);
> > + __entry->lowmem_reserve = lowmem_reserve;
> > + ),
> > +
> > + TP_printk("node_id=%d zone name=%s upper_zone name=%s lowmem_reserve_pages=%ld",
> > + __entry->node_id,
> > + __get_str(name),
> > + __get_str(upper_name),
> > + __entry->lowmem_reserve)
> > +);
> > +
> > /*
> > * Required for uniquely and securely identifying mm in rss_stat tracepoint.
> > */
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 50893061db66..48623a2bf1ac 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -5857,6 +5857,8 @@ static void setup_per_zone_lowmem_reserve(void)
> > zone->lowmem_reserve[j] = 0;
> > else
> > zone->lowmem_reserve[j] = managed_pages / ratio;
> > + trace_mm_setup_per_zone_lowmem_reserve(zone, upper_zone,
> > + zone->lowmem_reserve[j]);
>
> Hi Martin,
>
> Please use 8-character width tabs for indentation.
Hi Kalesh,
Yes, thank you for the reminders. I will address these once receiving
feedback :)
>
> -- Kalesh
> > }
> > }
> > }
> > --
> > 2.48.1.711.g2feabab25a-goog
> >
> >
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2025-03-03 10:11 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-03-03 7:35 [PATCH 0/3] Add tracepoints for lowmem reserves, watermarks and totalreserve_pages Martin Liu
2025-03-03 7:35 ` [PATCH 1/3] mm/page_alloc: Add trace event for per-zone watermark setup Martin Liu
2025-03-03 7:35 ` [PATCH 2/3] mm/page_alloc: Add trace event for per-zone lowmem reserve setup Martin Liu
2025-03-03 8:18 ` Kalesh Singh
2025-03-03 10:11 ` Martin Liu
2025-03-03 7:35 ` [PATCH 3/3] mm/page_alloc: Add trace event for totalreserve_pages calculation Martin Liu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox