* [PATCH 1/7] mm/damon/core: introduce damos quota goal metrics for memory node utilization
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
@ 2025-04-20 19:40 ` SeongJae Park
2025-04-20 19:40 ` [PATCH 2/7] mm/damon/sysfs-schemes: implement file for quota goal nid parameter SeongJae Park
` (7 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-04-20 19:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: SeongJae Park, damon, kernel-team, linux-kernel, linux-mm
Used and free space ratios for specific NUMA nodes can be useful inputs
for NUMA-specific DAMOS schemes' aggressiveness self-tuning feedback
loop. Implement DAMOS quota goal metrics for such self-tuned schemes.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
include/linux/damon.h | 6 ++++++
mm/damon/core.c | 27 +++++++++++++++++++++++++++
mm/damon/sysfs-schemes.c | 2 ++
3 files changed, 35 insertions(+)
diff --git a/include/linux/damon.h b/include/linux/damon.h
index 47e36e6ea203..a4011726cb3b 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -145,6 +145,8 @@ enum damos_action {
*
* @DAMOS_QUOTA_USER_INPUT: User-input value.
* @DAMOS_QUOTA_SOME_MEM_PSI_US: System level some memory PSI in us.
+ * @DAMOS_QUOTA_NODE_MEM_USED_BP: MemUsed ratio of a node.
+ * @DAMOS_QUOTA_NODE_MEM_FREE_BP: MemFree ratio of a node.
* @NR_DAMOS_QUOTA_GOAL_METRICS: Number of DAMOS quota goal metrics.
*
* Metrics equal to larger than @NR_DAMOS_QUOTA_GOAL_METRICS are unsupported.
@@ -152,6 +154,8 @@ enum damos_action {
enum damos_quota_goal_metric {
DAMOS_QUOTA_USER_INPUT,
DAMOS_QUOTA_SOME_MEM_PSI_US,
+ DAMOS_QUOTA_NODE_MEM_USED_BP,
+ DAMOS_QUOTA_NODE_MEM_FREE_BP,
NR_DAMOS_QUOTA_GOAL_METRICS,
};
@@ -161,6 +165,7 @@ enum damos_quota_goal_metric {
* @target_value: Target value of @metric to achieve with the tuning.
* @current_value: Current value of @metric.
* @last_psi_total: Last measured total PSI
+ * @nid: Node id.
* @list: List head for siblings.
*
* Data structure for getting the current score of the quota tuning goal. The
@@ -179,6 +184,7 @@ struct damos_quota_goal {
/* metric-dependent fields */
union {
u64 last_psi_total;
+ int nid;
};
struct list_head list;
};
diff --git a/mm/damon/core.c b/mm/damon/core.c
index f0c1676f0599..587fb9a4fef8 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -1889,6 +1889,29 @@ static inline u64 damos_get_some_mem_psi_total(void)
#endif /* CONFIG_PSI */
+#ifdef CONFIG_NUMA
+static __kernel_ulong_t damos_get_node_mem_bp(
+ struct damos_quota_goal *goal)
+{
+ struct sysinfo i;
+ __kernel_ulong_t numerator;
+
+ si_meminfo_node(&i, goal->nid);
+ if (goal->metric == DAMOS_QUOTA_NODE_MEM_USED_BP)
+ numerator = i.totalram - i.freeram;
+ else /* DAMOS_QUOTA_NODE_MEM_FREE_BP */
+ numerator = i.freeram;
+ return numerator * 10000 / i.totalram;
+}
+#else
+static __kernel_ulong_t damos_get_node_mem_bp(
+ struct damos_quota_goal *goal)
+{
+ return 0;
+}
+#endif
+
+
static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
{
u64 now_psi_total;
@@ -1902,6 +1925,10 @@ static void damos_set_quota_goal_current_value(struct damos_quota_goal *goal)
goal->current_value = now_psi_total - goal->last_psi_total;
goal->last_psi_total = now_psi_total;
break;
+ case DAMOS_QUOTA_NODE_MEM_USED_BP:
+ case DAMOS_QUOTA_NODE_MEM_FREE_BP:
+ goal->current_value = damos_get_node_mem_bp(goal);
+ break;
default:
break;
}
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 23b562df0839..98108f082178 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -942,6 +942,8 @@ struct damos_sysfs_quota_goal {
static const char * const damos_sysfs_quota_goal_metric_strs[] = {
"user_input",
"some_mem_psi_us",
+ "node_mem_used_bp",
+ "node_mem_free_bp",
};
static struct damos_sysfs_quota_goal *damos_sysfs_quota_goal_alloc(void)
--
2.39.5
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 2/7] mm/damon/sysfs-schemes: implement file for quota goal nid parameter
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
2025-04-20 19:40 ` [PATCH 1/7] mm/damon/core: introduce damos quota goal metrics for memory node utilization SeongJae Park
@ 2025-04-20 19:40 ` SeongJae Park
2025-04-20 19:40 ` [PATCH 3/7] mm/damon/sysfs-schemes: connect damos_quota_goal nid with core layer SeongJae Park
` (6 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-04-20 19:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: SeongJae Park, damon, kernel-team, linux-kernel, linux-mm
DAMOS_QUOTA_NODE_MEM_{USED,FREE}_BP DAMOS quota goal metrics require the
node id parameter. However, there is no DAMON user ABI for setting it.
Implement a DAMON sysfs file for that with name 'nid', under the quota
goal directory.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
mm/damon/sysfs-schemes.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 98108f082178..7681ed293b62 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -936,6 +936,7 @@ struct damos_sysfs_quota_goal {
enum damos_quota_goal_metric metric;
unsigned long target_value;
unsigned long current_value;
+ int nid;
};
/* This should match with enum damos_action */
@@ -1016,6 +1017,28 @@ static ssize_t current_value_store(struct kobject *kobj,
return err ? err : count;
}
+static ssize_t nid_show(struct kobject *kobj,
+ struct kobj_attribute *attr, char *buf)
+{
+ struct damos_sysfs_quota_goal *goal = container_of(kobj, struct
+ damos_sysfs_quota_goal, kobj);
+
+ /* todo: return error if the goal is not using nid */
+
+ return sysfs_emit(buf, "%d\n", goal->nid);
+}
+
+static ssize_t nid_store(struct kobject *kobj,
+ struct kobj_attribute *attr, const char *buf, size_t count)
+{
+ struct damos_sysfs_quota_goal *goal = container_of(kobj, struct
+ damos_sysfs_quota_goal, kobj);
+ int err = kstrtoint(buf, 0, &goal->nid);
+
+ /* feed callback should check existence of this file and read value */
+ return err ? err : count;
+}
+
static void damos_sysfs_quota_goal_release(struct kobject *kobj)
{
/* or, notify this release to the feed callback */
@@ -1031,10 +1054,14 @@ static struct kobj_attribute damos_sysfs_quota_goal_target_value_attr =
static struct kobj_attribute damos_sysfs_quota_goal_current_value_attr =
__ATTR_RW_MODE(current_value, 0600);
+static struct kobj_attribute damos_sysfs_quota_goal_nid_attr =
+ __ATTR_RW_MODE(nid, 0600);
+
static struct attribute *damos_sysfs_quota_goal_attrs[] = {
&damos_sysfs_quota_goal_target_metric_attr.attr,
&damos_sysfs_quota_goal_target_value_attr.attr,
&damos_sysfs_quota_goal_current_value_attr.attr,
+ &damos_sysfs_quota_goal_nid_attr.attr,
NULL,
};
ATTRIBUTE_GROUPS(damos_sysfs_quota_goal);
--
2.39.5
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 3/7] mm/damon/sysfs-schemes: connect damos_quota_goal nid with core layer
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
2025-04-20 19:40 ` [PATCH 1/7] mm/damon/core: introduce damos quota goal metrics for memory node utilization SeongJae Park
2025-04-20 19:40 ` [PATCH 2/7] mm/damon/sysfs-schemes: implement file for quota goal nid parameter SeongJae Park
@ 2025-04-20 19:40 ` SeongJae Park
2025-04-20 19:40 ` [PATCH 4/7] Docs/mm/damon/design: document node_mem_{used,free}_bp SeongJae Park
` (5 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-04-20 19:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: SeongJae Park, damon, kernel-team, linux-kernel, linux-mm
DAMON sysfs interface file for DAMOS quota goal's node id argument is
not passed to core layer. Implement the link.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
mm/damon/sysfs-schemes.c | 11 ++++++++++-
1 file changed, 10 insertions(+), 1 deletion(-)
diff --git a/mm/damon/sysfs-schemes.c b/mm/damon/sysfs-schemes.c
index 7681ed293b62..729fe5f1ef30 100644
--- a/mm/damon/sysfs-schemes.c
+++ b/mm/damon/sysfs-schemes.c
@@ -2149,8 +2149,17 @@ static int damos_sysfs_add_quota_score(
sysfs_goal->target_value);
if (!goal)
return -ENOMEM;
- if (sysfs_goal->metric == DAMOS_QUOTA_USER_INPUT)
+ switch (sysfs_goal->metric) {
+ case DAMOS_QUOTA_USER_INPUT:
goal->current_value = sysfs_goal->current_value;
+ break;
+ case DAMOS_QUOTA_NODE_MEM_USED_BP:
+ case DAMOS_QUOTA_NODE_MEM_FREE_BP:
+ goal->nid = sysfs_goal->nid;
+ break;
+ default:
+ break;
+ }
damos_add_quota_goal(quota, goal);
}
return 0;
--
2.39.5
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 4/7] Docs/mm/damon/design: document node_mem_{used,free}_bp
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
` (2 preceding siblings ...)
2025-04-20 19:40 ` [PATCH 3/7] mm/damon/sysfs-schemes: connect damos_quota_goal nid with core layer SeongJae Park
@ 2025-04-20 19:40 ` SeongJae Park
2025-04-20 19:40 ` [PATCH 5/7] Docs/admin-guide/mm/damon/usage: document 'nid' file SeongJae Park
` (4 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-04-20 19:40 UTC (permalink / raw)
To: Andrew Morton
Cc: SeongJae Park, Jonathan Corbet, damon, kernel-team, linux-doc,
linux-kernel, linux-mm
Add description of DAMOS quota goal metrics for NUMA node utilization on
the DAMON deesign document.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Documentation/mm/damon/design.rst | 13 +++++++++----
1 file changed, 9 insertions(+), 4 deletions(-)
diff --git a/Documentation/mm/damon/design.rst b/Documentation/mm/damon/design.rst
index f12d33749329..728bf5754343 100644
--- a/Documentation/mm/damon/design.rst
+++ b/Documentation/mm/damon/design.rst
@@ -550,10 +550,10 @@ aggressiveness (the quota) of the corresponding scheme. For example, if DAMOS
is under achieving the goal, DAMOS automatically increases the quota. If DAMOS
is over achieving the goal, it decreases the quota.
-The goal can be specified with three parameters, namely ``target_metric``,
-``target_value``, and ``current_value``. The auto-tuning mechanism tries to
-make ``current_value`` of ``target_metric`` be same to ``target_value``.
-Currently, two ``target_metric`` are provided.
+The goal can be specified with four parameters, namely ``target_metric``,
+``target_value``, ``current_value`` and ``nid``. The auto-tuning mechanism
+tries to make ``current_value`` of ``target_metric`` be same to
+``target_value``.
- ``user_input``: User-provided value. Users could use any metric that they
has interest in for the value. Use space main workload's latency or
@@ -565,6 +565,11 @@ Currently, two ``target_metric`` are provided.
in microseconds that measured from last quota reset to next quota reset.
DAMOS does the measurement on its own, so only ``target_value`` need to be
set by users at the initial time. In other words, DAMOS does self-feedback.
+- ``node_mem_used_bp``: Specific NUMA node's used memory ratio in bp (1/10,000).
+- ``node_mem_free_bp``: Specific NUMA node's free memory ratio in bp (1/10,000).
+
+``nid`` is optionally required for only ``node_mem_used_bp`` and
+``node_mem_free_bp`` to point the specific NUMA node.
To know how user-space can set the tuning goal metric, the target value, and/or
the current value via :ref:`DAMON sysfs interface <sysfs_interface>`, refer to
--
2.39.5
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 5/7] Docs/admin-guide/mm/damon/usage: document 'nid' file
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
` (3 preceding siblings ...)
2025-04-20 19:40 ` [PATCH 4/7] Docs/mm/damon/design: document node_mem_{used,free}_bp SeongJae Park
@ 2025-04-20 19:40 ` SeongJae Park
2025-04-20 19:40 ` [PATCH 6/7] Docs/ABI/damon: document nid file SeongJae Park
` (3 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-04-20 19:40 UTC (permalink / raw)
To: Andrew Morton
Cc: SeongJae Park, Jonathan Corbet, damon, kernel-team, linux-doc,
linux-kernel, linux-mm
Add description of 'nid' file, which is optionally used for specific
DAMOS quota goal metrics such as node_mem_{used,free}_bp on DAMON usage
document.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Documentation/admin-guide/mm/damon/usage.rst | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/Documentation/admin-guide/mm/damon/usage.rst b/Documentation/admin-guide/mm/damon/usage.rst
index ced2013db3df..d960aba72b82 100644
--- a/Documentation/admin-guide/mm/damon/usage.rst
+++ b/Documentation/admin-guide/mm/damon/usage.rst
@@ -81,7 +81,7 @@ comma (",").
│ │ │ │ │ │ │ :ref:`quotas <sysfs_quotas>`/ms,bytes,reset_interval_ms,effective_bytes
│ │ │ │ │ │ │ │ weights/sz_permil,nr_accesses_permil,age_permil
│ │ │ │ │ │ │ │ :ref:`goals <sysfs_schemes_quota_goals>`/nr_goals
- │ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value
+ │ │ │ │ │ │ │ │ │ 0/target_metric,target_value,current_value,nid
│ │ │ │ │ │ │ :ref:`watermarks <sysfs_watermarks>`/metric,interval_us,high,mid,low
│ │ │ │ │ │ │ :ref:`{core_,ops_,}filters <sysfs_filters>`/nr_filters
│ │ │ │ │ │ │ │ 0/type,matching,allow,memcg_path,addr_start,addr_end,target_idx,min,max
@@ -390,11 +390,11 @@ number (``N``) to the file creates the number of child directories named ``0``
to ``N-1``. Each directory represents each goal and current achievement.
Among the multiple feedback, the best one is used.
-Each goal directory contains three files, namely ``target_metric``,
-``target_value`` and ``current_value``. Users can set and get the three
-parameters for the quota auto-tuning goals that specified on the :ref:`design
-doc <damon_design_damos_quotas_auto_tuning>` by writing to and reading from each
-of the files. Note that users should further write
+Each goal directory contains four files, namely ``target_metric``,
+``target_value``, ``current_value`` and ``nid``. Users can set and get the
+four parameters for the quota auto-tuning goals that specified on the
+:ref:`design doc <damon_design_damos_quotas_auto_tuning>` by writing to and
+reading from each of the files. Note that users should further write
``commit_schemes_quota_goals`` to the ``state`` file of the :ref:`kdamond
directory <sysfs_kdamond>` to pass the feedback to DAMON.
--
2.39.5
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 6/7] Docs/ABI/damon: document nid file
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
` (4 preceding siblings ...)
2025-04-20 19:40 ` [PATCH 5/7] Docs/admin-guide/mm/damon/usage: document 'nid' file SeongJae Park
@ 2025-04-20 19:40 ` SeongJae Park
2025-04-20 19:40 ` [PATCH 7/7] samples/damon: implement a DAMON module for memory tiering SeongJae Park
` (2 subsequent siblings)
8 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-04-20 19:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: SeongJae Park, damon, kernel-team, linux-kernel, linux-mm
Add a description of 'nid' file, which is optionally used for specific
DAMOS quota goal metrics such as node_mem_{used,free}_bp on the DAMON
sysfs ABI document.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
Documentation/ABI/testing/sysfs-kernel-mm-damon | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/Documentation/ABI/testing/sysfs-kernel-mm-damon b/Documentation/ABI/testing/sysfs-kernel-mm-damon
index 293197f180ad..5697ab154c1f 100644
--- a/Documentation/ABI/testing/sysfs-kernel-mm-damon
+++ b/Documentation/ABI/testing/sysfs-kernel-mm-damon
@@ -283,6 +283,12 @@ Contact: SeongJae Park <sj@kernel.org>
Description: Writing to and reading from this file sets and gets the current
value of the goal metric.
+What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/goals/<G>/nid
+Date: Apr 2025
+Contact: SeongJae Park <sj@kernel.org>
+Description: Writing to and reading from this file sets and gets the nid
+ parameter of the goal.
+
What: /sys/kernel/mm/damon/admin/kdamonds/<K>/contexts/<C>/schemes/<S>/quotas/weights/sz_permil
Date: Mar 2022
Contact: SeongJae Park <sj@kernel.org>
--
2.39.5
^ permalink raw reply [flat|nested] 13+ messages in thread* [PATCH 7/7] samples/damon: implement a DAMON module for memory tiering
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
` (5 preceding siblings ...)
2025-04-20 19:40 ` [PATCH 6/7] Docs/ABI/damon: document nid file SeongJae Park
@ 2025-04-20 19:40 ` SeongJae Park
2025-04-20 19:47 ` [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
2025-05-02 7:38 ` Yunjeong Mun
8 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-04-20 19:40 UTC (permalink / raw)
To: Andrew Morton; +Cc: SeongJae Park, damon, kernel-team, linux-kernel, linux-mm
Implement a sample DAMON module that shows how self-tuned DAMON-based
memory tiering can be written. It is a sample since the purpose is to
give an idea about how it can be implemented and perform, rather than be
used on general production setups. Especially, it supports only two
tiers memory setup having only one CPU-attached NUMA node.
Signed-off-by: SeongJae Park <sj@kernel.org>
---
samples/damon/Kconfig | 13 ++++
samples/damon/Makefile | 1 +
samples/damon/mtier.c | 167 +++++++++++++++++++++++++++++++++++++++++
3 files changed, 181 insertions(+)
create mode 100644 samples/damon/mtier.c
diff --git a/samples/damon/Kconfig b/samples/damon/Kconfig
index 564c49ed69a2..cbf96fd8a8bf 100644
--- a/samples/damon/Kconfig
+++ b/samples/damon/Kconfig
@@ -27,4 +27,17 @@ config SAMPLE_DAMON_PRCL
If unsure, say N.
+config SAMPLE_DAMON_MTIER
+ bool "DAMON sample module for memory tiering"
+ depends on DAMON && DAMON_PADDR
+ help
+ Thps builds DAMON sample module for memory tierign.
+
+ The module assumes the system is constructed with two NUMA nodes,
+ which seems as local and remote nodes to all CPUs. For example,
+ node0 is for DDR5 DRAMs connected via DIMM, while node1 is for DDR4
+ DRAMs connected via CXL.
+
+ If unsure, say N.
+
endmenu
diff --git a/samples/damon/Makefile b/samples/damon/Makefile
index 7f155143f237..72f68cbf422a 100644
--- a/samples/damon/Makefile
+++ b/samples/damon/Makefile
@@ -2,3 +2,4 @@
obj-$(CONFIG_SAMPLE_DAMON_WSSE) += wsse.o
obj-$(CONFIG_SAMPLE_DAMON_PRCL) += prcl.o
+obj-$(CONFIG_SAMPLE_DAMON_MTIER) += mtier.o
diff --git a/samples/damon/mtier.c b/samples/damon/mtier.c
new file mode 100644
index 000000000000..3390b7d30c26
--- /dev/null
+++ b/samples/damon/mtier.c
@@ -0,0 +1,167 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * memory tiering: migrate cold pages in node 0 and hot pages in node 1 to node
+ * 1 and node 0, respectively. Adjust the hotness/coldness threshold aiming
+ * resulting 99.6 % node 0 utilization ratio.
+ */
+
+#define pr_fmt(fmt) "damon_sample_mtier: " fmt
+
+#include <linux/damon.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+static unsigned long node0_start_addr __read_mostly;
+module_param(node0_start_addr, ulong, 0600);
+
+static unsigned long node0_end_addr __read_mostly;
+module_param(node0_end_addr, ulong, 0600);
+
+static unsigned long node1_start_addr __read_mostly;
+module_param(node1_start_addr, ulong, 0600);
+
+static unsigned long node1_end_addr __read_mostly;
+module_param(node1_end_addr, ulong, 0600);
+
+static int damon_sample_mtier_enable_store(
+ const char *val, const struct kernel_param *kp);
+
+static const struct kernel_param_ops enable_param_ops = {
+ .set = damon_sample_mtier_enable_store,
+ .get = param_get_bool,
+};
+
+static bool enable __read_mostly;
+module_param_cb(enable, &enable_param_ops, &enable, 0600);
+MODULE_PARM_DESC(enable, "Enable of disable DAMON_SAMPLE_MTIER");
+
+static struct damon_ctx *ctxs[2];
+
+static struct damon_ctx *damon_sample_mtier_build_ctx(bool promote)
+{
+ struct damon_ctx *ctx;
+ struct damon_target *target;
+ struct damon_region *region;
+ struct damos *scheme;
+ struct damos_quota_goal *quota_goal;
+ struct damos_filter *filter;
+
+ ctx = damon_new_ctx();
+ if (!ctx)
+ return NULL;
+ /*
+ * auto-tune sampling and aggregation interval aiming 4% DAMON-observed
+ * accesses ratio, keeping sampling interval in [5ms, 10s] range.
+ */
+ ctx->attrs.intervals_goal = (struct damon_intervals_goal) {
+ .access_bp = 400, .aggrs = 3,
+ .min_sample_us = 5000, .max_sample_us = 10000000,
+ };
+ if (damon_select_ops(ctx, DAMON_OPS_PADDR))
+ goto free_out;
+
+ target = damon_new_target();
+ if (!target)
+ goto free_out;
+ damon_add_target(ctx, target);
+ region = damon_new_region(
+ promote ? node1_start_addr : node0_start_addr,
+ promote ? node1_end_addr : node0_end_addr);
+ if (!region)
+ goto free_out;
+ damon_add_region(region, target);
+
+ scheme = damon_new_scheme(
+ /* access pattern */
+ &(struct damos_access_pattern) {
+ .min_sz_region = PAGE_SIZE,
+ .max_sz_region = ULONG_MAX,
+ .min_nr_accesses = promote ? 1 : 0,
+ .max_nr_accesses = promote ? UINT_MAX : 0,
+ .min_age_region = 0,
+ .max_age_region = UINT_MAX},
+ /* action */
+ promote ? DAMOS_MIGRATE_HOT : DAMOS_MIGRATE_COLD,
+ 1000000, /* apply interval (1s) */
+ &(struct damos_quota){
+ /* 200 MiB per sec by most */
+ .reset_interval = 1000,
+ .sz = 200 * 1024 * 1024,
+ /* ignore size of region when prioritizing */
+ .weight_sz = 0,
+ .weight_nr_accesses = 100,
+ .weight_age = 100,
+ },
+ &(struct damos_watermarks){},
+ promote ? 0 : 1); /* migrate target node id */
+ if (!scheme)
+ goto free_out;
+ damon_set_schemes(ctx, &scheme, 1);
+ quota_goal = damos_new_quota_goal(
+ promote ? DAMOS_QUOTA_NODE_MEM_USED_BP :
+ DAMOS_QUOTA_NODE_MEM_FREE_BP,
+ promote ? 9970 : 50);
+ if (!quota_goal)
+ goto free_out;
+ quota_goal->nid = 0;
+ damos_add_quota_goal(&scheme->quota, quota_goal);
+ filter = damos_new_filter(DAMOS_FILTER_TYPE_YOUNG, true, promote);
+ if (!filter)
+ goto free_out;
+ damos_add_filter(scheme, filter);
+ return ctx;
+free_out:
+ damon_destroy_ctx(ctx);
+ return NULL;
+}
+
+static int damon_sample_mtier_start(void)
+{
+ struct damon_ctx *ctx;
+
+ ctx = damon_sample_mtier_build_ctx(true);
+ if (!ctx)
+ return -ENOMEM;
+ ctxs[0] = ctx;
+ ctx = damon_sample_mtier_build_ctx(false);
+ if (!ctx) {
+ damon_destroy_ctx(ctxs[0]);
+ return -ENOMEM;
+ }
+ ctxs[1] = ctx;
+ return damon_start(ctxs, 2, true);
+}
+
+static void damon_sample_mtier_stop(void)
+{
+ damon_stop(ctxs, 2);
+ damon_destroy_ctx(ctxs[0]);
+ damon_destroy_ctx(ctxs[1]);
+}
+
+static int damon_sample_mtier_enable_store(
+ const char *val, const struct kernel_param *kp)
+{
+ bool enabled = enable;
+ int err;
+
+ err = kstrtobool(val, &enable);
+ if (err)
+ return err;
+
+ if (enable == enabled)
+ return 0;
+
+ if (enable)
+ return damon_sample_mtier_start();
+ damon_sample_mtier_stop();
+ return 0;
+}
+
+static int __init damon_sample_mtier_init(void)
+{
+ return 0;
+}
+
+module_init(damon_sample_mtier_init);
--
2.39.5
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
` (6 preceding siblings ...)
2025-04-20 19:40 ` [PATCH 7/7] samples/damon: implement a DAMON module for memory tiering SeongJae Park
@ 2025-04-20 19:47 ` SeongJae Park
2025-05-02 7:38 ` Yunjeong Mun
8 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-04-20 19:47 UTC (permalink / raw)
To: SeongJae Park
Cc: Andrew Morton, Jonathan Corbet, damon, kernel-team, linux-doc,
linux-kernel, linux-mm
On Sun, 20 Apr 2025 12:40:23 -0700 SeongJae Park <sj@kernel.org> wrote:
> Utilizing DAMON for memory tiering usually requires manual tuning and/or
> tedious controls. Let it self-tune hotness and coldness thresholds for
> promotion and demotion aiming high utilization of high memory tiers, by
> introducing new DAMOS quota goal metrics representing the used and the
> free memory ratios of specific NUMA nodes. And introduce a sample DAMON
> module that demonstrates how the new feature can be used for memory
> tiering use cases.
[...]
> References
> ==========
>
> [1] https://lore.kernel.org/20231112195602.61525-1-sj@kernel.org/
> [2] https://lore.kernel.org/20250303221726.484227-1-sj@kernel.org
> [3] https://github.com/facebookresearch/DCPerf/blob/main/packages/tao_bench/README.md
Forgot adding below, sorry.
Revision History
================
Changes from RFC
(https://lore.kernel.org/20250320053937.57734-1-sj@kernel.org)
- Wordsmith commit messages
- Add documentations
[...]
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory
2025-04-20 19:40 [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
` (7 preceding siblings ...)
2025-04-20 19:47 ` [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory SeongJae Park
@ 2025-05-02 7:38 ` Yunjeong Mun
2025-05-02 15:49 ` SeongJae Park
8 siblings, 1 reply; 13+ messages in thread
From: Yunjeong Mun @ 2025-05-02 7:38 UTC (permalink / raw)
To: SeongJae Park
Cc: Jonathan Corbet, damon, kernel-team, linux-doc, linux-kernel,
linux-mm, Andrew Morton, kernel_team
Hi SeongJae, thanks for your helpful auto-tuning patchset, which optimizes
the ease of used of DAMON on tiered memory systems. I have tested demotion
mechanism with a microbenchmark and would like to share the result.
On Sun, 20 Apr 2025 12:40:23 -0700 SeongJae Park <sj@kernel.org> wrote:
[..snip..]
> Utilizing DAMON for memory tiering usually requires manual tuning and/
> Evaluation Limitations
> ----------------------
>
> As mentioned above, this evaluation shows only comparison of promotion
> mechanisms. DAMON-based tiering is recommended to be used together with
> reclaim-based demotion as a faster backup under significant memory
> pressure, though.
>
> >From some perspective, the modified version of Taobench may seems making
> the picture distorted too much. It would be better to evaluate with
> more realistic workload, or more finely tuned micro benchmarks.
>
Hardware.
- Node 0: 512GB DRAM
- Node 1: 0GB (memoryless)
- Node 2: 96GB CXL memory
Kernel
- RFC patchset on top of v6.14-rc7
https://lore.kernel.org/damon/20250320053937.57734-1-sj@kernel.org/
Workload
- Microbenchmark creates hot and cold regions based on the specified parameters.
$ ./hot_cold 1g 100g
It repetitively performs memset on a 1GB hot region, but only performs memset
once on a 100GB cold region.
DAMON setup
- My intention is to demote most of all regions of cold memory from node 0 to
node 2. So, damo start with below yaml configuration:
...
# damo v2.7.2 from https://git.kernel.org/pub/scm/linux/kernel/git/sj/damo.git/
schemes:
- action: migrate_cold
target_nid: 2
...
apply_interval_us: 0
quotas:
time_ms: 0 s
sz_bytes: 0 GiB
reset_interval_ms: 6 s
goals:
- metric: node_mem_free_bp
target_value: 99%
nid: 0
current_value: 1
effective_sz_bytes: 0 B
...
Results
I've run the hot_cold benchmark for approximately 2 days, and have monitored
the memory usage of each node as follows:
$ numastat -c -p hot_cold
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Node 3 Total
--------------- ------ ------ ------ ------ ------
2689746 (watch) 2 0 0 1 3
2690067 (hot_col 100122 0 3303 0 103426
3770656 (watch) 0 0 0 1 1
3770657 (sh) 2 0 0 0 2
--------------- ------ ------ ------ ------ ------
Total 100127 0 3303 1 103432
I expected that most of cold data from node 0 would be demoted to node 2, but it isn't.
In this situation, DAMON's variables are displayed as follows:
[2067202.863431] totalram 131938449 free 84504526 used 47433923 numerator 84504526
[2067202.863446] goal->current_value: 6404
[2067202.863452] score: 6468
[2067202.863455] quota->esz: 1844674407370955
`score` 6468 means the goal hasn't been achieved yet, and the `quota->esz`,
which specifies the aggressiveness of the demotion action, has reached
ULONG_MAX. However, the demotion has not occured.
[..snip..]
I think there may be some errors or misunderstanding in my experiment.
I would be grateful for any insights or feedback you might have regarding these
results.
Best Regards,
Yunjeong
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory
2025-05-02 7:38 ` Yunjeong Mun
@ 2025-05-02 15:49 ` SeongJae Park
2025-05-08 9:28 ` Yunjeong Mun
0 siblings, 1 reply; 13+ messages in thread
From: SeongJae Park @ 2025-05-02 15:49 UTC (permalink / raw)
To: Yunjeong Mun
Cc: SeongJae Park, Jonathan Corbet, damon, kernel-team, linux-doc,
linux-kernel, linux-mm, Andrew Morton, kernel_team
Hi Yunjeong,
On Fri, 2 May 2025 16:38:48 +0900 Yunjeong Mun <yunjeong.mun@sk.com> wrote:
> Hi SeongJae, thanks for your helpful auto-tuning patchset, which optimizes
> the ease of used of DAMON on tiered memory systems. I have tested demotion
> mechanism with a microbenchmark and would like to share the result.
Thank you for sharing your test result!
[...]
> Hardware.
> - Node 0: 512GB DRAM
> - Node 1: 0GB (memoryless)
> - Node 2: 96GB CXL memory
>
> Kernel
> - RFC patchset on top of v6.14-rc7
> https://lore.kernel.org/damon/20250320053937.57734-1-sj@kernel.org/
>
> Workload
> - Microbenchmark creates hot and cold regions based on the specified parameters.
> $ ./hot_cold 1g 100g
> It repetitively performs memset on a 1GB hot region, but only performs memset
> once on a 100GB cold region.
>
> DAMON setup
> - My intention is to demote most of all regions of cold memory from node 0 to
> node 2. So, damo start with below yaml configuration:
> ...
> # damo v2.7.2 from https://git.kernel.org/pub/scm/linux/kernel/git/sj/damo.git/
> schemes:
> - action: migrate_cold
> target_nid: 2
> ...
> apply_interval_us: 0
> quotas:
> time_ms: 0 s
> sz_bytes: 0 GiB
> reset_interval_ms: 6 s
> goals:
> - metric: node_mem_free_bp
> target_value: 99%
> nid: 0
> current_value: 1
> effective_sz_bytes: 0 B
> ...
Sharing DAMON parameters you used can be helpful, thank you! Can you further
share full parameters? I'm especially interested in how the parameters for
monitoring targets and migrate_cold scheme's target access pattern, and if
there are other DAMON contexts or DAMOS schemes running together.
>
> Results
> I've run the hot_cold benchmark for approximately 2 days, and have monitored
> the memory usage of each node as follows:
>
> $ numastat -c -p hot_cold
> Per-node process memory usage (in MBs)
> PID Node 0 Node 1 Node 2 Node 3 Total
> --------------- ------ ------ ------ ------ ------
> 2689746 (watch) 2 0 0 1 3
> 2690067 (hot_col 100122 0 3303 0 103426
> 3770656 (watch) 0 0 0 1 1
> 3770657 (sh) 2 0 0 0 2
> --------------- ------ ------ ------ ------ ------
> Total 100127 0 3303 1 103432
>
> I expected that most of cold data from node 0 would be demoted to node 2, but it isn't.
> In this situation, DAMON's variables are displayed as follows:
>
> [2067202.863431] totalram 131938449 free 84504526 used 47433923 numerator 84504526
> [2067202.863446] goal->current_value: 6404
> [2067202.863452] score: 6468
> [2067202.863455] quota->esz: 1844674407370955
>
> `score` 6468 means the goal hasn't been achieved yet, and the `quota->esz`,
> which specifies the aggressiveness of the demotion action, has reached
> ULONG_MAX. However, the demotion has not occured.
Yes, as you intrpret, seems the auto-tuning is working as designed, but
migration is not successfully happened. I'm curious if migration is tried but
failed. DAMOS stats[1] may let us know that. Can you check and share those?
>
> [..snip..]
>
> I think there may be some errors or misunderstanding in my experiment.
> I would be grateful for any insights or feedback you might have regarding these
> results.
I don't have clear idea at the moment, sorry. It would be helpful if you could
share things I asked above.
Also, it seems you suspect the auto-tuning as one of root causes. I'm curious
if you tried some different tests (e.g., same one without auto-tuning) and it
gave you some theories. If so, could you please share those?
[1] https://origin.kernel.org/doc/html/latest/mm/damon/design.html#statistics
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory
2025-05-02 15:49 ` SeongJae Park
@ 2025-05-08 9:28 ` Yunjeong Mun
2025-05-08 16:35 ` SeongJae Park
0 siblings, 1 reply; 13+ messages in thread
From: Yunjeong Mun @ 2025-05-08 9:28 UTC (permalink / raw)
To: SeongJae Park
Cc: honggyu.kim, Jonathan Corbet, damon, kernel-team, linux-doc,
linux-kernel, linux-mm, Andrew Morton, kernel_team
Hi Seongjae, I'm sorry for the delayed response due to the holidays.
On Fri, 2 May 2025 08:49:49 -0700 SeongJae Park <sj@kernel.org> wrote:
> Hi Yunjeong,
>
> On Fri, 2 May 2025 16:38:48 +0900 Yunjeong Mun <yunjeong.mun@sk.com> wrote:
>
> > Hi SeongJae, thanks for your helpful auto-tuning patchset, which optimizes
> > the ease of used of DAMON on tiered memory systems. I have tested demotion
> > mechanism with a microbenchmark and would like to share the result.
>
> Thank you for sharing your test result!
>
> [...]
> > Hardware.
> > - Node 0: 512GB DRAM
> > - Node 1: 0GB (memoryless)
> > - Node 2: 96GB CXL memory
> >
> > Kernel
> > - RFC patchset on top of v6.14-rc7
> > https://lore.kernel.org/damon/20250320053937.57734-1-sj@kernel.org/
> >
> > Workload
> > - Microbenchmark creates hot and cold regions based on the specified parameters.
> > $ ./hot_cold 1g 100g
> > It repetitively performs memset on a 1GB hot region, but only performs memset
> > once on a 100GB cold region.
> >
> > DAMON setup
> > - My intention is to demote most of all regions of cold memory from node 0 to
> > node 2. So, damo start with below yaml configuration:
> > ...
> > # damo v2.7.2 from https://git.kernel.org/pub/scm/linux/kernel/git/sj/damo.git/
> > schemes:
> > - action: migrate_cold
> > target_nid: 2
> > ...
> > apply_interval_us: 0
> > quotas:
> > time_ms: 0 s
> > sz_bytes: 0 GiB
> > reset_interval_ms: 6 s
> > goals:
> > - metric: node_mem_free_bp
> > target_value: 99%
> > nid: 0
> > current_value: 1
> > effective_sz_bytes: 0 B
> > ...
>
> Sharing DAMON parameters you used can be helpful, thank you! Can you further
> share full parameters? I'm especially interested in how the parameters for
> monitoring targets and migrate_cold scheme's target access pattern, and if
> there are other DAMON contexts or DAMOS schemes running together.
>
Actually, I realized that the 'regions' field in my YAML configuration is
incorrect. I've been using a configuration file that was create on another
server, not the testing server. As a result, the scheme is applied to wrong
region, causing the results to appear confusing. I've fixed the issue and
confirmed that the demotion occured successfully. I'm sorry for any confusion
this may have caused.
After fixing it up, Honggyu and I tested this patch again. I would like to
share two issues: 1) slow start of action, 2) action does not stop even when
target is acheived. Below are the test configurations:
Hardware
- node 0: 64GB DRAM
- node 1: 0GB (memoryless)
- node 2: 96GB CXL memory
Kernel
- This patchset on top of v6.15-rc4
Workload: microbenchmark that `mmap` and `memset` once for size GB
$ ./mmap 50
DAMON setup: just one contexts and schemes.
...
schemes:
- action: migrate_cold
target_nid: 2
access_pattern:
sz_bytes:
min: 4.000 KiB
max: max
nr_accesses:
min: 0 %
max: 0 %
age:
min: 10 s
max: max
apply_interval_us: 0
quotas:
time_ms: 0 s
sz_bytes: 0 GiB
reset_interval_ms: 20 s
goals:
- metric: node_mem_free_bp
target_value: 50%
nid: 0
current_value: 1
...
Two issues mentioned above are both caused by the calculation logic of
`quota->esz`, which grows too slowly and increases gradually.
Slow start: 50GB of data is allocated on node 0, and the demotion first occurs
after about 15 minutes. This is because `quota->esz` is growing slowly even
when the `current` is lower than the `target`.
Not stop: the `target` is to maintain 50% free space on node 0, which we expect
to be about 32GB. However, it demoted more than intended, maintaing about 90%
free space as follows:
Per-node process memory usage (in MBs)
PID Node 0 Node 1 Node 2 Total
------------ ------ ------ ------ -----
1182 (watch) 2 0 0 2
1198 (mmap) 7015 0 44187 51201
------------ ------ ------ ------ -----
Total 7017 0 44187 51204
This is becuase the `esz` decreased slowly after acheiving the `target`.
In the end, the demotion occured more excessively than intended.
We believe that the defference between `target` and `current` increases, the
`esz` should be raised more rapidly to increase the aggressiveness of action.
In the current implementation, the `esz` remains low even when the `current` is
below the `target`, leading to a slow start issue. Also, there is a not-stop
issue where high `esz` persist (decreasing slowly) even when an over_achieved
state.
>
> Yes, as you intrpret, seems the auto-tuning is working as designed, but
> migration is not successfully happened. I'm curious if migration is tried but
> failed. DAMOS stats[1] may let us know that. Can you check and share those?
>
Thank you for providing the DAMOS stats information. I will use it when
analyzing with DAMON. I would appreciate any feedback you might have on the new
results.
Best Regards,
Yunjeong
[..snip..]
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [PATCH 0/7] mm/damon: auto-tune DAMOS for NUMA setups including tiered memory
2025-05-08 9:28 ` Yunjeong Mun
@ 2025-05-08 16:35 ` SeongJae Park
0 siblings, 0 replies; 13+ messages in thread
From: SeongJae Park @ 2025-05-08 16:35 UTC (permalink / raw)
To: Yunjeong Mun
Cc: SeongJae Park, honggyu.kim, Jonathan Corbet, damon, kernel-team,
linux-doc, linux-kernel, linux-mm, Andrew Morton, kernel_team
On Thu, 8 May 2025 18:28:27 +0900 Yunjeong Mun <yunjeong.mun@sk.com> wrote:
> Hi Seongjae, I'm sorry for the delayed response due to the holidays.
No worry, hope you had a good break :)
>
> On Fri, 2 May 2025 08:49:49 -0700 SeongJae Park <sj@kernel.org> wrote:
> > Hi Yunjeong,
> >
> > On Fri, 2 May 2025 16:38:48 +0900 Yunjeong Mun <yunjeong.mun@sk.com> wrote:
> >
> > > Hi SeongJae, thanks for your helpful auto-tuning patchset, which optimizes
> > > the ease of used of DAMON on tiered memory systems. I have tested demotion
> > > mechanism with a microbenchmark and would like to share the result.
> >
> > Thank you for sharing your test result!
> >
> > [...]
> > > Hardware.
> > > - Node 0: 512GB DRAM
> > > - Node 1: 0GB (memoryless)
> > > - Node 2: 96GB CXL memory
> > >
> > > Kernel
> > > - RFC patchset on top of v6.14-rc7
> > > https://lore.kernel.org/damon/20250320053937.57734-1-sj@kernel.org/
> > >
> > > Workload
> > > - Microbenchmark creates hot and cold regions based on the specified parameters.
> > > $ ./hot_cold 1g 100g
> > > It repetitively performs memset on a 1GB hot region, but only performs memset
> > > once on a 100GB cold region.
> > >
> > > DAMON setup
> > > - My intention is to demote most of all regions of cold memory from node 0 to
> > > node 2. So, damo start with below yaml configuration:
> > > ...
> > > # damo v2.7.2 from https://git.kernel.org/pub/scm/linux/kernel/git/sj/damo.git/
> > > schemes:
> > > - action: migrate_cold
> > > target_nid: 2
> > > ...
> > > apply_interval_us: 0
> > > quotas:
> > > time_ms: 0 s
> > > sz_bytes: 0 GiB
> > > reset_interval_ms: 6 s
> > > goals:
> > > - metric: node_mem_free_bp
> > > target_value: 99%
> > > nid: 0
> > > current_value: 1
> > > effective_sz_bytes: 0 B
> > > ...
> >
> > Sharing DAMON parameters you used can be helpful, thank you! Can you further
> > share full parameters? I'm especially interested in how the parameters for
> > monitoring targets and migrate_cold scheme's target access pattern, and if
> > there are other DAMON contexts or DAMOS schemes running together.
> >
>
> Actually, I realized that the 'regions' field in my YAML configuration is
> incorrect. I've been using a configuration file that was create on another
> server, not the testing server.
To my understanding, you use YAML configuration because DAMON user-space tool
doesn't provide good interface for multiple kdamonds setup. Starting from
v2.7.5, DAMON user-space tool supports multiple kdamonds setup from the command
line, and it supports setting target regions as NUMA nodes (--numa_node).
Using those might be a better option for you.
> As a result, the scheme is applied to wrong
> region, causing the results to appear confusing. I've fixed the issue and
> confirmed that the demotion occured successfully. I'm sorry for any confusion
> this may have caused.
Glad to hear that the issue is fixed.
>
> After fixing it up, Honggyu and I tested this patch again. I would like to
> share two issues: 1) slow start of action, 2) action does not stop even when
> target is acheived. Below are the test configurations:
>
> Hardware
> - node 0: 64GB DRAM
> - node 1: 0GB (memoryless)
> - node 2: 96GB CXL memory
>
> Kernel
> - This patchset on top of v6.15-rc4
>
> Workload: microbenchmark that `mmap` and `memset` once for size GB
> $ ./mmap 50
>
> DAMON setup: just one contexts and schemes.
> ...
> schemes:
> - action: migrate_cold
> target_nid: 2
> access_pattern:
> sz_bytes:
> min: 4.000 KiB
> max: max
> nr_accesses:
> min: 0 %
> max: 0 %
> age:
> min: 10 s
> max: max
> apply_interval_us: 0
> quotas:
> time_ms: 0 s
> sz_bytes: 0 GiB
> reset_interval_ms: 20 s
> goals:
> - metric: node_mem_free_bp
> target_value: 50%
> nid: 0
> current_value: 1
> ...
>
> Two issues mentioned above are both caused by the calculation logic of
> `quota->esz`, which grows too slowly and increases gradually.
>
> Slow start: 50GB of data is allocated on node 0, and the demotion first occurs
> after about 15 minutes. This is because `quota->esz` is growing slowly even
> when the `current` is lower than the `target`.
This is an intended design to avoid making unnecessary actions for only
temporal access pattern. On realistic workloads having a time scale, I think
some delay is not a big problem. I agree 15 minutes is too long, though. But,
the speed also depends on reset_interval_ms. The quota grows up to 100% once
per reset_interval_ms. The quota size is 1 byte in minimum, so it takes at
least 12 reset_interval_ms to make the size quota at least single 4K page size.
Because reset_interval_ms is 20 seconds in this setup, 12 reset_interval_ms is
four minutes (240 seconds).
My intended use of resset_interval_ms is setting it just not too short, to
reduce unnecessary quota calculation overhead. From my perspective, 20 seconds
feels too long. Is there a reason to set it so long? If there is no reason,
I'd recommend starting with 1 second reset_interval_ms and adjust for your
setup if it doesn't work.
And I realize this would better to be documented. I will try to make this more
clarified on the documentation when I get time. Please feel free to submit a
patch if you find a time faster than me :)
>
> Not stop: the `target` is to maintain 50% free space on node 0, which we expect
> to be about 32GB. However, it demoted more than intended, maintaing about 90%
> free space as follows:
>
> Per-node process memory usage (in MBs)
> PID Node 0 Node 1 Node 2 Total
> ------------ ------ ------ ------ -----
> 1182 (watch) 2 0 0 2
> 1198 (mmap) 7015 0 44187 51201
> ------------ ------ ------ ------ -----
> Total 7017 0 44187 51204
>
> This is becuase the `esz` decreased slowly after acheiving the `target`.
> In the end, the demotion occured more excessively than intended.
>
> We believe that the defference between `target` and `current` increases, the
> `esz` should be raised more rapidly to increase the aggressiveness of action.
> In the current implementation, the `esz` remains low even when the `current` is
> below the `target`, leading to a slow start issue. Also, there is a not-stop
> issue where high `esz` persist (decreasing slowly) even when an over_achieved
> state.
This is yet another intended design. The aim-oriented quota auto-tuning
feature assumes there is an ideal amount of quota that fits for the current
situation, that could dynamically change. For example, proactively reclaiming
cold memory aiming a modest level of memory pressure.
For this case, I think you should have another scheme for promotion. Please
refer to the design and example implementation of the sample module. Or, do
you have a special reason to utilize only demotion scheme like this setup? If
so, please share.
If you really need a feature that turns DAMOS on and off for given situation,
DAMOS watermarks may be the right feature to look. You could also override
tuned quota from user space. So you could monitor the free size of given NUMA
node and set the tuned quota as zero, immediately, or jsut remove the scheme.
Again, this might be due to the poor documentation. Sorry about the poor
documentation and thank you for letting me find this. I'll try to make the
documentation better.
>
> >
> > Yes, as you intrpret, seems the auto-tuning is working as designed, but
> > migration is not successfully happened. I'm curious if migration is tried but
> > failed. DAMOS stats[1] may let us know that. Can you check and share those?
> >
>
> Thank you for providing the DAMOS stats information.
> I will use it when analyzing with DAMON.
Maybe an easiest way to monitor it is
'damo report access --tried_regions_of X Y Z --style temperature-sz-hist'.
> I would appreciate any feedback you
> might have on the new
> results.
I wish my above replies helps a bit, and looking forward to anything I missed
or your special reasons for your setup if you have.
Thanks,
SJ
[...]
^ permalink raw reply [flat|nested] 13+ messages in thread