linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V3 0/1] Add swappiness argument to memory.reclaim
@ 2023-12-11 14:04 Dan Schatzberg
  2023-12-11 14:04 ` [PATCH V3 1/1] mm: add swapiness= arg " Dan Schatzberg
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Schatzberg @ 2023-12-11 14:04 UTC (permalink / raw)
  To: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang
  Cc: linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song,
	Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang,
	Vishal Moola (Oracle),
	Mina Almasry, Yue Zhao, Hugh Dickins

Changes since V2:
  * No functional change
  * Used int consistently rather than a pointer

Changes since V1:
  * Added documentation

This patch proposes augmenting the memory.reclaim interface with a
swappiness=<val> argument that overrides the swappiness value for that instance
of proactive reclaim.

Userspace proactive reclaimers use the memory.reclaim interface to trigger
reclaim. The memory.reclaim interface does not allow for any way to effect the
balance of file vs anon during proactive reclaim. The only approach is to adjust
the vm.swappiness setting. However, there are a few reasons we look to control
the balance of file vs anon during proactive reclaim, separately from reactive
reclaim:

* Swapout should be limited to manage SSD write endurance. In near-OOM
  situations we are fine with lots of swap-out to avoid OOMs. As these are
  typically rare events, they have relatively little impact on write endurance.
  However, proactive reclaim runs continuously and so its impact on SSD write
  endurance is more significant. Therefore it is desireable to control swap-out
  for proactive reclaim separately from reactive reclaim

* Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap
  exhaustion. This makes sense if the swap exhaustion is triggered due to
  reactive reclaim but less so if it is triggered due to proactive reclaim (e.g.
  one could see OOMs when free memory is ample but anon is just particularly
  cold). Therefore, it's desireable to have proactive reclaim reduce or stop
  swap-out before the threshold at which OOM killing occurs.

In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before
writes to memory.reclaim[2]. This has been in production for nearly two years
and has addressed our needs to control proactive vs reactive reclaim behavior
but is still not ideal for a number of reasons:

* vm.swappiness is a global setting, adjusting it can race/interfere with other
  system administration that wishes to control vm.swappiness. In our case, we
  need to disable Senpai before adjusting vm.swappiness.

* vm.swappiness is stateful - so a crash or restart of Senpai can leave a
  misconfigured setting. This requires some additional management to record the
  "desired" setting and ensure Senpai always adjusts to it.

With this patch, we avoid these downsides of adjusting vm.swappiness globally.

Previously, this exact interface addition was proposed by Yosry[3]. In response,
Roman proposed instead an interface to specify precise file/anon/slab reclaim
amounts[4]. More recently Huan also proposed this as well[5] and others
similarly questioned if this was the proper interface.

Previous proposals sought to use this to allow proactive reclaimers to
effectively perform a custom reclaim algorithm by issuing proactive reclaim with
different settings to control file vs anon reclaim (e.g. to only reclaim anon
from some applications). Responses argued that adjusting swappiness is a poor
interface for custom reclaim.

In contrast, I argue in favor of a swappiness setting not as a way to implement
custom reclaim algorithms but rather to bias the balance of anon vs file due to
differences of proactive vs reactive reclaim. In this context, swappiness is the
existing interface for controlling this balance and this patch simply allows for
it to be configured differently for proactive vs reactive reclaim.

Specifying explicit amounts of anon vs file pages to reclaim feels inappropriate
for this prupose. Proactive reclaimers are un-aware of the relative age of file
vs anon for a cgroup which makes it difficult to manage proactive reclaim of
different memory pools. A proactive reclaimer would need some amount of anon
reclaim attempts separate from the amount of file reclaim attempts which seems
brittle given that it's difficult to observe the impact.

[1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html
[2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598
[3]https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/
[4]https://lore.kernel.org/linux-mm/YoPHtHXzpK51F%2F1Z@carbon/
[5]https://lore.kernel.org/lkml/20231108065818.19932-1-link@vivo.com/

Dan Schatzberg (1):
  mm: add swapiness= arg to memory.reclaim

 Documentation/admin-guide/cgroup-v2.rst | 15 ++++++-
 include/linux/swap.h                    |  3 +-
 mm/memcontrol.c                         | 55 ++++++++++++++++++++-----
 mm/vmscan.c                             | 13 +++++-
 4 files changed, 70 insertions(+), 16 deletions(-)

-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-11 14:04 [PATCH V3 0/1] Add swappiness argument to memory.reclaim Dan Schatzberg
@ 2023-12-11 14:04 ` Dan Schatzberg
  2023-12-11 19:41   ` Yosry Ahmed
  2023-12-12  1:06   ` Chris Li
  0 siblings, 2 replies; 10+ messages in thread
From: Dan Schatzberg @ 2023-12-11 14:04 UTC (permalink / raw)
  To: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang
  Cc: linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song,
	Andrew Morton, David Hildenbrand, Matthew Wilcox, Chris Li,
	Kefeng Wang, Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

Allow proactive reclaimers to submit an additional swappiness=<val>
argument to memory.reclaim. This overrides the global or per-memcg
swappiness setting for that reclaim attempt.

For example:

echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim

will perform reclaim on the rootcg with a swappiness setting of 0 (no
swap) regardless of the vm.swappiness sysctl setting.

Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
---
 Documentation/admin-guide/cgroup-v2.rst | 15 ++++++-
 include/linux/swap.h                    |  3 +-
 mm/memcontrol.c                         | 55 ++++++++++++++++++++-----
 mm/vmscan.c                             | 13 +++++-
 4 files changed, 70 insertions(+), 16 deletions(-)

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 3f85254f3cef..fc2b379dbd0f 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1282,8 +1282,8 @@ PAGE_SIZE multiple when read back.
 	This is a simple interface to trigger memory reclaim in the
 	target cgroup.
 
-	This file accepts a single key, the number of bytes to reclaim.
-	No nested keys are currently supported.
+	This file accepts a string which containers thhe number of bytes
+	to reclaim.
 
 	Example::
 
@@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back.
 	This means that the networking layer will not adapt based on
 	reclaim induced by memory.reclaim.
 
+	This file also allows the user to specify the swappiness value
+	to be used for the reclaim. For example:
+
+	  echo "1G swappiness=60" > memory.reclaim
+
+	The above instructs the kernel to perform the reclaim with
+	a swappiness value of 60. Note that this has the same semantics
+	as the vm.swappiness sysctl - it sets the relative IO cost of
+	reclaiming anon vs file memory but does not allow for reclaiming
+	specific amounts of anon or file memory.
+
   memory.peak
 	A read-only single value file which exists on non-root
 	cgroups.
diff --git a/include/linux/swap.h b/include/linux/swap.h
index f6dd6575b905..ebc20d094609 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -410,7 +410,8 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
 extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 						  unsigned long nr_pages,
 						  gfp_t gfp_mask,
-						  unsigned int reclaim_options);
+						  unsigned int reclaim_options,
+						  int swappiness);
 extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem,
 						gfp_t gfp_mask, bool noswap,
 						pg_data_t *pgdat,
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 1c1061df9cd1..74598c17d3cc 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -63,6 +63,7 @@
 #include <linux/resume_user_mode.h>
 #include <linux/psi.h>
 #include <linux/seq_buf.h>
+#include <linux/parser.h>
 #include <linux/sched/isolation.h>
 #include "internal.h"
 #include <net/sock.h>
@@ -2449,7 +2450,7 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg,
 		psi_memstall_enter(&pflags);
 		nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages,
 							gfp_mask,
-							MEMCG_RECLAIM_MAY_SWAP);
+							MEMCG_RECLAIM_MAY_SWAP, -1);
 		psi_memstall_leave(&pflags);
 	} while ((memcg = parent_mem_cgroup(memcg)) &&
 		 !mem_cgroup_is_root(memcg));
@@ -2740,7 +2741,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
 
 	psi_memstall_enter(&pflags);
 	nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages,
-						    gfp_mask, reclaim_options);
+						    gfp_mask, reclaim_options, -1);
 	psi_memstall_leave(&pflags);
 
 	if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
@@ -3660,7 +3661,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg,
 		}
 
 		if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL,
-					memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP)) {
+					memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, -1)) {
 			ret = -EBUSY;
 			break;
 		}
@@ -3774,7 +3775,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
 			return -EINTR;
 
 		if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL,
-						  MEMCG_RECLAIM_MAY_SWAP))
+						  MEMCG_RECLAIM_MAY_SWAP, -1))
 			nr_retries--;
 	}
 
@@ -6720,7 +6721,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
 		}
 
 		reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
-					GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP);
+					GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, -1);
 
 		if (!reclaimed && !nr_retries--)
 			break;
@@ -6769,7 +6770,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
 
 		if (nr_reclaims) {
 			if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max,
-					GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP))
+					GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, -1))
 				nr_reclaims--;
 			continue;
 		}
@@ -6895,6 +6896,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
 	return nbytes;
 }
 
+enum {
+	MEMORY_RECLAIM_SWAPPINESS = 0,
+	MEMORY_RECLAIM_NULL,
+};
+
+static const match_table_t if_tokens = {
+	{ MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
+	{ MEMORY_RECLAIM_NULL, NULL },
+};
+
 static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 			      size_t nbytes, loff_t off)
 {
@@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 	unsigned int nr_retries = MAX_RECLAIM_RETRIES;
 	unsigned long nr_to_reclaim, nr_reclaimed = 0;
 	unsigned int reclaim_options;
-	int err;
+	char *old_buf, *start;
+	substring_t args[MAX_OPT_ARGS];
+	int swappiness = -1;
 
 	buf = strstrip(buf);
-	err = page_counter_memparse(buf, "", &nr_to_reclaim);
-	if (err)
-		return err;
+
+	old_buf = buf;
+	nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
+	if (buf == old_buf)
+		return -EINVAL;
+
+	buf = strstrip(buf);
+
+	while ((start = strsep(&buf, " ")) != NULL) {
+		if (!strlen(start))
+			continue;
+		switch (match_token(start, if_tokens, args)) {
+		case MEMORY_RECLAIM_SWAPPINESS:
+			if (match_int(&args[0], &swappiness))
+				return -EINVAL;
+			if (swappiness < 0 || swappiness > 200)
+				return -EINVAL;
+			break;
+		default:
+			return -EINVAL;
+		}
+	}
 
 	reclaim_options	= MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE;
 	while (nr_reclaimed < nr_to_reclaim) {
@@ -6926,7 +6958,8 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
 
 		reclaimed = try_to_free_mem_cgroup_pages(memcg,
 					min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX),
-					GFP_KERNEL, reclaim_options);
+					GFP_KERNEL, reclaim_options,
+					swappiness);
 
 		if (!reclaimed && !nr_retries--)
 			return -EAGAIN;
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 506f8220c5fe..a20965b20177 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -136,6 +136,9 @@ struct scan_control {
 	/* Always discard instead of demoting to lower tier memory */
 	unsigned int no_demotion:1;
 
+	/* Swappiness value for reclaim, if -1 use memcg/global value */
+	int swappiness;
+
 	/* Allocation order */
 	s8 order;
 
@@ -2327,7 +2330,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
 	struct pglist_data *pgdat = lruvec_pgdat(lruvec);
 	struct mem_cgroup *memcg = lruvec_memcg(lruvec);
 	unsigned long anon_cost, file_cost, total_cost;
-	int swappiness = mem_cgroup_swappiness(memcg);
+	int swappiness = sc->swappiness != -1 ?
+		sc->swappiness : mem_cgroup_swappiness(memcg);
 	u64 fraction[ANON_AND_FILE];
 	u64 denominator = 0;	/* gcc */
 	enum scan_balance scan_balance;
@@ -2608,6 +2612,9 @@ static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc)
 	    mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH)
 		return 0;
 
+	if (sc->swappiness != -1)
+		return sc->swappiness;
+
 	return mem_cgroup_swappiness(memcg);
 }
 
@@ -6433,7 +6440,8 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg,
 unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 					   unsigned long nr_pages,
 					   gfp_t gfp_mask,
-					   unsigned int reclaim_options)
+					   unsigned int reclaim_options,
+					   int swappiness)
 {
 	unsigned long nr_reclaimed;
 	unsigned int noreclaim_flag;
@@ -6448,6 +6456,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
 		.may_unmap = 1,
 		.may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP),
 		.proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE),
+		.swappiness = swappiness,
 	};
 	/*
 	 * Traverse the ZONELIST_FALLBACK zonelist of the current node to put
-- 
2.34.1



^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-11 14:04 ` [PATCH V3 1/1] mm: add swapiness= arg " Dan Schatzberg
@ 2023-12-11 19:41   ` Yosry Ahmed
  2023-12-12 21:27     ` Dan Schatzberg
  2023-12-12  1:06   ` Chris Li
  1 sibling, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2023-12-11 19:41 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel,
	cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton,
	David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang,
	Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
>
> Allow proactive reclaimers to submit an additional swappiness=<val>
> argument to memory.reclaim. This overrides the global or per-memcg
> swappiness setting for that reclaim attempt.
>
> For example:
>
> echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim
>
> will perform reclaim on the rootcg with a swappiness setting of 0 (no
> swap) regardless of the vm.swappiness sysctl setting.
>
> Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 15 ++++++-
>  include/linux/swap.h                    |  3 +-
>  mm/memcontrol.c                         | 55 ++++++++++++++++++++-----
>  mm/vmscan.c                             | 13 +++++-
>  4 files changed, 70 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 3f85254f3cef..fc2b379dbd0f 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1282,8 +1282,8 @@ PAGE_SIZE multiple when read back.
>         This is a simple interface to trigger memory reclaim in the
>         target cgroup.
>
> -       This file accepts a single key, the number of bytes to reclaim.
> -       No nested keys are currently supported.
> +       This file accepts a string which containers thhe number of bytes

contains* the*

I think this statement was only important because no keys were
supported, so I think we can remove it completely and rely on
documenting the supported keys below like other interfaces, see my
next comment.

> +       to reclaim.
>
>         Example::
>
> @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back.
>         This means that the networking layer will not adapt based on
>         reclaim induced by memory.reclaim.
>
> +       This file also allows the user to specify the swappiness value
> +       to be used for the reclaim. For example:
> +
> +         echo "1G swappiness=60" > memory.reclaim
> +
> +       The above instructs the kernel to perform the reclaim with
> +       a swappiness value of 60. Note that this has the same semantics
> +       as the vm.swappiness sysctl - it sets the relative IO cost of
> +       reclaiming anon vs file memory but does not allow for reclaiming
> +       specific amounts of anon or file memory.
> +

Can we instead follow the same format used by other nested-keyed files
(e.g. io.max)? This usually involves a table of supported keys and
such.

>    memory.peak
>         A read-only single value file which exists on non-root
>         cgroups.
[..]
> @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
>         unsigned int nr_retries = MAX_RECLAIM_RETRIES;
>         unsigned long nr_to_reclaim, nr_reclaimed = 0;
>         unsigned int reclaim_options;
> -       int err;
> +       char *old_buf, *start;
> +       substring_t args[MAX_OPT_ARGS];
> +       int swappiness = -1;
>
>         buf = strstrip(buf);
> -       err = page_counter_memparse(buf, "", &nr_to_reclaim);
> -       if (err)
> -               return err;
> +
> +       old_buf = buf;
> +       nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
> +       if (buf == old_buf)
> +               return -EINVAL;
> +
> +       buf = strstrip(buf);
> +
> +       while ((start = strsep(&buf, " ")) != NULL) {
> +               if (!strlen(start))
> +                       continue;
> +               switch (match_token(start, if_tokens, args)) {
> +               case MEMORY_RECLAIM_SWAPPINESS:
> +                       if (match_int(&args[0], &swappiness))
> +                               return -EINVAL;
> +                       if (swappiness < 0 || swappiness > 200)

I am not a fan of extending the hardcoded 0 and 200 values, and now
the new -1 value. Maybe it's time to create constants for the min and
max swappiness values instead of hardcoding them everywhere? This can
be a separate preparatory patch. Then, -1 (or any invalid value) can
also be added as a constant with a useful name, instead of passing -1
to all other callers.

This should make the code a little bit more readable and easier to extend.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-11 14:04 ` [PATCH V3 1/1] mm: add swapiness= arg " Dan Schatzberg
  2023-12-11 19:41   ` Yosry Ahmed
@ 2023-12-12  1:06   ` Chris Li
  2023-12-12 21:43     ` Dan Schatzberg
  1 sibling, 1 reply; 10+ messages in thread
From: Chris Li @ 2023-12-12  1:06 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang,
	linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song,
	Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang,
	Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

Hi Dan,

Thank you for the patch.

On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
>
> Allow proactive reclaimers to submit an additional swappiness=<val>
> argument to memory.reclaim. This overrides the global or per-memcg
> swappiness setting for that reclaim attempt.

I am curious what prompted you to develop this patch. I understand
what this patch does, just want to know more of your background story
why this is needed.

>
> For example:
>
> echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim
>
> will perform reclaim on the rootcg with a swappiness setting of 0 (no
> swap) regardless of the vm.swappiness sysctl setting.
>
> Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com>
> ---
>  Documentation/admin-guide/cgroup-v2.rst | 15 ++++++-
>  include/linux/swap.h                    |  3 +-
>  mm/memcontrol.c                         | 55 ++++++++++++++++++++-----
>  mm/vmscan.c                             | 13 +++++-
>  4 files changed, 70 insertions(+), 16 deletions(-)
>
> diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
> index 3f85254f3cef..fc2b379dbd0f 100644
> --- a/Documentation/admin-guide/cgroup-v2.rst
> +++ b/Documentation/admin-guide/cgroup-v2.rst
> @@ -1282,8 +1282,8 @@ PAGE_SIZE multiple when read back.
>         This is a simple interface to trigger memory reclaim in the
>         target cgroup.
>
> -       This file accepts a single key, the number of bytes to reclaim.
> -       No nested keys are currently supported.
> +       This file accepts a string which containers thhe number of bytes
Just as Yosry points out, there is some typo here.

"contains the"

> +       to reclaim.
>
>         Example::
>
> @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back.
>         This means that the networking layer will not adapt based on
>         reclaim induced by memory.reclaim.
>
> +       This file also allows the user to specify the swappiness value
> +       to be used for the reclaim. For example:
> +
> +         echo "1G swappiness=60" > memory.reclaim
> +
> +       The above instructs the kernel to perform the reclaim with
> +       a swappiness value of 60. Note that this has the same semantics
> +       as the vm.swappiness sysctl - it sets the relative IO cost of
> +       reclaiming anon vs file memory but does not allow for reclaiming
> +       specific amounts of anon or file memory.
> +
>    memory.peak
>         A read-only single value file which exists on non-root
>         cgroups.
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index f6dd6575b905..ebc20d094609 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -410,7 +410,8 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order,
>  extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
>                                                   unsigned long nr_pages,
>                                                   gfp_t gfp_mask,
> -                                                 unsigned int reclaim_options);
> +                                                 unsigned int reclaim_options,
> +                                                 int swappiness);
>  extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem,
>                                                 gfp_t gfp_mask, bool noswap,
>                                                 pg_data_t *pgdat,
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 1c1061df9cd1..74598c17d3cc 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -63,6 +63,7 @@
>  #include <linux/resume_user_mode.h>
>  #include <linux/psi.h>
>  #include <linux/seq_buf.h>
> +#include <linux/parser.h>
>  #include <linux/sched/isolation.h>
>  #include "internal.h"
>  #include <net/sock.h>
> @@ -2449,7 +2450,7 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg,
>                 psi_memstall_enter(&pflags);
>                 nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages,
>                                                         gfp_mask,
> -                                                       MEMCG_RECLAIM_MAY_SWAP);
> +                                                       MEMCG_RECLAIM_MAY_SWAP, -1);

Instead of passing -1, maybe we can use mem_cgroup_swappiness(memcg);

And maybe remove the -1 test from the get_scan_count().

""
static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
   unsigned long *nr)
{
struct pglist_data *pgdat = lruvec_pgdat(lruvec);
struct mem_cgroup *memcg = lruvec_memcg(lruvec);
unsigned long anon_cost, file_cost, total_cost;
int swappiness = sc->swappiness != -1 ?
sc->swappiness : mem_cgroup_swappiness(memcg);
""

In other words, I feel it is cleaner if try_to_free_mem_cgroup_pages
accept the swappiness as it is. There is no second guessing the value
if it is -1, then the internal function gets the swappiness from
mem_cgroup_swappiness().
Maybe we can completely remove the -1 special default value?

>                 psi_memstall_leave(&pflags);
>         } while ((memcg = parent_mem_cgroup(memcg)) &&
>                  !mem_cgroup_is_root(memcg));
> @@ -2740,7 +2741,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask,
>
>         psi_memstall_enter(&pflags);
>         nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages,
> -                                                   gfp_mask, reclaim_options);
> +                                                   gfp_mask, reclaim_options, -1);

Same here.

>         psi_memstall_leave(&pflags);
>
>         if (mem_cgroup_margin(mem_over_limit) >= nr_pages)
> @@ -3660,7 +3661,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg,
>                 }
>
>                 if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL,
> -                                       memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP)) {
> +                                       memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, -1)) {

Same here.

>                         ret = -EBUSY;
>                         break;
>                 }
> @@ -3774,7 +3775,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg)
>                         return -EINTR;
>
>                 if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL,
> -                                                 MEMCG_RECLAIM_MAY_SWAP))
> +                                                 MEMCG_RECLAIM_MAY_SWAP, -1))

Here too.

>                         nr_retries--;
>         }
>
> @@ -6720,7 +6721,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of,
>                 }
>
>                 reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high,
> -                                       GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP);
> +                                       GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, -1);

Here too.

>
>                 if (!reclaimed && !nr_retries--)
>                         break;
> @@ -6769,7 +6770,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of,
>
>                 if (nr_reclaims) {
>                         if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max,
> -                                       GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP))
> +                                       GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, -1))

Here too.

>                                 nr_reclaims--;
>                         continue;
>                 }
> @@ -6895,6 +6896,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
>         return nbytes;
>  }
>
> +enum {
> +       MEMORY_RECLAIM_SWAPPINESS = 0,
> +       MEMORY_RECLAIM_NULL,
> +};
> +
> +static const match_table_t if_tokens = {

What this is called "if_tokens"? I am trying to figure out what "if" refers to.

> +       { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
> +       { MEMORY_RECLAIM_NULL, NULL },
> +};
> +

Do we foresee a lot of tunable for the try to free page? I see. You
want to use match_token() to do the keyword parsing.

>  static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
>                               size_t nbytes, loff_t off)
>  {
> @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
>         unsigned int nr_retries = MAX_RECLAIM_RETRIES;
>         unsigned long nr_to_reclaim, nr_reclaimed = 0;
>         unsigned int reclaim_options;
> -       int err;
> +       char *old_buf, *start;
> +       substring_t args[MAX_OPT_ARGS];
> +       int swappiness = -1;
>
>         buf = strstrip(buf);
> -       err = page_counter_memparse(buf, "", &nr_to_reclaim);
> -       if (err)
> -               return err;
> +
> +       old_buf = buf;
> +       nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
> +       if (buf == old_buf)
> +               return -EINVAL;
> +
> +       buf = strstrip(buf);
> +
> +       while ((start = strsep(&buf, " ")) != NULL) {
> +               if (!strlen(start))
> +                       continue;
> +               switch (match_token(start, if_tokens, args)) {
> +               case MEMORY_RECLAIM_SWAPPINESS:
> +                       if (match_int(&args[0], &swappiness))
> +                               return -EINVAL;
> +                       if (swappiness < 0 || swappiness > 200)

Agree with Yosry on the 200 magic value.

I am also wondering if there is an easier way to just parse one
keyword. Will using strcmp("swappiness=") be a bad idea? I haven't
tried it myself though.

> +                               return -EINVAL;
> +                       break;
> +               default:
> +                       return -EINVAL;
> +               }
> +       }
>
>         reclaim_options = MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE;
>         while (nr_reclaimed < nr_to_reclaim) {
> @@ -6926,7 +6958,8 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
>
>                 reclaimed = try_to_free_mem_cgroup_pages(memcg,
>                                         min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX),
> -                                       GFP_KERNEL, reclaim_options);
> +                                       GFP_KERNEL, reclaim_options,
> +                                       swappiness);
>
>                 if (!reclaimed && !nr_retries--)
>                         return -EAGAIN;
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 506f8220c5fe..a20965b20177 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -136,6 +136,9 @@ struct scan_control {
>         /* Always discard instead of demoting to lower tier memory */
>         unsigned int no_demotion:1;
>
> +       /* Swappiness value for reclaim, if -1 use memcg/global value */
> +       int swappiness;
> +

It would be great if we can get rid of the -1 case.

>         /* Allocation order */
>         s8 order;
>
> @@ -2327,7 +2330,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc,
>         struct pglist_data *pgdat = lruvec_pgdat(lruvec);
>         struct mem_cgroup *memcg = lruvec_memcg(lruvec);
>         unsigned long anon_cost, file_cost, total_cost;
> -       int swappiness = mem_cgroup_swappiness(memcg);
> +       int swappiness = sc->swappiness != -1 ?
> +               sc->swappiness : mem_cgroup_swappiness(memcg);

Let's see if we can get rid of the -1. Now I see the -1 is actually
introduced by this patch, should be doable.

>         u64 fraction[ANON_AND_FILE];
>         u64 denominator = 0;    /* gcc */
>         enum scan_balance scan_balance;
> @@ -2608,6 +2612,9 @@ static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc)
>             mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH)
>                 return 0;
>
> +       if (sc->swappiness != -1)
> +               return sc->swappiness;
> +

Get rid of that too.

Thank you.

Chris


>         return mem_cgroup_swappiness(memcg);
>  }
>
> @@ -6433,7 +6440,8 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg,
>  unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
>                                            unsigned long nr_pages,
>                                            gfp_t gfp_mask,
> -                                          unsigned int reclaim_options)
> +                                          unsigned int reclaim_options,
> +                                          int swappiness)
>  {
>         unsigned long nr_reclaimed;
>         unsigned int noreclaim_flag;
> @@ -6448,6 +6456,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg,
>                 .may_unmap = 1,
>                 .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP),
>                 .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE),
> +               .swappiness = swappiness,
>         };
>         /*
>          * Traverse the ZONELIST_FALLBACK zonelist of the current node to put
> --
> 2.34.1
>
>


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-11 19:41   ` Yosry Ahmed
@ 2023-12-12 21:27     ` Dan Schatzberg
  2023-12-12 21:31       ` Yosry Ahmed
  2023-12-12 21:32       ` Chris Li
  0 siblings, 2 replies; 10+ messages in thread
From: Dan Schatzberg @ 2023-12-12 21:27 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel,
	cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton,
	David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang,
	Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

On Mon, Dec 11, 2023 at 11:41:24AM -0800, Yosry Ahmed wrote:
> On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
> 
> contains* the*
> 
> I think this statement was only important because no keys were
> supported, so I think we can remove it completely and rely on
> documenting the supported keys below like other interfaces, see my
> next comment.
> 
> > +       to reclaim.
> >
> >         Example::
> >
> > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back.
> >         This means that the networking layer will not adapt based on
> >         reclaim induced by memory.reclaim.
> >
> > +       This file also allows the user to specify the swappiness value
> > +       to be used for the reclaim. For example:
> > +
> > +         echo "1G swappiness=60" > memory.reclaim
> > +
> > +       The above instructs the kernel to perform the reclaim with
> > +       a swappiness value of 60. Note that this has the same semantics
> > +       as the vm.swappiness sysctl - it sets the relative IO cost of
> > +       reclaiming anon vs file memory but does not allow for reclaiming
> > +       specific amounts of anon or file memory.
> > +
> 
> Can we instead follow the same format used by other nested-keyed files
> (e.g. io.max)? This usually involves a table of supported keys and
> such.

Thanks, both are good suggestions. Will address these.

> > +       while ((start = strsep(&buf, " ")) != NULL) {
> > +               if (!strlen(start))
> > +                       continue;
> > +               switch (match_token(start, if_tokens, args)) {
> > +               case MEMORY_RECLAIM_SWAPPINESS:
> > +                       if (match_int(&args[0], &swappiness))
> > +                               return -EINVAL;
> > +                       if (swappiness < 0 || swappiness > 200)
> 
> I am not a fan of extending the hardcoded 0 and 200 values, and now
> the new -1 value. Maybe it's time to create constants for the min and
> max swappiness values instead of hardcoding them everywhere? This can
> be a separate preparatory patch. Then, -1 (or any invalid value) can
> also be added as a constant with a useful name, instead of passing -1
> to all other callers.
> 
> This should make the code a little bit more readable and easier to extend.

I'm not sure I understand the concern. This check just validates that
the swappiness value inputted is between 0 and 200 (inclusive)
otherwise the interface returns -EINVAL. Are you just concerned that
these constants are not named explicitly so they can be reused
elsewhere in the code?


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-12 21:27     ` Dan Schatzberg
@ 2023-12-12 21:31       ` Yosry Ahmed
  2023-12-12 21:46         ` Dan Schatzberg
  2023-12-12 21:32       ` Chris Li
  1 sibling, 1 reply; 10+ messages in thread
From: Yosry Ahmed @ 2023-12-12 21:31 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel,
	cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton,
	David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang,
	Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

On Tue, Dec 12, 2023 at 1:27 PM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
>
> On Mon, Dec 11, 2023 at 11:41:24AM -0800, Yosry Ahmed wrote:
> > On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
> >
> > contains* the*
> >
> > I think this statement was only important because no keys were
> > supported, so I think we can remove it completely and rely on
> > documenting the supported keys below like other interfaces, see my
> > next comment.
> >
> > > +       to reclaim.
> > >
> > >         Example::
> > >
> > > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back.
> > >         This means that the networking layer will not adapt based on
> > >         reclaim induced by memory.reclaim.
> > >
> > > +       This file also allows the user to specify the swappiness value
> > > +       to be used for the reclaim. For example:
> > > +
> > > +         echo "1G swappiness=60" > memory.reclaim
> > > +
> > > +       The above instructs the kernel to perform the reclaim with
> > > +       a swappiness value of 60. Note that this has the same semantics
> > > +       as the vm.swappiness sysctl - it sets the relative IO cost of
> > > +       reclaiming anon vs file memory but does not allow for reclaiming
> > > +       specific amounts of anon or file memory.
> > > +
> >
> > Can we instead follow the same format used by other nested-keyed files
> > (e.g. io.max)? This usually involves a table of supported keys and
> > such.
>
> Thanks, both are good suggestions. Will address these.
>
> > > +       while ((start = strsep(&buf, " ")) != NULL) {
> > > +               if (!strlen(start))
> > > +                       continue;
> > > +               switch (match_token(start, if_tokens, args)) {
> > > +               case MEMORY_RECLAIM_SWAPPINESS:
> > > +                       if (match_int(&args[0], &swappiness))
> > > +                               return -EINVAL;
> > > +                       if (swappiness < 0 || swappiness > 200)
> >
> > I am not a fan of extending the hardcoded 0 and 200 values, and now
> > the new -1 value. Maybe it's time to create constants for the min and
> > max swappiness values instead of hardcoding them everywhere? This can
> > be a separate preparatory patch. Then, -1 (or any invalid value) can
> > also be added as a constant with a useful name, instead of passing -1
> > to all other callers.
> >
> > This should make the code a little bit more readable and easier to extend.
>
> I'm not sure I understand the concern. This check just validates that
> the swappiness value inputted is between 0 and 200 (inclusive)
> otherwise the interface returns -EINVAL. Are you just concerned that
> these constants are not named explicitly so they can be reused
> elsewhere in the code?

Yes. The 0 and 200 values are already hardcoded in multiple places,
and we are adding more places now and more hardcoded values (i.e. -1).


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-12 21:27     ` Dan Schatzberg
  2023-12-12 21:31       ` Yosry Ahmed
@ 2023-12-12 21:32       ` Chris Li
  1 sibling, 0 replies; 10+ messages in thread
From: Chris Li @ 2023-12-12 21:32 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Yosry Ahmed, Johannes Weiner, Roman Gushchin, Huan Yang,
	linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song,
	Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang,
	Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

Hi Dan,

On Tue, Dec 12, 2023 at 1:27 PM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
>
> > > +       while ((start = strsep(&buf, " ")) != NULL) {
> > > +               if (!strlen(start))
> > > +                       continue;
> > > +               switch (match_token(start, if_tokens, args)) {
> > > +               case MEMORY_RECLAIM_SWAPPINESS:
> > > +                       if (match_int(&args[0], &swappiness))
> > > +                               return -EINVAL;
> > > +                       if (swappiness < 0 || swappiness > 200)
> >
> > I am not a fan of extending the hardcoded 0 and 200 values, and now
> > the new -1 value. Maybe it's time to create constants for the min and
> > max swappiness values instead of hardcoding them everywhere? This can
> > be a separate preparatory patch. Then, -1 (or any invalid value) can
> > also be added as a constant with a useful name, instead of passing -1
> > to all other callers.
> >
> > This should make the code a little bit more readable and easier to extend.
>
> I'm not sure I understand the concern. This check just validates that
> the swappiness value inputted is between 0 and 200 (inclusive)
> otherwise the interface returns -EINVAL. Are you just concerned that
> these constants are not named explicitly so they can be reused
> elsewhere in the code?
>

I think the concern is why 200? Why not 400 or 600?

The user might write bigger values and expect the reclaim to work with
those values.
If there is some hard coded limit enforced somewhere else so writing
more than 200 does not make sense. It would be nice to have those
other places reference this limit as well. Thus give 200 a name and
use it in other places of the code as well.

Chris


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-12  1:06   ` Chris Li
@ 2023-12-12 21:43     ` Dan Schatzberg
  2023-12-13  0:12       ` Chris Li
  0 siblings, 1 reply; 10+ messages in thread
From: Dan Schatzberg @ 2023-12-12 21:43 UTC (permalink / raw)
  To: Chris Li
  Cc: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang,
	linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song,
	Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang,
	Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

On Mon, Dec 11, 2023 at 05:06:54PM -0800, Chris Li wrote:
> Hi Dan,
> 
> Thank you for the patch.
> 
> On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
> >
> > Allow proactive reclaimers to submit an additional swappiness=<val>
> > argument to memory.reclaim. This overrides the global or per-memcg
> > swappiness setting for that reclaim attempt.
> 
> I am curious what prompted you to develop this patch. I understand
> what this patch does, just want to know more of your background story
> why this is needed.

I wrote about this in some detail in the cover letter (0/1). Take a
look and let me know if the rationale is still unclear.

> Instead of passing -1, maybe we can use mem_cgroup_swappiness(memcg);
>

Yeah this makes sense, I'll go ahead and make that change and
eliminate the -1.

> >                                 nr_reclaims--;
> >                         continue;
> >                 }
> > @@ -6895,6 +6896,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
> >         return nbytes;
> >  }
> >
> > +enum {
> > +       MEMORY_RECLAIM_SWAPPINESS = 0,
> > +       MEMORY_RECLAIM_NULL,
> > +};
> > +
> > +static const match_table_t if_tokens = {
> 
> What this is called "if_tokens"? I am trying to figure out what "if" refers to.

I used the same logic as in "mm: Add nodes= arg to memory.reclaim". I
can just call it tokens.

> 
> > +       { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"},
> > +       { MEMORY_RECLAIM_NULL, NULL },
> > +};
> > +
> 
> Do we foresee a lot of tunable for the try to free page? I see. You
> want to use match_token() to do the keyword parsing.

See below

> 
> >  static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
> >                               size_t nbytes, loff_t off)
> >  {
> > @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf,
> >         unsigned int nr_retries = MAX_RECLAIM_RETRIES;
> >         unsigned long nr_to_reclaim, nr_reclaimed = 0;
> >         unsigned int reclaim_options;
> > -       int err;
> > +       char *old_buf, *start;
> > +       substring_t args[MAX_OPT_ARGS];
> > +       int swappiness = -1;
> >
> >         buf = strstrip(buf);
> > -       err = page_counter_memparse(buf, "", &nr_to_reclaim);
> > -       if (err)
> > -               return err;
> > +
> > +       old_buf = buf;
> > +       nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
> > +       if (buf == old_buf)
> > +               return -EINVAL;
> > +
> > +       buf = strstrip(buf);
> > +
> > +       while ((start = strsep(&buf, " ")) != NULL) {
> > +               if (!strlen(start))
> > +                       continue;
> > +               switch (match_token(start, if_tokens, args)) {
> > +               case MEMORY_RECLAIM_SWAPPINESS:
> > +                       if (match_int(&args[0], &swappiness))
> > +                               return -EINVAL;
> > +                       if (swappiness < 0 || swappiness > 200)
> 
> Agree with Yosry on the 200 magic value.
> 
> I am also wondering if there is an easier way to just parse one
> keyword. Will using strcmp("swappiness=") be a bad idea? I haven't
> tried it myself though.

As above, "mm: Add nodes= arg to memory.reclaim" was previously in the
mm tree doing it this way, so I duplicated it. I think given that
there have been lots of discussions about extending this interface,
this match table has some potential future value and I don't see a
major downside to using it in favor of strcmp.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-12 21:31       ` Yosry Ahmed
@ 2023-12-12 21:46         ` Dan Schatzberg
  0 siblings, 0 replies; 10+ messages in thread
From: Dan Schatzberg @ 2023-12-12 21:46 UTC (permalink / raw)
  To: Yosry Ahmed
  Cc: Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel,
	cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet,
	Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton,
	David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang,
	Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

On Tue, Dec 12, 2023 at 01:31:46PM -0800, Yosry Ahmed wrote:
> On Tue, Dec 12, 2023 at 1:27 PM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
> >
> > On Mon, Dec 11, 2023 at 11:41:24AM -0800, Yosry Ahmed wrote:
> > > On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
> > >
> > > contains* the*
> > >
> > > I think this statement was only important because no keys were
> > > supported, so I think we can remove it completely and rely on
> > > documenting the supported keys below like other interfaces, see my
> > > next comment.
> > >
> > > > +       to reclaim.
> > > >
> > > >         Example::
> > > >
> > > > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back.
> > > >         This means that the networking layer will not adapt based on
> > > >         reclaim induced by memory.reclaim.
> > > >
> > > > +       This file also allows the user to specify the swappiness value
> > > > +       to be used for the reclaim. For example:
> > > > +
> > > > +         echo "1G swappiness=60" > memory.reclaim
> > > > +
> > > > +       The above instructs the kernel to perform the reclaim with
> > > > +       a swappiness value of 60. Note that this has the same semantics
> > > > +       as the vm.swappiness sysctl - it sets the relative IO cost of
> > > > +       reclaiming anon vs file memory but does not allow for reclaiming
> > > > +       specific amounts of anon or file memory.
> > > > +
> > >
> > > Can we instead follow the same format used by other nested-keyed files
> > > (e.g. io.max)? This usually involves a table of supported keys and
> > > such.
> >
> > Thanks, both are good suggestions. Will address these.
> >
> > > > +       while ((start = strsep(&buf, " ")) != NULL) {
> > > > +               if (!strlen(start))
> > > > +                       continue;
> > > > +               switch (match_token(start, if_tokens, args)) {
> > > > +               case MEMORY_RECLAIM_SWAPPINESS:
> > > > +                       if (match_int(&args[0], &swappiness))
> > > > +                               return -EINVAL;
> > > > +                       if (swappiness < 0 || swappiness > 200)
> > >
> > > I am not a fan of extending the hardcoded 0 and 200 values, and now
> > > the new -1 value. Maybe it's time to create constants for the min and
> > > max swappiness values instead of hardcoding them everywhere? This can
> > > be a separate preparatory patch. Then, -1 (or any invalid value) can
> > > also be added as a constant with a useful name, instead of passing -1
> > > to all other callers.
> > >
> > > This should make the code a little bit more readable and easier to extend.
> >
> > I'm not sure I understand the concern. This check just validates that
> > the swappiness value inputted is between 0 and 200 (inclusive)
> > otherwise the interface returns -EINVAL. Are you just concerned that
> > these constants are not named explicitly so they can be reused
> > elsewhere in the code?
> 
> Yes. The 0 and 200 values are already hardcoded in multiple places,
> and we are adding more places now and more hardcoded values (i.e. -1).

Understood, I'll add a preparatory patch which adds DEFINEs for
MIN_SWAPPINESS and MAX_SWAPPINESS and change the usages of 0 and 200
to those. I'll also eliminate the use of -1 as Chris suggested.


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim
  2023-12-12 21:43     ` Dan Schatzberg
@ 2023-12-13  0:12       ` Chris Li
  0 siblings, 0 replies; 10+ messages in thread
From: Chris Li @ 2023-12-13  0:12 UTC (permalink / raw)
  To: Dan Schatzberg
  Cc: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang,
	linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li,
	Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song,
	Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang,
	Vishal Moola (Oracle),
	Yue Zhao, Hugh Dickins

Hi Dan,

On Tue, Dec 12, 2023 at 1:43 PM Dan Schatzberg <schatzberg.dan@gmail.com> wrote:
>
> > I am curious what prompted you to develop this patch. I understand
> > what this patch does, just want to know more of your background story
> > why this is needed.
>
> I wrote about this in some detail in the cover letter (0/1). Take a
> look and let me know if the rationale is still unclear.

Ah, found it. I was not CC on the cover letter but CC on the 1/1
patch.  That is why I did not pick up the cover letter.

Yes, the cover letter explanation was great. Exactly what I am looking for.


>
> > Instead of passing -1, maybe we can use mem_cgroup_swappiness(memcg);
> >
>
> Yeah this makes sense, I'll go ahead and make that change and
> eliminate the -1.

Thanks

>
> > >                                 nr_reclaims--;
> > >                         continue;
> > >                 }
> > > @@ -6895,6 +6896,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of,
> > >         return nbytes;
> > >  }
> > >
> > > +enum {
> > > +       MEMORY_RECLAIM_SWAPPINESS = 0,
> > > +       MEMORY_RECLAIM_NULL,
> > > +};
> > > +
> > > +static const match_table_t if_tokens = {
> >
> > What this is called "if_tokens"? I am trying to figure out what "if" refers to.
>
> I used the same logic as in "mm: Add nodes= arg to memory.reclaim". I
> can just call it tokens.

Thanks. I will take a look at that change.


> > > +
> > > +       old_buf = buf;
> > > +       nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE;
> > > +       if (buf == old_buf)
> > > +               return -EINVAL;
> > > +
> > > +       buf = strstrip(buf);
> > > +
> > > +       while ((start = strsep(&buf, " ")) != NULL) {
> > > +               if (!strlen(start))
> > > +                       continue;
> > > +               switch (match_token(start, if_tokens, args)) {
> > > +               case MEMORY_RECLAIM_SWAPPINESS:
> > > +                       if (match_int(&args[0], &swappiness))
> > > +                               return -EINVAL;
> > > +                       if (swappiness < 0 || swappiness > 200)
> >
> > Agree with Yosry on the 200 magic value.
> >
> > I am also wondering if there is an easier way to just parse one
> > keyword. Will using strcmp("swappiness=") be a bad idea? I haven't
> > tried it myself though.
>
> As above, "mm: Add nodes= arg to memory.reclaim" was previously in the
> mm tree doing it this way, so I duplicated it. I think given that
> there have been lots of discussions about extending this interface,
> this match table has some potential future value and I don't see a
> major downside to using it in favor of strcmp.

Yes, that is totally your call. I am fine as it is. Just the micro
optimization of me trying to see if there is a slimmer way to do it.

Chris


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2023-12-13  0:12 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-12-11 14:04 [PATCH V3 0/1] Add swappiness argument to memory.reclaim Dan Schatzberg
2023-12-11 14:04 ` [PATCH V3 1/1] mm: add swapiness= arg " Dan Schatzberg
2023-12-11 19:41   ` Yosry Ahmed
2023-12-12 21:27     ` Dan Schatzberg
2023-12-12 21:31       ` Yosry Ahmed
2023-12-12 21:46         ` Dan Schatzberg
2023-12-12 21:32       ` Chris Li
2023-12-12  1:06   ` Chris Li
2023-12-12 21:43     ` Dan Schatzberg
2023-12-13  0:12       ` Chris Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox