* [PATCH V3 0/1] Add swappiness argument to memory.reclaim @ 2023-12-11 14:04 Dan Schatzberg 2023-12-11 14:04 ` [PATCH V3 1/1] mm: add swapiness= arg " Dan Schatzberg 0 siblings, 1 reply; 10+ messages in thread From: Dan Schatzberg @ 2023-12-11 14:04 UTC (permalink / raw) To: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang Cc: linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang, Vishal Moola (Oracle), Mina Almasry, Yue Zhao, Hugh Dickins Changes since V2: * No functional change * Used int consistently rather than a pointer Changes since V1: * Added documentation This patch proposes augmenting the memory.reclaim interface with a swappiness=<val> argument that overrides the swappiness value for that instance of proactive reclaim. Userspace proactive reclaimers use the memory.reclaim interface to trigger reclaim. The memory.reclaim interface does not allow for any way to effect the balance of file vs anon during proactive reclaim. The only approach is to adjust the vm.swappiness setting. However, there are a few reasons we look to control the balance of file vs anon during proactive reclaim, separately from reactive reclaim: * Swapout should be limited to manage SSD write endurance. In near-OOM situations we are fine with lots of swap-out to avoid OOMs. As these are typically rare events, they have relatively little impact on write endurance. However, proactive reclaim runs continuously and so its impact on SSD write endurance is more significant. Therefore it is desireable to control swap-out for proactive reclaim separately from reactive reclaim * Some userspace OOM killers like systemd-oomd[1] support OOM killing on swap exhaustion. This makes sense if the swap exhaustion is triggered due to reactive reclaim but less so if it is triggered due to proactive reclaim (e.g. one could see OOMs when free memory is ample but anon is just particularly cold). Therefore, it's desireable to have proactive reclaim reduce or stop swap-out before the threshold at which OOM killing occurs. In the case of Meta's Senpai proactive reclaimer, we adjust vm.swappiness before writes to memory.reclaim[2]. This has been in production for nearly two years and has addressed our needs to control proactive vs reactive reclaim behavior but is still not ideal for a number of reasons: * vm.swappiness is a global setting, adjusting it can race/interfere with other system administration that wishes to control vm.swappiness. In our case, we need to disable Senpai before adjusting vm.swappiness. * vm.swappiness is stateful - so a crash or restart of Senpai can leave a misconfigured setting. This requires some additional management to record the "desired" setting and ensure Senpai always adjusts to it. With this patch, we avoid these downsides of adjusting vm.swappiness globally. Previously, this exact interface addition was proposed by Yosry[3]. In response, Roman proposed instead an interface to specify precise file/anon/slab reclaim amounts[4]. More recently Huan also proposed this as well[5] and others similarly questioned if this was the proper interface. Previous proposals sought to use this to allow proactive reclaimers to effectively perform a custom reclaim algorithm by issuing proactive reclaim with different settings to control file vs anon reclaim (e.g. to only reclaim anon from some applications). Responses argued that adjusting swappiness is a poor interface for custom reclaim. In contrast, I argue in favor of a swappiness setting not as a way to implement custom reclaim algorithms but rather to bias the balance of anon vs file due to differences of proactive vs reactive reclaim. In this context, swappiness is the existing interface for controlling this balance and this patch simply allows for it to be configured differently for proactive vs reactive reclaim. Specifying explicit amounts of anon vs file pages to reclaim feels inappropriate for this prupose. Proactive reclaimers are un-aware of the relative age of file vs anon for a cgroup which makes it difficult to manage proactive reclaim of different memory pools. A proactive reclaimer would need some amount of anon reclaim attempts separate from the amount of file reclaim attempts which seems brittle given that it's difficult to observe the impact. [1]https://www.freedesktop.org/software/systemd/man/latest/systemd-oomd.service.html [2]https://github.com/facebookincubator/oomd/blob/main/src/oomd/plugins/Senpai.cpp#L585-L598 [3]https://lore.kernel.org/linux-mm/CAJD7tkbDpyoODveCsnaqBBMZEkDvshXJmNdbk51yKSNgD7aGdg@mail.gmail.com/ [4]https://lore.kernel.org/linux-mm/YoPHtHXzpK51F%2F1Z@carbon/ [5]https://lore.kernel.org/lkml/20231108065818.19932-1-link@vivo.com/ Dan Schatzberg (1): mm: add swapiness= arg to memory.reclaim Documentation/admin-guide/cgroup-v2.rst | 15 ++++++- include/linux/swap.h | 3 +- mm/memcontrol.c | 55 ++++++++++++++++++++----- mm/vmscan.c | 13 +++++- 4 files changed, 70 insertions(+), 16 deletions(-) -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-11 14:04 [PATCH V3 0/1] Add swappiness argument to memory.reclaim Dan Schatzberg @ 2023-12-11 14:04 ` Dan Schatzberg 2023-12-11 19:41 ` Yosry Ahmed 2023-12-12 1:06 ` Chris Li 0 siblings, 2 replies; 10+ messages in thread From: Dan Schatzberg @ 2023-12-11 14:04 UTC (permalink / raw) To: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang Cc: linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins Allow proactive reclaimers to submit an additional swappiness=<val> argument to memory.reclaim. This overrides the global or per-memcg swappiness setting for that reclaim attempt. For example: echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim will perform reclaim on the rootcg with a swappiness setting of 0 (no swap) regardless of the vm.swappiness sysctl setting. Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> --- Documentation/admin-guide/cgroup-v2.rst | 15 ++++++- include/linux/swap.h | 3 +- mm/memcontrol.c | 55 ++++++++++++++++++++----- mm/vmscan.c | 13 +++++- 4 files changed, 70 insertions(+), 16 deletions(-) diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst index 3f85254f3cef..fc2b379dbd0f 100644 --- a/Documentation/admin-guide/cgroup-v2.rst +++ b/Documentation/admin-guide/cgroup-v2.rst @@ -1282,8 +1282,8 @@ PAGE_SIZE multiple when read back. This is a simple interface to trigger memory reclaim in the target cgroup. - This file accepts a single key, the number of bytes to reclaim. - No nested keys are currently supported. + This file accepts a string which containers thhe number of bytes + to reclaim. Example:: @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back. This means that the networking layer will not adapt based on reclaim induced by memory.reclaim. + This file also allows the user to specify the swappiness value + to be used for the reclaim. For example: + + echo "1G swappiness=60" > memory.reclaim + + The above instructs the kernel to perform the reclaim with + a swappiness value of 60. Note that this has the same semantics + as the vm.swappiness sysctl - it sets the relative IO cost of + reclaiming anon vs file memory but does not allow for reclaiming + specific amounts of anon or file memory. + memory.peak A read-only single value file which exists on non-root cgroups. diff --git a/include/linux/swap.h b/include/linux/swap.h index f6dd6575b905..ebc20d094609 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -410,7 +410,8 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - unsigned int reclaim_options); + unsigned int reclaim_options, + int swappiness); extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, gfp_t gfp_mask, bool noswap, pg_data_t *pgdat, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 1c1061df9cd1..74598c17d3cc 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -63,6 +63,7 @@ #include <linux/resume_user_mode.h> #include <linux/psi.h> #include <linux/seq_buf.h> +#include <linux/parser.h> #include <linux/sched/isolation.h> #include "internal.h" #include <net/sock.h> @@ -2449,7 +2450,7 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, psi_memstall_enter(&pflags); nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, gfp_mask, - MEMCG_RECLAIM_MAY_SWAP); + MEMCG_RECLAIM_MAY_SWAP, -1); psi_memstall_leave(&pflags); } while ((memcg = parent_mem_cgroup(memcg)) && !mem_cgroup_is_root(memcg)); @@ -2740,7 +2741,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, psi_memstall_enter(&pflags); nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, - gfp_mask, reclaim_options); + gfp_mask, reclaim_options, -1); psi_memstall_leave(&pflags); if (mem_cgroup_margin(mem_over_limit) >= nr_pages) @@ -3660,7 +3661,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, } if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP)) { + memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, -1)) { ret = -EBUSY; break; } @@ -3774,7 +3775,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) return -EINTR; if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, - MEMCG_RECLAIM_MAY_SWAP)) + MEMCG_RECLAIM_MAY_SWAP, -1)) nr_retries--; } @@ -6720,7 +6721,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, } reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP); + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, -1); if (!reclaimed && !nr_retries--) break; @@ -6769,7 +6770,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, if (nr_reclaims) { if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP)) + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, -1)) nr_reclaims--; continue; } @@ -6895,6 +6896,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, return nbytes; } +enum { + MEMORY_RECLAIM_SWAPPINESS = 0, + MEMORY_RECLAIM_NULL, +}; + +static const match_table_t if_tokens = { + { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"}, + { MEMORY_RECLAIM_NULL, NULL }, +}; + static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off) { @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, unsigned int nr_retries = MAX_RECLAIM_RETRIES; unsigned long nr_to_reclaim, nr_reclaimed = 0; unsigned int reclaim_options; - int err; + char *old_buf, *start; + substring_t args[MAX_OPT_ARGS]; + int swappiness = -1; buf = strstrip(buf); - err = page_counter_memparse(buf, "", &nr_to_reclaim); - if (err) - return err; + + old_buf = buf; + nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE; + if (buf == old_buf) + return -EINVAL; + + buf = strstrip(buf); + + while ((start = strsep(&buf, " ")) != NULL) { + if (!strlen(start)) + continue; + switch (match_token(start, if_tokens, args)) { + case MEMORY_RECLAIM_SWAPPINESS: + if (match_int(&args[0], &swappiness)) + return -EINVAL; + if (swappiness < 0 || swappiness > 200) + return -EINVAL; + break; + default: + return -EINVAL; + } + } reclaim_options = MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE; while (nr_reclaimed < nr_to_reclaim) { @@ -6926,7 +6958,8 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, reclaimed = try_to_free_mem_cgroup_pages(memcg, min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX), - GFP_KERNEL, reclaim_options); + GFP_KERNEL, reclaim_options, + swappiness); if (!reclaimed && !nr_retries--) return -EAGAIN; diff --git a/mm/vmscan.c b/mm/vmscan.c index 506f8220c5fe..a20965b20177 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -136,6 +136,9 @@ struct scan_control { /* Always discard instead of demoting to lower tier memory */ unsigned int no_demotion:1; + /* Swappiness value for reclaim, if -1 use memcg/global value */ + int swappiness; + /* Allocation order */ s8 order; @@ -2327,7 +2330,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); unsigned long anon_cost, file_cost, total_cost; - int swappiness = mem_cgroup_swappiness(memcg); + int swappiness = sc->swappiness != -1 ? + sc->swappiness : mem_cgroup_swappiness(memcg); u64 fraction[ANON_AND_FILE]; u64 denominator = 0; /* gcc */ enum scan_balance scan_balance; @@ -2608,6 +2612,9 @@ static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc) mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH) return 0; + if (sc->swappiness != -1) + return sc->swappiness; + return mem_cgroup_swappiness(memcg); } @@ -6433,7 +6440,8 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg, unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, unsigned long nr_pages, gfp_t gfp_mask, - unsigned int reclaim_options) + unsigned int reclaim_options, + int swappiness) { unsigned long nr_reclaimed; unsigned int noreclaim_flag; @@ -6448,6 +6456,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, .may_unmap = 1, .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), + .swappiness = swappiness, }; /* * Traverse the ZONELIST_FALLBACK zonelist of the current node to put -- 2.34.1 ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-11 14:04 ` [PATCH V3 1/1] mm: add swapiness= arg " Dan Schatzberg @ 2023-12-11 19:41 ` Yosry Ahmed 2023-12-12 21:27 ` Dan Schatzberg 2023-12-12 1:06 ` Chris Li 1 sibling, 1 reply; 10+ messages in thread From: Yosry Ahmed @ 2023-12-11 19:41 UTC (permalink / raw) To: Dan Schatzberg Cc: Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > Allow proactive reclaimers to submit an additional swappiness=<val> > argument to memory.reclaim. This overrides the global or per-memcg > swappiness setting for that reclaim attempt. > > For example: > > echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim > > will perform reclaim on the rootcg with a swappiness setting of 0 (no > swap) regardless of the vm.swappiness sysctl setting. > > Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> > --- > Documentation/admin-guide/cgroup-v2.rst | 15 ++++++- > include/linux/swap.h | 3 +- > mm/memcontrol.c | 55 ++++++++++++++++++++----- > mm/vmscan.c | 13 +++++- > 4 files changed, 70 insertions(+), 16 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 3f85254f3cef..fc2b379dbd0f 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1282,8 +1282,8 @@ PAGE_SIZE multiple when read back. > This is a simple interface to trigger memory reclaim in the > target cgroup. > > - This file accepts a single key, the number of bytes to reclaim. > - No nested keys are currently supported. > + This file accepts a string which containers thhe number of bytes contains* the* I think this statement was only important because no keys were supported, so I think we can remove it completely and rely on documenting the supported keys below like other interfaces, see my next comment. > + to reclaim. > > Example:: > > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back. > This means that the networking layer will not adapt based on > reclaim induced by memory.reclaim. > > + This file also allows the user to specify the swappiness value > + to be used for the reclaim. For example: > + > + echo "1G swappiness=60" > memory.reclaim > + > + The above instructs the kernel to perform the reclaim with > + a swappiness value of 60. Note that this has the same semantics > + as the vm.swappiness sysctl - it sets the relative IO cost of > + reclaiming anon vs file memory but does not allow for reclaiming > + specific amounts of anon or file memory. > + Can we instead follow the same format used by other nested-keyed files (e.g. io.max)? This usually involves a table of supported keys and such. > memory.peak > A read-only single value file which exists on non-root > cgroups. [..] > @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, > unsigned int nr_retries = MAX_RECLAIM_RETRIES; > unsigned long nr_to_reclaim, nr_reclaimed = 0; > unsigned int reclaim_options; > - int err; > + char *old_buf, *start; > + substring_t args[MAX_OPT_ARGS]; > + int swappiness = -1; > > buf = strstrip(buf); > - err = page_counter_memparse(buf, "", &nr_to_reclaim); > - if (err) > - return err; > + > + old_buf = buf; > + nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE; > + if (buf == old_buf) > + return -EINVAL; > + > + buf = strstrip(buf); > + > + while ((start = strsep(&buf, " ")) != NULL) { > + if (!strlen(start)) > + continue; > + switch (match_token(start, if_tokens, args)) { > + case MEMORY_RECLAIM_SWAPPINESS: > + if (match_int(&args[0], &swappiness)) > + return -EINVAL; > + if (swappiness < 0 || swappiness > 200) I am not a fan of extending the hardcoded 0 and 200 values, and now the new -1 value. Maybe it's time to create constants for the min and max swappiness values instead of hardcoding them everywhere? This can be a separate preparatory patch. Then, -1 (or any invalid value) can also be added as a constant with a useful name, instead of passing -1 to all other callers. This should make the code a little bit more readable and easier to extend. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-11 19:41 ` Yosry Ahmed @ 2023-12-12 21:27 ` Dan Schatzberg 2023-12-12 21:31 ` Yosry Ahmed 2023-12-12 21:32 ` Chris Li 0 siblings, 2 replies; 10+ messages in thread From: Dan Schatzberg @ 2023-12-12 21:27 UTC (permalink / raw) To: Yosry Ahmed Cc: Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins On Mon, Dec 11, 2023 at 11:41:24AM -0800, Yosry Ahmed wrote: > On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > contains* the* > > I think this statement was only important because no keys were > supported, so I think we can remove it completely and rely on > documenting the supported keys below like other interfaces, see my > next comment. > > > + to reclaim. > > > > Example:: > > > > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back. > > This means that the networking layer will not adapt based on > > reclaim induced by memory.reclaim. > > > > + This file also allows the user to specify the swappiness value > > + to be used for the reclaim. For example: > > + > > + echo "1G swappiness=60" > memory.reclaim > > + > > + The above instructs the kernel to perform the reclaim with > > + a swappiness value of 60. Note that this has the same semantics > > + as the vm.swappiness sysctl - it sets the relative IO cost of > > + reclaiming anon vs file memory but does not allow for reclaiming > > + specific amounts of anon or file memory. > > + > > Can we instead follow the same format used by other nested-keyed files > (e.g. io.max)? This usually involves a table of supported keys and > such. Thanks, both are good suggestions. Will address these. > > + while ((start = strsep(&buf, " ")) != NULL) { > > + if (!strlen(start)) > > + continue; > > + switch (match_token(start, if_tokens, args)) { > > + case MEMORY_RECLAIM_SWAPPINESS: > > + if (match_int(&args[0], &swappiness)) > > + return -EINVAL; > > + if (swappiness < 0 || swappiness > 200) > > I am not a fan of extending the hardcoded 0 and 200 values, and now > the new -1 value. Maybe it's time to create constants for the min and > max swappiness values instead of hardcoding them everywhere? This can > be a separate preparatory patch. Then, -1 (or any invalid value) can > also be added as a constant with a useful name, instead of passing -1 > to all other callers. > > This should make the code a little bit more readable and easier to extend. I'm not sure I understand the concern. This check just validates that the swappiness value inputted is between 0 and 200 (inclusive) otherwise the interface returns -EINVAL. Are you just concerned that these constants are not named explicitly so they can be reused elsewhere in the code? ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-12 21:27 ` Dan Schatzberg @ 2023-12-12 21:31 ` Yosry Ahmed 2023-12-12 21:46 ` Dan Schatzberg 2023-12-12 21:32 ` Chris Li 1 sibling, 1 reply; 10+ messages in thread From: Yosry Ahmed @ 2023-12-12 21:31 UTC (permalink / raw) To: Dan Schatzberg Cc: Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins On Tue, Dec 12, 2023 at 1:27 PM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > On Mon, Dec 11, 2023 at 11:41:24AM -0800, Yosry Ahmed wrote: > > On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > > > contains* the* > > > > I think this statement was only important because no keys were > > supported, so I think we can remove it completely and rely on > > documenting the supported keys below like other interfaces, see my > > next comment. > > > > > + to reclaim. > > > > > > Example:: > > > > > > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back. > > > This means that the networking layer will not adapt based on > > > reclaim induced by memory.reclaim. > > > > > > + This file also allows the user to specify the swappiness value > > > + to be used for the reclaim. For example: > > > + > > > + echo "1G swappiness=60" > memory.reclaim > > > + > > > + The above instructs the kernel to perform the reclaim with > > > + a swappiness value of 60. Note that this has the same semantics > > > + as the vm.swappiness sysctl - it sets the relative IO cost of > > > + reclaiming anon vs file memory but does not allow for reclaiming > > > + specific amounts of anon or file memory. > > > + > > > > Can we instead follow the same format used by other nested-keyed files > > (e.g. io.max)? This usually involves a table of supported keys and > > such. > > Thanks, both are good suggestions. Will address these. > > > > + while ((start = strsep(&buf, " ")) != NULL) { > > > + if (!strlen(start)) > > > + continue; > > > + switch (match_token(start, if_tokens, args)) { > > > + case MEMORY_RECLAIM_SWAPPINESS: > > > + if (match_int(&args[0], &swappiness)) > > > + return -EINVAL; > > > + if (swappiness < 0 || swappiness > 200) > > > > I am not a fan of extending the hardcoded 0 and 200 values, and now > > the new -1 value. Maybe it's time to create constants for the min and > > max swappiness values instead of hardcoding them everywhere? This can > > be a separate preparatory patch. Then, -1 (or any invalid value) can > > also be added as a constant with a useful name, instead of passing -1 > > to all other callers. > > > > This should make the code a little bit more readable and easier to extend. > > I'm not sure I understand the concern. This check just validates that > the swappiness value inputted is between 0 and 200 (inclusive) > otherwise the interface returns -EINVAL. Are you just concerned that > these constants are not named explicitly so they can be reused > elsewhere in the code? Yes. The 0 and 200 values are already hardcoded in multiple places, and we are adding more places now and more hardcoded values (i.e. -1). ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-12 21:31 ` Yosry Ahmed @ 2023-12-12 21:46 ` Dan Schatzberg 0 siblings, 0 replies; 10+ messages in thread From: Dan Schatzberg @ 2023-12-12 21:46 UTC (permalink / raw) To: Yosry Ahmed Cc: Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Chris Li, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins On Tue, Dec 12, 2023 at 01:31:46PM -0800, Yosry Ahmed wrote: > On Tue, Dec 12, 2023 at 1:27 PM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > > > On Mon, Dec 11, 2023 at 11:41:24AM -0800, Yosry Ahmed wrote: > > > On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > > > > > contains* the* > > > > > > I think this statement was only important because no keys were > > > supported, so I think we can remove it completely and rely on > > > documenting the supported keys below like other interfaces, see my > > > next comment. > > > > > > > + to reclaim. > > > > > > > > Example:: > > > > > > > > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back. > > > > This means that the networking layer will not adapt based on > > > > reclaim induced by memory.reclaim. > > > > > > > > + This file also allows the user to specify the swappiness value > > > > + to be used for the reclaim. For example: > > > > + > > > > + echo "1G swappiness=60" > memory.reclaim > > > > + > > > > + The above instructs the kernel to perform the reclaim with > > > > + a swappiness value of 60. Note that this has the same semantics > > > > + as the vm.swappiness sysctl - it sets the relative IO cost of > > > > + reclaiming anon vs file memory but does not allow for reclaiming > > > > + specific amounts of anon or file memory. > > > > + > > > > > > Can we instead follow the same format used by other nested-keyed files > > > (e.g. io.max)? This usually involves a table of supported keys and > > > such. > > > > Thanks, both are good suggestions. Will address these. > > > > > > + while ((start = strsep(&buf, " ")) != NULL) { > > > > + if (!strlen(start)) > > > > + continue; > > > > + switch (match_token(start, if_tokens, args)) { > > > > + case MEMORY_RECLAIM_SWAPPINESS: > > > > + if (match_int(&args[0], &swappiness)) > > > > + return -EINVAL; > > > > + if (swappiness < 0 || swappiness > 200) > > > > > > I am not a fan of extending the hardcoded 0 and 200 values, and now > > > the new -1 value. Maybe it's time to create constants for the min and > > > max swappiness values instead of hardcoding them everywhere? This can > > > be a separate preparatory patch. Then, -1 (or any invalid value) can > > > also be added as a constant with a useful name, instead of passing -1 > > > to all other callers. > > > > > > This should make the code a little bit more readable and easier to extend. > > > > I'm not sure I understand the concern. This check just validates that > > the swappiness value inputted is between 0 and 200 (inclusive) > > otherwise the interface returns -EINVAL. Are you just concerned that > > these constants are not named explicitly so they can be reused > > elsewhere in the code? > > Yes. The 0 and 200 values are already hardcoded in multiple places, > and we are adding more places now and more hardcoded values (i.e. -1). Understood, I'll add a preparatory patch which adds DEFINEs for MIN_SWAPPINESS and MAX_SWAPPINESS and change the usages of 0 and 200 to those. I'll also eliminate the use of -1 as Chris suggested. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-12 21:27 ` Dan Schatzberg 2023-12-12 21:31 ` Yosry Ahmed @ 2023-12-12 21:32 ` Chris Li 1 sibling, 0 replies; 10+ messages in thread From: Chris Li @ 2023-12-12 21:32 UTC (permalink / raw) To: Dan Schatzberg Cc: Yosry Ahmed, Johannes Weiner, Roman Gushchin, Huan Yang, linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins Hi Dan, On Tue, Dec 12, 2023 at 1:27 PM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > > > + while ((start = strsep(&buf, " ")) != NULL) { > > > + if (!strlen(start)) > > > + continue; > > > + switch (match_token(start, if_tokens, args)) { > > > + case MEMORY_RECLAIM_SWAPPINESS: > > > + if (match_int(&args[0], &swappiness)) > > > + return -EINVAL; > > > + if (swappiness < 0 || swappiness > 200) > > > > I am not a fan of extending the hardcoded 0 and 200 values, and now > > the new -1 value. Maybe it's time to create constants for the min and > > max swappiness values instead of hardcoding them everywhere? This can > > be a separate preparatory patch. Then, -1 (or any invalid value) can > > also be added as a constant with a useful name, instead of passing -1 > > to all other callers. > > > > This should make the code a little bit more readable and easier to extend. > > I'm not sure I understand the concern. This check just validates that > the swappiness value inputted is between 0 and 200 (inclusive) > otherwise the interface returns -EINVAL. Are you just concerned that > these constants are not named explicitly so they can be reused > elsewhere in the code? > I think the concern is why 200? Why not 400 or 600? The user might write bigger values and expect the reclaim to work with those values. If there is some hard coded limit enforced somewhere else so writing more than 200 does not make sense. It would be nice to have those other places reference this limit as well. Thus give 200 a name and use it in other places of the code as well. Chris ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-11 14:04 ` [PATCH V3 1/1] mm: add swapiness= arg " Dan Schatzberg 2023-12-11 19:41 ` Yosry Ahmed @ 2023-12-12 1:06 ` Chris Li 2023-12-12 21:43 ` Dan Schatzberg 1 sibling, 1 reply; 10+ messages in thread From: Chris Li @ 2023-12-12 1:06 UTC (permalink / raw) To: Dan Schatzberg Cc: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang, linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins Hi Dan, Thank you for the patch. On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > Allow proactive reclaimers to submit an additional swappiness=<val> > argument to memory.reclaim. This overrides the global or per-memcg > swappiness setting for that reclaim attempt. I am curious what prompted you to develop this patch. I understand what this patch does, just want to know more of your background story why this is needed. > > For example: > > echo "2M swappiness=0" > /sys/fs/cgroup/memory.reclaim > > will perform reclaim on the rootcg with a swappiness setting of 0 (no > swap) regardless of the vm.swappiness sysctl setting. > > Signed-off-by: Dan Schatzberg <schatzberg.dan@gmail.com> > --- > Documentation/admin-guide/cgroup-v2.rst | 15 ++++++- > include/linux/swap.h | 3 +- > mm/memcontrol.c | 55 ++++++++++++++++++++----- > mm/vmscan.c | 13 +++++- > 4 files changed, 70 insertions(+), 16 deletions(-) > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > index 3f85254f3cef..fc2b379dbd0f 100644 > --- a/Documentation/admin-guide/cgroup-v2.rst > +++ b/Documentation/admin-guide/cgroup-v2.rst > @@ -1282,8 +1282,8 @@ PAGE_SIZE multiple when read back. > This is a simple interface to trigger memory reclaim in the > target cgroup. > > - This file accepts a single key, the number of bytes to reclaim. > - No nested keys are currently supported. > + This file accepts a string which containers thhe number of bytes Just as Yosry points out, there is some typo here. "contains the" > + to reclaim. > > Example:: > > @@ -1304,6 +1304,17 @@ PAGE_SIZE multiple when read back. > This means that the networking layer will not adapt based on > reclaim induced by memory.reclaim. > > + This file also allows the user to specify the swappiness value > + to be used for the reclaim. For example: > + > + echo "1G swappiness=60" > memory.reclaim > + > + The above instructs the kernel to perform the reclaim with > + a swappiness value of 60. Note that this has the same semantics > + as the vm.swappiness sysctl - it sets the relative IO cost of > + reclaiming anon vs file memory but does not allow for reclaiming > + specific amounts of anon or file memory. > + > memory.peak > A read-only single value file which exists on non-root > cgroups. > diff --git a/include/linux/swap.h b/include/linux/swap.h > index f6dd6575b905..ebc20d094609 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -410,7 +410,8 @@ extern unsigned long try_to_free_pages(struct zonelist *zonelist, int order, > extern unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > unsigned long nr_pages, > gfp_t gfp_mask, > - unsigned int reclaim_options); > + unsigned int reclaim_options, > + int swappiness); > extern unsigned long mem_cgroup_shrink_node(struct mem_cgroup *mem, > gfp_t gfp_mask, bool noswap, > pg_data_t *pgdat, > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > index 1c1061df9cd1..74598c17d3cc 100644 > --- a/mm/memcontrol.c > +++ b/mm/memcontrol.c > @@ -63,6 +63,7 @@ > #include <linux/resume_user_mode.h> > #include <linux/psi.h> > #include <linux/seq_buf.h> > +#include <linux/parser.h> > #include <linux/sched/isolation.h> > #include "internal.h" > #include <net/sock.h> > @@ -2449,7 +2450,7 @@ static unsigned long reclaim_high(struct mem_cgroup *memcg, > psi_memstall_enter(&pflags); > nr_reclaimed += try_to_free_mem_cgroup_pages(memcg, nr_pages, > gfp_mask, > - MEMCG_RECLAIM_MAY_SWAP); > + MEMCG_RECLAIM_MAY_SWAP, -1); Instead of passing -1, maybe we can use mem_cgroup_swappiness(memcg); And maybe remove the -1 test from the get_scan_count(). "" static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, unsigned long *nr) { struct pglist_data *pgdat = lruvec_pgdat(lruvec); struct mem_cgroup *memcg = lruvec_memcg(lruvec); unsigned long anon_cost, file_cost, total_cost; int swappiness = sc->swappiness != -1 ? sc->swappiness : mem_cgroup_swappiness(memcg); "" In other words, I feel it is cleaner if try_to_free_mem_cgroup_pages accept the swappiness as it is. There is no second guessing the value if it is -1, then the internal function gets the swappiness from mem_cgroup_swappiness(). Maybe we can completely remove the -1 special default value? > psi_memstall_leave(&pflags); > } while ((memcg = parent_mem_cgroup(memcg)) && > !mem_cgroup_is_root(memcg)); > @@ -2740,7 +2741,7 @@ static int try_charge_memcg(struct mem_cgroup *memcg, gfp_t gfp_mask, > > psi_memstall_enter(&pflags); > nr_reclaimed = try_to_free_mem_cgroup_pages(mem_over_limit, nr_pages, > - gfp_mask, reclaim_options); > + gfp_mask, reclaim_options, -1); Same here. > psi_memstall_leave(&pflags); > > if (mem_cgroup_margin(mem_over_limit) >= nr_pages) > @@ -3660,7 +3661,7 @@ static int mem_cgroup_resize_max(struct mem_cgroup *memcg, > } > > if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, > - memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP)) { > + memsw ? 0 : MEMCG_RECLAIM_MAY_SWAP, -1)) { Same here. > ret = -EBUSY; > break; > } > @@ -3774,7 +3775,7 @@ static int mem_cgroup_force_empty(struct mem_cgroup *memcg) > return -EINTR; > > if (!try_to_free_mem_cgroup_pages(memcg, 1, GFP_KERNEL, > - MEMCG_RECLAIM_MAY_SWAP)) > + MEMCG_RECLAIM_MAY_SWAP, -1)) Here too. > nr_retries--; > } > > @@ -6720,7 +6721,7 @@ static ssize_t memory_high_write(struct kernfs_open_file *of, > } > > reclaimed = try_to_free_mem_cgroup_pages(memcg, nr_pages - high, > - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP); > + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, -1); Here too. > > if (!reclaimed && !nr_retries--) > break; > @@ -6769,7 +6770,7 @@ static ssize_t memory_max_write(struct kernfs_open_file *of, > > if (nr_reclaims) { > if (!try_to_free_mem_cgroup_pages(memcg, nr_pages - max, > - GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP)) > + GFP_KERNEL, MEMCG_RECLAIM_MAY_SWAP, -1)) Here too. > nr_reclaims--; > continue; > } > @@ -6895,6 +6896,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, > return nbytes; > } > > +enum { > + MEMORY_RECLAIM_SWAPPINESS = 0, > + MEMORY_RECLAIM_NULL, > +}; > + > +static const match_table_t if_tokens = { What this is called "if_tokens"? I am trying to figure out what "if" refers to. > + { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"}, > + { MEMORY_RECLAIM_NULL, NULL }, > +}; > + Do we foresee a lot of tunable for the try to free page? I see. You want to use match_token() to do the keyword parsing. > static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, > size_t nbytes, loff_t off) > { > @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, > unsigned int nr_retries = MAX_RECLAIM_RETRIES; > unsigned long nr_to_reclaim, nr_reclaimed = 0; > unsigned int reclaim_options; > - int err; > + char *old_buf, *start; > + substring_t args[MAX_OPT_ARGS]; > + int swappiness = -1; > > buf = strstrip(buf); > - err = page_counter_memparse(buf, "", &nr_to_reclaim); > - if (err) > - return err; > + > + old_buf = buf; > + nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE; > + if (buf == old_buf) > + return -EINVAL; > + > + buf = strstrip(buf); > + > + while ((start = strsep(&buf, " ")) != NULL) { > + if (!strlen(start)) > + continue; > + switch (match_token(start, if_tokens, args)) { > + case MEMORY_RECLAIM_SWAPPINESS: > + if (match_int(&args[0], &swappiness)) > + return -EINVAL; > + if (swappiness < 0 || swappiness > 200) Agree with Yosry on the 200 magic value. I am also wondering if there is an easier way to just parse one keyword. Will using strcmp("swappiness=") be a bad idea? I haven't tried it myself though. > + return -EINVAL; > + break; > + default: > + return -EINVAL; > + } > + } > > reclaim_options = MEMCG_RECLAIM_MAY_SWAP | MEMCG_RECLAIM_PROACTIVE; > while (nr_reclaimed < nr_to_reclaim) { > @@ -6926,7 +6958,8 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, > > reclaimed = try_to_free_mem_cgroup_pages(memcg, > min(nr_to_reclaim - nr_reclaimed, SWAP_CLUSTER_MAX), > - GFP_KERNEL, reclaim_options); > + GFP_KERNEL, reclaim_options, > + swappiness); > > if (!reclaimed && !nr_retries--) > return -EAGAIN; > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 506f8220c5fe..a20965b20177 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -136,6 +136,9 @@ struct scan_control { > /* Always discard instead of demoting to lower tier memory */ > unsigned int no_demotion:1; > > + /* Swappiness value for reclaim, if -1 use memcg/global value */ > + int swappiness; > + It would be great if we can get rid of the -1 case. > /* Allocation order */ > s8 order; > > @@ -2327,7 +2330,8 @@ static void get_scan_count(struct lruvec *lruvec, struct scan_control *sc, > struct pglist_data *pgdat = lruvec_pgdat(lruvec); > struct mem_cgroup *memcg = lruvec_memcg(lruvec); > unsigned long anon_cost, file_cost, total_cost; > - int swappiness = mem_cgroup_swappiness(memcg); > + int swappiness = sc->swappiness != -1 ? > + sc->swappiness : mem_cgroup_swappiness(memcg); Let's see if we can get rid of the -1. Now I see the -1 is actually introduced by this patch, should be doable. > u64 fraction[ANON_AND_FILE]; > u64 denominator = 0; /* gcc */ > enum scan_balance scan_balance; > @@ -2608,6 +2612,9 @@ static int get_swappiness(struct lruvec *lruvec, struct scan_control *sc) > mem_cgroup_get_nr_swap_pages(memcg) < MIN_LRU_BATCH) > return 0; > > + if (sc->swappiness != -1) > + return sc->swappiness; > + Get rid of that too. Thank you. Chris > return mem_cgroup_swappiness(memcg); > } > > @@ -6433,7 +6440,8 @@ unsigned long mem_cgroup_shrink_node(struct mem_cgroup *memcg, > unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > unsigned long nr_pages, > gfp_t gfp_mask, > - unsigned int reclaim_options) > + unsigned int reclaim_options, > + int swappiness) > { > unsigned long nr_reclaimed; > unsigned int noreclaim_flag; > @@ -6448,6 +6456,7 @@ unsigned long try_to_free_mem_cgroup_pages(struct mem_cgroup *memcg, > .may_unmap = 1, > .may_swap = !!(reclaim_options & MEMCG_RECLAIM_MAY_SWAP), > .proactive = !!(reclaim_options & MEMCG_RECLAIM_PROACTIVE), > + .swappiness = swappiness, > }; > /* > * Traverse the ZONELIST_FALLBACK zonelist of the current node to put > -- > 2.34.1 > > ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-12 1:06 ` Chris Li @ 2023-12-12 21:43 ` Dan Schatzberg 2023-12-13 0:12 ` Chris Li 0 siblings, 1 reply; 10+ messages in thread From: Dan Schatzberg @ 2023-12-12 21:43 UTC (permalink / raw) To: Chris Li Cc: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang, linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins On Mon, Dec 11, 2023 at 05:06:54PM -0800, Chris Li wrote: > Hi Dan, > > Thank you for the patch. > > On Mon, Dec 11, 2023 at 6:04 AM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > > > Allow proactive reclaimers to submit an additional swappiness=<val> > > argument to memory.reclaim. This overrides the global or per-memcg > > swappiness setting for that reclaim attempt. > > I am curious what prompted you to develop this patch. I understand > what this patch does, just want to know more of your background story > why this is needed. I wrote about this in some detail in the cover letter (0/1). Take a look and let me know if the rationale is still unclear. > Instead of passing -1, maybe we can use mem_cgroup_swappiness(memcg); > Yeah this makes sense, I'll go ahead and make that change and eliminate the -1. > > nr_reclaims--; > > continue; > > } > > @@ -6895,6 +6896,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, > > return nbytes; > > } > > > > +enum { > > + MEMORY_RECLAIM_SWAPPINESS = 0, > > + MEMORY_RECLAIM_NULL, > > +}; > > + > > +static const match_table_t if_tokens = { > > What this is called "if_tokens"? I am trying to figure out what "if" refers to. I used the same logic as in "mm: Add nodes= arg to memory.reclaim". I can just call it tokens. > > > + { MEMORY_RECLAIM_SWAPPINESS, "swappiness=%d"}, > > + { MEMORY_RECLAIM_NULL, NULL }, > > +}; > > + > > Do we foresee a lot of tunable for the try to free page? I see. You > want to use match_token() to do the keyword parsing. See below > > > static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, > > size_t nbytes, loff_t off) > > { > > @@ -6902,12 +6913,33 @@ static ssize_t memory_reclaim(struct kernfs_open_file *of, char *buf, > > unsigned int nr_retries = MAX_RECLAIM_RETRIES; > > unsigned long nr_to_reclaim, nr_reclaimed = 0; > > unsigned int reclaim_options; > > - int err; > > + char *old_buf, *start; > > + substring_t args[MAX_OPT_ARGS]; > > + int swappiness = -1; > > > > buf = strstrip(buf); > > - err = page_counter_memparse(buf, "", &nr_to_reclaim); > > - if (err) > > - return err; > > + > > + old_buf = buf; > > + nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE; > > + if (buf == old_buf) > > + return -EINVAL; > > + > > + buf = strstrip(buf); > > + > > + while ((start = strsep(&buf, " ")) != NULL) { > > + if (!strlen(start)) > > + continue; > > + switch (match_token(start, if_tokens, args)) { > > + case MEMORY_RECLAIM_SWAPPINESS: > > + if (match_int(&args[0], &swappiness)) > > + return -EINVAL; > > + if (swappiness < 0 || swappiness > 200) > > Agree with Yosry on the 200 magic value. > > I am also wondering if there is an easier way to just parse one > keyword. Will using strcmp("swappiness=") be a bad idea? I haven't > tried it myself though. As above, "mm: Add nodes= arg to memory.reclaim" was previously in the mm tree doing it this way, so I duplicated it. I think given that there have been lots of discussions about extending this interface, this match table has some potential future value and I don't see a major downside to using it in favor of strcmp. ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH V3 1/1] mm: add swapiness= arg to memory.reclaim 2023-12-12 21:43 ` Dan Schatzberg @ 2023-12-13 0:12 ` Chris Li 0 siblings, 0 replies; 10+ messages in thread From: Chris Li @ 2023-12-13 0:12 UTC (permalink / raw) To: Dan Schatzberg Cc: Johannes Weiner, Roman Gushchin, Yosry Ahmed, Huan Yang, linux-kernel, cgroups, linux-mm, Tejun Heo, Zefan Li, Jonathan Corbet, Michal Hocko, Shakeel Butt, Muchun Song, Andrew Morton, David Hildenbrand, Matthew Wilcox, Kefeng Wang, Vishal Moola (Oracle), Yue Zhao, Hugh Dickins Hi Dan, On Tue, Dec 12, 2023 at 1:43 PM Dan Schatzberg <schatzberg.dan@gmail.com> wrote: > > > I am curious what prompted you to develop this patch. I understand > > what this patch does, just want to know more of your background story > > why this is needed. > > I wrote about this in some detail in the cover letter (0/1). Take a > look and let me know if the rationale is still unclear. Ah, found it. I was not CC on the cover letter but CC on the 1/1 patch. That is why I did not pick up the cover letter. Yes, the cover letter explanation was great. Exactly what I am looking for. > > > Instead of passing -1, maybe we can use mem_cgroup_swappiness(memcg); > > > > Yeah this makes sense, I'll go ahead and make that change and > eliminate the -1. Thanks > > > > nr_reclaims--; > > > continue; > > > } > > > @@ -6895,6 +6896,16 @@ static ssize_t memory_oom_group_write(struct kernfs_open_file *of, > > > return nbytes; > > > } > > > > > > +enum { > > > + MEMORY_RECLAIM_SWAPPINESS = 0, > > > + MEMORY_RECLAIM_NULL, > > > +}; > > > + > > > +static const match_table_t if_tokens = { > > > > What this is called "if_tokens"? I am trying to figure out what "if" refers to. > > I used the same logic as in "mm: Add nodes= arg to memory.reclaim". I > can just call it tokens. Thanks. I will take a look at that change. > > > + > > > + old_buf = buf; > > > + nr_to_reclaim = memparse(buf, &buf) / PAGE_SIZE; > > > + if (buf == old_buf) > > > + return -EINVAL; > > > + > > > + buf = strstrip(buf); > > > + > > > + while ((start = strsep(&buf, " ")) != NULL) { > > > + if (!strlen(start)) > > > + continue; > > > + switch (match_token(start, if_tokens, args)) { > > > + case MEMORY_RECLAIM_SWAPPINESS: > > > + if (match_int(&args[0], &swappiness)) > > > + return -EINVAL; > > > + if (swappiness < 0 || swappiness > 200) > > > > Agree with Yosry on the 200 magic value. > > > > I am also wondering if there is an easier way to just parse one > > keyword. Will using strcmp("swappiness=") be a bad idea? I haven't > > tried it myself though. > > As above, "mm: Add nodes= arg to memory.reclaim" was previously in the > mm tree doing it this way, so I duplicated it. I think given that > there have been lots of discussions about extending this interface, > this match table has some potential future value and I don't see a > major downside to using it in favor of strcmp. Yes, that is totally your call. I am fine as it is. Just the micro optimization of me trying to see if there is a slimmer way to do it. Chris ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2023-12-13 0:12 UTC | newest] Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-12-11 14:04 [PATCH V3 0/1] Add swappiness argument to memory.reclaim Dan Schatzberg 2023-12-11 14:04 ` [PATCH V3 1/1] mm: add swapiness= arg " Dan Schatzberg 2023-12-11 19:41 ` Yosry Ahmed 2023-12-12 21:27 ` Dan Schatzberg 2023-12-12 21:31 ` Yosry Ahmed 2023-12-12 21:46 ` Dan Schatzberg 2023-12-12 21:32 ` Chris Li 2023-12-12 1:06 ` Chris Li 2023-12-12 21:43 ` Dan Schatzberg 2023-12-13 0:12 ` Chris Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox