[PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
@ 2025-11-28  2:53 Zicheng Wang
  2025-11-28  2:53 ` [PATCH 1/3] mm/lru_gen: add procfs support for lru_gen interfaces Zicheng Wang
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Zicheng Wang @ 2025-11-28  2:53 UTC (permalink / raw)
  To: akpm, hannes, david, axelrasmussen, yuanchu
  Cc: mhocko, zhengqi.arch, shakeel.butt, lorenzo.stoakes, weixugc,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, linux-mm,
	linux-doc, linux-kernel, Zicheng Wang

This patchset moves the lru_gen control interface from debugfs to procfs.
Exposing the interface enables the capabilities for *commercial products*
such as Android to proactive aging and reclaim.

Two main reasons:
1. The MGLRU reaches the stage where its control interface can be
consideres product-ready, not just for experiments or debugging.
In specific scenarios, proactive aging with reclaim can improve
overall system performance.
2. Commercial products like Android prohibit mounting debugfs for
security reasons (selinux neverallow). Without moving the interface
to procfs, Android cannot utilize lru_gen.

Case study:
A widely observed issue on Android is that after application launch,
the oldest anon generation often becomes empty, and file pages
are over-reclaimed.

In Android, each application owns its own memcg. When an app is swiped
away seconds or minutes after (cold) launch, it will be frozen and
part of its memory is proactively reclaimed.
At this time, both file pages and anonymous pages are temporarily unused,
while the system load is also typically low, making it an ideal time to
prefer reclaiming anon pages while retaining file cache.

Keeping more file cache benefits the system in multiple ways:
1. The device can retain a larger page cache, reducing IO.
2. When memory is tight, evicting clean cache pages is fast.
2. Super-apps such as camera benefit from reducing the chance of
slow direct reclaim on the critical startup path.

Experiments:

- after cold launch
```
Kernel version v6.6
memcg    54 /apps/some_app
node     0
1     119804          0       85461
2     119804          0           5
3     119804     181719       18667
4       1752        392         244

Kernel version v6.12
memcg    84 /apps/some_app
node     0
1      38428          0       16424
2      38428      24307       14997
3      38428     126529       56452
4      37980         27           1
```

- proactive aging 2/3 times
```
Kernel version v6.6
memcg    54 /apps/some_app
 node     0
          3     172432     102532      103441
          4      54380      74803         854
          5      28892       6496         229
          6       1588         26           0  

Kernel version v6.12
memcg    84 /apps/some_app
 node     0
          3     819624      98726      166045
          4     819176      14849        1543
          5      40000      41328        7633
          6        960          0           0     
```

In continuous app-launch scenarios (e.g., After boot, retail demo
loops, tech review testing), our measurements show: v6.6
1. Available memory improves by 400–800 MB.
2. Direct reclaim frequency and latency drop by more than 24%.
3. memavailable/cached levels aligns with traditional LRU.

Summary of average available memory (MB):
mglru without proactive aging: 6060
mglru with proactive aging test1~3: 6988/6432/6837
traditional LRU: 6982

Camera direct reclaim (10-run average):
mglru without aging: 1050 events / 460 ms
mglru with aging: 789 events / 316 ms
(25% fewer events, 31% lower latency)

Signed-off-by: Zicheng Wang <wangzicheng@honor.com>

 Documentation/admin-guide/mm/multigen_lru.rst | 19 ++++++++++
 mm/Kconfig                                    | 10 +++++
 mm/vmscan.c                                   | 37 ++++++++++++++++++-
 3 files changed, 64 insertions(+), 2 deletions(-)

-- 
2.25.1

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/3] mm/lru_gen: add procfs support for lru_gen interfaces
  2025-11-28  2:53 [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Zicheng Wang
@ 2025-11-28  2:53 ` Zicheng Wang
  2025-11-28  2:53 ` [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen Zicheng Wang
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 24+ messages in thread
From: Zicheng Wang @ 2025-11-28  2:53 UTC (permalink / raw)
  To: akpm, hannes, david, axelrasmussen, yuanchu
  Cc: mhocko, zhengqi.arch, shakeel.butt, lorenzo.stoakes, weixugc,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, linux-mm,
	linux-doc, linux-kernel, Zicheng Wang

This patch refactors the lru_gen and lru_gen_full control files to allow
their interfaces to be exposed under either debugfs or procfs.

Two main changes:
1. minimal code modification by reusing the existing seq_operations.
2. lru_gen file mode update from 0644 to 0664, so that Android's group
"system" can write to the file when procfs interface is enabled.

Signed-off-by: Zicheng Wang <wangzicheng@honor.com>
---
 mm/vmscan.c | 37 +++++++++++++++++++++++++++++++++++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/mm/vmscan.c b/mm/vmscan.c
index 674999999..dd30f3949 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -58,6 +58,7 @@
 #include <linux/random.h>
 #include <linux/mmu_notifier.h>
 #include <linux/parser.h>
+#include <linux/proc_fs.h>
 
 #include <asm/tlbflush.h>
 #include <asm/div64.h>
@@ -5324,9 +5325,17 @@ static const struct attribute_group lru_gen_attr_group = {
 };
 
 /******************************************************************************
- *                          debugfs interface
+ *                          lru_gen interface
  ******************************************************************************/
 
+static inline bool lru_gen_show_is_full(const struct file *file)
+{
+    /* procfs: i_private = (void *)1 means full
+     * debugfs: also works because debugfs sets i_private
+     */
+	return file->f_inode->i_private != NULL;
+}
+
 static void *lru_gen_seq_start(struct seq_file *m, loff_t *pos)
 {
 	struct mem_cgroup *memcg;
@@ -5435,7 +5444,7 @@ static void lru_gen_seq_show_full(struct seq_file *m, struct lruvec *lruvec,
 static int lru_gen_seq_show(struct seq_file *m, void *v)
 {
 	unsigned long seq;
-	bool full = debugfs_get_aux_num(m->file);
+	bool full = lru_gen_show_is_full(m->file);
 	struct lruvec *lruvec = v;
 	struct lru_gen_folio *lrugen = &lruvec->lrugen;
 	int nid = lruvec_pgdat(lruvec)->node_id;
@@ -5671,6 +5680,7 @@ static int lru_gen_seq_open(struct inode *inode, struct file *file)
 	return seq_open(file, &lru_gen_seq_ops);
 }
 
+#ifndef CONFIG_LRU_GEN_PROCFS_CTRL
 static const struct file_operations lru_gen_rw_fops = {
 	.open = lru_gen_seq_open,
 	.read = seq_read,
@@ -5685,6 +5695,22 @@ static const struct file_operations lru_gen_ro_fops = {
 	.llseek = seq_lseek,
 	.release = seq_release,
 };
+#else
+static const struct proc_ops lru_gen_proc_rw_ops = {
+	.proc_open    = lru_gen_seq_open,
+	.proc_read    = seq_read,
+	.proc_write   = lru_gen_seq_write,
+	.proc_lseek   = seq_lseek,
+	.proc_release = seq_release,
+};
+
+static const struct proc_ops lru_gen_proc_ro_ops = {
+	.proc_open    = lru_gen_seq_open,
+	.proc_read    = seq_read,
+	.proc_lseek   = seq_lseek,
+	.proc_release = seq_release,
+};
+#endif
 
 /******************************************************************************
  *                          initialization
@@ -5772,10 +5798,17 @@ static int __init init_lru_gen(void)
 	if (sysfs_create_group(mm_kobj, &lru_gen_attr_group))
 		pr_err("lru_gen: failed to create sysfs group\n");
 
+#ifndef CONFIG_LRU_GEN_PROCFS_CTRL
 	debugfs_create_file_aux_num("lru_gen", 0644, NULL, NULL, false,
 				    &lru_gen_rw_fops);
 	debugfs_create_file_aux_num("lru_gen_full", 0444, NULL, NULL, true,
 				    &lru_gen_ro_fops);
+#else
+	proc_create_data("lru_gen", 0664, NULL,
+					&lru_gen_proc_rw_ops, NULL);
+	proc_create_data("lru_gen_full", 0444, NULL,
+					&lru_gen_proc_ro_ops, (void *)1);
+#endif
 
 	return 0;
 };
-- 
2.25.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen
  2025-11-28  2:53 [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Zicheng Wang
  2025-11-28  2:53 ` [PATCH 1/3] mm/lru_gen: add procfs support for lru_gen interfaces Zicheng Wang
@ 2025-11-28  2:53 ` Zicheng Wang
  2025-11-28  4:33   ` Randy Dunlap
  2025-12-01 21:35   ` Yuanchu Xie
  2025-11-28  2:53 ` [PATCH 3/3] mm/lru_gen: document procfs interface " Zicheng Wang
  2025-11-28 15:16 ` [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Matthew Wilcox
  3 siblings, 2 replies; 24+ messages in thread
From: Zicheng Wang @ 2025-11-28  2:53 UTC (permalink / raw)
  To: akpm, hannes, david, axelrasmussen, yuanchu
  Cc: mhocko, zhengqi.arch, shakeel.butt, lorenzo.stoakes, weixugc,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, linux-mm,
	linux-doc, linux-kernel, Zicheng Wang

Signed-off-by: Zicheng Wang <wangzicheng@honor.com>
---
 mm/Kconfig | 10 ++++++++++
 1 file changed, 10 insertions(+)

diff --git a/mm/Kconfig b/mm/Kconfig
index e443fe8cd..be7efa794 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -1325,6 +1325,16 @@ config LRU_GEN_STATS
 config LRU_GEN_WALKS_MMU
 	def_bool y
 	depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
+
+config LRU_GEN_PROCFS_CTRL
+	bool "Move lru_gen files from debugfs to procfs"
+	depends on LRU_GEN && PROC_FS
+	help
+	  Move lru_gen management from debugfs to procfs (/proc/lru_gen).
+	  This production-ready feature provides critical memory reclaim
+	  prediction and control. It is no longer experimental.
+	  The migration ensures availability in commercial products where
+	  debugfs may be disabled.
 # }
 
 config ARCH_SUPPORTS_PER_VMA_LOCK
-- 
2.25.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 3/3] mm/lru_gen: document procfs interface for lru_gen
  2025-11-28  2:53 [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Zicheng Wang
  2025-11-28  2:53 ` [PATCH 1/3] mm/lru_gen: add procfs support for lru_gen interfaces Zicheng Wang
  2025-11-28  2:53 ` [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen Zicheng Wang
@ 2025-11-28  2:53 ` Zicheng Wang
  2025-11-28 15:16 ` [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Matthew Wilcox
  3 siblings, 0 replies; 24+ messages in thread
From: Zicheng Wang @ 2025-11-28  2:53 UTC (permalink / raw)
  To: akpm, hannes, david, axelrasmussen, yuanchu
  Cc: mhocko, zhengqi.arch, shakeel.butt, lorenzo.stoakes, weixugc,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, linux-mm,
	linux-doc, linux-kernel, Zicheng Wang

Signed-off-by: Zicheng Wang <wangzicheng@honor.com>
---
 Documentation/admin-guide/mm/multigen_lru.rst | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

diff --git a/Documentation/admin-guide/mm/multigen_lru.rst b/Documentation/admin-guide/mm/multigen_lru.rst
index 9cb54b4ff..d9927b254 100644
--- a/Documentation/admin-guide/mm/multigen_lru.rst
+++ b/Documentation/admin-guide/mm/multigen_lru.rst
@@ -161,3 +161,22 @@ cold pages because of the overestimation, it retries on the next
 server according to the ranking result obtained from the working set
 estimation step. This less forceful approach limits the impacts on the
 existing jobs.
+
+Procfs Migration
+================
+The multi-gen LRU control interface has been moved from debugfs to procfs
+via ``CONFIG_LRU_GEN_PROCFS_CTRL``:
+
+New Path
+--------
+- Control interface: ``/proc/lru_gen``
+- Replaces debugfs path: ``/sys/kernel/debug/lru_gen``
+
+Key Advantages
+--------------
+1. Production-ready availability (works when debugfs is not allowed)
+2. Maintains identical ABI to original debugfs interface
+3. Preserves all core functionality (working set estimation, proactive reclaim)
+4. Standardized location matching memory management conventions
+
+Note: Requires both ``CONFIG_PROC_FS`` and ``CONFIG_LRU_GEN``
-- 
2.25.1



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen
  2025-11-28  2:53 ` [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen Zicheng Wang
@ 2025-11-28  4:33   ` Randy Dunlap
  2025-11-28  7:19     ` wangzicheng
  2025-12-01 21:35   ` Yuanchu Xie
  1 sibling, 1 reply; 24+ messages in thread
From: Randy Dunlap @ 2025-11-28  4:33 UTC (permalink / raw)
  To: Zicheng Wang, akpm, hannes, david, axelrasmussen, yuanchu
  Cc: mhocko, zhengqi.arch, shakeel.butt, lorenzo.stoakes, weixugc,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, linux-mm,
	linux-doc, linux-kernel



On 11/27/25 6:53 PM, Zicheng Wang wrote:
> Signed-off-by: Zicheng Wang <wangzicheng@honor.com>
> ---
>  mm/Kconfig | 10 ++++++++++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e443fe8cd..be7efa794 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1325,6 +1325,16 @@ config LRU_GEN_STATS
>  config LRU_GEN_WALKS_MMU
>  	def_bool y
>  	depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
> +
> +config LRU_GEN_PROCFS_CTRL
> +	bool "Move lru_gen files from debugfs to procfs"
> +	depends on LRU_GEN && PROC_FS
> +	help
> +	  Move lru_gen management from debugfs to procfs (/proc/lru_gen).
> +	  This production-ready feature provides critical memory reclaim
> +	  prediction and control. It is no longer experimental.
> +	  The migration ensures availability in commercial products where
> +	  debugfs may be disabled.

A. missing patch description
B. The help message above sounds like a patch description.

If someone does not enable this kconfig option, what happens?
a. lru_gen files stay in debugfs
b. lru_gen files are not present
c. something else. If so, what?


-- 
~Randy



^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen
  2025-11-28  4:33   ` Randy Dunlap
@ 2025-11-28  7:19     ` wangzicheng
  0 siblings, 0 replies; 24+ messages in thread
From: wangzicheng @ 2025-11-28  7:19 UTC (permalink / raw)
  To: Randy Dunlap, akpm, hannes, david, axelrasmussen, yuanchu
  Cc: mhocko, zhengqi.arch, shakeel.butt, lorenzo.stoakes, weixugc,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, linux-mm,
	linux-doc, linux-kernel

> On 11/27/25 6:53 PM, Zicheng Wang wrote:
> > Signed-off-by: Zicheng Wang <wangzicheng@honor.com>
> > ---
> >  mm/Kconfig | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index e443fe8cd..be7efa794 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -1325,6 +1325,16 @@ config LRU_GEN_STATS
> >  config LRU_GEN_WALKS_MMU
> >  	def_bool y
> >  	depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
> > +
> > +config LRU_GEN_PROCFS_CTRL
> > +	bool "Move lru_gen files from debugfs to procfs"
> > +	depends on LRU_GEN && PROC_FS
> > +	help
> > +	  Move lru_gen management from debugfs to procfs (/proc/lru_gen).
> > +	  This production-ready feature provides critical memory reclaim
> > +	  prediction and control. It is no longer experimental.
> > +	  The migration ensures availability in commercial products where
> > +	  debugfs may be disabled.
> 
> A. missing patch description
> B. The help message above sounds like a patch description.
> 

Thanks for the comments, will be fix in the next version.

> If someone does not enable this kconfig option, what happens?
> a. lru_gen files stay in debugfs
> b. lru_gen files are not present
> c. something else. If so, what?
> 
> 
> --
> ~Randy

Regard to the questions,
If this Kconfig option is enabled, the `lru_gen` and `lru_gen_full` files will appear under /proc/.
If it is not enabled, the files remain under debugfs, which is the current default behavior.

Thanks,
Zicheng

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-11-28  2:53 [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Zicheng Wang
                   ` (2 preceding siblings ...)
  2025-11-28  2:53 ` [PATCH 3/3] mm/lru_gen: document procfs interface " Zicheng Wang
@ 2025-11-28 15:16 ` Matthew Wilcox
  2025-11-28 16:13   ` Liam R. Howlett
  3 siblings, 1 reply; 24+ messages in thread
From: Matthew Wilcox @ 2025-11-28 15:16 UTC (permalink / raw)
  To: Zicheng Wang
  Cc: akpm, hannes, david, axelrasmussen, yuanchu, mhocko,
	zhengqi.arch, shakeel.butt, lorenzo.stoakes, weixugc,
	Liam.Howlett, vbabka, rppt, surenb, mhocko, corbet, linux-mm,
	linux-doc, linux-kernel

On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> Case study:
> A widely observed issue on Android is that after application launch,
> the oldest anon generation often becomes empty, and file pages
> are over-reclaimed.

You should fix the bug, not move the debug interface to procfs.  NACK.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-11-28 15:16 ` [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Matthew Wilcox
@ 2025-11-28 16:13   ` Liam R. Howlett
  2025-12-01  4:13     ` Barry Song
  0 siblings, 1 reply; 24+ messages in thread
From: Liam R. Howlett @ 2025-11-28 16:13 UTC (permalink / raw)
  To: Matthew Wilcox
  Cc: Zicheng Wang, akpm, hannes, david, axelrasmussen, yuanchu,
	mhocko, zhengqi.arch, shakeel.butt, lorenzo.stoakes, weixugc,
	vbabka, rppt, surenb, mhocko, corbet, linux-mm, linux-doc,
	linux-kernel

* Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > Case study:
> > A widely observed issue on Android is that after application launch,

What do you mean by application launch?  What does this mean in the
kernel context?

> > the oldest anon generation often becomes empty, and file pages
> > are over-reclaimed.
> 
> You should fix the bug, not move the debug interface to procfs.  NACK.

Barry recently sent an RFC [1] to affect LRU in the exit path for
Android.  This was proven incorrect by Johannes, iirc, in another thread
I cannot find (destroys performance of calling the same command).

These ideas seem both related as it points to a suboptimal LRU in the
Android ecosystem, at least.  It seems to stem from Androids life
(cycle) choices :)

I strongly agree with Willy.  We don't want another userspace daemon
and/or interface, but this time to play with the LRU to avoid trying to
define and fix the problem.

Do you know if this affects others or why it is android specific?

[1].  https://lore.kernel.org/all/20250514070820.51793-1-21cnbao@gmail.com/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-11-28 16:13   ` Liam R. Howlett
@ 2025-12-01  4:13     ` Barry Song
  2025-12-01  6:50       ` wangzicheng
  2025-12-01  7:13       ` zhongjinji
  0 siblings, 2 replies; 24+ messages in thread
From: Barry Song @ 2025-12-01  4:13 UTC (permalink / raw)
  To: Liam R. Howlett, Matthew Wilcox, Zicheng Wang, akpm, hannes,
	david, axelrasmussen, yuanchu, mhocko, zhengqi.arch,
	shakeel.butt, lorenzo.stoakes, weixugc, vbabka, rppt, surenb,
	mhocko, corbet, linux-mm, linux-doc, linux-kernel

Hi Liam,

I saw you mentioned me, so I just wanted to join in :-)

On Sat, Nov 29, 2025 at 12:16 AM Liam R. Howlett
<Liam.Howlett@oracle.com> wrote:
>
> * Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> > On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > > Case study:
> > > A widely observed issue on Android is that after application launch,
>
> What do you mean by application launch?  What does this mean in the
> kernel context?

I think there are two cases. First, a cold start: a new process is
forked to launch the app. Second, when the app switches from background
to foreground, for example when we bring it back to the screen after it
has been running in the background.

In the first case, you reboot your phone and tap the YouTube icon to
start the app (cold launch). In the second case, you are watching a
video in YouTube, then switch to Facebook, and later tap the YouTube
icon again to bring it from background to foreground.

>
> > > the oldest anon generation often becomes empty, and file pages
> > > are over-reclaimed.
> >
> > You should fix the bug, not move the debug interface to procfs.  NACK.
>
> Barry recently sent an RFC [1] to affect LRU in the exit path for
> Android.  This was proven incorrect by Johannes, iirc, in another thread
> I cannot find (destroys performance of calling the same command).

My understanding is that affecting the LRU in the exit path is not
generally correct, but it still highlights a requirement: Linux LRU
needs a way to understand app-cycling behavior in an Android-like
system.

>
> These ideas seem both related as it points to a suboptimal LRU in the
> Android ecosystem, at least.  It seems to stem from Androids life
> (cycle) choices :)
>
> I strongly agree with Willy.  We don't want another userspace daemon
> and/or interface, but this time to play with the LRU to avoid trying to
> define and fix the problem.
>
> Do you know if this affects others or why it is android specific?

The behavior Zicheng probably wants is a proactive memory reclamation
interface. For example, since each app may be in a different memcg, if an
app has been in the background for a long time, he wants to reclaim its
memory proactively rather than waiting until kswapd hits the watermarks.

This may help a newly launched app obtain memory more quickly, avoiding
delays from reclamation, since a new app typically requires a substantial
amount of memory.

Zicheng, please let me know if I’m misunderstanding anything.

>
> [1].  https://lore.kernel.org/all/20250514070820.51793-1-21cnbao@gmail.com/
>

Thanks
Barry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  4:13     ` Barry Song
@ 2025-12-01  6:50       ` wangzicheng
  2025-12-01  7:02         ` wangzicheng
  2025-12-01  7:45         ` Barry Song
  2025-12-01  7:13       ` zhongjinji
  1 sibling, 2 replies; 24+ messages in thread
From: wangzicheng @ 2025-12-01  6:50 UTC (permalink / raw)
  To: Barry Song, Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326

Hi Barry,

> Hi Liam,
> 
> I saw you mentioned me, so I just wanted to join in :-)
> 
> On Sat, Nov 29, 2025 at 12:16 AM Liam R. Howlett <Liam.Howlett@oracle.com>
> wrote:
> >
> > * Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> > > On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > > > Case study:
> > > > A widely observed issue on Android is that after application
> > > > launch,
> >
> > What do you mean by application launch?  What does this mean in the
> > kernel context?
> 
> I think there are two cases. First, a cold start: a new process is forked to
> launch the app. Second, when the app switches from background to
> foreground, for example when we bring it back to the screen after it has
> been running in the background.
> 
> In the first case, you reboot your phone and tap the YouTube icon to start
> the app (cold launch). In the second case, you are watching a video in
> YouTube, then switch to Facebook, and later tap the YouTube icon again to
> bring it from background to foreground.
> 
Thanks for the explain, that's exactly what I meant.  

Android lifecycle model isn't obvious outside the Android context. I’ll make that 
clearer in the next version.
> >
> > > > the oldest anon generation often becomes empty, and file pages are
> > > > over-reclaimed.
> > >
> > > You should fix the bug, not move the debug interface to procfs.  NACK.
> >
> > Barry recently sent an RFC [1] to affect LRU in the exit path for
> > Android.  This was proven incorrect by Johannes, iirc, in another
> > thread I cannot find (destroys performance of calling the same command).
> 
> My understanding is that affecting the LRU in the exit path is not generally
> correct, but it still highlights a requirement: Linux LRU needs a way to
> understand app-cycling behavior in an Android-like system.
> 
> >
> > These ideas seem both related as it points to a suboptimal LRU in the
> > Android ecosystem, at least.  It seems to stem from Androids life
> > (cycle) choices :)
> >
> > I strongly agree with Willy.  We don't want another userspace daemon
> > and/or interface, but this time to play with the LRU to avoid trying
> > to define and fix the problem.
> >
> > Do you know if this affects others or why it is android specific?
> 
> The behavior Zicheng probably wants is a proactive memory reclamation
> interface. For example, since each app may be in a different memcg, if an
> app has been in the background for a long time, he wants to reclaim its
> memory proactively rather than waiting until kswapd hits the watermarks.
> 
> This may help a newly launched app obtain memory more quickly, avoiding
> delays from reclamation, since a new app typically requires a substantial
> amount of memory.
> 
> Zicheng, please let me know if I’m misunderstanding anything.

Yes, but not least.

1. proactive memory reclaim: yes, that's we are after. 
When an app is swiped away and kept in the background and not use for a while, 
proactively reclaiming its memcg can help new foreground apps get memory 
faster (instead of paying the cost of direct reclaim).

2. Anon v.s. File: *bias more towards anonymous* pages for background apps.
With mglru, however, the oldest generations often contain almost no anon pages,
so simply tuning swappiness cannot achieve that -- reclaim will still clear file cache
in the old generations first.
To some extent, file caches are `over-reclaimed` in such senario, leading to a disaster
when user‑interaction threads get stuck in direct reclaim of anon pages.

See the case in the cover letter.
```
memcg    54 /apps/some_app
node     0
1     119804          0       85461
2     119804          0           5
3     119804     181719       18667
4       1752        392         244
```

> 
> >
> > [1].
> > https://lore.kernel.org/all/20250514070820.51793-1-21cnbao@gmail.com/
> >
> 
> Thanks
> Barry

Since the semantic gap between user/kernel space will always exist.
It would be great benefits for leaving some APIs for user hints, just like 
mmadvise/userfault/para-virtualization.
Exposing such hints to the kernel can help improve overall system performance.

Best,
Zicheng 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  6:50       ` wangzicheng
@ 2025-12-01  7:02         ` wangzicheng
  2025-12-01  7:45         ` Barry Song
  1 sibling, 0 replies; 24+ messages in thread
From: wangzicheng @ 2025-12-01  7:02 UTC (permalink / raw)
  To: Barry Song, Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326

> > >
> > > [1].
> > > https://lore.kernel.org/all/20250514070820.51793-1-21cnbao@gmail.com
> > > /
> > >
> >
> > Thanks
> > Barry
> 
> Since the semantic gap between user/kernel space will always exist.
> It would be great benefits for leaving some APIs for user hints, just like
> mmadvise/userfault/para-virtualization.
> Exposing such hints to the kernel can help improve overall system
> performance.
> 
> Best,
> Zicheng

More precisely, it’s a form of *proactive scanning and aging*.

Ensure a more even generational distribution between file and anonymous pages.

Best,
Zicheng

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  4:13     ` Barry Song
  2025-12-01  6:50       ` wangzicheng
@ 2025-12-01  7:13       ` zhongjinji
  1 sibling, 0 replies; 24+ messages in thread
From: zhongjinji @ 2025-12-01  7:13 UTC (permalink / raw)
  To: 21cnbao
  Cc: Liam.Howlett, akpm, axelrasmussen, corbet, david, hannes,
	linux-doc, linux-kernel, linux-mm, lorenzo.stoakes, mhocko,
	mhocko, rppt, shakeel.butt, surenb, vbabka, wangzicheng, weixugc,
	willy, yuanchu, zhengqi.arch

> Hi Liam,
> 
> I saw you mentioned me, so I just wanted to join in :-)
> 
> On Sat, Nov 29, 2025 at 12:16 AM Liam R. Howlett
> <Liam.Howlett@oracle.com> wrote:
> >
> > * Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> > > On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > > > Case study:
> > > > A widely observed issue on Android is that after application launch,
> >
> > What do you mean by application launch?  What does this mean in the
> > kernel context?
> 
> I think there are two cases. First, a cold start: a new process is
> forked to launch the app. Second, when the app switches from background
> to foreground, for example when we bring it back to the screen after it
> has been running in the background.
> 
> In the first case, you reboot your phone and tap the YouTube icon to
> start the app (cold launch). In the second case, you are watching a
> video in YouTube, then switch to Facebook, and later tap the YouTube
> icon again to bring it from background to foreground.
> 
> >
> > > > the oldest anon generation often becomes empty, and file pages
> > > > are over-reclaimed.
> > >
> > > You should fix the bug, not move the debug interface to procfs.  NACK.
> >
> > Barry recently sent an RFC [1] to affect LRU in the exit path for
> > Android.  This was proven incorrect by Johannes, iirc, in another thread
> > I cannot find (destroys performance of calling the same command).
> 
> My understanding is that affecting the LRU in the exit path is not
> generally correct, but it still highlights a requirement: Linux LRU
> needs a way to understand app-cycling behavior in an Android-like
> system.
> 
> >
> > These ideas seem both related as it points to a suboptimal LRU in the
> > Android ecosystem, at least.  It seems to stem from Androids life
> > (cycle) choices :)
> >
> > I strongly agree with Willy.  We don't want another userspace daemon
> > and/or interface, but this time to play with the LRU to avoid trying to
> > define and fix the problem.
> >
> > Do you know if this affects others or why it is android specific?
> 
> The behavior Zicheng probably wants is a proactive memory reclamation
> interface. For example, since each app may be in a different memcg, if an
> app has been in the background for a long time, he wants to reclaim its
> memory proactively rather than waiting until kswapd hits the watermarks.

Yes, we need a mechanism for proactive memory reclamation that supports
proactive aging. Zicheng and I were just discussing this issue, and it
seems that supporting proactive aging during proactive memory reclamation
(such as reclamation of only anonymous pages) is a better approach, which
can be enabled by adding the parameter `force`. For example, the following
code, though it has other details to handle...

+static bool proactive_aging(struct lruvec *lruvec, int swappiness)
+{
+       int type;
+       bool should_age = false;
+
+       if (unlikely(sc->proactive && sc->proactive_force))
+               return false;
+
+       for_each_evictable_type(type, swappiness) {
+               if (get_nr_gens(lruvec, type) != MIN_NR_GENS)
+                       continue;
+               should_age = true;
+       }
+       return should_age;
+}
 /*
  * For future optimizations:
  * 1. Defer try_to_inc_max_seq() to workqueues to reduce latency for memcg
@@ -4845,6 +4860,8 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, int s
        if (mem_cgroup_below_min(sc->target_mem_cgroup, memcg))
                return -1;

+       if (proactive_aging(lruvec, swappiness))
+               goto aging;
        success = should_run_aging(lruvec, max_seq, swappiness, &nr_to_scan);

        /* try to scrape all its memory if this memcg was deleted */
@@ -4856,7 +4873,7 @@ static long get_nr_to_scan(struct lruvec *lruvec, struct scan_control *sc, int s
        /* try to get away with not aging at the default priority */
        if (!success || sc->priority == DEF_PRIORITY)
                return nr_to_scan >> sc->priority;
-
+aging:
        /* stop scanning this lruvec as it's low on cold folios */
        return try_to_inc_max_seq(lruvec, max_seq, swappiness, false) ? -1 : 0;
 }


> This may help a newly launched app obtain memory more quickly, avoiding
> delays from reclamation, since a new app typically requires a substantial
> amount of memory.
> Zicheng, please let me know if I’m misunderstanding anything.
> 
> >
> > [1].  https://lore.kernel.org/all/20250514070820.51793-1-21cnbao@gmail.com/
> >
> 
> Thanks
> Barry



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  6:50       ` wangzicheng
  2025-12-01  7:02         ` wangzicheng
@ 2025-12-01  7:45         ` Barry Song
  2025-12-01  8:14           ` wangzicheng
  2025-12-01  9:00           ` Kairui Song
  1 sibling, 2 replies; 24+ messages in thread
From: Barry Song @ 2025-12-01  7:45 UTC (permalink / raw)
  To: wangzicheng
  Cc: Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326, Kairui Song

On Mon, Dec 1, 2025 at 2:50 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi Barry,
>
> > Hi Liam,
> >
> > I saw you mentioned me, so I just wanted to join in :-)
> >
> > On Sat, Nov 29, 2025 at 12:16 AM Liam R. Howlett <Liam.Howlett@oracle.com>
> > wrote:
> > >
> > > * Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> > > > On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > > > > Case study:
> > > > > A widely observed issue on Android is that after application
> > > > > launch,
> > >
> > > What do you mean by application launch?  What does this mean in the
> > > kernel context?
> >
> > I think there are two cases. First, a cold start: a new process is forked to
> > launch the app. Second, when the app switches from background to
> > foreground, for example when we bring it back to the screen after it has
> > been running in the background.
> >
> > In the first case, you reboot your phone and tap the YouTube icon to start
> > the app (cold launch). In the second case, you are watching a video in
> > YouTube, then switch to Facebook, and later tap the YouTube icon again to
> > bring it from background to foreground.
> >
> Thanks for the explain, that's exactly what I meant.
>
> Android lifecycle model isn't obvious outside the Android context. I’ll make that
> clearer in the next version.
> > >
> > > > > the oldest anon generation often becomes empty, and file pages are
> > > > > over-reclaimed.
> > > >
> > > > You should fix the bug, not move the debug interface to procfs.  NACK.
> > >
> > > Barry recently sent an RFC [1] to affect LRU in the exit path for
> > > Android.  This was proven incorrect by Johannes, iirc, in another
> > > thread I cannot find (destroys performance of calling the same command).
> >
> > My understanding is that affecting the LRU in the exit path is not generally
> > correct, but it still highlights a requirement: Linux LRU needs a way to
> > understand app-cycling behavior in an Android-like system.
> >
> > >
> > > These ideas seem both related as it points to a suboptimal LRU in the
> > > Android ecosystem, at least.  It seems to stem from Androids life
> > > (cycle) choices :)
> > >
> > > I strongly agree with Willy.  We don't want another userspace daemon
> > > and/or interface, but this time to play with the LRU to avoid trying
> > > to define and fix the problem.
> > >
> > > Do you know if this affects others or why it is android specific?
> >
> > The behavior Zicheng probably wants is a proactive memory reclamation
> > interface. For example, since each app may be in a different memcg, if an
> > app has been in the background for a long time, he wants to reclaim its
> > memory proactively rather than waiting until kswapd hits the watermarks.
> >
> > This may help a newly launched app obtain memory more quickly, avoiding
> > delays from reclamation, since a new app typically requires a substantial
> > amount of memory.
> >
> > Zicheng, please let me know if I’m misunderstanding anything.
>
> Yes, but not least.
>
> 1. proactive memory reclaim: yes, that's we are after.
> When an app is swiped away and kept in the background and not use for a while,
> proactively reclaiming its memcg can help new foreground apps get memory
> faster (instead of paying the cost of direct reclaim).
>
> 2. Anon v.s. File: *bias more towards anonymous* pages for background apps.
> With mglru, however, the oldest generations often contain almost no anon pages,
> so simply tuning swappiness cannot achieve that -- reclaim will still clear file cache
> in the old generations first.
> To some extent, file caches are `over-reclaimed` in such senario, leading to a disaster
> when user‑interaction threads get stuck in direct reclaim of anon pages.

I strongly recommend separating this from your patchset. Avoid including
unrelated changes in a single patchset.

MGLRU has a mechanism to ensure that file and anon pages can keep pace
with each other. In the newest kernel, the minimum generation is 2. For
example, if anon has only 2 generations left and we decide to reclaim
anon folios, we will fall back to reclaiming file pages. Sometimes,
this means that anon reclamation is insufficient while file pages are
over-reclaimed.

static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
                       struct scan_control *sc, int type, int tier,
                       struct list_head *list)
{
        ...
        if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
                return 0;
        ...
}

This is probably not a bug, but this design can sometimes work
suboptimally.

Regarding this issue, both Kairui (from the Linux server side, cc-ed) and I
(from the Android side) have observed it. This should be addressed in
MGLRU's code, and we already have kernel code for that. It is unrelated
to your patchset, so you shouldn’t include so many unrelated changes in
a single patchset.

Please keep your patchset focused solely on whether the MGLRU proactive
reclamation interface should be promoted to sysfs (LRU_GEN already has a
folder in sysfs) instead of debugfs, if there is a v2.

The following is quoted from
`Documentation/admin-guide/mm/multigen_lru.rst`.

Proactive reclaim
-----------------
Proactive reclaim induces page reclaim when there is no memory
pressure. It usually targets cold pages only. E.g., when a new job
comes in, the job scheduler wants to proactively reclaim cold pages on
the server it selected, to improve the chance of successfully landing
this new job.

Users can write the following command to ``lru_gen`` to evict
generations less than or equal to ``min_gen_nr``.

    ``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``


>
> See the case in the cover letter.
> ```
> memcg    54 /apps/some_app
> node     0
> 1     119804          0       85461
> 2     119804          0           5
> 3     119804     181719       18667
> 4       1752        392         244
> ```
>
>
> Since the semantic gap between user/kernel space will always exist.
> It would be great benefits for leaving some APIs for user hints, just like
> mmadvise/userfault/para-virtualization.

Nope. This is just an internal detail of MGLRU and shouldn’t be exposed
as an interface.
Hopefully, Kairui or I will send a patchset soon to address the balance
issue between file and anon pages. For now, you can use `swappiness=201`
as a temporary workaround. Take a look at bytedance's patchset.[1]

> Exposing such hints to the kernel can help improve overall system performance.

[1] https://lore.kernel.org/linux-mm/cover.1744169302.git.hezhongkun.hzk@bytedance.com/

Thanks
Barry


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  7:45         ` Barry Song
@ 2025-12-01  8:14           ` wangzicheng
  2025-12-01  8:48             ` Barry Song
  2025-12-01  9:00           ` Kairui Song
  1 sibling, 1 reply; 24+ messages in thread
From: wangzicheng @ 2025-12-01  8:14 UTC (permalink / raw)
  To: Barry Song
  Cc: Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326, Kairui Song

> 
> I strongly recommend separating this from your patchset. Avoid including
> unrelated changes in a single patchset.
> 
Thank you for the clarification, separating it from our patchset makes sense.

Recall that imbalance file/anon generations is one of the reasons to move `lru_gen`
files out of the debugfs.

> MGLRU has a mechanism to ensure that file and anon pages can keep pace
> with each other. In the newest kernel, the minimum generation is 2. For
> example, if anon has only 2 generations left and we decide to reclaim anon
> folios, we will fall back to reclaiming file pages. Sometimes, this means that
> anon reclamation is insufficient while file pages are over-reclaimed.
> 
> static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
>                        struct scan_control *sc, int type, int tier,
>                        struct list_head *list) {
>         ...
>         if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
>                 return 0;
>         ...
> }
> 
> This is probably not a bug, but this design can sometimes work suboptimally.
> 

Yes, our patchset also aims to solve similar cases by proactive aging 2/3 gens.

> Regarding this issue, both Kairui (from the Linux server side, cc-ed) and I
> (from the Android side) have observed it. This should be addressed in
> MGLRU's code, and we already have kernel code for that. It is unrelated to
> your patchset, so you shouldn’t include so many unrelated changes in a
> single patchset.
> 
> Please keep your patchset focused solely on whether the MGLRU proactive
> reclamation interface should be promoted to sysfs (LRU_GEN already has a
> folder in sysfs) instead of debugfs, if there is a v2.
> 
> The following is quoted from
> `Documentation/admin-guide/mm/multigen_lru.rst`.
> 
> Proactive reclaim
> -----------------
> Proactive reclaim induces page reclaim when there is no memory pressure. It
> usually targets cold pages only. E.g., when a new job comes in, the job
> scheduler wants to proactively reclaim cold pages on the server it selected,
> to improve the chance of successfully landing this new job.
> 
> Users can write the following command to ``lru_gen`` to evict generations
> less than or equal to ``min_gen_nr``.
> 
>     ``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``
> 
> 
> >
> > See the case in the cover letter.
> > ```
> > memcg    54 /apps/some_app
> > node     0
> > 1     119804          0       85461
> > 2     119804          0           5
> > 3     119804     181719       18667
> > 4       1752        392         244
> > ```
> >
> >
> > Since the semantic gap between user/kernel space will always exist.
> > It would be great benefits for leaving some APIs for user hints, just
> > like mmadvise/userfault/para-virtualization.
> 
> Nope. This is just an internal detail of MGLRU and shouldn’t be exposed as an
> interface.
> Hopefully, Kairui or I will send a patchset soon to address the balance issue
> between file and anon pages. For now, you can use `swappiness=201` as a
> temporary workaround. Take a look at bytedance's patchset.[1]
> 
Sound great:), we are looking forward to it.

> > Exposing such hints to the kernel can help improve overall system
> performance.
> 
> [1] https://lore.kernel.org/linux-
> mm/cover.1744169302.git.hezhongkun.hzk@bytedance.com/
> 
And thank you for the `swappiness=201` workaround, we will research on it.

> Thanks
> Barry

Best,
Zicheng

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  8:14           ` wangzicheng
@ 2025-12-01  8:48             ` Barry Song
  2025-12-01  9:54               ` wangzicheng
  0 siblings, 1 reply; 24+ messages in thread
From: Barry Song @ 2025-12-01  8:48 UTC (permalink / raw)
  To: wangzicheng
  Cc: Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326, Kairui Song

On Mon, Dec 1, 2025 at 4:14 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> >
> > I strongly recommend separating this from your patchset. Avoid including
> > unrelated changes in a single patchset.
> >
> Thank you for the clarification, separating it from our patchset makes sense.
>

Also note that memcg already has an interface for proactive reclamation,
so I’m not certain whether your patchset can coexist with it or extend
it to meet your requirements—which seems quite impossible to me

memory.reclaim
        A write-only nested-keyed file which exists for all cgroups.

        This is a simple interface to trigger memory reclaim in the
        target cgroup.

        Example::

          echo "1G" > memory.reclaim

        Please note that the kernel can over or under reclaim from
        the target cgroup. If less bytes are reclaimed than the
        specified amount, -EAGAIN is returned.

Thanks
Barry


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  7:45         ` Barry Song
  2025-12-01  8:14           ` wangzicheng
@ 2025-12-01  9:00           ` Kairui Song
  2025-12-01 12:01             ` zhongjinji
  1 sibling, 1 reply; 24+ messages in thread
From: Kairui Song @ 2025-12-01  9:00 UTC (permalink / raw)
  To: Barry Song
  Cc: wangzicheng, Liam R. Howlett, Matthew Wilcox, akpm, hannes,
	david, axelrasmussen, yuanchu, mhocko, zhengqi.arch,
	shakeel.butt, lorenzo.stoakes, weixugc, vbabka, rppt, surenb,
	mhocko, corbet, linux-mm, linux-doc, linux-kernel, wangtao,
	wangzhen 00021541, zhongjinji 00025326

On Mon, Dec 1, 2025 at 3:46 PM Barry Song <21cnbao@gmail.com> wrote:
>
> On Mon, Dec 1, 2025 at 2:50 PM wangzicheng <wangzicheng@honor.com> wrote:
> >
> > Hi Barry,
> >
> > > Hi Liam,
> > >
> > > I saw you mentioned me, so I just wanted to join in :-)
> > >
> > > On Sat, Nov 29, 2025 at 12:16 AM Liam R. Howlett <Liam.Howlett@oracle.com>
> > > wrote:
> > > >
> > > > * Matthew Wilcox <willy@infradead.org> [251128 10:16]:
> > > > > On Fri, Nov 28, 2025 at 10:53:12AM +0800, Zicheng Wang wrote:
> > > > > > Case study:
> > > > > > A widely observed issue on Android is that after application
> > > > > > launch,
> > > >
> > > > What do you mean by application launch?  What does this mean in the
> > > > kernel context?
> > >
> > > I think there are two cases. First, a cold start: a new process is forked to
> > > launch the app. Second, when the app switches from background to
> > > foreground, for example when we bring it back to the screen after it has
> > > been running in the background.
> > >
> > > In the first case, you reboot your phone and tap the YouTube icon to start
> > > the app (cold launch). In the second case, you are watching a video in
> > > YouTube, then switch to Facebook, and later tap the YouTube icon again to
> > > bring it from background to foreground.
> > >
> > Thanks for the explain, that's exactly what I meant.
> >
> > Android lifecycle model isn't obvious outside the Android context. I’ll make that
> > clearer in the next version.
> > > >
> > > > > > the oldest anon generation often becomes empty, and file pages are
> > > > > > over-reclaimed.
> > > > >
> > > > > You should fix the bug, not move the debug interface to procfs.  NACK.
> > > >
> > > > Barry recently sent an RFC [1] to affect LRU in the exit path for
> > > > Android.  This was proven incorrect by Johannes, iirc, in another
> > > > thread I cannot find (destroys performance of calling the same command).
> > >
> > > My understanding is that affecting the LRU in the exit path is not generally
> > > correct, but it still highlights a requirement: Linux LRU needs a way to
> > > understand app-cycling behavior in an Android-like system.
> > >
> > > >
> > > > These ideas seem both related as it points to a suboptimal LRU in the
> > > > Android ecosystem, at least.  It seems to stem from Androids life
> > > > (cycle) choices :)
> > > >
> > > > I strongly agree with Willy.  We don't want another userspace daemon
> > > > and/or interface, but this time to play with the LRU to avoid trying
> > > > to define and fix the problem.
> > > >
> > > > Do you know if this affects others or why it is android specific?
> > >
> > > The behavior Zicheng probably wants is a proactive memory reclamation
> > > interface. For example, since each app may be in a different memcg, if an
> > > app has been in the background for a long time, he wants to reclaim its
> > > memory proactively rather than waiting until kswapd hits the watermarks.
> > >
> > > This may help a newly launched app obtain memory more quickly, avoiding
> > > delays from reclamation, since a new app typically requires a substantial
> > > amount of memory.
> > >
> > > Zicheng, please let me know if I’m misunderstanding anything.
> >
> > Yes, but not least.
> >
> > 1. proactive memory reclaim: yes, that's we are after.
> > When an app is swiped away and kept in the background and not use for a while,
> > proactively reclaiming its memcg can help new foreground apps get memory
> > faster (instead of paying the cost of direct reclaim).
> >
> > 2. Anon v.s. File: *bias more towards anonymous* pages for background apps.
> > With mglru, however, the oldest generations often contain almost no anon pages,
> > so simply tuning swappiness cannot achieve that -- reclaim will still clear file cache
> > in the old generations first.
> > To some extent, file caches are `over-reclaimed` in such senario, leading to a disaster
> > when user‑interaction threads get stuck in direct reclaim of anon pages.
> I strongly recommend separating this from your patchset. Avoid including
> unrelated changes in a single patchset.
>
> MGLRU has a mechanism to ensure that file and anon pages can keep pace
> with each other. In the newest kernel, the minimum generation is 2. For
> example, if anon has only 2 generations left and we decide to reclaim
> anon folios, we will fall back to reclaiming file pages. Sometimes,
> this means that anon reclamation is insufficient while file pages are
> over-reclaimed.
>
> static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
>                        struct scan_control *sc, int type, int tier,
>                        struct list_head *list)
> {
>         ...
>         if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
>                 return 0;
>         ...
> }
>
> This is probably not a bug, but this design can sometimes work
> suboptimally.
>
> Regarding this issue, both Kairui (from the Linux server side, cc-ed) and I
> (from the Android side) have observed it. This should be addressed in
> MGLRU's code, and we already have kernel code for that. It is unrelated
> to your patchset, so you shouldn’t include so many unrelated changes in
> a single patchset.

Thanks for including me in the discussion.

Right, we are seeing similar problems on our server too. To workaround
it we force an age iteration before reclaiming when it happens, which
isn't the best choice. When the LRU is long and the opposite type of
the folios we want to reclaim is piling up in the oldest gen, a forced
age will have to move all these folios, which leads to long tailing
issues. Let's work on a reasonable solution for that.

>
> Please keep your patchset focused solely on whether the MGLRU proactive
> reclamation interface should be promoted to sysfs (LRU_GEN already has a
> folder in sysfs) instead of debugfs, if there is a v2.
>
> The following is quoted from
> `Documentation/admin-guide/mm/multigen_lru.rst`.
>
> Proactive reclaim
> -----------------
> Proactive reclaim induces page reclaim when there is no memory
> pressure. It usually targets cold pages only. E.g., when a new job
> comes in, the job scheduler wants to proactively reclaim cold pages on
> the server it selected, to improve the chance of successfully landing
> this new job.
>
> Users can write the following command to ``lru_gen`` to evict
> generations less than or equal to ``min_gen_nr``.
>
>     ``- memcg_id node_id min_gen_nr [swappiness [nr_to_reclaim]]``
>
>
> >
> > See the case in the cover letter.
> > ```
> > memcg    54 /apps/some_app
> > node     0
> > 1     119804          0       85461
> > 2     119804          0           5
> > 3     119804     181719       18667
> > 4       1752        392         244
> > ```
> >
> >
> > Since the semantic gap between user/kernel space will always exist.
> > It would be great benefits for leaving some APIs for user hints, just like
> > mmadvise/userfault/para-virtualization.
>
> Nope. This is just an internal detail of MGLRU and shouldn’t be exposed
> as an interface.
> Hopefully, Kairui or I will send a patchset soon to address the balance
> issue between file and anon pages. For now, you can use `swappiness=201`
> as a temporary workaround. Take a look at bytedance's patchset.[1]

Agree, Thanks!


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  8:48             ` Barry Song
@ 2025-12-01  9:54               ` wangzicheng
  2025-12-01 10:39                 ` Barry Song
  0 siblings, 1 reply; 24+ messages in thread
From: wangzicheng @ 2025-12-01  9:54 UTC (permalink / raw)
  To: Barry Song
  Cc: Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326, Kairui Song

Hi Barry,

Thank you for the comment, actually we do know the cgroup file.

What we really need is to *proactive aging 2~3 gens* before proactive reclaim.
(especially after cold launches when no anon pages in the oldest gens)

The proactive aging also helps distribute the anon and file pages evenly in 
MGLRU gens. And reclaiming won't fall into file caches.

> Also note that memcg already has an interface for proactive reclamation,
> so I’m not certain whether your patchset can coexist with it or extend
> it to meet your requirements—which seems quite impossible to me
> 
> memory.reclaim
>         A write-only nested-keyed file which exists for all cgroups.
> 
>         This is a simple interface to trigger memory reclaim in the
>         target cgroup.
> 
>         Example::
> 
>           echo "1G" > memory.reclaim
> 
>         Please note that the kernel can over or under reclaim from
>         the target cgroup. If less bytes are reclaimed than the
>         specified amount, -EAGAIN is returned.
> 
This remind me that adding a `memor.aging` under memcg directories
rather than adding new procfs files is also a great option.

> Thanks
> Barry

Thanks,
Zicheng

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  9:54               ` wangzicheng
@ 2025-12-01 10:39                 ` Barry Song
  2025-12-01 13:32                   ` wangzicheng
  0 siblings, 1 reply; 24+ messages in thread
From: Barry Song @ 2025-12-01 10:39 UTC (permalink / raw)
  To: wangzicheng
  Cc: Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326, Kairui Song

Hi Zicheng,

On Mon, Dec 1, 2025 at 5:55 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi Barry,
>
> Thank you for the comment, actually we do know the cgroup file.
>
> What we really need is to *proactive aging 2~3 gens* before proactive reclaim.
> (especially after cold launches when no anon pages in the oldest gens)
>
> The proactive aging also helps distribute the anon and file pages evenly in
> MGLRU gens. And reclaiming won't fall into file caches.

I’m not quite sure what you mean by “reclaiming won’t fall into file caches.”

I assume you mean you configured a high swappiness for MGLRU proactive
reclamation, so when both anon and file have four generations,
`get_type_to_scan()` effectively always returns anon?

>
> > Also note that memcg already has an interface for proactive reclamation,
> > so I’m not certain whether your patchset can coexist with it or extend
> > it to meet your requirements—which seems quite impossible to me
> >
> > memory.reclaim
> >         A write-only nested-keyed file which exists for all cgroups.
> >
> >         This is a simple interface to trigger memory reclaim in the
> >         target cgroup.
> >
> >         Example::
> >
> >           echo "1G" > memory.reclaim
> >
> >         Please note that the kernel can over or under reclaim from
> >         the target cgroup. If less bytes are reclaimed than the
> >         specified amount, -EAGAIN is returned.
> >
> This remind me that adding a `memor.aging` under memcg directories
> rather than adding new procfs files is also a great option.

I still don’t understand why. Aging is something MGLRU itself should
handle; components outside MGLRU, such as cgroup v2, do not need to be
aware of this concept at all. Exposing it will likely lead to another
immediate NAK.

In short, aging should remain within MGLRU’s internal scope.

But it seems you do want some policy control for your proactive
reclamation, such as always reclaiming anon pages or reclaiming them
more aggressively than file pages. I assume Zhongkun’s patch [1] we
mentioned earlier should provide support for that, correct?

As a workaround, you can set `swappiness=max` for `memory.reclaim` before
we internally improve the handling of the aging issue. In short,
“proactive aging” and similar mechanisms should be handled automatically
and internally within the scope of the MGLRU code.

[1] https://lore.kernel.org/linux-mm/cover.1744169302.git.hezhongkun.hzk@bytedance.com/

Thanks
Barry

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01  9:00           ` Kairui Song
@ 2025-12-01 12:01             ` zhongjinji
  0 siblings, 0 replies; 24+ messages in thread
From: zhongjinji @ 2025-12-01 12:01 UTC (permalink / raw)
  To: ryncsn
  Cc: 21cnbao, Liam.Howlett, akpm, axelrasmussen, corbet, david,
	hannes, linux-doc, linux-kernel, linux-mm, lorenzo.stoakes,
	mhocko, mhocko, rppt, shakeel.butt, surenb, tao.wangtao, vbabka,
	wangzhen5, wangzicheng, weixugc, willy, yuanchu, zhengqi.arch,
	zhongjinji

> > I strongly recommend separating this from your patchset. Avoid including
> > unrelated changes in a single patchset.
> >
> > MGLRU has a mechanism to ensure that file and anon pages can keep pace
> > with each other. In the newest kernel, the minimum generation is 2. For
> > example, if anon has only 2 generations left and we decide to reclaim
> > anon folios, we will fall back to reclaiming file pages. Sometimes,
> > this means that anon reclamation is insufficient while file pages are
> > over-reclaimed.
> >
> > static int scan_folios(unsigned long nr_to_scan, struct lruvec *lruvec,
> >                        struct scan_control *sc, int type, int tier,
> >                        struct list_head *list)
> > {
> >         ...
> >         if (get_nr_gens(lruvec, type) == MIN_NR_GENS)
> >                 return 0;
> >         ...
> > }
> >
> > This is probably not a bug, but this design can sometimes work
> > suboptimally.
> >
> > Regarding this issue, both Kairui (from the Linux server side, cc-ed) and I
> > (from the Android side) have observed it. This should be addressed in
> > MGLRU's code, and we already have kernel code for that. It is unrelated
> > to your patchset, so you shouldn’t include so many unrelated changes in
> > a single patchset.
> 
> Thanks for including me in the discussion.
> 
> Right, we are seeing similar problems on our server too. To workaround
> it we force an age iteration before reclaiming when it happens, which
> isn't the best choice. When the LRU is long and the opposite type of
> the folios we want to reclaim is piling up in the oldest gen, a forced
> age will have to move all these folios, which leads to long tailing
> issues. Let's work on a reasonable solution for that.

We have encountered the same issue on Android. When an app is frozen
(which may mean the app will not be used for a long time), we want to
reclaim the app's anonymous pages. After all inactive anonymous pages
are reclaimed, the reclamation cannot proceed further. If we actively trigger
aging on anonymous pages at this point, the number of inactive file pages
may become very large.

To address this issue, I have tried using different max_seq values for
anonymous and file pages. When reclaiming anonymous pages through memory.reclaim,
we can age only the anonymous pages. However, this approach requires extensive
code changes, and it does not seem worthwhile to implement.



^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01 10:39                 ` Barry Song
@ 2025-12-01 13:32                   ` wangzicheng
  2025-12-01 16:57                     ` Barry Song
  0 siblings, 1 reply; 24+ messages in thread
From: wangzicheng @ 2025-12-01 13:32 UTC (permalink / raw)
  To: Barry Song
  Cc: Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326, Kairui Song

Hi Barry,

Thank you for the follow-up questions.

It seems that our main testbed (kernel v6.6/v6.12 for latest devices), 
don't have SWAPPINESS_ANON_ONLY/201 - related patches yet.

Since the max swappiness is 200, there are quite scenarios that file
pages are the only option.

Quote from kairui's reply:
> Right, we are seeing similar problems on our server too. To workaround
> it we force an age iteration before reclaiming when it happens, which
> isn't the best choice. When the LRU is long and the opposite type of
> the folios we want to reclaim is piling up in the oldest gen, a forced
> age will have to move all these folios, which leads to long tailing
> issues. Let's work on a reasonable solution for that.

Again, thank you for your guidance. We will carefully evaluate the 
Patchset[1] you recommended.

> Hi Zicheng,
> 
> On Mon, Dec 1, 2025 at 5:55 PM wangzicheng <wangzicheng@honor.com>
> wrote:
> >
> > Hi Barry,
> >
> > Thank you for the comment, actually we do know the cgroup file.
> >
> > What we really need is to *proactive aging 2~3 gens* before proactive
> reclaim.
> > (especially after cold launches when no anon pages in the oldest gens)
> >
> > The proactive aging also helps distribute the anon and file pages evenly in
> > MGLRU gens. And reclaiming won't fall into file caches.
> 
> I’m not quite sure what you mean by “reclaiming won’t fall into file caches.”
> 
> I assume you mean you configured a high swappiness for MGLRU proactive
> reclamation, so when both anon and file have four generations,
> `get_type_to_scan()` effectively always returns anon?
> 
> >
> > > Also note that memcg already has an interface for proactive reclamation,
> > > so I’m not certain whether your patchset can coexist with it or extend
> > > it to meet your requirements—which seems quite impossible to me
> > >
> > > memory.reclaim
> > >         A write-only nested-keyed file which exists for all cgroups.
> > >
> > >         This is a simple interface to trigger memory reclaim in the
> > >         target cgroup.
> > >
> > >         Example::
> > >
> > >           echo "1G" > memory.reclaim
> > >
> > >         Please note that the kernel can over or under reclaim from
> > >         the target cgroup. If less bytes are reclaimed than the
> > >         specified amount, -EAGAIN is returned.
> > >
> > This remind me that adding a `memor.aging` under memcg directories
> > rather than adding new procfs files is also a great option.
> 
> I still don’t understand why. Aging is something MGLRU itself should
> handle; components outside MGLRU, such as cgroup v2, do not need to be
> aware of this concept at all. Exposing it will likely lead to another
> immediate NAK.
> 
> In short, aging should remain within MGLRU’s internal scope.

I would like to express a different point of view. We are working on something
Interesting on it, will be shared once ready.

> 
> But it seems you do want some policy control for your proactive
> reclamation, such as always reclaiming anon pages or reclaiming them
> more aggressively than file pages. I assume Zhongkun’s patch [1] we
> mentioned earlier should provide support for that, correct?
> 
> As a workaround, you can set `swappiness=max` for `memory.reclaim`
> before
> we internally improve the handling of the aging issue. In short,
> “proactive aging” and similar mechanisms should be handled automatically
> and internally within the scope of the MGLRU code.

Sure, we will make a careful evaluation.

> 
> [1] https://lore.kernel.org/linux-
> mm/cover.1744169302.git.hezhongkun.hzk@bytedance.com/
> 
> Thanks
> Barry

Thanks
Zicheng

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01 13:32                   ` wangzicheng
@ 2025-12-01 16:57                     ` Barry Song
  2025-12-02  2:28                       ` wangzicheng
  0 siblings, 1 reply; 24+ messages in thread
From: Barry Song @ 2025-12-01 16:57 UTC (permalink / raw)
  To: wangzicheng
  Cc: Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326, Kairui Song

On Mon, Dec 1, 2025 at 9:32 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi Barry,
>
> Thank you for the follow-up questions.
>
> It seems that our main testbed (kernel v6.6/v6.12 for latest devices),
> don't have SWAPPINESS_ANON_ONLY/201 - related patches yet.

Then please check with Suren whether it is possible to backport this to
the Android common kernel.
My understanding is that this should already be present in the Android 6.12
kernel.

>
> Since the max swappiness is 200, there are quite scenarios that file
> pages are the only option.
>
> Quote from kairui's reply:
> > Right, we are seeing similar problems on our server too. To workaround
> > it we force an age iteration before reclaiming when it happens, which
> > isn't the best choice. When the LRU is long and the opposite type of
> > the folios we want to reclaim is piling up in the oldest gen, a forced
> > age will have to move all these folios, which leads to long tailing
> > issues. Let's work on a reasonable solution for that.
>

We all agree that MGLRU has this generation issue. You mentioned it, I agreed
and noted that both Kairui and I had observed it. Then Kairui replied that he
had indeed seen it as well. Now you are using Kairui’s reply to argue against
me, and I honestly don’t understand the logic behind your responses.

> Again, thank you for your guidance. We will carefully evaluate the
> Patchset[1] you recommended.
>
> > Hi Zicheng,
> >
> > On Mon, Dec 1, 2025 at 5:55 PM wangzicheng <wangzicheng@honor.com>
> > wrote:
> > >
> > > Hi Barry,
> > >
> > > Thank you for the comment, actually we do know the cgroup file.
> > >
> > > What we really need is to *proactive aging 2~3 gens* before proactive
> > reclaim.
> > > (especially after cold launches when no anon pages in the oldest gens)
> > >
> > > The proactive aging also helps distribute the anon and file pages evenly in
> > > MGLRU gens. And reclaiming won't fall into file caches.
> >
> > I’m not quite sure what you mean by “reclaiming won’t fall into file caches.”
> >
> > I assume you mean you configured a high swappiness for MGLRU proactive
> > reclamation, so when both anon and file have four generations,
> > `get_type_to_scan()` effectively always returns anon?
> >
> > >
> > > > Also note that memcg already has an interface for proactive reclamation,
> > > > so I’m not certain whether your patchset can coexist with it or extend
> > > > it to meet your requirements—which seems quite impossible to me
> > > >
> > > > memory.reclaim
> > > >         A write-only nested-keyed file which exists for all cgroups.
> > > >
> > > >         This is a simple interface to trigger memory reclaim in the
> > > >         target cgroup.
> > > >
> > > >         Example::
> > > >
> > > >           echo "1G" > memory.reclaim
> > > >
> > > >         Please note that the kernel can over or under reclaim from
> > > >         the target cgroup. If less bytes are reclaimed than the
> > > >         specified amount, -EAGAIN is returned.
> > > >
> > > This remind me that adding a `memor.aging` under memcg directories
> > > rather than adding new procfs files is also a great option.
> >
> > I still don’t understand why. Aging is something MGLRU itself should
> > handle; components outside MGLRU, such as cgroup v2, do not need to be
> > aware of this concept at all. Exposing it will likely lead to another
> > immediate NAK.
> >
> > In short, aging should remain within MGLRU’s internal scope.
>
> I would like to express a different point of view. We are working on something
> Interesting on it, will be shared once ready.

You are always welcome to share, but please understand that memory.aging is
not of interest to any module outside the scope of MGLRU itself. An interface
is an interface, and internal implementation should remain internal. In other
words, there is no reason for cgroupv2 to be aware of what “aging” is.

You may submit your new code as a "fix" for the generation issue without
introducing a new interface. That would be a good starting point for
discussing how to resolve the problem.

>
> >
> > But it seems you do want some policy control for your proactive
> > reclamation, such as always reclaiming anon pages or reclaiming them
> > more aggressively than file pages. I assume Zhongkun’s patch [1] we
> > mentioned earlier should provide support for that, correct?
> >
> > As a workaround, you can set `swappiness=max` for `memory.reclaim`
> > before
> > we internally improve the handling of the aging issue. In short,
> > “proactive aging” and similar mechanisms should be handled automatically
> > and internally within the scope of the MGLRU code.
>
> Sure, we will make a careful evaluation.

Thanks
Barry


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen
  2025-11-28  2:53 ` [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen Zicheng Wang
  2025-11-28  4:33   ` Randy Dunlap
@ 2025-12-01 21:35   ` Yuanchu Xie
  2025-12-02  2:53     ` wangzicheng
  1 sibling, 1 reply; 24+ messages in thread
From: Yuanchu Xie @ 2025-12-01 21:35 UTC (permalink / raw)
  To: Zicheng Wang
  Cc: akpm, hannes, david, axelrasmussen, mhocko, zhengqi.arch,
	shakeel.butt, lorenzo.stoakes, weixugc, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, corbet, linux-mm, linux-doc, linux-kernel

On Thu, Nov 27, 2025 at 8:54 PM Zicheng Wang <wangzicheng@honor.com> wrote:
>
> Signed-off-by: Zicheng Wang <wangzicheng@honor.com>
> ---
>  mm/Kconfig | 10 ++++++++++
>  1 file changed, 10 insertions(+)
>
> diff --git a/mm/Kconfig b/mm/Kconfig
> index e443fe8cd..be7efa794 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -1325,6 +1325,16 @@ config LRU_GEN_STATS
>  config LRU_GEN_WALKS_MMU
>         def_bool y
>         depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
> +
> +config LRU_GEN_PROCFS_CTRL
> +       bool "Move lru_gen files from debugfs to procfs"
> +       depends on LRU_GEN && PROC_FS
> +       help
> +         Move lru_gen management from debugfs to procfs (/proc/lru_gen).
> +         This production-ready feature provides critical memory reclaim
> +         prediction and control. It is no longer experimental.
> +         The migration ensures availability in commercial products where
> +         debugfs may be disabled.
Hi Zicheng,

A config option determining where LRU_gen files reside creates a
fragile procfs interface. Consider adding a similar interface with
less implementation detail to /sys/kernel/mm/lru_gen/ if the goal is
to stabilize the debugfs APIs.

If the goal is to proactively age lruvecs that have been at
MIN_NR_GENS for some time/events/etc, is it possible to integrate this
into the kernel and avoid leaking MGLRU implementation details into
userspace?

Thanks,
Yuanchu


^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs
  2025-12-01 16:57                     ` Barry Song
@ 2025-12-02  2:28                       ` wangzicheng
  0 siblings, 0 replies; 24+ messages in thread
From: wangzicheng @ 2025-12-02  2:28 UTC (permalink / raw)
  To: Barry Song
  Cc: Liam R. Howlett, Matthew Wilcox, akpm, hannes, david,
	axelrasmussen, yuanchu, mhocko, zhengqi.arch, shakeel.butt,
	lorenzo.stoakes, weixugc, vbabka, rppt, surenb, mhocko, corbet,
	linux-mm, linux-doc, linux-kernel, wangtao, wangzhen 00021541,
	zhongjinji 00025326, Kairui Song, Yuanchu Xie

Hi Barry,

> Then please check with Suren whether it is possible to backport this to
> the Android common kernel.
> My understanding is that this should already be present in the Android 6.12
> kernel.
> 
Thanks for the reminding.

> >
> > Since the max swappiness is 200, there are quite scenarios that file
> > pages are the only option.
> >
> > Quote from kairui's reply:
> > > Right, we are seeing similar problems on our server too. To workaround
> > > it we force an age iteration before reclaiming when it happens, which
> > > isn't the best choice. When the LRU is long and the opposite type of
> > > the folios we want to reclaim is piling up in the oldest gen, a forced
> > > age will have to move all these folios, which leads to long tailing
> > > issues. Let's work on a reasonable solution for that.
> >
> 
> We all agree that MGLRU has this generation issue. You mentioned it, I
> agreed
> and noted that both Kairui and I had observed it. Then Kairui replied that he
> had indeed seen it as well. Now you are using Kairui’s reply to argue against
> me, and I honestly don’t understand the logic behind your responses.
> 

My apologize if my previous wording caused any confusion.

The only thing the patchset (want to) do is forcing 2/3 gens aging right before proactive
reclaim, and it helps reclaim more anon pages and preserve more file pages under
certain workload. (400~800MB MemAvailable improvement).

The reason for quoting Kairui's reply:
`force aging 2/3 gens before reclaim` would be roughly similar in spirit to what Kairui
referred to ` force an age iteration before reclaiming`, from my understanding.

If my understanding is inaccurate, please feel free to correct me.

> > Again, thank you for your guidance. We will carefully evaluate the
> > Patchset[1] you recommended.
> >
> > > Hi Zicheng,
> > >
> > > On Mon, Dec 1, 2025 at 5:55 PM wangzicheng <wangzicheng@honor.com>
> > > wrote:
> > > >
> > > > Hi Barry,
> > > >
> > > > Thank you for the comment, actually we do know the cgroup file.
> > > >
> > > > What we really need is to *proactive aging 2~3 gens* before proactive
> > > reclaim.
> > > > (especially after cold launches when no anon pages in the oldest gens)
> > > >
> > > > The proactive aging also helps distribute the anon and file pages evenly
> in
> > > > MGLRU gens. And reclaiming won't fall into file caches.
> > >
> > > I’m not quite sure what you mean by “reclaiming won’t fall into file
> caches.”
> > >
> > > I assume you mean you configured a high swappiness for MGLRU
> proactive
> > > reclamation, so when both anon and file have four generations,
> > > `get_type_to_scan()` effectively always returns anon?
> > >
> > > >
> > > > > Also note that memcg already has an interface for proactive
> reclamation,
> > > > > so I’m not certain whether your patchset can coexist with it or extend
> > > > > it to meet your requirements—which seems quite impossible to me
> > > > >
> > > > > memory.reclaim
> > > > >         A write-only nested-keyed file which exists for all cgroups.
> > > > >
> > > > >         This is a simple interface to trigger memory reclaim in the
> > > > >         target cgroup.
> > > > >
> > > > >         Example::
> > > > >
> > > > >           echo "1G" > memory.reclaim
> > > > >
> > > > >         Please note that the kernel can over or under reclaim from
> > > > >         the target cgroup. If less bytes are reclaimed than the
> > > > >         specified amount, -EAGAIN is returned.
> > > > >
> > > > This remind me that adding a `memor.aging` under memcg directories
> > > > rather than adding new procfs files is also a great option.
> > >
> > > I still don’t understand why. Aging is something MGLRU itself should
> > > handle; components outside MGLRU, such as cgroup v2, do not need to
> be
> > > aware of this concept at all. Exposing it will likely lead to another
> > > immediate NAK.
> > >
> > > In short, aging should remain within MGLRU’s internal scope.
> >
> > I would like to express a different point of view. We are working on
> something
> > Interesting on it, will be shared once ready.
> 
> You are always welcome to share, but please understand that memory.aging
> is
> not of interest to any module outside the scope of MGLRU itself. An
> interface
> is an interface, and internal implementation should remain internal. In other
> words, there is no reason for cgroupv2 to be aware of what “aging” is.
> 
> You may submit your new code as a "fix" for the generation issue without
> introducing a new interface. That would be a good starting point for
> discussing how to resolve the problem.
> 

Completely agree with your guidance.
We will revisit the design and think about the next version, and try to keep the
mechanism internally.

> >
> > >
> > > But it seems you do want some policy control for your proactive
> > > reclamation, such as always reclaiming anon pages or reclaiming them
> > > more aggressively than file pages. I assume Zhongkun’s patch [1] we
> > > mentioned earlier should provide support for that, correct?
> > >
> > > As a workaround, you can set `swappiness=max` for `memory.reclaim`
> > > before
> > > we internally improve the handling of the aging issue. In short,
> > > “proactive aging” and similar mechanisms should be handled
> automatically
> > > and internally within the scope of the MGLRU code.
> >
> > Sure, we will make a careful evaluation.
> 
> Thanks
> Barry

Best,
Zicheng

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen
  2025-12-01 21:35   ` Yuanchu Xie
@ 2025-12-02  2:53     ` wangzicheng
  0 siblings, 0 replies; 24+ messages in thread
From: wangzicheng @ 2025-12-02  2:53 UTC (permalink / raw)
  To: Yuanchu Xie
  Cc: akpm, hannes, david, axelrasmussen, mhocko, zhengqi.arch,
	shakeel.butt, lorenzo.stoakes, weixugc, Liam.Howlett, vbabka,
	rppt, surenb, mhocko, corbet, linux-mm, linux-doc, linux-kernel,
	wangtao, wangzhen 00021541, zhongjinji 00025326, Kairui Song,
	Yuanchu Xie, Barry Song

Hi Yuanchu,

> 
> On Thu, Nov 27, 2025 at 8:54 PM Zicheng Wang <wangzicheng@honor.com>
> wrote:
> >
> > Signed-off-by: Zicheng Wang <wangzicheng@honor.com>
> > ---
> >  mm/Kconfig | 10 ++++++++++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index e443fe8cd..be7efa794 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -1325,6 +1325,16 @@ config LRU_GEN_STATS
> >  config LRU_GEN_WALKS_MMU
> >         def_bool y
> >         depends on LRU_GEN && ARCH_HAS_HW_PTE_YOUNG
> > +
> > +config LRU_GEN_PROCFS_CTRL
> > +       bool "Move lru_gen files from debugfs to procfs"
> > +       depends on LRU_GEN && PROC_FS
> > +       help
> > +         Move lru_gen management from debugfs to procfs (/proc/lru_gen).
> > +         This production-ready feature provides critical memory reclaim
> > +         prediction and control. It is no longer experimental.
> > +         The migration ensures availability in commercial products where
> > +         debugfs may be disabled.
> Hi Zicheng,
> 
> A config option determining where LRU_gen files reside creates a
> fragile procfs interface. Consider adding a similar interface with
> less implementation detail to /sys/kernel/mm/lru_gen/ if the goal is
> to stabilize the debugfs APIs.

Thank you for the comments.
Your suggestion on sysfs really make sence. The only challenge is the show
buffers of sysfs are limited to PAGE_SIZE according to the kernel doc.
It can hardly show all memcg infos in a single read, as debugfs file
currently does.

> 
> If the goal is to proactively age lruvecs that have been at
> MIN_NR_GENS for some time/events/etc, is it possible to integrate this
> into the kernel and avoid leaking MGLRU implementation details into
> userspace?

We will explore the possible options in the next version. :-)

> 
> Thanks,
> Yuanchu

Best,
Zicheng

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2025-12-02  2:53 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2025-11-28  2:53 [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Zicheng Wang
2025-11-28  2:53 ` [PATCH 1/3] mm/lru_gen: add procfs support for lru_gen interfaces Zicheng Wang
2025-11-28  2:53 ` [PATCH 2/3] mm/lru_gen: add configuration option to select debugfs/procfs for lru_gen Zicheng Wang
2025-11-28  4:33   ` Randy Dunlap
2025-11-28  7:19     ` wangzicheng
2025-12-01 21:35   ` Yuanchu Xie
2025-12-02  2:53     ` wangzicheng
2025-11-28  2:53 ` [PATCH 3/3] mm/lru_gen: document procfs interface " Zicheng Wang
2025-11-28 15:16 ` [PATCH 0/3] mm/lru_gen: move lru_gen control interface from debugfs to procfs Matthew Wilcox
2025-11-28 16:13   ` Liam R. Howlett
2025-12-01  4:13     ` Barry Song
2025-12-01  6:50       ` wangzicheng
2025-12-01  7:02         ` wangzicheng
2025-12-01  7:45         ` Barry Song
2025-12-01  8:14           ` wangzicheng
2025-12-01  8:48             ` Barry Song
2025-12-01  9:54               ` wangzicheng
2025-12-01 10:39                 ` Barry Song
2025-12-01 13:32                   ` wangzicheng
2025-12-01 16:57                     ` Barry Song
2025-12-02  2:28                       ` wangzicheng
2025-12-01  9:00           ` Kairui Song
2025-12-01 12:01             ` zhongjinji
2025-12-01  7:13       ` zhongjinji

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox