RE: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: wangzicheng <wangzicheng@honor.com>
To: Barry Song <21cnbao@gmail.com>
Cc: "lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	wangxin 00023513 <wangxin23@honor.com>, gao xu <gaoxu2@honor.com>,
	wangtao <tao.wangtao@honor.com>,
	liulu 00013167 <liulu.liu@honor.com>,
	zhouxiaolong <zhouxiaolong9@honor.com>,
	linkunli <linkunli@honor.com>,
	"kasong@tencent.com" <kasong@tencent.com>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	"axelrasmussen@google.com" <axelrasmussen@google.com>,
	"yuanchu@google.com" <yuanchu@google.com>,
	"weixugc@google.com" <weixugc@google.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"willy@infradead.org" <willy@infradead.org>,
	"surenb@google.com" <surenb@google.com>,
	yangxuzhe 00017436 <yangxuzhe@honor.com>
Subject: RE: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
Date: Wed, 25 Feb 2026 10:43:42 +0000	[thread overview]
Message-ID: <7ecdf6b68ace402f90a4685684bbe995@honor.com> (raw)
In-Reply-To: <CAGsJ_4zatnuLkCJyDe_o_yXmhngpc54kV6b7M3tu6JyvF-ZxDw@mail.gmail.com>

> 
> On Tue, Feb 24, 2026 at 11:17 AM wangzicheng <wangzicheng@honor.com>
> wrote:
> >
> > Hi,
> >
> > I previously sent a similar email which unfortunately had encoding issues.
> > I'm resending a cleaned-up version here so it's easier to read and discuss.
> >
> > MGLRU has been available on Android for about four years, but many
> > OEM vendors still choose not to enable it in production.
> > HONOR is a major Android OEM shipping tens of millions of devices
> > per year, and we run MGLRU on all our devices across multiple kernel
> > versions (5.15~6.12) and RAM configurations(4G~24G), backed by
> > large-scale beta and field data. From this deployment, we have identified
> > four concrete issues (Q1-Q4) and current workarounds, and would like to
> > work with the community to design upstream solutions.
> > Also we would like to discuss MGLRU's future direction on Android.
> >
> > Below is a short summary of what we see.
> >
> > Q1: anon/file imbalance and drop in available memory
> > Android apps workload show a persistent anon/file generational
> > imbalance under MGLRU:
> > anon pages tend to stay in the youngest 2 generations;
> > file pages are spread across multiple generations and over-reclaimed.
> > Tuning swappiness to 200 and ANON_ONLY does not fully fix this.
> > On a 16G media workload we see:
> > MGLRU: MemAvailable ~ 6060 MB
> > legacy: MemAvailable ~ 6982 MB (differs by ~1G)
> > Today we mitigate this via explicit memcg aging in Android
> > userspace [1], which is a vendor-only workaround.
> 
> One fundamental design of MGLRU is that file generations and anon
> generations catch up with each other when the generation gap reaches
> two or more. As a result, even if swappiness is set very high, its
> effect on aggressively reclaiming anonymous pages is much smaller
> than with the traditional LRU.
> 
> One workaround is to force old file folios to be promoted to relatively
> younger generations, but this could also cause problems by clustering
> file folios in the newer generations.
> 
> I wonder if anon and file generations can progress separately to some
> extent.
> 
Hi Barry,
Thanks for the detailed feedback and observations.
We also feel this direction might be more reasonable.

> >
> > Q2: Hard to control reclaim amount and stopping conditions (memcg)
> > For memcg reclaim it is hard to stop near a target reclaim amount:
> > kswapd can continue reclaiming even after watermarks are met
> > (e.g. to satisfy higher-order or memcg allocations);
> 
> High-order is an interesting topic. Sometimes, vmscan over-reclaims to
> satisfy high-order allocations, reclaiming many zero-order pages even
> when free pages of the required order are sufficient[1]. You’ve revealed
> another aspect: high-order pages may already exist, but reclamation
> doesn’t push them out in time.
>
Yes.

> > reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
> > semantics and can overshoot the intended reclaim amount.
> > We currently use OEM hooks [2] to early-exit or bypass reclaim under
> > some conditions
> >
> > Q3: High reclaim cost and long uninterruptible sleep on lower-end
> > devices
> > On lower-end devices, reclaim cost and latency are harder to control:
> > throttle_direct_reclaim can make tasks wait for kswapd instead of
> > doing direct reclaim;
> > sometimes the target generations in many memcgs have very few
> > reclaimable
> > pages, so the CPU spends time scanning with little progress.
> > We observe tasks staying in uninterruptible sleep in try_to_free_pages()
> > We haven't find any proper ways to fix it.
> 
> Have you identified the exact line of code where direct reclaim enters
> uninterruptible sleep? Is it waiting on a lock or something else?
> 
Yes, we’ve identified the exact code locations where this happens:

in slow path

static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
					nodemask_t *nodemask)
{
...
	if (!(gfp_mask & __GFP_FS))
		wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
			allow_direct_reclaim(pgdat), HZ);
	else
		/* Throttle until kswapd wakes the process */
		wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
			allow_direct_reclaim(pgdat));
...
}

in kswapd

static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order,
				int highest_zoneidx)
{
...
	if (waitqueue_active(&pgdat->pfmemalloc_wait))
		wake_up_all(&pgdat->pfmemalloc_wait);
...
}

> >
> > Q4: Lack of global hot/cold + priority view with per-app memcg
> > Android uses a per-app memcg model and foreground/background levels
> > for resource control. root reclaim lacks a cross-memcg hot/cold and
> > priority view;
> > foreground app file pages may be reclaimed and reloaded frequently,
> > causing visible stalls;
> > We currently use a hook [3] to skip reclaim for foreground apps.
> 
> Interesting. This somehow reflects that the LRU lacks the user’s
> context, especially on Android systems—for example, which apps are in
> the foreground, which are in the background, and how long an app has
> been in the background.
> 
> But this is not specific to MGLRU; it can also be an issue for the
> active/inactive LRU?
> 
That's right, this affects both MGLRU and the traditional LRU.
We believe this comes from a semantic gap between the kernel and Android
(e.g. fg/bg, per-app priorities), and this is one of the main topics we’d like to discuss.
Additionally, even with MGLRU’s memcg-LRU, this issue is still not fully resolved
in our workloads.

> 
> Additionally, I’d like to add Q5 based on my observations:
> Q5:
> MGLRU places readahead folios in the newest generation. For example, if
> a page fault occurs at address 5, readahead fetches addresses 1–16, and
> all 16 folios are put in the youngest generation, even though many may
> not be needed. This can seriously impact reclamation performance, as
> these cold readahead folios occupy active slots.
> 
> See the code below and the checks performed by lru_gen_in_fault().
> 
> void folio_add_lru(struct folio *folio)
> {
>         ...
>         /* see the comment in lru_gen_folio_seq() */
>         if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
>             lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
>                 folio_set_active(folio);
> 
>         folio_batch_add_and_move(folio, lru_add);
> }
> EXPORT_SYMBOL(folio_add_lru);
> 
> I could have submitted a patchset to address this by initially marking
> only address 5 as active, and activating the other addresses later when
> they are actually mapped or accessed.
> 
This sounds very reasonable to us, and we look forward to discussing
and evaluating this direction together.

> >
> > Discussion
> >
> > - Vendor-only workarounds -> generic mechanisms (Q1-Q4)
> > Our current fixes (userspace memcg aging [1], OEM reclaim hooks
> > [2,3]) are Android/vendor-only—what parts should be turned into
> > generic MGLRU/kernel mechanisms vs. kept as Android policy?
> > We need guidance from community.
> >
> > - How much control should MGLRU expose to Android? (Q1-Q3)
> > For Q1/Q2, Android has strong fg/bg and priority semantics that
> > the kernel does not see. Should MGLRU provide more explicit control
> > points (e.g. anon-vs-file / generation steering,
> > "target amount + abort condition" memcg reclaim) so Android can
> > safely trade complexity and risk for better performance and bounded
> > reclaim latency (Q3)?
> >
> > - MGLRU evolution without memcg LRU: global hot/cold & scanning (Q4)
> > If memcg LRU will be removed [4], how should we maintain a cross-memcg
> > global hot/cold view and per-app priority on Android?
> > Given that much of the power benefit seems to come from page-table
> > scanning while generations are complex, is it reasonable to decouple
> > page-scanning functionality from MGLRU and make it a seperate kernel
> > configuration.
> >
> > We are happy to share more detailed data and experiments and to help
> > with PoCs and large-scale validation if there is interest in
> > pursuing these directions.
> 
> This is very welcome.
> 
> [1] https://lore.kernel.org/linux-mm/20251013101636.69220-1-
> 21cnbao@gmail.com/
> 
> Best Regards
> Barry

Best regards,
Zicheng

next prev parent reply	other threads:[~2026-02-25 10:43 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  3:17 wangzicheng
2026-02-24 17:10 ` Suren Baghdasaryan
2026-02-25 10:46   ` wangzicheng
2026-02-24 20:23 ` Barry Song
2026-02-25 10:43   ` wangzicheng [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-02-14 10:06 wangzicheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7ecdf6b68ace402f90a4685684bbe995@honor.com \
    --to=wangzicheng@honor.com \
    --cc=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=gaoxu2@honor.com \
    --cc=kasong@tencent.com \
    --cc=linkunli@honor.com \
    --cc=linux-mm@kvack.org \
    --cc=liulu.liu@honor.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rdunlap@infradead.org \
    --cc=surenb@google.com \
    --cc=tao.wangtao@honor.com \
    --cc=wangxin23@honor.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yangxuzhe@honor.com \
    --cc=yuanchu@google.com \
    --cc=zhouxiaolong9@honor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox