[LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: wangzicheng <wangzicheng@honor.com>
To: "lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Cc: wangxin 00023513 <wangxin23@honor.com>, gao xu <gaoxu2@honor.com>,
	wangtao <tao.wangtao@honor.com>,
	liulu 00013167 <liulu.liu@honor.com>,
	zhouxiaolong <zhouxiaolong9@honor.com>,
	linkunli <linkunli@honor.com>
Subject: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
Date: Sat, 14 Feb 2026 10:06:04 +0000	[thread overview]
Message-ID: <cb0c0a0bfc7247cf85858eecf0db6eca@honor.com> (raw)

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 4146 bytes --]

Hi,

MGLRU has been available on Android for about four years, but many
OEM vendors still choose not to enable it in production.
HONOR is a major Android OEM shipping tens of millions of devices
per year, and we run MGLRU on all our devices across multiple kernel
versions (5.15~6.12) and RAM configurations(4G~24G), backed by
large-scale beta and field data. From this deployment, we have identified
four concrete issues (Q1¨CQ4) and current workarounds, and would like to
work with the community to design upstream solutions. 
Also we would like to discuss MGLRU¡¯s future direction on Android.

Below is a short summary of what we see.

Q1: anon/file imbalance and drop in available memory
Android apps workload show a persistent anon/file generational
imbalance under MGLRU:
anon pages tend to stay in the youngest 2 generations;
file pages are spread across multiple generations and over-reclaimed.
Tuning swappiness to 200 and ANON_ONLY does not fully fix this.
On a 16G media workload we see:
MGLRU: MemAvailable ¡Ö 6060 MB
legacy: MemAvailable ¡Ö 6982 MB (differs by ~1G)
Today we mitigate this via explicit memcg aging in Android
userspace [1], which is a vendor-only workaround.

Q2: Hard to control reclaim amount and stopping conditions (memcg)
For memcg reclaim it is hard to stop near a target reclaim amount:
kswapd can continue reclaiming even after watermarks are met
(e.g. to satisfy higher-order or memcg allocations);
reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
semantics and can overshoot the intended reclaim amount.
We currently use OEM hooks [2] to early-exit or bypass reclaim under
some conditions

Q3: High reclaim cost and long uninterruptible sleep on lower-end
devices
On lower-end devices, reclaim cost and latency are harder to control:
throttle_direct_reclaim can make tasks wait for kswapd instead of
doing direct reclaim;
sometimes the target generations in many memcgs have very few reclaimable
pages, so the CPU spends time scanning with little progress.
We observe tasks staying in uninterruptible sleep in try_to_free_pages()
We haven't find any proper ways to fix it.

Q4: Lack of global hot/cold + priority view with per-app memcg
Android uses a per-app memcg model and foreground/background levels
for resource control. root reclaim lacks a cross-memcg hot/cold and
priority view;
foreground app file pages may be reclaimed and reloaded frequently,
causing visible stalls;
We currently use a hook [3] to skip reclaim for foreground apps.

Discussion

- Vendor-only workarounds ¡ú generic mechanisms (Q1¨CQ4)
Our current fixes (userspace memcg aging [1], OEM reclaim hooks
[2,3]) are Android/vendor-only¡ªwhat parts should be turned into
generic MGLRU/kernel mechanisms vs. kept as Android policy?
We need guidance from community.

- How much control should MGLRU expose to Android? (Q1¨CQ3)
For Q1/Q2, Android has strong fg/bg and priority semantics that
the kernel does not see. Should MGLRU provide more explicit control
points (e.g. anon-vs-file / generation steering, 
"target amount + abort condition" memcg reclaim) so Android can
safely trade complexity and risk for better performance and bounded
reclaim latency (Q3)?

- MGLRU evolution without memcg LRU: global hot/cold & scanning (Q4)
If memcg LRU will be removed [4], how should we maintain a cross-memcg
global hot/cold view and per-app priority on Android?
Given that much of the power benefit seems to come from page-table
scanning while generations are complex, is it reasonable to decouple
page-scanning functionality from MGLRU and make it a seperate kernel
configuration.

We are happy to share more detailed data and experiments and to help
with PoCs and large-scale validation if there is interest in
pursuing these directions.

Reference
[1] https://lore.kernel.org/linux-mm/20251128025315.3520689-1-wangzicheng@honor.com/
[2] https://android-review.googlesource.com/c/kernel/common/+/3866554
[3] https://android-review.googlesource.com/c/kernel/common/+/3870920
[4] https://lwn.net/Articles/1051882/

--
Best,
Zicheng Wang

                 reply	other threads:[~2026-02-14 10:06 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cb0c0a0bfc7247cf85858eecf0db6eca@honor.com \
    --to=wangzicheng@honor.com \
    --cc=gaoxu2@honor.com \
    --cc=linkunli@honor.com \
    --cc=linux-mm@kvack.org \
    --cc=liulu.liu@honor.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=tao.wangtao@honor.com \
    --cc=wangxin23@honor.com \
    --cc=zhouxiaolong9@honor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox