RE: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

* RE: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
@ 2026-02-24  3:17 wangzicheng
  2026-02-24 17:10 ` Suren Baghdasaryan
  2026-02-24 20:23 ` Barry Song
  0 siblings, 2 replies; 4+ messages in thread
From: wangzicheng @ 2026-02-24  3:17 UTC (permalink / raw)
  To: lsf-pc, linux-mm
  Cc: wangxin 00023513, gao xu, wangtao, liulu 00013167, zhouxiaolong,
	linkunli, kasong, 21cnbao, akpm, axelrasmussen, yuanchu, weixugc,
	Randy Dunlap, Liam.Howlett, willy

Hi,

I previously sent a similar email which unfortunately had encoding issues.
I'm resending a cleaned-up version here so it's easier to read and discuss.

MGLRU has been available on Android for about four years, but many
OEM vendors still choose not to enable it in production.
HONOR is a major Android OEM shipping tens of millions of devices
per year, and we run MGLRU on all our devices across multiple kernel
versions (5.15~6.12) and RAM configurations(4G~24G), backed by
large-scale beta and field data. From this deployment, we have identified
four concrete issues (Q1-Q4) and current workarounds, and would like to
work with the community to design upstream solutions.
Also we would like to discuss MGLRU's future direction on Android.

Below is a short summary of what we see.

Q1: anon/file imbalance and drop in available memory
Android apps workload show a persistent anon/file generational
imbalance under MGLRU:
anon pages tend to stay in the youngest 2 generations;
file pages are spread across multiple generations and over-reclaimed.
Tuning swappiness to 200 and ANON_ONLY does not fully fix this.
On a 16G media workload we see:
MGLRU: MemAvailable ~ 6060 MB
legacy: MemAvailable ~ 6982 MB (differs by ~1G)
Today we mitigate this via explicit memcg aging in Android
userspace [1], which is a vendor-only workaround.

Q2: Hard to control reclaim amount and stopping conditions (memcg)
For memcg reclaim it is hard to stop near a target reclaim amount:
kswapd can continue reclaiming even after watermarks are met
(e.g. to satisfy higher-order or memcg allocations);
reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
semantics and can overshoot the intended reclaim amount.
We currently use OEM hooks [2] to early-exit or bypass reclaim under
some conditions

Q3: High reclaim cost and long uninterruptible sleep on lower-end
devices
On lower-end devices, reclaim cost and latency are harder to control:
throttle_direct_reclaim can make tasks wait for kswapd instead of
doing direct reclaim;
sometimes the target generations in many memcgs have very few
reclaimable
pages, so the CPU spends time scanning with little progress.
We observe tasks staying in uninterruptible sleep in try_to_free_pages()
We haven't find any proper ways to fix it.

Q4: Lack of global hot/cold + priority view with per-app memcg
Android uses a per-app memcg model and foreground/background levels
for resource control. root reclaim lacks a cross-memcg hot/cold and
priority view;
foreground app file pages may be reclaimed and reloaded frequently,
causing visible stalls;
We currently use a hook [3] to skip reclaim for foreground apps.

Discussion

- Vendor-only workarounds -> generic mechanisms (Q1-Q4)
Our current fixes (userspace memcg aging [1], OEM reclaim hooks
[2,3]) are Android/vendor-only—what parts should be turned into
generic MGLRU/kernel mechanisms vs. kept as Android policy?
We need guidance from community.

- How much control should MGLRU expose to Android? (Q1-Q3)
For Q1/Q2, Android has strong fg/bg and priority semantics that
the kernel does not see. Should MGLRU provide more explicit control
points (e.g. anon-vs-file / generation steering,
"target amount + abort condition" memcg reclaim) so Android can
safely trade complexity and risk for better performance and bounded
reclaim latency (Q3)?

- MGLRU evolution without memcg LRU: global hot/cold & scanning (Q4)
If memcg LRU will be removed [4], how should we maintain a cross-memcg
global hot/cold view and per-app priority on Android?
Given that much of the power benefit seems to come from page-table
scanning while generations are complex, is it reasonable to decouple
page-scanning functionality from MGLRU and make it a seperate kernel
configuration.

We are happy to share more detailed data and experiments and to help
with PoCs and large-scale validation if there is interest in
pursuing these directions.

Reference
[1] https://lore.kernel.org/linux-mm/20251128025315.3520689-1-wangzicheng@honor.com/
[2] https://android-review.googlesource.com/c/kernel/common/+/3866554
[3] https://android-review.googlesource.com/c/kernel/common/+/3870920
[4] https://lwn.net/Articles/1051882/

--
Best regards, and wishing you a prosperous Year of the Horse,
Zicheng Wang

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
  2026-02-24  3:17 [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges wangzicheng
@ 2026-02-24 17:10 ` Suren Baghdasaryan
  2026-02-24 20:23 ` Barry Song
  1 sibling, 0 replies; 4+ messages in thread
From: Suren Baghdasaryan @ 2026-02-24 17:10 UTC (permalink / raw)
  To: wangzicheng
  Cc: lsf-pc, linux-mm, wangxin 00023513, gao xu, wangtao,
	liulu 00013167, zhouxiaolong, linkunli, kasong, 21cnbao, akpm,
	axelrasmussen, yuanchu, weixugc, Randy Dunlap, Liam.Howlett,
	willy, Kalesh Singh

On Mon, Feb 23, 2026 at 7:17 PM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi,
>
> I previously sent a similar email which unfortunately had encoding issues.
> I'm resending a cleaned-up version here so it's easier to read and discuss.
>
> MGLRU has been available on Android for about four years, but many
> OEM vendors still choose not to enable it in production.
> HONOR is a major Android OEM shipping tens of millions of devices
> per year, and we run MGLRU on all our devices across multiple kernel
> versions (5.15~6.12) and RAM configurations(4G~24G), backed by
> large-scale beta and field data. From this deployment, we have identified
> four concrete issues (Q1-Q4) and current workarounds, and would like to
> work with the community to design upstream solutions.
> Also we would like to discuss MGLRU's future direction on Android.
>
> Below is a short summary of what we see.
>
> Q1: anon/file imbalance and drop in available memory
> Android apps workload show a persistent anon/file generational
> imbalance under MGLRU:
> anon pages tend to stay in the youngest 2 generations;
> file pages are spread across multiple generations and over-reclaimed.
> Tuning swappiness to 200 and ANON_ONLY does not fully fix this.
> On a 16G media workload we see:
> MGLRU: MemAvailable ~ 6060 MB
> legacy: MemAvailable ~ 6982 MB (differs by ~1G)
> Today we mitigate this via explicit memcg aging in Android
> userspace [1], which is a vendor-only workaround.
>
> Q2: Hard to control reclaim amount and stopping conditions (memcg)
> For memcg reclaim it is hard to stop near a target reclaim amount:
> kswapd can continue reclaiming even after watermarks are met
> (e.g. to satisfy higher-order or memcg allocations);
> reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
> semantics and can overshoot the intended reclaim amount.
> We currently use OEM hooks [2] to early-exit or bypass reclaim under
> some conditions
>
> Q3: High reclaim cost and long uninterruptible sleep on lower-end
> devices
> On lower-end devices, reclaim cost and latency are harder to control:
> throttle_direct_reclaim can make tasks wait for kswapd instead of
> doing direct reclaim;
> sometimes the target generations in many memcgs have very few
> reclaimable
> pages, so the CPU spends time scanning with little progress.
> We observe tasks staying in uninterruptible sleep in try_to_free_pages()
> We haven't find any proper ways to fix it.
>
> Q4: Lack of global hot/cold + priority view with per-app memcg
> Android uses a per-app memcg model and foreground/background levels
> for resource control. root reclaim lacks a cross-memcg hot/cold and
> priority view;
> foreground app file pages may be reclaimed and reloaded frequently,
> causing visible stalls;
> We currently use a hook [3] to skip reclaim for foreground apps.
>
> Discussion
>
> - Vendor-only workarounds -> generic mechanisms (Q1-Q4)
> Our current fixes (userspace memcg aging [1], OEM reclaim hooks
> [2,3]) are Android/vendor-only—what parts should be turned into
> generic MGLRU/kernel mechanisms vs. kept as Android policy?
> We need guidance from community.
>
> - How much control should MGLRU expose to Android? (Q1-Q3)
> For Q1/Q2, Android has strong fg/bg and priority semantics that
> the kernel does not see. Should MGLRU provide more explicit control
> points (e.g. anon-vs-file / generation steering,
> "target amount + abort condition" memcg reclaim) so Android can
> safely trade complexity and risk for better performance and bounded
> reclaim latency (Q3)?
>
> - MGLRU evolution without memcg LRU: global hot/cold & scanning (Q4)
> If memcg LRU will be removed [4], how should we maintain a cross-memcg
> global hot/cold view and per-app priority on Android?
> Given that much of the power benefit seems to come from page-table
> scanning while generations are complex, is it reasonable to decouple
> page-scanning functionality from MGLRU and make it a seperate kernel
> configuration.
>
> We are happy to share more detailed data and experiments and to help
> with PoCs and large-scale validation if there is interest in
> pursuing these directions.

For obvious reasons I'm interested in this discussion. We also notice
some shortcomings of MGLRU and would like to collaborate on resolving
them.
Thanks,
Suren.

>
> Reference
> [1] https://lore.kernel.org/linux-mm/20251128025315.3520689-1-wangzicheng@honor.com/
> [2] https://android-review.googlesource.com/c/kernel/common/+/3866554
> [3] https://android-review.googlesource.com/c/kernel/common/+/3870920
> [4] https://lwn.net/Articles/1051882/
>
> --
> Best regards, and wishing you a prosperous Year of the Horse,
> Zicheng Wang


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
  2026-02-24  3:17 [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges wangzicheng
  2026-02-24 17:10 ` Suren Baghdasaryan
@ 2026-02-24 20:23 ` Barry Song
  1 sibling, 0 replies; 4+ messages in thread
From: Barry Song @ 2026-02-24 20:23 UTC (permalink / raw)
  To: wangzicheng
  Cc: lsf-pc, linux-mm, wangxin 00023513, gao xu, wangtao,
	liulu 00013167, zhouxiaolong, linkunli, kasong, akpm,
	axelrasmussen, yuanchu, weixugc, Randy Dunlap, Liam.Howlett,
	willy

On Tue, Feb 24, 2026 at 11:17 AM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi,
>
> I previously sent a similar email which unfortunately had encoding issues.
> I'm resending a cleaned-up version here so it's easier to read and discuss.
>
> MGLRU has been available on Android for about four years, but many
> OEM vendors still choose not to enable it in production.
> HONOR is a major Android OEM shipping tens of millions of devices
> per year, and we run MGLRU on all our devices across multiple kernel
> versions (5.15~6.12) and RAM configurations(4G~24G), backed by
> large-scale beta and field data. From this deployment, we have identified
> four concrete issues (Q1-Q4) and current workarounds, and would like to
> work with the community to design upstream solutions.
> Also we would like to discuss MGLRU's future direction on Android.
>
> Below is a short summary of what we see.
>
> Q1: anon/file imbalance and drop in available memory
> Android apps workload show a persistent anon/file generational
> imbalance under MGLRU:
> anon pages tend to stay in the youngest 2 generations;
> file pages are spread across multiple generations and over-reclaimed.
> Tuning swappiness to 200 and ANON_ONLY does not fully fix this.
> On a 16G media workload we see:
> MGLRU: MemAvailable ~ 6060 MB
> legacy: MemAvailable ~ 6982 MB (differs by ~1G)
> Today we mitigate this via explicit memcg aging in Android
> userspace [1], which is a vendor-only workaround.

One fundamental design of MGLRU is that file generations and anon
generations catch up with each other when the generation gap reaches
two or more. As a result, even if swappiness is set very high, its
effect on aggressively reclaiming anonymous pages is much smaller
than with the traditional LRU.

One workaround is to force old file folios to be promoted to relatively
younger generations, but this could also cause problems by clustering
file folios in the newer generations.

I wonder if anon and file generations can progress separately to some
extent.

>
> Q2: Hard to control reclaim amount and stopping conditions (memcg)
> For memcg reclaim it is hard to stop near a target reclaim amount:
> kswapd can continue reclaiming even after watermarks are met
> (e.g. to satisfy higher-order or memcg allocations);

High-order is an interesting topic. Sometimes, vmscan over-reclaims to
satisfy high-order allocations, reclaiming many zero-order pages even
when free pages of the required order are sufficient[1]. You’ve revealed
another aspect: high-order pages may already exist, but reclamation
doesn’t push them out in time.

> reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
> semantics and can overshoot the intended reclaim amount.
> We currently use OEM hooks [2] to early-exit or bypass reclaim under
> some conditions
>
> Q3: High reclaim cost and long uninterruptible sleep on lower-end
> devices
> On lower-end devices, reclaim cost and latency are harder to control:
> throttle_direct_reclaim can make tasks wait for kswapd instead of
> doing direct reclaim;
> sometimes the target generations in many memcgs have very few
> reclaimable
> pages, so the CPU spends time scanning with little progress.
> We observe tasks staying in uninterruptible sleep in try_to_free_pages()
> We haven't find any proper ways to fix it.

Have you identified the exact line of code where direct reclaim enters
uninterruptible sleep? Is it waiting on a lock or something else?

>
> Q4: Lack of global hot/cold + priority view with per-app memcg
> Android uses a per-app memcg model and foreground/background levels
> for resource control. root reclaim lacks a cross-memcg hot/cold and
> priority view;
> foreground app file pages may be reclaimed and reloaded frequently,
> causing visible stalls;
> We currently use a hook [3] to skip reclaim for foreground apps.

Interesting. This somehow reflects that the LRU lacks the user’s
context, especially on Android systems—for example, which apps are in
the foreground, which are in the background, and how long an app has
been in the background.

But this is not specific to MGLRU; it can also be an issue for the
active/inactive LRU?


Additionally, I’d like to add Q5 based on my observations:
Q5:
MGLRU places readahead folios in the newest generation. For example, if
a page fault occurs at address 5, readahead fetches addresses 1–16, and
all 16 folios are put in the youngest generation, even though many may
not be needed. This can seriously impact reclamation performance, as
these cold readahead folios occupy active slots.

See the code below and the checks performed by lru_gen_in_fault().

void folio_add_lru(struct folio *folio)
{
        ...
        /* see the comment in lru_gen_folio_seq() */
        if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
            lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
                folio_set_active(folio);

        folio_batch_add_and_move(folio, lru_add);
}
EXPORT_SYMBOL(folio_add_lru);

I could have submitted a patchset to address this by initially marking
only address 5 as active, and activating the other addresses later when
they are actually mapped or accessed.

>
> Discussion
>
> - Vendor-only workarounds -> generic mechanisms (Q1-Q4)
> Our current fixes (userspace memcg aging [1], OEM reclaim hooks
> [2,3]) are Android/vendor-only—what parts should be turned into
> generic MGLRU/kernel mechanisms vs. kept as Android policy?
> We need guidance from community.
>
> - How much control should MGLRU expose to Android? (Q1-Q3)
> For Q1/Q2, Android has strong fg/bg and priority semantics that
> the kernel does not see. Should MGLRU provide more explicit control
> points (e.g. anon-vs-file / generation steering,
> "target amount + abort condition" memcg reclaim) so Android can
> safely trade complexity and risk for better performance and bounded
> reclaim latency (Q3)?
>
> - MGLRU evolution without memcg LRU: global hot/cold & scanning (Q4)
> If memcg LRU will be removed [4], how should we maintain a cross-memcg
> global hot/cold view and per-app priority on Android?
> Given that much of the power benefit seems to come from page-table
> scanning while generations are complex, is it reasonable to decouple
> page-scanning functionality from MGLRU and make it a seperate kernel
> configuration.
>
> We are happy to share more detailed data and experiments and to help
> with PoCs and large-scale validation if there is interest in
> pursuing these directions.

This is very welcome.

[1] https://lore.kernel.org/linux-mm/20251013101636.69220-1-21cnbao@gmail.com/

Best Regards
Barry


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
@ 2026-02-14 10:06 wangzicheng
  0 siblings, 0 replies; 4+ messages in thread
From: wangzicheng @ 2026-02-14 10:06 UTC (permalink / raw)
  To: lsf-pc, linux-mm
  Cc: wangxin 00023513, gao xu, wangtao, liulu 00013167, zhouxiaolong,
	linkunli

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="gb2312", Size: 4146 bytes --]

Hi,

MGLRU has been available on Android for about four years, but many
OEM vendors still choose not to enable it in production.
HONOR is a major Android OEM shipping tens of millions of devices
per year, and we run MGLRU on all our devices across multiple kernel
versions (5.15~6.12) and RAM configurations(4G~24G), backed by
large-scale beta and field data. From this deployment, we have identified
four concrete issues (Q1¨CQ4) and current workarounds, and would like to
work with the community to design upstream solutions. 
Also we would like to discuss MGLRU¡¯s future direction on Android.

Below is a short summary of what we see.

Q1: anon/file imbalance and drop in available memory
Android apps workload show a persistent anon/file generational
imbalance under MGLRU:
anon pages tend to stay in the youngest 2 generations;
file pages are spread across multiple generations and over-reclaimed.
Tuning swappiness to 200 and ANON_ONLY does not fully fix this.
On a 16G media workload we see:
MGLRU: MemAvailable ¡Ö 6060 MB
legacy: MemAvailable ¡Ö 6982 MB (differs by ~1G)
Today we mitigate this via explicit memcg aging in Android
userspace [1], which is a vendor-only workaround.

Q2: Hard to control reclaim amount and stopping conditions (memcg)
For memcg reclaim it is hard to stop near a target reclaim amount:
kswapd can continue reclaiming even after watermarks are met
(e.g. to satisfy higher-order or memcg allocations);
reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
semantics and can overshoot the intended reclaim amount.
We currently use OEM hooks [2] to early-exit or bypass reclaim under
some conditions

Q3: High reclaim cost and long uninterruptible sleep on lower-end
devices
On lower-end devices, reclaim cost and latency are harder to control:
throttle_direct_reclaim can make tasks wait for kswapd instead of
doing direct reclaim;
sometimes the target generations in many memcgs have very few reclaimable
pages, so the CPU spends time scanning with little progress.
We observe tasks staying in uninterruptible sleep in try_to_free_pages()
We haven't find any proper ways to fix it.

Q4: Lack of global hot/cold + priority view with per-app memcg
Android uses a per-app memcg model and foreground/background levels
for resource control. root reclaim lacks a cross-memcg hot/cold and
priority view;
foreground app file pages may be reclaimed and reloaded frequently,
causing visible stalls;
We currently use a hook [3] to skip reclaim for foreground apps.

Discussion

- Vendor-only workarounds ¡ú generic mechanisms (Q1¨CQ4)
Our current fixes (userspace memcg aging [1], OEM reclaim hooks
[2,3]) are Android/vendor-only¡ªwhat parts should be turned into
generic MGLRU/kernel mechanisms vs. kept as Android policy?
We need guidance from community.

- How much control should MGLRU expose to Android? (Q1¨CQ3)
For Q1/Q2, Android has strong fg/bg and priority semantics that
the kernel does not see. Should MGLRU provide more explicit control
points (e.g. anon-vs-file / generation steering, 
"target amount + abort condition" memcg reclaim) so Android can
safely trade complexity and risk for better performance and bounded
reclaim latency (Q3)?

- MGLRU evolution without memcg LRU: global hot/cold & scanning (Q4)
If memcg LRU will be removed [4], how should we maintain a cross-memcg
global hot/cold view and per-app priority on Android?
Given that much of the power benefit seems to come from page-table
scanning while generations are complex, is it reasonable to decouple
page-scanning functionality from MGLRU and make it a seperate kernel
configuration.

We are happy to share more detailed data and experiments and to help
with PoCs and large-scale validation if there is interest in
pursuing these directions.

Reference
[1] https://lore.kernel.org/linux-mm/20251128025315.3520689-1-wangzicheng@honor.com/
[2] https://android-review.googlesource.com/c/kernel/common/+/3866554
[3] https://android-review.googlesource.com/c/kernel/common/+/3870920
[4] https://lwn.net/Articles/1051882/

--
Best,
Zicheng Wang

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2026-02-24 20:23 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2026-02-24  3:17 [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges wangzicheng
2026-02-24 17:10 ` Suren Baghdasaryan
2026-02-24 20:23 ` Barry Song
  -- strict thread matches above, loose matches on Subject: below --
2026-02-14 10:06 wangzicheng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox