Re: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Barry Song <21cnbao@gmail.com>
To: wangzicheng <wangzicheng@honor.com>
Cc: "lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	 "linux-mm@kvack.org" <linux-mm@kvack.org>,
	wangxin 00023513 <wangxin23@honor.com>, gao xu <gaoxu2@honor.com>,
	 wangtao <tao.wangtao@honor.com>,
	liulu 00013167 <liulu.liu@honor.com>,
	 zhouxiaolong <zhouxiaolong9@honor.com>,
	linkunli <linkunli@honor.com>,
	 "kasong@tencent.com" <kasong@tencent.com>,
	 "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	 "axelrasmussen@google.com" <axelrasmussen@google.com>,
	"yuanchu@google.com" <yuanchu@google.com>,
	 "weixugc@google.com" <weixugc@google.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	 "Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"willy@infradead.org" <willy@infradead.org>
Subject: Re: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
Date: Wed, 25 Feb 2026 04:23:38 +0800	[thread overview]
Message-ID: <CAGsJ_4zatnuLkCJyDe_o_yXmhngpc54kV6b7M3tu6JyvF-ZxDw@mail.gmail.com> (raw)
In-Reply-To: <e5c8b5b9e7574dce8e4d4744595123e2@honor.com>

On Tue, Feb 24, 2026 at 11:17 AM wangzicheng <wangzicheng@honor.com> wrote:
>
> Hi,
>
> I previously sent a similar email which unfortunately had encoding issues.
> I'm resending a cleaned-up version here so it's easier to read and discuss.
>
> MGLRU has been available on Android for about four years, but many
> OEM vendors still choose not to enable it in production.
> HONOR is a major Android OEM shipping tens of millions of devices
> per year, and we run MGLRU on all our devices across multiple kernel
> versions (5.15~6.12) and RAM configurations(4G~24G), backed by
> large-scale beta and field data. From this deployment, we have identified
> four concrete issues (Q1-Q4) and current workarounds, and would like to
> work with the community to design upstream solutions.
> Also we would like to discuss MGLRU's future direction on Android.
>
> Below is a short summary of what we see.
>
> Q1: anon/file imbalance and drop in available memory
> Android apps workload show a persistent anon/file generational
> imbalance under MGLRU:
> anon pages tend to stay in the youngest 2 generations;
> file pages are spread across multiple generations and over-reclaimed.
> Tuning swappiness to 200 and ANON_ONLY does not fully fix this.
> On a 16G media workload we see:
> MGLRU: MemAvailable ~ 6060 MB
> legacy: MemAvailable ~ 6982 MB (differs by ~1G)
> Today we mitigate this via explicit memcg aging in Android
> userspace [1], which is a vendor-only workaround.

One fundamental design of MGLRU is that file generations and anon
generations catch up with each other when the generation gap reaches
two or more. As a result, even if swappiness is set very high, its
effect on aggressively reclaiming anonymous pages is much smaller
than with the traditional LRU.

One workaround is to force old file folios to be promoted to relatively
younger generations, but this could also cause problems by clustering
file folios in the newer generations.

I wonder if anon and file generations can progress separately to some
extent.

>
> Q2: Hard to control reclaim amount and stopping conditions (memcg)
> For memcg reclaim it is hard to stop near a target reclaim amount:
> kswapd can continue reclaiming even after watermarks are met
> (e.g. to satisfy higher-order or memcg allocations);

High-order is an interesting topic. Sometimes, vmscan over-reclaims to
satisfy high-order allocations, reclaiming many zero-order pages even
when free pages of the required order are sufficient[1]. You’ve revealed
another aspect: high-order pages may already exist, but reclamation
doesn’t push them out in time.

> reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
> semantics and can overshoot the intended reclaim amount.
> We currently use OEM hooks [2] to early-exit or bypass reclaim under
> some conditions
>
> Q3: High reclaim cost and long uninterruptible sleep on lower-end
> devices
> On lower-end devices, reclaim cost and latency are harder to control:
> throttle_direct_reclaim can make tasks wait for kswapd instead of
> doing direct reclaim;
> sometimes the target generations in many memcgs have very few
> reclaimable
> pages, so the CPU spends time scanning with little progress.
> We observe tasks staying in uninterruptible sleep in try_to_free_pages()
> We haven't find any proper ways to fix it.

Have you identified the exact line of code where direct reclaim enters
uninterruptible sleep? Is it waiting on a lock or something else?

>
> Q4: Lack of global hot/cold + priority view with per-app memcg
> Android uses a per-app memcg model and foreground/background levels
> for resource control. root reclaim lacks a cross-memcg hot/cold and
> priority view;
> foreground app file pages may be reclaimed and reloaded frequently,
> causing visible stalls;
> We currently use a hook [3] to skip reclaim for foreground apps.

Interesting. This somehow reflects that the LRU lacks the user’s
context, especially on Android systems—for example, which apps are in
the foreground, which are in the background, and how long an app has
been in the background.

But this is not specific to MGLRU; it can also be an issue for the
active/inactive LRU?


Additionally, I’d like to add Q5 based on my observations:
Q5:
MGLRU places readahead folios in the newest generation. For example, if
a page fault occurs at address 5, readahead fetches addresses 1–16, and
all 16 folios are put in the youngest generation, even though many may
not be needed. This can seriously impact reclamation performance, as
these cold readahead folios occupy active slots.

See the code below and the checks performed by lru_gen_in_fault().

void folio_add_lru(struct folio *folio)
{
        ...
        /* see the comment in lru_gen_folio_seq() */
        if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
            lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
                folio_set_active(folio);

        folio_batch_add_and_move(folio, lru_add);
}
EXPORT_SYMBOL(folio_add_lru);

I could have submitted a patchset to address this by initially marking
only address 5 as active, and activating the other addresses later when
they are actually mapped or accessed.

>
> Discussion
>
> - Vendor-only workarounds -> generic mechanisms (Q1-Q4)
> Our current fixes (userspace memcg aging [1], OEM reclaim hooks
> [2,3]) are Android/vendor-only—what parts should be turned into
> generic MGLRU/kernel mechanisms vs. kept as Android policy?
> We need guidance from community.
>
> - How much control should MGLRU expose to Android? (Q1-Q3)
> For Q1/Q2, Android has strong fg/bg and priority semantics that
> the kernel does not see. Should MGLRU provide more explicit control
> points (e.g. anon-vs-file / generation steering,
> "target amount + abort condition" memcg reclaim) so Android can
> safely trade complexity and risk for better performance and bounded
> reclaim latency (Q3)?
>
> - MGLRU evolution without memcg LRU: global hot/cold & scanning (Q4)
> If memcg LRU will be removed [4], how should we maintain a cross-memcg
> global hot/cold view and per-app priority on Android?
> Given that much of the power benefit seems to come from page-table
> scanning while generations are complex, is it reasonable to decouple
> page-scanning functionality from MGLRU and make it a seperate kernel
> configuration.
>
> We are happy to share more detailed data and experiments and to help
> with PoCs and large-scale validation if there is interest in
> pursuing these directions.

This is very welcome.

[1] https://lore.kernel.org/linux-mm/20251013101636.69220-1-21cnbao@gmail.com/

Best Regards
Barry

next prev parent reply	other threads:[~2026-02-24 20:23 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  3:17 wangzicheng
2026-02-24 17:10 ` Suren Baghdasaryan
2026-02-24 20:23 ` Barry Song [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-02-14 10:06 wangzicheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4zatnuLkCJyDe_o_yXmhngpc54kV6b7M3tu6JyvF-ZxDw@mail.gmail.com \
    --to=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=gaoxu2@honor.com \
    --cc=kasong@tencent.com \
    --cc=linkunli@honor.com \
    --cc=linux-mm@kvack.org \
    --cc=liulu.liu@honor.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rdunlap@infradead.org \
    --cc=tao.wangtao@honor.com \
    --cc=wangxin23@honor.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yuanchu@google.com \
    --cc=zhouxiaolong9@honor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox