From: wangzicheng <wangzicheng@honor.com>
To: Barry Song <21cnbao@gmail.com>
Cc: "lsf-pc@lists.linux-foundation.org"
<lsf-pc@lists.linux-foundation.org>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
wangxin 00023513 <wangxin23@honor.com>, gao xu <gaoxu2@honor.com>,
wangtao <tao.wangtao@honor.com>,
liulu 00013167 <liulu.liu@honor.com>,
zhouxiaolong <zhouxiaolong9@honor.com>,
linkunli <linkunli@honor.com>,
"kasong@tencent.com" <kasong@tencent.com>,
"akpm@linux-foundation.org" <akpm@linux-foundation.org>,
"axelrasmussen@google.com" <axelrasmussen@google.com>,
"yuanchu@google.com" <yuanchu@google.com>,
"weixugc@google.com" <weixugc@google.com>,
Randy Dunlap <rdunlap@infradead.org>,
"Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
"willy@infradead.org" <willy@infradead.org>,
"surenb@google.com" <surenb@google.com>,
yangxuzhe 00017436 <yangxuzhe@honor.com>,
Kalesh Singh <kaleshsingh@google.com>,
android-mm <android-mm@google.com>
Subject: RE: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
Date: Thu, 26 Feb 2026 13:29:40 +0000 [thread overview]
Message-ID: <d91b347b9639488580182a9032ba1f2f@honor.com> (raw)
In-Reply-To: <CAGsJ_4zPYpTDY4U0wSGhoa9dVHus_FChXaaD45H170TSXJ+RvQ@mail.gmail.com>
> > > > reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
> > > > semantics and can overshoot the intended reclaim amount.
> > > > We currently use OEM hooks [2] to early-exit or bypass reclaim under
> > > > some conditions
> > > >
> > > > Q3: High reclaim cost and long uninterruptible sleep on lower-end
> > > > devices
> > > > On lower-end devices, reclaim cost and latency are harder to control:
> > > > throttle_direct_reclaim can make tasks wait for kswapd instead of
> > > > doing direct reclaim;
> > > > sometimes the target generations in many memcgs have very few
> > > > reclaimable
> > > > pages, so the CPU spends time scanning with little progress.
> > > > We observe tasks staying in uninterruptible sleep in
> try_to_free_pages()
> > > > We haven't find any proper ways to fix it.
> > >
> > > Have you identified the exact line of code where direct reclaim enters
> > > uninterruptible sleep? Is it waiting on a lock or something else?
> > >
> > Yes, we’ve identified the exact code locations where this happens:
> >
> > in slow path
> >
> > static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
> > nodemask_t *nodemask)
> > {
> > ...
> > if (!(gfp_mask & __GFP_FS))
> > wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
> > allow_direct_reclaim(pgdat), HZ);
> > else
> > /* Throttle until kswapd wakes the process */
> > wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
> > allow_direct_reclaim(pgdat));
> > ...
> > }
> >
> > in kswapd
> >
> > static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order,
> > int highest_zoneidx)
> > {
> > ...
> > if (waitqueue_active(&pgdat->pfmemalloc_wait))
> > wake_up_all(&pgdat->pfmemalloc_wait);
> > ...
> > }
>
> Thanks. I understand it could be problematic if throttling occurs,
> especially on threads related to user experience.
>
Thank you for the detailed following up.
For Q3, the throttling is dangerous for UX‑critical threads. Kalesh also shared similar
observations about long direct reclaim tail latencies.
> >
> > > >
> > > > Q4: Lack of global hot/cold + priority view with per-app memcg
> > > > Android uses a per-app memcg model and foreground/background
> levels
> > > > for resource control. root reclaim lacks a cross-memcg hot/cold and
> > > > priority view;
> > > > foreground app file pages may be reclaimed and reloaded frequently,
> > > > causing visible stalls;
> > > > We currently use a hook [3] to skip reclaim for foreground apps.
> > >
> > > Interesting. This somehow reflects that the LRU lacks the user’s
> > > context, especially on Android systems—for example, which apps are in
> > > the foreground, which are in the background, and how long an app has
> > > been in the background.
> > >
> > > But this is not specific to MGLRU; it can also be an issue for the
> > > active/inactive LRU?
> > >
> > That's right, this affects both MGLRU and the traditional LRU.
> > We believe this comes from a semantic gap between the kernel and
> Android
> > (e.g. fg/bg, per-app priorities), and this is one of the main topics we’d like
> to discuss.
> > Additionally, even with MGLRU’s memcg-LRU, this issue is still not fully
> resolved
> > in our workloads.
>
> We might be able to leverage some existing infrastructure. MGLRU
> maintains an LRU of LRUs, and within this structure, it may be
> possible to adjust positions based on whether a cgroup is in the
> foreground or background. In other words, the LRU of LRUs could
> receive hints from userspace to influence a cgroup’s position.
>
Regarding the LRU‑of‑LRUs idea, that does sound like a promising direction
(compare to vendor hook).
but it seems hardly support some more complex policies, e.g.,
- reclaim from bg apps first, but capping the reclaim amount and protect the bg 'super apps'
- preserving memcg MGLRU generation info after app frozen and not running.
Looking forward to the discussion.
> >
> > >
> > > Additionally, I’d like to add Q5 based on my observations:
> > > Q5:
> > > MGLRU places readahead folios in the newest generation. For example, if
> > > a page fault occurs at address 5, readahead fetches addresses 1–16, and
> > > all 16 folios are put in the youngest generation, even though many may
> > > not be needed. This can seriously impact reclamation performance, as
> > > these cold readahead folios occupy active slots.
> > >
> > > See the code below and the checks performed by lru_gen_in_fault().
> > >
> > > void folio_add_lru(struct folio *folio)
> > > {
> > > ...
> > > /* see the comment in lru_gen_folio_seq() */
> > > if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
> > > lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
> > > folio_set_active(folio);
> > >
> > > folio_batch_add_and_move(folio, lru_add);
> > > }
> > > EXPORT_SYMBOL(folio_add_lru);
> > >
> > > I could have submitted a patchset to address this by initially marking
> > > only address 5 as active, and activating the other addresses later when
> > > they are actually mapped or accessed.
> > >
> > This sounds very reasonable to us, and we look forward to discussing
> > and evaluating this direction together.
>
> I sent an RFC today for Q5. I hope you can review, comment,
> and test it together:
>
For Q5, we are happy to see your RFC. We’ve already comment to the thread
and will run it on our workloads to provide feedback.
Thanks,
Zicheng
> https://lore.kernel.org/linux-mm/20260225223712.3685-1-
> 21cnbao@gmail.com/
>
> Thanks
> Barry
next prev parent reply other threads:[~2026-02-26 13:29 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-02-24 3:17 wangzicheng
2026-02-24 17:10 ` Suren Baghdasaryan
2026-02-25 10:46 ` wangzicheng
2026-02-26 2:04 ` Kalesh Singh
2026-02-26 13:06 ` wangzicheng
2026-02-24 20:23 ` Barry Song
2026-02-25 10:43 ` wangzicheng
2026-02-26 8:03 ` Barry Song
2026-02-26 13:29 ` wangzicheng [this message]
-- strict thread matches above, loose matches on Subject: below --
2026-02-14 10:06 wangzicheng
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d91b347b9639488580182a9032ba1f2f@honor.com \
--to=wangzicheng@honor.com \
--cc=21cnbao@gmail.com \
--cc=Liam.Howlett@oracle.com \
--cc=akpm@linux-foundation.org \
--cc=android-mm@google.com \
--cc=axelrasmussen@google.com \
--cc=gaoxu2@honor.com \
--cc=kaleshsingh@google.com \
--cc=kasong@tencent.com \
--cc=linkunli@honor.com \
--cc=linux-mm@kvack.org \
--cc=liulu.liu@honor.com \
--cc=lsf-pc@lists.linux-foundation.org \
--cc=rdunlap@infradead.org \
--cc=surenb@google.com \
--cc=tao.wangtao@honor.com \
--cc=wangxin23@honor.com \
--cc=weixugc@google.com \
--cc=willy@infradead.org \
--cc=yangxuzhe@honor.com \
--cc=yuanchu@google.com \
--cc=zhouxiaolong9@honor.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox