linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Barry Song <21cnbao@gmail.com>
To: wangzicheng <wangzicheng@honor.com>
Cc: "lsf-pc@lists.linux-foundation.org"
	<lsf-pc@lists.linux-foundation.org>,
	 "linux-mm@kvack.org" <linux-mm@kvack.org>,
	wangxin 00023513 <wangxin23@honor.com>, gao xu <gaoxu2@honor.com>,
	 wangtao <tao.wangtao@honor.com>,
	liulu 00013167 <liulu.liu@honor.com>,
	 zhouxiaolong <zhouxiaolong9@honor.com>,
	linkunli <linkunli@honor.com>,
	 "kasong@tencent.com" <kasong@tencent.com>,
	 "akpm@linux-foundation.org" <akpm@linux-foundation.org>,
	 "axelrasmussen@google.com" <axelrasmussen@google.com>,
	"yuanchu@google.com" <yuanchu@google.com>,
	 "weixugc@google.com" <weixugc@google.com>,
	Randy Dunlap <rdunlap@infradead.org>,
	 "Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"willy@infradead.org" <willy@infradead.org>,
	 "surenb@google.com" <surenb@google.com>,
	yangxuzhe 00017436 <yangxuzhe@honor.com>
Subject: Re: [LSF/MM/BPF TOPIC] MGLRU on Android: Real-World Problems and Challenges
Date: Thu, 26 Feb 2026 16:03:08 +0800	[thread overview]
Message-ID: <CAGsJ_4zPYpTDY4U0wSGhoa9dVHus_FChXaaD45H170TSXJ+RvQ@mail.gmail.com> (raw)
In-Reply-To: <7ecdf6b68ace402f90a4685684bbe995@honor.com>

On Wed, Feb 25, 2026 at 6:43 PM wangzicheng <wangzicheng@honor.com> wrote:
[...]
> > > reclaim via try_to_free_mem_cgroup_pages() lacks clear abort
> > > semantics and can overshoot the intended reclaim amount.
> > > We currently use OEM hooks [2] to early-exit or bypass reclaim under
> > > some conditions
> > >
> > > Q3: High reclaim cost and long uninterruptible sleep on lower-end
> > > devices
> > > On lower-end devices, reclaim cost and latency are harder to control:
> > > throttle_direct_reclaim can make tasks wait for kswapd instead of
> > > doing direct reclaim;
> > > sometimes the target generations in many memcgs have very few
> > > reclaimable
> > > pages, so the CPU spends time scanning with little progress.
> > > We observe tasks staying in uninterruptible sleep in try_to_free_pages()
> > > We haven't find any proper ways to fix it.
> >
> > Have you identified the exact line of code where direct reclaim enters
> > uninterruptible sleep? Is it waiting on a lock or something else?
> >
> Yes, we’ve identified the exact code locations where this happens:
>
> in slow path
>
> static bool throttle_direct_reclaim(gfp_t gfp_mask, struct zonelist *zonelist,
>                                         nodemask_t *nodemask)
> {
> ...
>         if (!(gfp_mask & __GFP_FS))
>                 wait_event_interruptible_timeout(pgdat->pfmemalloc_wait,
>                         allow_direct_reclaim(pgdat), HZ);
>         else
>                 /* Throttle until kswapd wakes the process */
>                 wait_event_killable(zone->zone_pgdat->pfmemalloc_wait,
>                         allow_direct_reclaim(pgdat));
> ...
> }
>
> in kswapd
>
> static bool prepare_kswapd_sleep(pg_data_t *pgdat, int order,
>                                 int highest_zoneidx)
> {
> ...
>         if (waitqueue_active(&pgdat->pfmemalloc_wait))
>                 wake_up_all(&pgdat->pfmemalloc_wait);
> ...
> }

Thanks. I understand it could be problematic if throttling occurs,
especially on threads related to user experience.

>
> > >
> > > Q4: Lack of global hot/cold + priority view with per-app memcg
> > > Android uses a per-app memcg model and foreground/background levels
> > > for resource control. root reclaim lacks a cross-memcg hot/cold and
> > > priority view;
> > > foreground app file pages may be reclaimed and reloaded frequently,
> > > causing visible stalls;
> > > We currently use a hook [3] to skip reclaim for foreground apps.
> >
> > Interesting. This somehow reflects that the LRU lacks the user’s
> > context, especially on Android systems—for example, which apps are in
> > the foreground, which are in the background, and how long an app has
> > been in the background.
> >
> > But this is not specific to MGLRU; it can also be an issue for the
> > active/inactive LRU?
> >
> That's right, this affects both MGLRU and the traditional LRU.
> We believe this comes from a semantic gap between the kernel and Android
> (e.g. fg/bg, per-app priorities), and this is one of the main topics we’d like to discuss.
> Additionally, even with MGLRU’s memcg-LRU, this issue is still not fully resolved
> in our workloads.

We might be able to leverage some existing infrastructure. MGLRU
maintains an LRU of LRUs, and within this structure, it may be
possible to adjust positions based on whether a cgroup is in the
foreground or background. In other words, the LRU of LRUs could
receive hints from userspace to influence a cgroup’s position.

>
> >
> > Additionally, I’d like to add Q5 based on my observations:
> > Q5:
> > MGLRU places readahead folios in the newest generation. For example, if
> > a page fault occurs at address 5, readahead fetches addresses 1–16, and
> > all 16 folios are put in the youngest generation, even though many may
> > not be needed. This can seriously impact reclamation performance, as
> > these cold readahead folios occupy active slots.
> >
> > See the code below and the checks performed by lru_gen_in_fault().
> >
> > void folio_add_lru(struct folio *folio)
> > {
> >         ...
> >         /* see the comment in lru_gen_folio_seq() */
> >         if (lru_gen_enabled() && !folio_test_unevictable(folio) &&
> >             lru_gen_in_fault() && !(current->flags & PF_MEMALLOC))
> >                 folio_set_active(folio);
> >
> >         folio_batch_add_and_move(folio, lru_add);
> > }
> > EXPORT_SYMBOL(folio_add_lru);
> >
> > I could have submitted a patchset to address this by initially marking
> > only address 5 as active, and activating the other addresses later when
> > they are actually mapped or accessed.
> >
> This sounds very reasonable to us, and we look forward to discussing
> and evaluating this direction together.

I sent an RFC today for Q5. I hope you can review, comment,
and test it together:

https://lore.kernel.org/linux-mm/20260225223712.3685-1-21cnbao@gmail.com/

Thanks
Barry


  reply	other threads:[~2026-02-26  8:03 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-02-24  3:17 wangzicheng
2026-02-24 17:10 ` Suren Baghdasaryan
2026-02-25 10:46   ` wangzicheng
2026-02-26  2:04     ` Kalesh Singh
2026-02-24 20:23 ` Barry Song
2026-02-25 10:43   ` wangzicheng
2026-02-26  8:03     ` Barry Song [this message]
  -- strict thread matches above, loose matches on Subject: below --
2026-02-14 10:06 wangzicheng

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGsJ_4zPYpTDY4U0wSGhoa9dVHus_FChXaaD45H170TSXJ+RvQ@mail.gmail.com \
    --to=21cnbao@gmail.com \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=axelrasmussen@google.com \
    --cc=gaoxu2@honor.com \
    --cc=kasong@tencent.com \
    --cc=linkunli@honor.com \
    --cc=linux-mm@kvack.org \
    --cc=liulu.liu@honor.com \
    --cc=lsf-pc@lists.linux-foundation.org \
    --cc=rdunlap@infradead.org \
    --cc=surenb@google.com \
    --cc=tao.wangtao@honor.com \
    --cc=wangxin23@honor.com \
    --cc=wangzicheng@honor.com \
    --cc=weixugc@google.com \
    --cc=willy@infradead.org \
    --cc=yangxuzhe@honor.com \
    --cc=yuanchu@google.com \
    --cc=zhouxiaolong9@honor.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox