linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Yuanchu Xie <yuanchu@google.com>
To: Henry Huang <henry.hj@antgroup.com>
Cc: rientjes@google.com, akpm@linux-foundation.org,
	谈鉴锋 <henry.tjf@antgroup.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	"朱辉(茶水)" <teawater@antgroup.com>,
	yuzhao@google.com
Subject: Re: [RFC v2] mm: Multi-Gen LRU: fix use mm/page_idle/bitmap
Date: Wed, 10 Jan 2024 11:24:02 -0800	[thread overview]
Message-ID: <CAJj2-QG3jJcA=71n5imx+OjhMapPMN-1bfT5XQRjswxOPG9MvA@mail.gmail.com> (raw)
In-Reply-To: <20231222154037.62823-1-henry.hj@antgroup.com>

On Fri, Dec 22, 2023 at 7:40 AM Henry Huang <henry.hj@antgroup.com> wrote:
> > - are pages ever shared between different memcg hierarchies?  You
> >   mentioned sharing between processes in A and A/B, but I'm wondering
> >   if there is sharing between two different memcg hierarchies where root
> >   is the only common ancestor?
>
> Yes, there is a another really common case:
> If docker graph driver is overlayfs, different docker containers use the
> same image, or share same low layers, would share file cache of public bin or
> lib(i.e libc.so).
Does this present a problem with setting memcg limits or OOMs? It
seems like deterministically charging shared pages would be highly
desirable. Mina Almasry previously proposed a memcg= mount option to
implement deterministic charging[1], but it wasn't a generic sharing
mechanism. Nonetheless, the problem remains, and it would be
interesting to learn if this presents any issues for you.

[1] https://lore.kernel.org/linux-mm/20211120045011.3074840-1-almasrymina@google.com/
>
> > - do you anticipate a shorter scan period at some point?  Proactively
> >   reclaiming all memory colder than one hour is a long time :)  Are you
> >   concerned at all about the cost of doing your current idle bit
> >   harvesting approach becoming too expensive if you significantly reduce
> >   the scan period?
>
> We don't want the owner of the application to feel a significant
> performance downgrade when using swap. There is a high risk to reclaim pages
> which idle age are less than 1 hour. We have internal test and
> data analysis to support it.
>
> We disabled global swappiness and memcg swapinness.
> Only proactive reclaim can swap anon pages.
>
> What's more, we see that mglru has a more efficient way to scan pte access bit.
> We perferred to use mglru scan help us scan and select idle pages.
I'm working on a kernel driver/per-memcg interface to perform aging
with MGLRU, including configuration for the MGLRU page scanning
optimizations. I suspect scanning the PTE accessed bits for pages
charged to a foreign memcg ad-hoc has some performance implications,
and the more general solution is to charge in a predetermined way,
which makes the scanning on behalf of the foreign memcg a bit cleaner.
This is possible nonetheless, but a bit hacky. Let me know you have
any ideas.
>
> > - is proactive reclaim being driven by writing to memory.reclaim, by
> >   enforcing a smaller memory.high, or something else?
>
> Because all pages info and idle age are stored in userspace, kernel can't get
> these information directly. We have a private patch include a new reclaim interface
> to support reclaim pages with specific pfns.
Thanks for sharing! It's been enlightening to learn about different
prod environments.


  reply	other threads:[~2024-01-10 19:24 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-06 12:50 Henry Huang
2023-12-06 12:50 ` Henry Huang
2023-12-07  1:30   ` Yu Zhao
2023-12-08  7:12     ` Henry Huang
2023-12-15  6:46       ` Yu Zhao
2023-12-15 10:53         ` Henry Huang
2023-12-16 21:06           ` Yu Zhao
2023-12-17  6:59             ` Henry Huang
2023-12-21 23:15           ` Yuanchu Xie
2023-12-22  2:44             ` Henry Huang
2023-12-22  4:35               ` Yu Zhao
2023-12-22  5:14                 ` David Rientjes
2023-12-22 15:40                   ` Henry Huang
2024-01-10 19:24                     ` Yuanchu Xie [this message]
2024-01-12  4:40                       ` Henry Huang
2023-12-15  7:23   ` Yu Zhao
2023-12-15 12:44     ` Henry Huang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJj2-QG3jJcA=71n5imx+OjhMapPMN-1bfT5XQRjswxOPG9MvA@mail.gmail.com' \
    --to=yuanchu@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=henry.hj@antgroup.com \
    --cc=henry.tjf@antgroup.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rientjes@google.com \
    --cc=teawater@antgroup.com \
    --cc=yuzhao@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox