Re: [RFC PATCH 0/2] memcg: add nomlock to avoid folios beling mlocked in a memcg

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Michal Hocko <mhocko@suse.com>
To: Yafang Shao <laoar.shao@gmail.com>
Cc: hannes@cmpxchg.org, roman.gushchin@linux.dev,
	shakeel.butt@linux.dev, muchun.song@linux.dev,
	akpm@linux-foundation.org, linux-mm@kvack.org
Subject: Re: [RFC PATCH 0/2] memcg: add nomlock to avoid folios beling mlocked in a memcg
Date: Mon, 6 Jan 2025 13:28:24 +0100	[thread overview]
Message-ID: <Z3vMaDNdC1_fIVKn@tiehlicka> (raw)
In-Reply-To: <CALOAHbBf7=TenpguNYYkuuFJA4dGGQEO+fJ8FwG8+Sv67Piu_g@mail.gmail.com>

On Sun 22-12-24 10:34:12, Yafang Shao wrote:
> On Sat, Dec 21, 2024 at 3:21 PM Michal Hocko <mhocko@suse.com> wrote:
> >
> > On Fri 20-12-24 19:52:16, Yafang Shao wrote:
> > > On Fri, Dec 20, 2024 at 6:23 PM Michal Hocko <mhocko@suse.com> wrote:
> > > >
> > > > On Sun 15-12-24 15:34:13, Yafang Shao wrote:
> > > > > Implementation Options
> > > > > ----------------------
> > > > >
> > > > > - Solution A: Allow file caches on the unevictable list to become
> > > > >   reclaimable.
> > > > >   This approach would require significant refactoring of the page reclaim
> > > > >   logic.
> > > > >
> > > > > - Solution B: Prevent file caches from being moved to the unevictable list
> > > > >   during mlock and ignore the VM_LOCKED flag during page reclaim.
> > > > >   This is a more straightforward solution and is the one we have chosen.
> > > > >   If the file caches are reclaimed from the download-proxy's memcg and
> > > > >   subsequently accessed by tasks in the application’s memcg, a filemap
> > > > >   fault will occur. A new file cache will be faulted in, charged to the
> > > > >   application’s memcg, and locked there.
> > > >
> > > > Both options are silently breaking userspace because a non failing mlock
> > > > doesn't give guarantees it is supposed to AFAICS.
> > >
> > > It does not bypass the mlock mechanism; rather, it defers the actual
> > > locking operation to the page fault path. Could you clarify what you
> > > mean by "a non-failing mlock"? From what I can see, mlock can indeed
> > > fail if there isn’t sufficient memory available. With this change, we
> > > are simply shifting the potential failure point to the page fault path
> > > instead.
> >
> > Your change will cause mlocked pages (as mlock syscall returns success)
> > to be reclaimable later on. That breaks the basic mlock contract.
> 
> AFAICS, the mlock() behavior was originally designed with only a
> single root memory cgroup in mind. In other words, when mlock() was
> introduced, all locked pages were confined to the same memcg.

yes and this is the case to any other syscalls that might have an impact
on the memory consumption. This is by design. Memory cgroup controller
aims to provide a completely transparent resource control without any
modifications to applications. This is the case for all other cgroup
controllers. If memcg (or other controller) affects a specific syscall
behavior then this has to be communicated explicitly to the caller.

The purpose of mlock syscall is to _guarantee_ memory to be resident
(never swapped out). There might be additional constrains to prevent
from mlock succeeding - e.g. rlimit or if memcg aims to control amount
of the mlocked memory but those failures need to be explicitly
communicated via syscall failure.

> However, this changed with the introduction of memcg support. Now,
> mlock() can lock pages that belong to a different memcg than the
> current task. This behavior is not explicitly defined in the mlock()
> documentation, which could lead to confusion.

This is more of a problem of the cgroup configurations where different
resource domains are sharing resources. This is not much diffent when
other resources (e.g. shmem) are shared accross unrelated cgroups.

> To clarify, I propose updating the mlock() documentation as follows:

This is not really possible because you are effectively breaking an
existing userspace.
-- 
Michal Hocko
SUSE Labs

next prev parent reply	other threads:[~2025-01-06 12:28 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-15  7:34 Yafang Shao
2024-12-15  7:34 ` [RFC PATCH 1/2] mm/memcontrol: add a new cgroup file memory.nomlock Yafang Shao
2024-12-15  7:34 ` [RFC PATCH 2/2] mm: Add support for nomlock to avoid folios beling mlocked in a memcg Yafang Shao
2024-12-20 10:23 ` [RFC PATCH 0/2] memcg: add " Michal Hocko
2024-12-20 11:52   ` Yafang Shao
2024-12-21  7:21     ` Michal Hocko
2024-12-22  2:34       ` Yafang Shao
2024-12-25  2:23         ` Yafang Shao
2025-01-06 12:30           ` Michal Hocko
2025-01-06 14:04             ` Yafang Shao
2025-01-07  8:39               ` Michal Hocko
2025-01-07  9:43                 ` Yafang Shao
2025-01-06 12:28         ` Michal Hocko [this message]
2025-01-06 13:59           ` Yafang Shao
2025-01-07 10:04             ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Z3vMaDNdC1_fIVKn@tiehlicka \
    --to=mhocko@suse.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=laoar.shao@gmail.com \
    --cc=linux-mm@kvack.org \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeel.butt@linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox