From: Shakeel Butt <shakeel.butt@linux.dev>
To: Christian Brauner <brauner@kernel.org>
Cc: "Michal Koutný" <mkoutny@suse.com>,
"Andrew Morton" <akpm@linux-foundation.org>,
"Johannes Weiner" <hannes@cmpxchg.org>,
"Michal Hocko" <mhocko@kernel.org>,
"Roman Gushchin" <roman.gushchin@linux.dev>,
"Muchun Song" <muchun.song@linux.dev>,
"Yosry Ahmed" <yosry.ahmed@linux.dev>,
"Tejun Heo" <tj@kernel.org>, "Greg Thelen" <gthelen@google.com>,
linux-mm@kvack.org, cgroups@vger.kernel.org,
linux-kernel@vger.kernel.org,
"Meta kernel team" <kernel-team@meta.com>
Subject: Re: [PATCH v2] memcg: introduce non-blocking limit setting option
Date: Tue, 22 Apr 2025 08:40:14 -0700 [thread overview]
Message-ID: <rha4tmnnrhncn2ryoml2hbu5hxt3qnbg2rurl6tkssnegrc5wn@isui3jn3cu4h> (raw)
In-Reply-To: <20250422-synergie-bauabschnitt-5f724f1d9866@brauner>
On Tue, Apr 22, 2025 at 11:48:23AM +0200, Christian Brauner wrote:
> On Tue, Apr 22, 2025 at 11:31:23AM +0200, Michal Koutný wrote:
> > On Tue, Apr 22, 2025 at 11:23:17AM +0200, Christian Brauner <brauner@kernel.org> wrote:
> > > As written this isn't restricted to admin processes though, no? So any
> > > unprivileged container can open that file O_NONBLOCK and avoid
> > > synchronous reclaim?
> > >
> > > Which might be fine I have no idea but it's something to explicitly
> > > point out
> >
> > It occurred to me as well but I think this is fine -- changing the
> > limits of a container is (should be) a privileged operation already
> > (ensured by file permissions at opening).
> > IOW, this doesn't allow bypassing the limits to anyone who couldn't have
> > been able to change them already.
>
> Hm, can you explain what you mean by a privileged operation here? If I
> have nested containers with user namespaces with delegated cgroup tress,
> i.e., chowned to them and then some PID 1 or privileged container
> _within the user namespace_ lowers the limit and uses O_NONBLOCK then it
> won't trigger synchronous reclaim. Again, this might all be fine I'm
> just trying to understand.
I think Michal's point is (which I agree with) that if a process has the
privilege to change the limit of a cgroup then it is ok for that process
to use O_NONBLOCK to avoid sync reclaim. This new functionality is not
enabling anyone to bypass their limits.
In your example of PID 1 or privileged container, yes with O_NONBLOCK
the limit updater will not trigger sync reclaim but whoever is running
in that cgroup will eventually hit the sync reclaim in their next charge
request.
next prev parent reply other threads:[~2025-04-22 15:40 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-04-19 18:35 Shakeel Butt
2025-04-21 21:34 ` Roman Gushchin
2025-04-22 9:23 ` Christian Brauner
2025-04-22 9:31 ` Michal Koutný
2025-04-22 9:48 ` Christian Brauner
2025-04-22 15:40 ` Shakeel Butt [this message]
2025-04-22 18:12 ` Johannes Weiner
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=rha4tmnnrhncn2ryoml2hbu5hxt3qnbg2rurl6tkssnegrc5wn@isui3jn3cu4h \
--to=shakeel.butt@linux.dev \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=gthelen@google.com \
--cc=hannes@cmpxchg.org \
--cc=kernel-team@meta.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mhocko@kernel.org \
--cc=mkoutny@suse.com \
--cc=muchun.song@linux.dev \
--cc=roman.gushchin@linux.dev \
--cc=tj@kernel.org \
--cc=yosry.ahmed@linux.dev \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox