linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Mina Almasry <almasrymina@google.com>
Cc: Tejun Heo <tj@kernel.org>, Yafang Shao <laoar.shao@gmail.com>,
	Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>, Martin Lau <kafai@fb.com>,
	Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
	john fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
	jolsa@kernel.org, Michal Hocko <mhocko@kernel.org>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	Shakeel Butt <shakeelb@google.com>,
	Muchun Song <songmuchun@bytedance.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Zefan Li <lizefan.x@bytedance.com>,
	Cgroups <cgroups@vger.kernel.org>,
	netdev <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
	Linux MM <linux-mm@kvack.org>,
	Yosry Ahmed <yosryahmed@google.com>,
	Dan Schatzberg <schatzberg.dan@gmail.com>,
	Lennart Poettering <lennart@poettering.net>
Subject: Re: [RFD RESEND] cgroup: Persistent memory usage tracking
Date: Mon, 22 Aug 2022 17:19:50 -0400	[thread overview]
Message-ID: <YwPy9hervVxfuuYE@cmpxchg.org> (raw)
In-Reply-To: <CAHS8izNvEpX3Lv7eFn-vu=4ZT96Djk2dU-VU+zOueZaZZbnWNw@mail.gmail.com>

On Mon, Aug 22, 2022 at 12:02:48PM -0700, Mina Almasry wrote:
> On Mon, Aug 22, 2022 at 4:29 AM Tejun Heo <tj@kernel.org> wrote:
> > b. Let userspace specify which cgroup to charge for some of constructs like
> >    tmpfs and bpf maps. The key problems with this approach are
> >
> >    1. How to grant/deny what can be charged where. We must ensure that a
> >       descendant can't move charges up or across the tree without the
> >       ancestors allowing it.
> >
> >    2. How to specify the cgroup to charge. While specifying the target
> >       cgroup directly might seem like an obvious solution, it has a couple
> >       rather serious problems. First, if the descendant is inside a cgroup
> >       namespace, it might be able to see the target cgroup at all. Second,
> >       it's an interface which is likely to cause misunderstandings on how it
> >       can be used. It's too broad an interface.
> >
> 
> This is pretty much the solution I sent out for review about a year
> ago and yes, it suffers from the issues you've brought up:
> https://lore.kernel.org/linux-mm/20211120045011.3074840-1-almasrymina@google.com/
> 
> 
> >    One solution that I can think of is leveraging the resource domain
> >    concept which is currently only used for threaded cgroups. All memory
> >    usages of threaded cgroups are charged to their resource domain cgroup
> >    which hosts the processes for those threads. The persistent usages have a
> >    similar pattern, so maybe the service level cgroup can declare that it's
> >    the encompassing resource domain and the instance cgroup can say whether
> >    it's gonna charge e.g. the tmpfs instance to its own or the encompassing
> >    resource domain.
> >
> 
> I think this sounds excellent and addresses our use cases. Basically
> the tmpfs/bpf memory would get charged to the encompassing resource
> domain cgroup rather than the instance cgroup, making the memory usage
> of the first and second+ instances consistent and predictable.
> 
> Would love to hear from other memcg folks what they would think of
> such an approach. I would also love to hear what kind of interface you
> have in mind. Perhaps a cgroup tunable that says whether it's going to
> charge the tmpfs/bpf instance to itself or to the encompassing
> resource domain?

I like this too. It makes shared charging predictable, with a coherent
resource hierarchy (congruent OOM, CPU, IO domains), and without the
need for cgroup paths in tmpfs mounts or similar.

As far as who is declaring what goes, though: if the instance groups
can declare arbitrary files/objects persistent or shared, they'd be
able to abuse this and sneak private memory past local limits and
burden the wider persistent/shared domain with it.

I'm thinking it might make more sense for the service level to declare
which objects are persistent and shared across instances.

If that's the case, we may not need a two-component interface. Just
the ability for an intermediate cgroup to say: "This object's future
memory is to be charged to me, not the instantiating cgroup."

Can we require a process in the intermediate cgroup to set up the file
or object, and use madvise/fadvise to say "charge me", before any
instances are launched?


  reply	other threads:[~2022-08-22 21:19 UTC|newest]

Thread overview: 36+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-18 14:31 [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 01/12] cgroup: Update the comment on cgroup_get_from_fd Yafang Shao
2022-08-18 19:11   ` Yosry Ahmed
2022-08-18 14:31 ` [PATCH bpf-next v2 02/12] bpf: Introduce new helper bpf_map_put_memcg() Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 03/12] bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 04/12] bpf: Call bpf_map_init_from_attr() immediately after map creation Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 05/12] bpf: Save memcg in bpf_map_init_from_attr() Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 06/12] bpf: Use scoped-based charge in bpf_map_area_alloc Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 07/12] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2022-08-18 17:30   ` Andrii Nakryiko
2022-08-18 14:31 ` [PATCH bpf-next v2 08/12] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 09/12] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 10/12] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
2022-08-18 20:38   ` Shakeel Butt
2022-08-19  1:21     ` Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 11/12] bpf: Add return value for bpf_map_init_from_attr Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 12/12] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-18 22:20 ` [PATCH bpf-next v2 00/12] " Tejun Heo
2022-08-18 22:33   ` Tejun Heo
2022-08-19  1:09     ` Yafang Shao
2022-08-19 17:06       ` Tejun Heo
2022-08-20  2:25         ` Yafang Shao
2022-08-22 11:29           ` [RFD RESEND] cgroup: Persistent memory usage tracking Tejun Heo
2022-08-22 16:12             ` Shakeel Butt
2022-08-22 19:02             ` Mina Almasry
2022-08-22 21:19               ` Johannes Weiner [this message]
2022-08-22 21:52                 ` Mina Almasry
2022-08-23  3:01                 ` Roman Gushchin
2022-08-23  3:14                   ` Tejun Heo
2022-08-24 19:02                     ` Mina Almasry
2022-08-25 17:59                       ` Tejun Heo
2022-08-23 11:08             ` Yafang Shao
2022-08-23 17:12               ` Tejun Heo
2022-08-24 11:57                 ` Yafang Shao
2022-08-19  0:59   ` [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-19 16:45     ` Tejun Heo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YwPy9hervVxfuuYE@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=almasrymina@google.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=cgroups@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=jolsa@kernel.org \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=laoar.shao@gmail.com \
    --cc=lennart@poettering.net \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=roman.gushchin@linux.dev \
    --cc=schatzberg.dan@gmail.com \
    --cc=sdf@google.com \
    --cc=shakeelb@google.com \
    --cc=songliubraving@fb.com \
    --cc=songmuchun@bytedance.com \
    --cc=tj@kernel.org \
    --cc=yhs@fb.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox