From: Tejun Heo <tj@kernel.org>
To: Mina Almasry <almasrymina@google.com>
Cc: Roman Gushchin <roman.gushchin@linux.dev>,
Johannes Weiner <hannes@cmpxchg.org>,
Yafang Shao <laoar.shao@gmail.com>,
Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>, Martin Lau <kafai@fb.com>,
Song Liu <songliubraving@fb.com>, Yonghong Song <yhs@fb.com>,
john fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
Stanislav Fomichev <sdf@google.com>, Hao Luo <haoluo@google.com>,
jolsa@kernel.org, Michal Hocko <mhocko@kernel.org>,
Shakeel Butt <shakeelb@google.com>,
Muchun Song <songmuchun@bytedance.com>,
Andrew Morton <akpm@linux-foundation.org>,
Zefan Li <lizefan.x@bytedance.com>,
Cgroups <cgroups@vger.kernel.org>,
netdev <netdev@vger.kernel.org>, bpf <bpf@vger.kernel.org>,
Linux MM <linux-mm@kvack.org>,
Yosry Ahmed <yosryahmed@google.com>,
Dan Schatzberg <schatzberg.dan@gmail.com>,
Lennart Poettering <lennart@poettering.net>
Subject: Re: [RFD RESEND] cgroup: Persistent memory usage tracking
Date: Thu, 25 Aug 2022 07:59:42 -1000 [thread overview]
Message-ID: <Ywe4jlSvu5rC44+1@slm.duckdns.org> (raw)
In-Reply-To: <CAHS8izMFMtM5ry12iEo72nwkynDpgycETn6QoXLGj=O6b8z1jg@mail.gmail.com>
Hello,
On Wed, Aug 24, 2022 at 12:02:04PM -0700, Mina Almasry wrote:
> > If we can express all the resource contraints and structures in the cgroup
> > side and configured by the management agent, the application can simply e.g.
> > madvise whatever memory region or flag bpf maps as "these are persistent"
> > and the rest can be handled by the system. If the agent set up the
> > environment for that, it gets accounted accordingly; otherwise, it'd behave
> > as if those tagging didn't exist. Asking the application to set up all its
> > resources in separate steps, that might require significant restructuring
> > and knowledge of how the hierarchy is setup in many cases.
>
> I don't know if this level of granularity is needed with a madvise()
> or such. The kernel knows whether resources are persistent due to the
> nature of the resource. For example a shared tmpfs file is a resource
> that is persistent and not cleaned up after the process using it dies,
> but private memory is. madvise(PERSISTENT) on private memory would not
> make sense, and I don't think madvise(NOT_PERSISTENT) on tmpfs-backed
> memory region would make sense. Also, this requires adding madvise()
> hints in userspace code to leverage this.
I haven't thought hard about what the hinting interface should be like. The
default assumptions would be that page cache belongs to the persistent
domain and anon belongs to the instance (mm folks, please correct me if I'm
off the rails here), but I can imagine situations where that doesn't
necessarily hold - like temp files which get unlinked on instance shutdown.
In terms of hint granularity, more coarse grained (e.g. file, mount
whatever) seems to make sense but again I haven't thought too hard on it.
That said, as long as the default behavior is reasonable, I think adding
some hinting calls in the application is acceptable. It doesn't require any
structrual changes and the additions would be for its own benefit of more
accurate accounting and control. That makes sense to me.
One unfortunate effect this will have is that we'll be actively putting
resources into intermediate cgroups. This already happens today but if we
support persistent domains, it's gonna be a lot more prevalent and we'll
need to update e.g. iocost to support IOs coming out of intermediate
cgroups. This kinda sucks because we don't even have knobs to control self
vs. children distributions. Oh well...
Thanks.
--
tejun
next prev parent reply other threads:[~2022-08-25 17:59 UTC|newest]
Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-08-18 14:31 [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 01/12] cgroup: Update the comment on cgroup_get_from_fd Yafang Shao
2022-08-18 19:11 ` Yosry Ahmed
2022-08-18 14:31 ` [PATCH bpf-next v2 02/12] bpf: Introduce new helper bpf_map_put_memcg() Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 03/12] bpf: Define bpf_map_{get,put}_memcg for !CONFIG_MEMCG_KMEM Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 04/12] bpf: Call bpf_map_init_from_attr() immediately after map creation Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 05/12] bpf: Save memcg in bpf_map_init_from_attr() Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 06/12] bpf: Use scoped-based charge in bpf_map_area_alloc Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 07/12] bpf: Introduce new helpers bpf_ringbuf_pages_{alloc,free} Yafang Shao
2022-08-18 17:30 ` Andrii Nakryiko
2022-08-18 14:31 ` [PATCH bpf-next v2 08/12] bpf: Use bpf_map_kzalloc in arraymap Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 09/12] bpf: Use bpf_map_kvcalloc in bpf_local_storage Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 10/12] mm, memcg: Add new helper get_obj_cgroup_from_cgroup Yafang Shao
2022-08-18 20:38 ` Shakeel Butt
2022-08-19 1:21 ` Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 11/12] bpf: Add return value for bpf_map_init_from_attr Yafang Shao
2022-08-18 14:31 ` [PATCH bpf-next v2 12/12] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-18 22:20 ` [PATCH bpf-next v2 00/12] " Tejun Heo
2022-08-18 22:33 ` Tejun Heo
2022-08-19 1:09 ` Yafang Shao
2022-08-19 17:06 ` Tejun Heo
2022-08-20 2:25 ` Yafang Shao
2022-08-22 11:29 ` [RFD RESEND] cgroup: Persistent memory usage tracking Tejun Heo
2022-08-22 16:12 ` Shakeel Butt
2022-08-22 19:02 ` Mina Almasry
2022-08-22 21:19 ` Johannes Weiner
2022-08-22 21:52 ` Mina Almasry
2022-08-23 3:01 ` Roman Gushchin
2022-08-23 3:14 ` Tejun Heo
2022-08-24 19:02 ` Mina Almasry
2022-08-25 17:59 ` Tejun Heo [this message]
2022-08-23 11:08 ` Yafang Shao
2022-08-23 17:12 ` Tejun Heo
2022-08-24 11:57 ` Yafang Shao
2022-08-19 0:59 ` [PATCH bpf-next v2 00/12] bpf: Introduce selectable memcg for bpf map Yafang Shao
2022-08-19 16:45 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Ywe4jlSvu5rC44+1@slm.duckdns.org \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=almasrymina@google.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=cgroups@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=hannes@cmpxchg.org \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=laoar.shao@gmail.com \
--cc=lennart@poettering.net \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mhocko@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=roman.gushchin@linux.dev \
--cc=schatzberg.dan@gmail.com \
--cc=sdf@google.com \
--cc=shakeelb@google.com \
--cc=songliubraving@fb.com \
--cc=songmuchun@bytedance.com \
--cc=yhs@fb.com \
--cc=yosryahmed@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox