Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: David Finkel <davidf@vimeo.com>
To: Tejun Heo <tj@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>,
	Muchun Song <muchun.song@linux.dev>,
	 Andrew Morton <akpm@linux-foundation.org>,
	core-services@vimeo.com,  Jonathan Corbet <corbet@lwn.net>,
	Roman Gushchin <roman.gushchin@linux.dev>,
	 Shakeel Butt <shakeelb@google.com>,
	Shuah Khan <shuah@kernel.org>,
	 Johannes Weiner <hannes@cmpxchg.org>,
	Zefan Li <lizefan.x@bytedance.com>,
	cgroups@vger.kernel.org,  linux-doc@vger.kernel.org,
	linux-mm@kvack.org,  linux-kselftest@vger.kernel.org
Subject: Re: [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers
Date: Tue, 16 Jul 2024 13:10:14 -0400	[thread overview]
Message-ID: <CAFUnj5M9CTYPcEM3=4i4rTfiU4sY4Qq8V1DXHJ00YYD2xFBvew@mail.gmail.com> (raw)
In-Reply-To: <ZpajW9BKCFcCCTr-@slm.duckdns.org>

On Tue, Jul 16, 2024 at 12:44 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Tue, Jul 16, 2024 at 03:48:17PM +0200, Michal Hocko wrote:
> ...
> > > This behavior is particularly useful for work scheduling systems that
> > > need to track memory usage of worker processes/cgroups per-work-item.
> > > Since memory can't be squeezed like CPU can (the OOM-killer has
> > > opinions), these systems need to track the peak memory usage to compute
> > > system/container fullness when binpacking workitems.
>
> Swap still has bad reps but there's nothing drastically worse about it than
> page cache. ie. If you're under memory pressure, you get thrashing one way
> or another. If there's no swap, the system is just memlocking anon memory
> even when they are a lot colder than page cache, so I'm skeptical that no
> swap + mostly anon + kernel OOM kills is a good strategy in general
> especially given that the system behavior is not very predictable under OOM
> conditions.

The reason we need peak memory information is to let us schedule work in a way
that we generally avoid OOM conditions.
For the workloads I work on, we generally have very little in the
page-cache, since
the data isn't stored locally most of the time, but streamed from
other storage/database
systems. For those cases, demand-paging will cause large variations in
servicing time,
and we'd rather restart the process than have unpredictable latency.
The same is true for the batch/queue-work system I wrote this patch to support.
We keep very little data on the local disk, so the page cache is
relatively small.

>
> > As mentioned down the email thread, I consider usefulness of peak value
> > rather limited. It is misleading when memory is reclaimed. But
> > fundamentally I do not oppose to unifying the write behavior to reset
> > values.
>
> The removal of resets was intentional. The problem was that it wasn't clear
> who owned those counters and there's no way of telling who reset what when.
> It was easy to accidentally end up with multiple entities that think they
> can get timed measurement by resetting.
>
> So, in general, I don't think this is a great idea. There are shortcomings
> to how memory.peak behaves in that its meaningfulness quickly declines over
> time. This is expected and the rationale behind adding memory.peak, IIRC,
> was that it was difficult to tell the memory usage of a short-lived cgroup.
>
> If we want to allow peak measurement of time periods, I wonder whether we
> could do something similar to pressure triggers - ie. let users register
> watchers so that each user can define their own watch periods. This is more
> involved but more useful and less error-inducing than adding reset to a
> single counter.

I appreciate the ownership issues with the current resetting interface
in the other locations.
However, this peak RSS data is not used by all that many applications
(as evidenced by
the fact that the memory.peak file was only added a bit over a year ago).
I think there are enough cases where ownership is enforced externally
that mirroring the
existing interface to cgroup2 is sufficient.

I do think a more stateful interface would be nice, but I don't know
whether I have enough
knowledge of memcg to implement that in a reasonable amount of time.

Ownership aside, I think being able to reset the high watermark of a
process makes it
significantly more useful. Creating new cgroups and moving processes
around is significantly
heavier-weight.

Thanks,

>
> Johannes, what do you think?
>
> Thanks.
>
> --
> tejun

-- 
David Finkel
Senior Principal Software Engineer, Core Services

next prev parent reply	other threads:[~2024-07-16 17:10 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15 20:36 David Finkel
2024-07-15 20:36 ` David Finkel
2024-07-15 20:42   ` David Finkel
2024-07-15 20:46     ` David Finkel
2024-07-16  7:20       ` Michal Hocko
2024-07-16 12:47         ` David Finkel
2024-07-16 13:19           ` Michal Hocko
2024-07-16 13:39             ` David Finkel
2024-07-16 13:48   ` Michal Hocko
2024-07-16 13:54     ` David Finkel
2024-07-16 16:44     ` Tejun Heo
2024-07-16 17:01       ` Roman Gushchin
2024-07-16 17:20         ` David Finkel
2024-07-16 19:53         ` Tejun Heo
2024-07-16 17:10       ` David Finkel [this message]
2024-07-16 19:48         ` Tejun Heo
2024-07-16 20:18           ` David Finkel
2024-07-16 18:00       ` Michal Hocko
2024-07-16 20:00         ` Tejun Heo
2024-07-16 22:06           ` David Finkel
2024-07-17  6:26             ` Michal Hocko
2024-07-17 14:24               ` David Finkel
2024-07-17 15:46                 ` Michal Hocko
2024-07-17  6:23           ` Michal Hocko
2024-07-17 17:04       ` Johannes Weiner
2024-07-17 20:14         ` David Finkel
2024-07-17 20:44           ` Johannes Weiner
2024-07-17 21:13             ` David Finkel
2024-07-17 23:48               ` Waiman Long
2024-07-18  1:24                 ` Tejun Heo
2024-07-18  2:17                   ` Roman Gushchin
2024-07-18  2:22                   ` Waiman Long
2024-07-18  7:21             ` Michal Hocko
2024-07-18 21:49         ` David Finkel
2024-07-19  3:23           ` Waiman Long
2024-07-22 15:18             ` David Finkel
  -- strict thread matches above, loose matches on Subject: below --
2024-07-22 15:17 [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers (fd-local edition) David Finkel
2024-07-22 15:17 ` [PATCH] mm, memcg: cg2 memory{.swap,}.peak write handlers David Finkel
2024-07-22 18:22   ` Roman Gushchin
2024-07-22 19:30     ` David Finkel
2024-07-22 19:47       ` Waiman Long
2024-07-22 23:06         ` David Finkel
2023-12-04 19:41 David Finkel
2023-12-04 23:33 ` Shakeel Butt
2023-12-05  9:07 ` Michal Hocko
2023-12-05 16:00   ` David Finkel
2023-12-06  8:45     ` Michal Hocko
2024-02-07 21:06 ` David Finkel

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAFUnj5M9CTYPcEM3=4i4rTfiU4sY4Qq8V1DXHJ00YYD2xFBvew@mail.gmail.com' \
    --to=davidf@vimeo.com \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=corbet@lwn.net \
    --cc=core-services@vimeo.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@suse.com \
    --cc=muchun.song@linux.dev \
    --cc=roman.gushchin@linux.dev \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox