linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Johannes Weiner <hannes@cmpxchg.org>
To: Vladimir Davydov <vdavydov@virtuozzo.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	Michal Hocko <mhocko@suse.cz>,
	linux-mm@kvack.org, cgroups@vger.kernel.org,
	linux-kernel@vger.kernel.org, kernel-team@fb.com
Subject: Re: [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics
Date: Fri, 15 Jan 2016 15:30:59 -0500	[thread overview]
Message-ID: <20160115203059.GA25092@cmpxchg.org> (raw)
In-Reply-To: <20160115095834.GP30160@esperanza>

On Fri, Jan 15, 2016 at 12:58:34PM +0300, Vladimir Davydov wrote:
> With the follow-up it looks good to me. All exported counters look
> justified enough and the format follows that of other cgroup2
> controllers (cpu, blkio). Thanks!
> 
> Acked-by: Vladimir Davydov <vdavydov@virtuozzo.com>

Thanks Vladimir.

> One addition though. May be, we could add 'total' field which would show
> memory.current? Yeah, this would result in a little redundancy, but I
> think that from userspace pov it's much more convenient to read the
> only file and get all stat counters than having them spread throughout
> several files.

I am not fully convinced that a total value or even memory.current
will be looked at that often in practice, because in all but a few
cornercases that value will be pegged to the configured limit. In
those instances I think it should be okay to check another file.

> Come to think of it, do we really need separate memory.events file?
> Can't these counters live in memory.stat either?

I think it sits at a different level of the interface. The events file
indicates cgroup-specific dynamics between configuration and memory
footprint, and so it sits on the same level as low, high, max, and
current. These are the parts involved in the most basic control loop
between the kernel and the job scheduler--monitor and adjust or notify
the admin. It's for the entity that allocates and manages the system.

The memory.stat file on the other hand is geared toward analyzing and
understanding workload-specific performance (whether by humans or with
some automated heuristics) and if necessary correcting the config file
that describes the application's requirements to the job scheduler.

I think it makes sense to not conflate these two interfaces.

> Yeah, this file
> generates events, but IMHO it's not very useful the way it is currently
> implemented:
> 
> Suppose, a user wants to receive notifications about OOM or LOW events,
> which are rather rare normally and might require immediate action. The
> only way to do that is to listen to memory.events, but this file can
> generate tons of MAX/HIGH when the cgroup is performing normally. The
> userspace app will have to wake up every time the cgroup performs
> reclaim and check memory.events just to ensure no OOM happened and this
> all will result in wasting cpu time.

Under optimal system load there is no limit reclaim, and memory
pressure comes exclusively from a shortage of physical pages that
global reclaim balances based on memory.low. If groups run into their
own limits, it means that there are idle resources left on the table.

So events only happen when the machine is over or under utilized, and
as per above, the events file is mainly meant for something like a job
scheduler tasked with allocating the machine's resources. It's hard to
imagine a job scheduler scenario where the long-term goal is anything
other than optimal utilization.

There are reasonable cases in which memory could be temporarily left
idle, say to keep startup latency of new jobs low. In those it's true
that the max and high notifications might become annoying. But do you
really think that could become problematic in practice? In that case
it should be enough if we ratelimit the file-changed notifications.

> May be, we could generate LOW/HIGH/MAX events on memory.low/high/max?
> This would look natural IMO. Don't know where OOM events should go in
> this case though.

Without a natural place for OOM notifications, it probably makes sense
to stick with memory.events.

Thanks,
Johannes

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-01-15 20:31 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-13 22:01 Johannes Weiner
2016-01-13 22:01 ` [PATCH 1/2] mm: memcontrol: basic memory statistics in cgroup2 memory controller Johannes Weiner
2016-01-13 22:49   ` Andrew Morton
2016-01-14 20:25     ` Johannes Weiner
2016-01-13 22:01 ` [PATCH 2/2] mm: memcontrol: add "sock" to cgroup2 memory.stat Johannes Weiner
2016-01-14 20:26   ` [PATCH 2/2 V2] " Johannes Weiner
2016-01-13 22:49 ` [PATCH 0/2] mm: memcontrol: cgroup2 memory statistics Andrew Morton
2016-01-14 20:24   ` Johannes Weiner
2016-01-15  9:58     ` Vladimir Davydov
2016-01-15 20:30       ` Johannes Weiner [this message]
2016-01-28 16:21         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160115203059.GA25092@cmpxchg.org \
    --to=hannes@cmpxchg.org \
    --cc=akpm@linux-foundation.org \
    --cc=cgroups@vger.kernel.org \
    --cc=kernel-team@fb.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.cz \
    --cc=vdavydov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox