linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Hillf Danton <hdanton@sina.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Hillf Danton <hdanton@sina.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>, Shakeel Butt <shakeelb@google.com>,
	Roman Gushchin <guro@fb.com>,
	Matthew Wilcox <willy@infradead.org>
Subject: Re: [RFC] mm: memcg: add priority for soft limit reclaiming
Date: Tue, 24 Sep 2019 15:36:42 +0800	[thread overview]
Message-ID: <20190924073642.3224-1-hdanton@sina.com> (raw)


On Mon, 23 Sep 2019 21:28:34 Michal Hocko wrote:
> 
> On Mon 23-09-19 21:04:59, Hillf Danton wrote:
> >
> > On Thu, 19 Sep 2019 21:32:31 +0800 Michal Hocko wrote:
> > >
> > > On Thu 19-09-19 21:13:32, Hillf Danton wrote:
> > > >
> > > > Currently memory controler is playing increasingly important role in
> > > > how memory is used and how pages are reclaimed on memory pressure.
> > > >
> > > > In daily works memcg is often created for critical tasks and their pre
> > > > configured memory usage is supposed to be met even on memory pressure.
> > > > Administrator wants to make it configurable that the pages consumed by
> > > > memcg-B can be reclaimed by page allocations invoked not by memcg-A but
> > > > by memcg-C.
> > >
> > > I am not really sure I understand the usecase well but this sounds like
> > > what memory reclaim protection in v2 is aiming at.
> > >
> Please describe the usecase.
> 
It is for quite a while that task-A has been able to preempt task-B for
cpu cycles. IOW the physical resource cpu cycles are preemptible.

Are physical pages are preemptible too in the same manner?
Nope without priority defined for pages currently (say the link between
page->nice and task->nice).

The slrp is added for memcg instead of nice because 1) it is only used
in the page reclaiming context (in memcg it is soft limit reclaiming),
and 2) it is difficult to compare reclaimer and reclaimee task->nice
directly in that context as only info about reclaimer and lru page is
available.

Here task->nice is replaced with memcg->slrp in order to do page
preemption, PP. There is no way for task-A to PP task-B, but the
group containing task-A can PP the group containing task-B.
That preemption needs code within 100 lines as you see on top of
the current memory controller framework.

The user visible things following PP include
1) the increase in system-wide configurability,

Combined with and/or in parallel to memcg.high, PP help admin configure
and maintain 100 mm groups on systems with 100GB RAM. With every group
high bundary set to 10MB, then he only needs to fiddle with the slrps of
handful of groups containing critical tasks.

2) the increase in system-wide responsibility,

Because critical groups can be configured to be not page preempted.

3) the gradient field grows in a long running system with prioirty,

Just like the rivers going through all the ways from mountains to
the seas.

Adding PP in background reclaiming is on the way:
1> define page->nice and link it to task->nice
2> on isolating lru pages check reclaimer->nice against page->nice
   and skip page if reclaimer is lower on priority

> > A tipoint to the v2 stuff please.
> 
> Documentation/admin-guide/cgroup-v2.rst
> 
Thanks Michal.

Out of surprise slrp happened to go with the line of cgroup-v2.

--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -1108,6 +1108,17 @@ PAGE_SIZE multiple when read back.
        Going over the high limit never invokes the OOM killer and
        under extreme conditions the limit may be breached.

+  memory.slrp
+       A read-write single value [0-32] file which exists on non-root
+       cgroups.  The default is "0".
+
+       Soft limit reclaiming priority.  This is the mechanism to control
+       how physical pages are reclaimed when a group's memory usage goes
+       over its high boundary.
+
+       It makes sure that no pages will be reclaimed from any group of
+       higher slrp in favor of a lower-slrp group.
+
   memory.max
        A read-write single value file which exists on non-root
        cgroups.  The default is "max".
--

Hillf



             reply	other threads:[~2019-09-24  7:37 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-24  7:36 Hillf Danton [this message]
2019-09-24 13:30 ` Michal Hocko
2019-09-24 17:23   ` Roman Gushchin
2019-09-25  2:35   ` Hillf Danton
2019-09-25  6:52     ` Michal Hocko
  -- strict thread matches above, loose matches on Subject: below --
2019-09-19 13:13 Hillf Danton
2019-09-19 13:32 ` Michal Hocko
2019-09-23 13:04   ` Hillf Danton
2019-09-23 13:28     ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190924073642.3224-1-hdanton@sina.com \
    --to=hdanton@sina.com \
    --cc=akpm@linux-foundation.org \
    --cc=guro@fb.com \
    --cc=hannes@cmpxchg.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=shakeelb@google.com \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox