From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=pOhq=ZB=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.7 required=3.0 tests=DKIM_SIGNED,DKIM_VALID,
	DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,
	HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 262EAC43331
	for <linux-mm@archiver.kernel.org>; Sat,  9 Nov 2019 03:33:39 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id C00A821019
	for <linux-mm@archiver.kernel.org>; Sat,  9 Nov 2019 03:33:38 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="V+UxchCH"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C00A821019
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 25DCE6B0003; Fri,  8 Nov 2019 22:33:38 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 20D826B0006; Fri,  8 Nov 2019 22:33:38 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 124BB6B0007; Fri,  8 Nov 2019 22:33:38 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194])
	by kanga.kvack.org (Postfix) with ESMTP id F1EC96B0003
	for <linux-mm@kvack.org>; Fri,  8 Nov 2019 22:33:37 -0500 (EST)
Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay05.hostedemail.com (Postfix) with SMTP id 91F9F181AEF1A
	for <linux-mm@kvack.org>; Sat,  9 Nov 2019 03:33:37 +0000 (UTC)
X-FDA: 76135319274.03.grape25_3cb3122f4eb52
X-HE-Tag: grape25_3cb3122f4eb52
X-Filterd-Recvd-Size: 8573
Received: from mail-il1-f194.google.com (mail-il1-f194.google.com [209.85.166.194])
	by imf22.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Sat,  9 Nov 2019 03:33:36 +0000 (UTC)
Received: by mail-il1-f194.google.com with SMTP id r9so6911946ilq.10
        for <linux-mm@kvack.org>; Fri, 08 Nov 2019 19:33:36 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=zUa6A2fBX6FPuC/wsIhBz6+nzFmbhpbsUM/cZAiGxZY=;
        b=V+UxchCHAs52CE4tXwFRP52zQ9lCRjBpbKm2e7n2y5TnGyAYSrV36H6iefDb1Mm6mj
         KvK+eV+zkBP4diLKVPVem87o9X9WBqmxfOekmbcwCcQ3TeMwlQsXxiTLtflwao/eF+eD
         85gLWMPHaWadQMXvQJOtORFiTS4xAyCnZw9nyLTBVSrzk7d+9kdne86A1ZOHiqD8oi+l
         oQThh2d56/YJCsqveZmDikt2Q7bitxBJvFuwxl9WJvajWKxTbzyqafXPUfLB6iSQLHLV
         sY8FfvuNnqoutQEq1mL1/3MphtulzC3IXL2tRWkKFMnVlrTDN9CPRm4Dy6+qIr0fjYFI
         7qGg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=zUa6A2fBX6FPuC/wsIhBz6+nzFmbhpbsUM/cZAiGxZY=;
        b=sFcNhKnMK/CYi5VamAxQj5FveLse84aABgmXQbxZVKUzsFfV07U008H1iyq60m9bZU
         7TTEL5b50hRoCBR/ohqW7us7ru5cf4h0VwcVFCvgHlPIK45038Vwo5ehQrK1v/7v08tN
         ZYsBk46FUL5lHIHKHKpdFfaU5s6zrKeGd1ERrRXq5vAfn4ZZhr0K1zlTE9oZYNs+0BZn
         LN/u5esrY5UlOTD+9C7gJwAUJK9E8C1zlXZ+On4FAEQVS+T6qSbF1lRSg0gq0PJSUjlQ
         WqbrVlagjZA319rq9CeEMeTz2AeVFwmRVQxB6EeuhM1qh2dn5uSXvqZqJ/LARQsqzpwQ
         VTWw==
X-Gm-Message-State: APjAAAWRvGq7ny2cp/X6eKXl5ufllqK7tfBoKUZlCXsSQmNKT9/azoSo
	BB/qztCwluje1AjSvEpavS61E1HFRBuK61pLsN0=
X-Google-Smtp-Source: APXvYqxETzeX2rvKnyMhUkx+trOoNquoYQTsRGuL158+yN6r4zSbT8hC8tGtBXDfKB1UXnK8FruC/xBriqBtO9epNIU=
X-Received: by 2002:a92:109c:: with SMTP id 28mr16966621ilq.142.1573270416228;
 Fri, 08 Nov 2019 19:33:36 -0800 (PST)
MIME-Version: 1.0
References: <1573106889-4939-1-git-send-email-laoar.shao@gmail.com> <20191108132603.GJ15658@dhcp22.suse.cz>
In-Reply-To: <20191108132603.GJ15658@dhcp22.suse.cz>
From: Yafang Shao <laoar.shao@gmail.com>
Date: Sat, 9 Nov 2019 11:32:59 +0800
Message-ID: <CALOAHbCLD9hA0mJm+D+A9gCd=jPoZwOPyv+k7MaJ_frjD4gB+Q@mail.gmail.com>
Subject: Re: [PATCH 1/2] mm, memcg: introduce multiple levels memory low protection
To: Michal Hocko <mhocko@kernel.org>
Cc: Johannes Weiner <hannes@cmpxchg.org>, Vladimir Davydov <vdavydov.dev@gmail.com>, 
	Andrew Morton <akpm@linux-foundation.org>, Linux MM <linux-mm@kvack.org>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Nov 8, 2019 at 9:26 PM Michal Hocko <mhocko@kernel.org> wrote:
>
> On Thu 07-11-19 01:08:08, Yafang Shao wrote:
> > This patch introduces a new memory controller file memory.low.level,
> > which is used to set multiple levels memory.low protetion.
> > The valid value of memory.low.level is [0..3], meaning we support four
> > levels protection now. This new controller file takes effect only when
> > memory.low is set. With this new file, we can do page reclaim QoS on
> > different memcgs. For example, when the system is under memory pressure, it
> > will reclaim pages from the memcg with lower priority first and then higher
> > priority.
> >
> >   - What is the problem in the current memory low proection ?
> >   Currently we can set bigger memory.low protection on memcg with higher
> >   priority, and smaller memory.low protection on memcg with lower priority.
> >   But once there's no available unprotected memory to reclaim, the
> >   reclaimers will reclaim the protected memory from all the memcgs.
> >   While we really want the reclaimers to reclaim the protected memory from
> >   the lower-priority memcgs first, and if it still can't meet the page
> >   allocation it will then reclaim the protected memory from higher-priority
> >   memdcgs. The logic can be displayed as bellow,
> >       under_memory_pressure
> >               reclaim_unprotected_memory
> >               if (meet_the_request)
> >                       exit
> >               reclaim_protected_memory_from_lowest_priority_memcgs
> >               if (meet_the_request)
> >                       exit
> >               reclaim_protected_memory_from_higher_priority_memcgs
> >               if (meet_the_request)
> >                       exit
> >               reclaim_protected_memory_from_highest_priority_memcgs
>
> Could you expand a bit more on the usecase please? Do you overcommit on
> the memory protection?
>

Hi Michal,

It doesn't matter it is overcommited or not, becuase there's a
effective low protection when it is overcommitted.
Also the real low level is effective low level, which makes it work in
the cgroup hierachy.
Let's expand the example in the comment above mem_cgroup_protected()
with memory.low.level.

 *
 *       A        A/memory.low = 2G  A/memory.current = 6G,
A/memory.low.level = 1
 *    / /  \  \
 *   BC  DE   B/memory.low = 3G  B/memory.current = 2G  B/memory.low.level = 2
 *                  C/memory.low = 1G  C/memory.current = 2G
C/memory.low.level = 1
 *                  D/memory.low = 0     D/memory.current = 2G
D/memory.low.level = 3
 *                  E/memory.low = 10G E/memory.current = 0
E/memory.low.level = 3
Suppose A is the targeted memory cgroup,  the following memory
distribution is expected,
A/memory.current = 2G
B/memory.current = 2G (because it has a higher low.level than C)
C/memory.current = 0
D/memory.current = 0
E/memory.current = 0

While if C/memory.low.level = 2, then the result will be
A/memory.current = 2G
B/memory.current = 1.3G
C/memory.current = 0.6G
D/memory.current = 0
E/memory.current = 0


> Also why is this needed only for the reclaim protection? In other words
> let's say that you have more memcgs that are above their protection
> thresholds why should reclaim behave differently for them from the
> situation when all of them reach the protection? Also what about min
> reclaim protection?
>

If you don't set memory.low (and memory.min), then you don't care
about the page reclaim.
If you really care about the page reclaim, then you should set
memory.low (and memory.min) first,
and if you want to distinguish the protected memory, you can use
memory.low.level then.

The reason we don't want to use min reclaim protection is that it will
cause more frequent OOMs
and you have to do more work to prevent random OOM.

> >   - Why does it support four-level memory low protection ?
> >   Low priority, medium priority and high priority, that is the most common
> >   usecases in the real life. So four-level memory low protection should be
> >   enough. The more levels it is, the higher overhead page reclaim will
> >   take. So four-level protection is really a trade-off.
>
> Well, is this really the case? Isn't that just a matter of a proper
> implementation? Starting an API with a very restricted input values
> usually tends to outdate very quickly and it is found unsuitable. It
> would help to describe how do you envision using those priorities. E.g.
> do you have examples of what kind of services workloads fall into which
> priority.
>
> That being said there are quite some gaps in the usecase description as
> well the interface design.

We have diffrent workloads running on one single server.
Some workloads are latency sensitive, while the others are not.
When there's resource pressure on this server, we expect this pressure
impact the latency sensitive workoad as little as possile,
and throttle the latency non-sensitve workloads.
memory.{low, min} can seperate the page caches to 'unevictable'
memory, protected memory and non-protected memory, but it can't
seperate different containers (workloads).
In the latency-sensive containers, there're also some non-impottant
page caches, e.g. some page cahes for the log file,
so it would better to reclaim log file pages first (unprotected
memory) then reclaim protected pages  from latency-non-sensitve
containers, then reclaim the protected pages from latency-sensitve
containers.

Sometimes we have to restrict the user behavior.
The current design memory.{min, low} seprate the page caches into min
protected, low protected and non-protected, that seems outdate now,
but we can improve it.
So I don't think the four-level low protection will be a big problem.

Thanks
Yafang