From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=X7gQ=6B=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-8.4 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,
	SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 69F0FC2BB1D
	for <linux-mm@archiver.kernel.org>; Fri, 17 Apr 2020 21:51:26 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 0324620B1F
	for <linux-mm@archiver.kernel.org>; Fri, 17 Apr 2020 21:51:25 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=pass (2048-bit key) header.d=google.com header.i=@google.com header.b="VVM8cusG"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0324620B1F
Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 794638E0003; Fri, 17 Apr 2020 17:51:24 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 71E338E0001; Fri, 17 Apr 2020 17:51:24 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 5E5298E0003; Fri, 17 Apr 2020 17:51:24 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0167.hostedemail.com [216.40.44.167])
	by kanga.kvack.org (Postfix) with ESMTP id 3DF418E0001
	for <linux-mm@kvack.org>; Fri, 17 Apr 2020 17:51:24 -0400 (EDT)
Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay03.hostedemail.com (Postfix) with ESMTP id E7CEF824556B
	for <linux-mm@kvack.org>; Fri, 17 Apr 2020 21:51:23 +0000 (UTC)
X-FDA: 76718693646.12.store29_29472eb546055
X-HE-Tag: store29_29472eb546055
X-Filterd-Recvd-Size: 7309
Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193])
	by imf33.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 17 Apr 2020 21:51:23 +0000 (UTC)
Received: by mail-lj1-f193.google.com with SMTP id e25so3595168ljg.5
        for <linux-mm@kvack.org>; Fri, 17 Apr 2020 14:51:23 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=google.com; s=20161025;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=tmTmDDUdpiLDV9LVGQCn0/OLUoBHllDUrwiB8bPf7PE=;
        b=VVM8cusG0MQVqrRDrUMvM8EhIwHOn0sQjHBjQhOSHLGEZCqJMCc4gROw7t4V/VvMaM
         rTksXe0M88FFeD+d+4/bU9Umfz4tE0rDCFqJOsAP3yaEDjpWmUR9eMq3AzzsiqYBQZNi
         Qi5TmhCZLqxmx93vn63TQb3vbwPVpF0H+JIhVikUm85foDo7F5Xb6KvZFwLmyFrNIkSL
         z0wQyU2jNtR4QvbOlI3OHXgzEbM+bT53bjqp3l7bkCopGgWyws5tk2hxrzjT4QynMgT6
         dD6JSbo1qnWGSX02bZGkk10xSJ+aXhLrECxwYZ4spLR0m5HatbRaYOnwAgY37gWpKmxN
         IdYQ==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=tmTmDDUdpiLDV9LVGQCn0/OLUoBHllDUrwiB8bPf7PE=;
        b=hM+14EhBdNHTWuOiOIDXqVhy93jre40ABgl4NA3JT8cU5QcGi1dAh2feDBYCorU7Wr
         v4R28RgAKyBxH6lnUs29Jn/Ap2g3dcI4+zv/g5+rO8cKAyTFxIGkcJQnp6aij0ddZVjT
         yeU6jhNyVq9WcUHrZuTOpmQ5vErsJXZVuXrh6I+Bdg2Fm7VMfA2+DWlGfCSZHEgi10bu
         nNdmcwh+NzzpHkReOuNQ6FLMgqDQxug3SXBvpecmaOV4X2L6J9YDJhkmh64cS7KsTsrn
         feRKxs1xfM19TPFFbCY01fK1B499a5qkXfoTAb/ryJYeaSZxJelCqMvkJEI5Q4L0AcK7
         n84g==
X-Gm-Message-State: AGi0PuaoVRLIcEJmOOWFhtwBYy2xohSrwcH4d8Kii/yLKizL475Iknz8
	wLwfe73N4LVkpZF+qpt+7ZpVIQZ8wVDIKZd77AeeLw==
X-Google-Smtp-Source: APiQypK8Y9XOmeXokupsh1wQ9/dgy9//zGMwaI7Km7M6TBcUB/VFbedYG7nmlzeeQlUJIJ1cPSuBGxYo1kBc46PpLxg=
X-Received: by 2002:a2e:45c3:: with SMTP id s186mr2894597lja.270.1587160281667;
 Fri, 17 Apr 2020 14:51:21 -0700 (PDT)
MIME-Version: 1.0
References: <20200417010617.927266-1-kuba@kernel.org> <CALvZod78ZUhU+yr2x1h_gv+VgVGTPnSSGKh_+fd+MeiAKreJvg@mail.gmail.com>
 <20200417162355.GA43469@mtj.thefacebook.com> <CALvZod4ftvXCu8SbQUXwTGVvx5K2+at9h30r28chZLXEB1JdfQ@mail.gmail.com>
 <20200417173615.GB43469@mtj.thefacebook.com> <CALvZod7-r0OrJ+-_uCy_p3BU3348ve2+YatiSdLvFaVqcqCs=w@mail.gmail.com>
 <20200417193539.GC43469@mtj.thefacebook.com>
In-Reply-To: <20200417193539.GC43469@mtj.thefacebook.com>
From: Shakeel Butt <shakeelb@google.com>
Date: Fri, 17 Apr 2020 14:51:09 -0700
Message-ID: <CALvZod6LT25t9aAA1KHmf1U4-L8zSjUXQ4VQvX4cMT1A+R_g+w@mail.gmail.com>
Subject: Re: [PATCH 0/3] memcg: Slow down swap allocation as the available
 space gets depleted
To: Tejun Heo <tj@kernel.org>
Cc: Jakub Kicinski <kuba@kernel.org>, Andrew Morton <akpm@linux-foundation.org>, 
	Linux MM <linux-mm@kvack.org>, Kernel Team <kernel-team@fb.com>, 
	Johannes Weiner <hannes@cmpxchg.org>, Chris Down <chris@chrisdown.name>, 
	Cgroups <cgroups@vger.kernel.org>
Content-Type: text/plain; charset="UTF-8"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000001, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

On Fri, Apr 17, 2020 at 12:35 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Fri, Apr 17, 2020 at 10:51:10AM -0700, Shakeel Butt wrote:
> > > Can you please elaborate concrete scenarios? I'm having a hard time seeing
> > > differences from page cache.
> >
> > Oh I was talking about the global reclaim here. In global reclaim, any
> > task can be throttled (throttle_direct_reclaim()). Memory freed by
> > using the CPU of high priority low latency jobs can be stolen by low
> > priority batch jobs.
>
> I'm still having a hard time following this thread of discussion, most
> likely because my knoweldge of mm is fleeting at best. Can you please ELI5
> why the above is specifically relevant to this discussion?
>

No, it is not relevant to this discussion "now". The mention of
performance isolation in my first email was mostly due to my lack of
understanding about what problem this patch series is trying to solve.
So, let's skip this topic.

> I'm gonna list two things that come to my mind just in case that'd help
> reducing the back and forth.
>
> * With protection based configurations, protected cgroups wouldn't usually
>   go into direct reclaim themselves all that much.
>
> * We do have holes in accounting CPU cycles used by reclaim to the orgins,
>   which, for example, prevents making memory.high reclaim async and lets
>   memory pressure contaminate cpu isolation possibly to a significant degree
>   on lower core count machines in some scenarios, but that's a separate
>   issue we need to address in the future.
>

I have an opinion on the above but I will restrain as those are not
relevant to the patch series.

> > > cgroup A has memory.low protection and no other restrictions. cgroup B has
> > > no protection and has access to swap. When B's memory starts bloating and
> > > gets the system under memory contention, it'll start consuming swap until it
> > > can't. When swap becomes depleted for B, there's nothing holding it back and
> > > B will start eating into A's protection.
> > >
> >
> > In this example does 'B' have memory.high and memory.max set and by A
>
> B doesn't have anything set.
>
> > having no other restrictions, I am assuming you meant unlimited high
> > and max for A? Can 'A' use memory.min?
>
> Sure, it can but 1. the purpose of the example is illustrating the
> imcompleteness of the existing mechanism

I understand but is this a real world configuration people use and do
we want to support the scenario where without setting high/max, the
kernel still guarantees the isolation.

> 2. there's a big difference between
> letting the machine hit the wall and waiting for the kernel OOM to trigger
> and being able to monitor the situation as it gradually develops and respond
> to it, which is the whole point of the low/high mechanisms.
>

I am not really against the proposed solution. What I am trying to see
is if this problem is more general than an anon/swap-full problem and
if a more general solution is possible. To me it seems like, whenever
a large portion of reclaimable memory (anon, file or kmem) becomes
non-reclaimable abruptly, the memory isolation can be broken. You gave
the anon/swap-full example, let me see if I can come up with file and
kmem examples (with similar A & B).

1) B has a lot of page cache but temporarily gets pinned for rdma or
something and the system gets low on memory. B can attack A's low
protected memory as B's page cache is not reclaimable temporarily.

2) B has a lot of dentries/inodes but someone has taken a write lock
on shrinker_rwsem and got stuck in allocation/reclaim or CPU
preempted. B can attack A's low protected memory as B's slabs are not
reclaimable temporarily.

I think the aim is to slow down B enough to give the PSI monitor a
chance to act before either B targets A's protected memory or the
kernel triggers oom-kill.

My question is do we really want to solve the issue without limiting B
through high/max? Also isn't fine grained PSI monitoring along with
limiting B through memory.[high|max] general enough to solve all three
example scenarios?

thanks,
Shakeel