From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=X7gQ=6B=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED,
	MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no
	version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 8B455C2BB1D
	for <linux-mm@archiver.kernel.org>; Fri, 17 Apr 2020 22:59:46 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id 1A2EB214D8
	for <linux-mm@archiver.kernel.org>; Fri, 17 Apr 2020 22:59:45 +0000 (UTC)
Authentication-Results: mail.kernel.org;
	dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eFx3mF2i"
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A2EB214D8
Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id 7D5E48E0003; Fri, 17 Apr 2020 18:59:45 -0400 (EDT)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 786EE8E0001; Fri, 17 Apr 2020 18:59:45 -0400 (EDT)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 69D4A8E0003; Fri, 17 Apr 2020 18:59:45 -0400 (EDT)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243])
	by kanga.kvack.org (Postfix) with ESMTP id 5242F8E0001
	for <linux-mm@kvack.org>; Fri, 17 Apr 2020 18:59:45 -0400 (EDT)
Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay04.hostedemail.com (Postfix) with ESMTP id 13AA353B3
	for <linux-mm@kvack.org>; Fri, 17 Apr 2020 22:59:45 +0000 (UTC)
X-FDA: 76718865930.22.join85_37fa34997743e
X-HE-Tag: join85_37fa34997743e
X-Filterd-Recvd-Size: 7652
Received: from mail-qk1-f194.google.com (mail-qk1-f194.google.com [209.85.222.194])
	by imf35.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Fri, 17 Apr 2020 22:59:44 +0000 (UTC)
Received: by mail-qk1-f194.google.com with SMTP id 20so4243106qkl.10
        for <linux-mm@kvack.org>; Fri, 17 Apr 2020 15:59:44 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20161025;
        h=sender:date:from:to:cc:subject:message-id:references:mime-version
         :content-disposition:in-reply-to;
        bh=8801jFN/GrsE8fa3Pk4djhRzV1N+Kj/xtd4sR7cPIJY=;
        b=eFx3mF2izmyuuy+NlcXaQ6JuMj3oh7WJ8XZtMr9WOfL9uZpAeVCnutroEyWtEhpxUG
         U+Hrb7r0k5y1Y1a3TZLKgW0LdmTMLqNwjfheoRsTPzNngA+n/0YfYLqRI2d/YcKjpbDD
         sOjVhwed3Tx26v86MOWCVyBbbatyaT6ZJHSZXRsF4YbYX2rwoBAh5ziMxjuLEbJrNB3O
         So9pKpViuYklaBJrkN7tffQvA6QefQ6fFMzGx8Cpb6zn5RUvRLz9xbmN/e1EVVasq23O
         HFfbTKrzrbEDqdD7FNeQf2ml/i+sd7rJwYXe/+1OdhvSEgGV5PPe/B+G1KxPFMYwMhp1
         +iCw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:sender:date:from:to:cc:subject:message-id
         :references:mime-version:content-disposition:in-reply-to;
        bh=8801jFN/GrsE8fa3Pk4djhRzV1N+Kj/xtd4sR7cPIJY=;
        b=nrfhkS+2w+ndG1QlKOOVELoHEZKqZhGxYtIl2geSHKvRO4mfHlEsEqH0AwP6OJBBMn
         TLqmhguPJnW5oO4uMaHXWYzMZxONvAHZmyUpTlabiFtvrM0ytBGJ9cdieL/9PcRlWaF8
         tJOY7aWnV602/ytLTstxU59fXo7BBzCb12m7OjArsywLNJlXbjloUf/Z3bY797rOKhfl
         NsKXq0GjdBiQf4dZ4jRpoP8k/J5BDxZCDz7VRXRp5CGhh50dn/vt6WJ3JeOY3kTRoiNI
         1gMXdLoKbDgmd+xqCjVBxSuAov2sMxzgqZRz1GL2+OCIhONaoiRYLKKWpFM+kvEVTC2Q
         0K8Q==
X-Gm-Message-State: AGi0PuYfG7v6cXfIJk9Tm7KUTtt/JHfOIsiY7ZpwMcBwWxKqHW2WNHId
	SSL2jGbz+h6cywvjVkNcCWs=
X-Google-Smtp-Source: APiQypL1hAW8H3KClosDsteQ5OyvchkCGhvMTo6J3M8/gh1Qr9QGj2B9dsj9evoP1gVbrXHsUrAK8w==
X-Received: by 2002:ae9:dd83:: with SMTP id r125mr5904914qkf.105.1587164383750;
        Fri, 17 Apr 2020 15:59:43 -0700 (PDT)
Received: from localhost ([199.96.181.106])
        by smtp.gmail.com with ESMTPSA id c27sm19213455qte.49.2020.04.17.15.59.42
        (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
        Fri, 17 Apr 2020 15:59:43 -0700 (PDT)
Date: Fri, 17 Apr 2020 18:59:41 -0400
From: Tejun Heo <tj@kernel.org>
To: Shakeel Butt <shakeelb@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	Linux MM <linux-mm@kvack.org>, Kernel Team <kernel-team@fb.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Chris Down <chris@chrisdown.name>,
	Cgroups <cgroups@vger.kernel.org>
Subject: Re: [PATCH 0/3] memcg: Slow down swap allocation as the available
 space gets depleted
Message-ID: <20200417225941.GE43469@mtj.thefacebook.com>
References: <20200417010617.927266-1-kuba@kernel.org>
 <CALvZod78ZUhU+yr2x1h_gv+VgVGTPnSSGKh_+fd+MeiAKreJvg@mail.gmail.com>
 <20200417162355.GA43469@mtj.thefacebook.com>
 <CALvZod4ftvXCu8SbQUXwTGVvx5K2+at9h30r28chZLXEB1JdfQ@mail.gmail.com>
 <20200417173615.GB43469@mtj.thefacebook.com>
 <CALvZod7-r0OrJ+-_uCy_p3BU3348ve2+YatiSdLvFaVqcqCs=w@mail.gmail.com>
 <20200417193539.GC43469@mtj.thefacebook.com>
 <CALvZod6LT25t9aAA1KHmf1U4-L8zSjUXQ4VQvX4cMT1A+R_g+w@mail.gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <CALvZod6LT25t9aAA1KHmf1U4-L8zSjUXQ4VQvX4cMT1A+R_g+w@mail.gmail.com>
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Hello, Shakeel.

On Fri, Apr 17, 2020 at 02:51:09PM -0700, Shakeel Butt wrote:
> > > In this example does 'B' have memory.high and memory.max set and by A
> >
> > B doesn't have anything set.
> >
> > > having no other restrictions, I am assuming you meant unlimited high
> > > and max for A? Can 'A' use memory.min?
> >
> > Sure, it can but 1. the purpose of the example is illustrating the
> > imcompleteness of the existing mechanism
> 
> I understand but is this a real world configuration people use and do
> we want to support the scenario where without setting high/max, the
> kernel still guarantees the isolation.

Yes, that's the configuration we're deploying fleet-wide and at least the
direction I'm gonna be pushing towards for reasons of generality and ease of
use.

Here's an example to illustrate the point - consider distros or upstream
desktop environments wanting to provide basic resource configuration to
protect user sessions and critical system services needed for user
interaction by default. That is something which is clearly and immediately
useful but also is extremely challenging to achieve with limits.

There are no universally good enough upper limits. Any one number is gonna
be both too high to guarantee protection and too low for use cases which
legitimately need that much memory. That's because the upper limits aren't
work-conserving and have a high chance of doing harm when misconfigured
making figuring out the correct configuration almost impossible with
per-use-case manual tuning.

The whole idea behind memory.low and related efforts is resolving that
problem by making memory control more work-conserving and forgiving, so that
users can say something like "I want the user session to have at least 25%
memory protected if needed and possible" and get most of the benefits of
carefully crafted configuration. We're already deploying such configuration
and it works well enough for a wide variety of workloads.

> > 2. there's a big difference between
> > letting the machine hit the wall and waiting for the kernel OOM to trigger
> > and being able to monitor the situation as it gradually develops and respond
> > to it, which is the whole point of the low/high mechanisms.
> 
> I am not really against the proposed solution. What I am trying to see
> is if this problem is more general than an anon/swap-full problem and
> if a more general solution is possible. To me it seems like, whenever
> a large portion of reclaimable memory (anon, file or kmem) becomes
> non-reclaimable abruptly, the memory isolation can be broken. You gave
> the anon/swap-full example, let me see if I can come up with file and
> kmem examples (with similar A & B).
> 
> 1) B has a lot of page cache but temporarily gets pinned for rdma or
> something and the system gets low on memory. B can attack A's low
> protected memory as B's page cache is not reclaimable temporarily.
> 
> 2) B has a lot of dentries/inodes but someone has taken a write lock
> on shrinker_rwsem and got stuck in allocation/reclaim or CPU
> preempted. B can attack A's low protected memory as B's slabs are not
> reclaimable temporarily.
> 
> I think the aim is to slow down B enough to give the PSI monitor a
> chance to act before either B targets A's protected memory or the
> kernel triggers oom-kill.
> 
> My question is do we really want to solve the issue without limiting B
> through high/max? Also isn't fine grained PSI monitoring along with
> limiting B through memory.[high|max] general enough to solve all three
> example scenarios?

Yes, we definitely want to solve the issue without involving high and max. I
hope that part is clear now. As for whether we want to cover niche cases
such as RDMA pinning a large swath of page cache, I don't know, maybe? But I
don't think that's a problem with a comparable importance especially given
that in both cases you listed the problem is temporary and the workload
wouldn't have the ability to keep expanding undeterred.

Thanks.

-- 
tejun