From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B455C2BB1D for ; Fri, 17 Apr 2020 22:59:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1A2EB214D8 for ; Fri, 17 Apr 2020 22:59:45 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="eFx3mF2i" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1A2EB214D8 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7D5E48E0003; Fri, 17 Apr 2020 18:59:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 786EE8E0001; Fri, 17 Apr 2020 18:59:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 69D4A8E0003; Fri, 17 Apr 2020 18:59:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id 5242F8E0001 for ; Fri, 17 Apr 2020 18:59:45 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 13AA353B3 for ; Fri, 17 Apr 2020 22:59:45 +0000 (UTC) X-FDA: 76718865930.22.join85_37fa34997743e X-HE-Tag: join85_37fa34997743e X-Filterd-Recvd-Size: 7652 Received: from mail-qk1-f194.google.com (mail-qk1-f194.google.com [209.85.222.194]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Fri, 17 Apr 2020 22:59:44 +0000 (UTC) Received: by mail-qk1-f194.google.com with SMTP id 20so4243106qkl.10 for ; Fri, 17 Apr 2020 15:59:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=8801jFN/GrsE8fa3Pk4djhRzV1N+Kj/xtd4sR7cPIJY=; b=eFx3mF2izmyuuy+NlcXaQ6JuMj3oh7WJ8XZtMr9WOfL9uZpAeVCnutroEyWtEhpxUG U+Hrb7r0k5y1Y1a3TZLKgW0LdmTMLqNwjfheoRsTPzNngA+n/0YfYLqRI2d/YcKjpbDD sOjVhwed3Tx26v86MOWCVyBbbatyaT6ZJHSZXRsF4YbYX2rwoBAh5ziMxjuLEbJrNB3O So9pKpViuYklaBJrkN7tffQvA6QefQ6fFMzGx8Cpb6zn5RUvRLz9xbmN/e1EVVasq23O HFfbTKrzrbEDqdD7FNeQf2ml/i+sd7rJwYXe/+1OdhvSEgGV5PPe/B+G1KxPFMYwMhp1 +iCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:from:to:cc:subject:message-id :references:mime-version:content-disposition:in-reply-to; bh=8801jFN/GrsE8fa3Pk4djhRzV1N+Kj/xtd4sR7cPIJY=; b=nrfhkS+2w+ndG1QlKOOVELoHEZKqZhGxYtIl2geSHKvRO4mfHlEsEqH0AwP6OJBBMn TLqmhguPJnW5oO4uMaHXWYzMZxONvAHZmyUpTlabiFtvrM0ytBGJ9cdieL/9PcRlWaF8 tJOY7aWnV602/ytLTstxU59fXo7BBzCb12m7OjArsywLNJlXbjloUf/Z3bY797rOKhfl NsKXq0GjdBiQf4dZ4jRpoP8k/J5BDxZCDz7VRXRp5CGhh50dn/vt6WJ3JeOY3kTRoiNI 1gMXdLoKbDgmd+xqCjVBxSuAov2sMxzgqZRz1GL2+OCIhONaoiRYLKKWpFM+kvEVTC2Q 0K8Q== X-Gm-Message-State: AGi0PuYfG7v6cXfIJk9Tm7KUTtt/JHfOIsiY7ZpwMcBwWxKqHW2WNHId SSL2jGbz+h6cywvjVkNcCWs= X-Google-Smtp-Source: APiQypL1hAW8H3KClosDsteQ5OyvchkCGhvMTo6J3M8/gh1Qr9QGj2B9dsj9evoP1gVbrXHsUrAK8w== X-Received: by 2002:ae9:dd83:: with SMTP id r125mr5904914qkf.105.1587164383750; Fri, 17 Apr 2020 15:59:43 -0700 (PDT) Received: from localhost ([199.96.181.106]) by smtp.gmail.com with ESMTPSA id c27sm19213455qte.49.2020.04.17.15.59.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Apr 2020 15:59:43 -0700 (PDT) Date: Fri, 17 Apr 2020 18:59:41 -0400 From: Tejun Heo To: Shakeel Butt Cc: Jakub Kicinski , Andrew Morton , Linux MM , Kernel Team , Johannes Weiner , Chris Down , Cgroups Subject: Re: [PATCH 0/3] memcg: Slow down swap allocation as the available space gets depleted Message-ID: <20200417225941.GE43469@mtj.thefacebook.com> References: <20200417010617.927266-1-kuba@kernel.org> <20200417162355.GA43469@mtj.thefacebook.com> <20200417173615.GB43469@mtj.thefacebook.com> <20200417193539.GC43469@mtj.thefacebook.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: X-Bogosity: Ham, tests=bogofilter, spamicity=0.000008, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, Shakeel. On Fri, Apr 17, 2020 at 02:51:09PM -0700, Shakeel Butt wrote: > > > In this example does 'B' have memory.high and memory.max set and by A > > > > B doesn't have anything set. > > > > > having no other restrictions, I am assuming you meant unlimited high > > > and max for A? Can 'A' use memory.min? > > > > Sure, it can but 1. the purpose of the example is illustrating the > > imcompleteness of the existing mechanism > > I understand but is this a real world configuration people use and do > we want to support the scenario where without setting high/max, the > kernel still guarantees the isolation. Yes, that's the configuration we're deploying fleet-wide and at least the direction I'm gonna be pushing towards for reasons of generality and ease of use. Here's an example to illustrate the point - consider distros or upstream desktop environments wanting to provide basic resource configuration to protect user sessions and critical system services needed for user interaction by default. That is something which is clearly and immediately useful but also is extremely challenging to achieve with limits. There are no universally good enough upper limits. Any one number is gonna be both too high to guarantee protection and too low for use cases which legitimately need that much memory. That's because the upper limits aren't work-conserving and have a high chance of doing harm when misconfigured making figuring out the correct configuration almost impossible with per-use-case manual tuning. The whole idea behind memory.low and related efforts is resolving that problem by making memory control more work-conserving and forgiving, so that users can say something like "I want the user session to have at least 25% memory protected if needed and possible" and get most of the benefits of carefully crafted configuration. We're already deploying such configuration and it works well enough for a wide variety of workloads. > > 2. there's a big difference between > > letting the machine hit the wall and waiting for the kernel OOM to trigger > > and being able to monitor the situation as it gradually develops and respond > > to it, which is the whole point of the low/high mechanisms. > > I am not really against the proposed solution. What I am trying to see > is if this problem is more general than an anon/swap-full problem and > if a more general solution is possible. To me it seems like, whenever > a large portion of reclaimable memory (anon, file or kmem) becomes > non-reclaimable abruptly, the memory isolation can be broken. You gave > the anon/swap-full example, let me see if I can come up with file and > kmem examples (with similar A & B). > > 1) B has a lot of page cache but temporarily gets pinned for rdma or > something and the system gets low on memory. B can attack A's low > protected memory as B's page cache is not reclaimable temporarily. > > 2) B has a lot of dentries/inodes but someone has taken a write lock > on shrinker_rwsem and got stuck in allocation/reclaim or CPU > preempted. B can attack A's low protected memory as B's slabs are not > reclaimable temporarily. > > I think the aim is to slow down B enough to give the PSI monitor a > chance to act before either B targets A's protected memory or the > kernel triggers oom-kill. > > My question is do we really want to solve the issue without limiting B > through high/max? Also isn't fine grained PSI monitoring along with > limiting B through memory.[high|max] general enough to solve all three > example scenarios? Yes, we definitely want to solve the issue without involving high and max. I hope that part is clear now. As for whether we want to cover niche cases such as RDMA pinning a large swath of page cache, I don't know, maybe? But I don't think that's a problem with a comparable importance especially given that in both cases you listed the problem is temporary and the workload wouldn't have the ability to keep expanding undeterred. Thanks. -- tejun