From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.6 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72AD7C4BA12 for ; Wed, 26 Feb 2020 15:09:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 162252467D for ; Wed, 26 Feb 2020 15:09:46 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=cmpxchg-org.20150623.gappssmtp.com header.i=@cmpxchg-org.20150623.gappssmtp.com header.b="l500m4Ia" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 162252467D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=cmpxchg.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 939126B0003; Wed, 26 Feb 2020 10:09:46 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 8C1CA6B0005; Wed, 26 Feb 2020 10:09:46 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 761B16B0006; Wed, 26 Feb 2020 10:09:46 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id 56BEA6B0003 for ; Wed, 26 Feb 2020 10:09:46 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 2C5A6824559C for ; Wed, 26 Feb 2020 15:09:46 +0000 (UTC) X-FDA: 76532612772.08.brake59_50673b9cbc943 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin08.hostedemail.com (Postfix) with ESMTP id 446901819DA4F for ; Wed, 26 Feb 2020 15:05:53 +0000 (UTC) X-HE-Tag: brake59_50673b9cbc943 X-Filterd-Recvd-Size: 10107 Received: from mail-qt1-f193.google.com (mail-qt1-f193.google.com [209.85.160.193]) by imf26.hostedemail.com (Postfix) with ESMTP for ; Wed, 26 Feb 2020 15:05:52 +0000 (UTC) Received: by mail-qt1-f193.google.com with SMTP id i23so2452073qtr.5 for ; Wed, 26 Feb 2020 07:05:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=cmpxchg-org.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:content-transfer-encoding:in-reply-to; bh=xLGXbcfU4g3S4aLzErptFSe/rB4zsTw8GDyEXXa4IsQ=; b=l500m4Ia0VPmZO/B9pKznYu4o7SEwfOC1uaoGPv4NoceNO/0Wqgj4P9gxxqmfoxXXr 2kWTq/5FqbgQz2jgyDyys21kd9PhcFQ0CMswAF+XsUuqHGQag3nmjwk5EFvMdah/eeNM Yu2fkEZ9R1WUX5qAnNaC4KmmhmOhlZZc3WzgzPqyvUnbpsNTvGA4/XSTAsc7vQwRhHoJ blXSIq28K7tHP1/3MlKvHb5V1eja+wNHFXtgjN4xCvqwvVqPd/3Osm3k9WmVyExLTcJU PbkJbzTsToxgU4U8jnz/CbiiMEPy8HFz7tuf698Nx3/WAsMfLQGPXmSA7naPYfYjI+NH fpeg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=xLGXbcfU4g3S4aLzErptFSe/rB4zsTw8GDyEXXa4IsQ=; b=oLJSvCBT06vnPNNSqsOaRwMXg9SFmoRk0PamLWoXa2HdnREXYhPsYQedUYPmGDEYTC 1Mt7qHtk+Mdspny4P67GN5rXChsob+fM8clsunmDkQ3jPXFO/N5UUYD+hq3tD5MBNXd2 WgV+c1On7KUNYBSurRJVJ8I+f7F6PHxXPBih5AFCYK0Ls82f5LRJjIrB2FiDNsekXHai 6c4+bEDgqsF4fLWkOTwduHKRQSzd9VJ7/GNXthLWTJaZTKNNX/IqP7m+PsKivRces8oa SHeGVEwNI1e+ekZCmyKJnm9sxQp6C7GLDInQxeH9HVDudFyjYyOoZ9z4kA1QBojKhKZA S8Tw== X-Gm-Message-State: APjAAAV9ZGuXxPtJ/djm5lVdjdBtzhH+r1eTs0TpuoxWZUSs0HxBtz0v B7SNZTQ3NJvF4XIU3LyVoaurJg== X-Google-Smtp-Source: APXvYqz/OMScVgqZj2TTum3CKs7uT4KI3EattRDuhazannnjUJt+RvFbSOawip3x4muz/4GuVsB00A== X-Received: by 2002:ac8:5298:: with SMTP id s24mr5766771qtn.54.1582729550417; Wed, 26 Feb 2020 07:05:50 -0800 (PST) Received: from localhost (pool-108-27-252-85.nycmny.fios.verizon.net. [108.27.252.85]) by smtp.gmail.com with ESMTPSA id v6sm169129qtc.76.2020.02.26.07.05.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 26 Feb 2020 07:05:49 -0800 (PST) Date: Wed, 26 Feb 2020 10:05:48 -0500 From: Johannes Weiner To: Michal =?iso-8859-1?Q?Koutn=FD?= Cc: Andrew Morton , Roman Gushchin , Michal Hocko , Tejun Heo , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@fb.com Subject: Re: [PATCH v2 3/3] mm: memcontrol: recursive memory.low protection Message-ID: <20200226150548.GD10257@cmpxchg.org> References: <20191219200718.15696-1-hannes@cmpxchg.org> <20191219200718.15696-4-hannes@cmpxchg.org> <20200221171256.GB23476@blackbody.suse.cz> <20200221185839.GB70967@cmpxchg.org> <20200225133720.GA6709@blackbody.suse.cz> <20200225150304.GA10257@cmpxchg.org> <20200226132237.GA16746@blackbody.suse.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline In-Reply-To: <20200226132237.GA16746@blackbody.suse.cz> Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hello, On Wed, Feb 26, 2020 at 02:22:37PM +0100, Michal Koutn=FD wrote: > On Tue, Feb 25, 2020 at 10:03:04AM -0500, Johannes Weiner wrote: > > Can you explain why you think protection is different from a weight? > - weights are dimension-less, they represent no real resource They still ultimately translate to real resources. The concrete value depends on what the parent's weight translates to, and it depends on sibling configurations and their current consumption. (All of this is already true for memory protection as well, btw). But eventually, a weight specification translates to actual time on a CPU, bandwidth on an IO device etc. > - sum of sibling weights is meaningless (and independent from parent > weight) Technically true for overcommitted memory.low values as well. > - to me this protection is closer to limits (actually I like your simil= e > that they're very lazily enforced limits) But weights are also lazily enforced limits. Without competition, you can get 100% regardless of your weight; under contention, you get throttled/limited back to an assigned share, however that's specified. Once you peel away the superficial layer of how resources, or shares of resources, are being referred to (time, bytes, relative shares) weights/guarantees/protections are all the same thing: they are lazily enforced* partitioning rules of a resource under contention. I don't see a fundamental difference between them. And that in turn makes it hard for me to accept that hierarchical inheritance rules should be different. * We also refer to this as work-conserving > > Now you apply memory pressure, what happens?. D isn't reclaimed, C is > > somewhat reclaimed, E is reclaimed hard. D will not page, C will page > > a little bit, E will page hard *with the higher IO priority of B*. > >=20 > > Now C is stuck behind E. This is a priority inversion. > This is how I understand the weights to work. >=20 > A =09 > `- B io.weight=3D200 > `- D io.weight=3D100 (e.g.) > `- E io.weight=3D100 (e.g.) > `- C io.weight=3D50 >=20 > Whatever weights I assign to D and E, when only E and C compete, E will > have higher weight (200 to 50, work-conservacy of weights). Yes, exactly. I'm saying the same should be true for memory. > I don't think this inversion is wrong because E's work is still on > behalf of B. "Wrong" isn't the right term. Is it what you wanted to express in your configuration? What's the point of designating E a memory donor group that needs to relinquish memory to C under pressure, but when it actually gives up that memory it beats C in competition over a different resource? You are talking about a mathematical truth on a per-controller basis. What I'm saying is that I don't see how this is useful for real workloads, their relative priorities, and the performance expectations users have from these priorities. With a priority inversion like this, there is no actual performance isolation or containerization going on here - which is the whole point of cgroups and resource control. > Or did you mean that if protections were transformed (via effective > calculation) to have ratios only in the same range as io.weights > (1e-4..1e4 instead of 0..inf), then it'd prevent the inversion? (By > setting D,E weights in same ratios as D,E protections.) No, the inversion would be prevented if E could consume all resources assigned to B that aren't consumed by D. This is true for IO and CPU, but before my patch not for memory. > > 1. Can you please make a practical use case for having scape goats or > > donor groups to justify retaining what I consider to be an > > unimportant artifact in the memory.low semantics? > A.low=3D10G > `- B.low=3DX u=3D6G > `- C.low=3DX u=3D4G > `- D.low=3D0G u=3D5G >=20 > B,C run the workload which should be protected > D runs job that doesn't need any protection=20 > u denotes usage > (I made the example with more than one important sibling to illustrate > usefulness of some implicit distribution X.) >=20 > When outer reclaim comes, reclaiming from B,C would be detrimental to > their performance, while impact on D is unimportant. (And induced IO > load on the rest (out of A) too.) Okay, but this is a different usecase than we were talking about. My objection is to opting out of protection against cousins (thus overriding parental resource assignment), not against siblings. Expressing priorities between siblings like this is fine. And I absolutely see practical value in your specific example. > It's not possible to move D to the A's level, since only A is all what = a > given user can control. Correct, but you can change the tree to this: A.low=3D10G `- A1.low=3D10G `- B.low=3D0G `- C.low=3D0G `- D.low=3D0G to express A1 > D B =3D C That priority order can be matched by CPU and IO controls as well: A.weight=3D100 `- A1.weight=3D100 `- B.weight=3D100 `- C.weight=3D100 `- D.weight=3D1 My objection is purely about opting out of resources relative to (and assuming a lower memory priority than) an outside cousin that may have a lower priority on other resources. That is, I would like to see an argument for this setup: A =09 `- B io.weight=3D200 memory.low=3D10G `- D io.weight=3D100 (e.g.) memory.low=3D10G `- E io.weight=3D100 (e.g.) memory.low=3D0 `- C io.weight=3D50 memory.low=3D5G Where E has no memory protection against C, but E has IO priority over C. That's the configuration that cannot be expressed with a recursive memory.low, but since it involves priority inversions it's not useful to actually isolate and containerize workloads. That's why I'm saying it's an artifact, not an actual feature. > > 2. If you think opting out of hierarchically assigned resources is a > > fundamentally important usecase, can you please either make an > > argument why it should also apply to CPU and IO, or alternatively > > explain in detail why they are meaningfully different? > I'd say that protected memory is a disposable resource in contrast with > CPU/IO. If you don't have latter, you don't progress; if you lack the > former, you are refaulting but can make progress. Even more, you should > be able to give up memory.min. Eh, I'm not buying that. You cannot run without memory either. If somebody reclaims a page between you faulting it in and you resuming to userspace, there is no forward progress.