From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59EB6C433EF for ; Mon, 27 Jun 2022 09:40:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AB8BE8E0002; Mon, 27 Jun 2022 05:40:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A8FFE8E0001; Mon, 27 Jun 2022 05:40:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 97FF48E0002; Mon, 27 Jun 2022 05:40:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 88D578E0001 for ; Mon, 27 Jun 2022 05:40:27 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 50D5534F37 for ; Mon, 27 Jun 2022 09:40:27 +0000 (UTC) X-FDA: 79623520494.07.0803DA2 Received: from mail-wr1-f41.google.com (mail-wr1-f41.google.com [209.85.221.41]) by imf12.hostedemail.com (Postfix) with ESMTP id D502840028 for ; Mon, 27 Jun 2022 09:40:26 +0000 (UTC) Received: by mail-wr1-f41.google.com with SMTP id d17so6545636wrc.10 for ; Mon, 27 Jun 2022 02:40:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=vM3Z5SIYSYq9bPwMW1jTmQTQLpbgb+0WvNR7BGEdaNc=; b=kaUJc1jIpkPbtC9U8MXJHBYKoOHGQEzq3q3vaAxuGqXW070C8TPuYXSRMnth+BNr9J 0sZMpkOpsANxBsF/IiElUG/nBHUDPjWCzHWxhmaWTxD4cBfxfZDCQdfPIEYVrLG0uYIf 500/ZXu3ZVXey1dY/qwd+K3Go+lN6v9xvNaG2gGQospEu0PStaPpy0QoB3L44E/uLe9L crjZgHGPdsDgELrBhZI3C7/bz6YiGreCo/gdeyUSSiG0QPcO28gIwNw2jFwfo9wKZvca jCmzNyBTXKMPKf71ylml61Oa83MglG17PpcG6KxCS8H9DVy7WQu9g6ooSr5adnXstZAE 6DGw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=vM3Z5SIYSYq9bPwMW1jTmQTQLpbgb+0WvNR7BGEdaNc=; b=V9PmNSZHvCTuhEkWtXSj7KW4S3HwKZw2NBlD04FFZID/RRbZz3rhSrz6F/C9XEVI+h d3DpVlguy1H0tttmbEsYCmktT/SMAucshHOXCxu5PhAsou8zeKt247iZzUhTbLvIvhOm iGFh778k5vf/St6FKaEBLIhKZmkgJSNyw4pBmDFetzXxkM8fM6Oc0acP4TkSYOGK+FT5 qmUTegoEDdEgfL4dIqREuD8pZEjg16Gx2OwWI9XPBMrXLIu15JwxmkrCPcN3dhN/8mEd Drw2Pe1NyN8B14bbwYpnMR/VctNVz8xN15arFoSNqsFHr7SHhfRjlsWUrNeFm4pA+eV/ 7atA== X-Gm-Message-State: AJIora++RaGK2HyEHUIQjq2i1bFRibF2DpUOdftrLX0TrIIvD/RubOHL uvXFzLfrdzUUWwmArdqLkid48fOXw3n3fO7vPm41Hw== X-Google-Smtp-Source: AGRyM1u1jBCycO/F5vOR+P6KrW406NYglZdJ1zyNmunES7mYIZMgXRaaieQLcBfvpxK9wRCa3Zudq3jXHfelHF9vZbo= X-Received: by 2002:a5d:6ac4:0:b0:21b:a724:1711 with SMTP id u4-20020a5d6ac4000000b0021ba7241711mr11205352wrw.80.1656322825265; Mon, 27 Jun 2022 02:40:25 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: Yosry Ahmed Date: Mon, 27 Jun 2022 02:39:49 -0700 Message-ID: Subject: Re: [PATCH] mm: vmpressure: don't count userspace-induced reclaim as memory pressure To: Michal Hocko Cc: Shakeel Butt , Johannes Weiner , Roman Gushchin , Muchun Song , Andrew Morton , Matthew Wilcox , Vlastimil Babka , David Hildenbrand , Miaohe Lin , NeilBrown , Alistair Popple , Suren Baghdasaryan , Peter Xu , Linux Kernel Mailing List , Cgroups , Linux-MM Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kaUJc1jI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.221.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1656322826; a=rsa-sha256; cv=none; b=Uej+QRANKdC54EXxSHoWOZPH2vSDr2G0J/4QlC1kO0Ce56RHljHxogVj7TG+DNsartSS52 T4CGue0knFexSeUAVJojH1H3Ap8qT13lH+1RBLxVqG10CWhs/9FoRxsWB61hko8CW4mkUW roNX9HKH8ZJFQrxCcPWfGJCd34ZxX3Y= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1656322826; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vM3Z5SIYSYq9bPwMW1jTmQTQLpbgb+0WvNR7BGEdaNc=; b=lOuZUe+f0U3ZRhOy1l2spcS+RHSimr1jUvyPExsSAyW8EsURlLMJoBseCrOuvD7xHh7ub9 qvedTViBDKUSAJ8zB7mSTXczDceYMiu8Cd9c/DGOzgGbFtxXSiqwNOLX1rD16KlRhe28VI zPsRx6VBBV7gafBaspOBhcMZl/Jrtg4= X-Stat-Signature: 3wixcnx8odn8dausrttze6s97j5eq7ju X-Rspamd-Server: rspam08 X-Rspam-User: X-Rspamd-Queue-Id: D502840028 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kaUJc1jI; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of yosryahmed@google.com designates 209.85.221.41 as permitted sender) smtp.mailfrom=yosryahmed@google.com X-HE-Tag: 1656322826-381990 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Jun 27, 2022 at 2:20 AM Michal Hocko wrote: > > On Mon 27-06-22 01:39:46, Yosry Ahmed wrote: > > On Mon, Jun 27, 2022 at 1:25 AM Michal Hocko wrote: > > > > > > On Thu 23-06-22 10:26:11, Yosry Ahmed wrote: > > > > On Thu, Jun 23, 2022 at 10:04 AM Michal Hocko wrote: > > > > > > > > > > On Thu 23-06-22 09:42:43, Shakeel Butt wrote: > > > > > > On Thu, Jun 23, 2022 at 9:37 AM Michal Hocko wrote: > > > > > > > > > > > > > > On Thu 23-06-22 09:22:35, Yosry Ahmed wrote: > > > > > > > > On Thu, Jun 23, 2022 at 2:43 AM Michal Hocko wrote: > > > > > > > > > > > > > > > > > > On Thu 23-06-22 01:35:59, Yosry Ahmed wrote: > > > > > > > [...] > > > > > > > > > > In our internal version of memory.reclaim that we recently upstreamed, > > > > > > > > > > we do not account vmpressure during proactive reclaim (similar to how > > > > > > > > > > psi is handled upstream). We want to make sure this behavior also > > > > > > > > > > exists in the upstream version so that consolidating them does not > > > > > > > > > > break our users who rely on vmpressure and will start seeing increased > > > > > > > > > > pressure due to proactive reclaim. > > > > > > > > > > > > > > > > > > These are good reasons to have this patch in your tree. But why is this > > > > > > > > > patch benefitial for the upstream kernel? It clearly adds some code and > > > > > > > > > some special casing which will add a maintenance overhead. > > > > > > > > > > > > > > > > It is not just Google, any existing vmpressure users will start seeing > > > > > > > > false pressure notifications with memory.reclaim. The main goal of the > > > > > > > > patch is to make sure memory.reclaim does not break pre-existing users > > > > > > > > of vmpressure, and doing it in a way that is consistent with psi makes > > > > > > > > sense. > > > > > > > > > > > > > > memory.reclaim is v2 only feature which doesn't have vmpressure > > > > > > > interface. So I do not see how pre-existing users of the upstream kernel > > > > > > > can see any breakage. > > > > > > > > > > > > > > > > > > > Please note that vmpressure is still being used in v2 by the > > > > > > networking layer (see mem_cgroup_under_socket_pressure()) for > > > > > > detecting memory pressure. > > > > > > > > > > I have missed this. It is hidden quite good. I thought that v2 is > > > > > completely vmpressure free. I have to admit that the effect of > > > > > mem_cgroup_under_socket_pressure is not really clear to me. Not to > > > > > mention whether it should or shouldn't be triggered for the user > > > > > triggered memory reclaim. So this would really need some explanation. > > > > > > > > vmpressure was tied into socket pressure by 8e8ae645249b ("mm: > > > > memcontrol: hook up vmpressure to socket pressure"). A quick look at > > > > the commit log and the code suggests that this is used all over the > > > > socket and tcp code to throttles the memory consumption of the > > > > networking layer if we are under pressure. > > > > > > > > However, for proactive reclaim like memory.reclaim, the target is to > > > > probe the memcg for cold memory. Reclaiming such memory should not > > > > have a visible effect on the workload performance. I don't think that > > > > any network throttling side effects are correct here. > > > > > > Please describe the user visible effects of this change. IIUC this is > > > changing the vmpressure semantic for pre-existing users (v1 when setting > > > the hard limit for example) and it really should be explained why > > > this is good for them after those years. I do not see any actual bug > > > being described explicitly so please make sure this is all properly > > > documented. > > > > In cgroup v1, user-induced reclaim that is caused by limit-setting (or > > memory.reclaim for systems that choose to expose it in cgroup v1) will > > no longer cause vmpressure notifications, which makes the vmpressure > > behavior consistent with the current psi behavior. > > Yes it makes the behavior consistent with PSI. But is this what existing > users really want or need? This is a user visible long term behavior > change for a legacy interface and there should be a very good reason to > change that. > > > In cgroup v2, user-induced reclaim (limit-setting, memory.reclaim, ..) > > would currently cause the networking layer to perceive the memcg as > > being under memory pressure, reducing memory consumption and possibly > > causing throttling. This patch makes the networking layer only > > perceive the memcg as being under pressure when the "pressure" is > > caused by increased memory usage, not limit-setting or proactive > > reclaim, which also makes the definition of memcg memory pressure > > consistent with psi today. > > I do understand the argument about the pro-active reclaim. > memory.reclaim is a new interface and it a) makes sense to exclude it > from different memory pressure notification interfaces and b) there are > unlikely too many user applications depending on the exact behavior so > changes are still rather low on the risk scale. > > > In short, the purpose of this patch is to unify the definition of > > memcg memory pressure across psi and vmpressure (which indirectly also > > defines the definition of memcg memory pressure for the networking > > layer). If this sounds good to you, I can add this explanation to the > > commit log, and possibly anywhere you see appropriate in the > > code/docs. > > The consistency on its own sounds like a very weak argument to change a > long term behavior. I do not really see any serious arguments or > evaluation what kind of fallout this change can have on old applications > that are still sticking with v1. > > After it has been made clear that the vmpressure is still used for the > pro-active reclaim in v2 I do agree that this is likely something we > want to have addressed. But I wouldn't touch v1 semantics as this > doesn't really buy much and it can potentially break existing users. > Understood, and fair enough. There are 3 behavioral changes in this patch. (a) Do not count vmpressure for mem_cgroup_resize_max() and mem_cgroup_force_empty() in v1. (b) Do not count vmpressure (consequently, mem_cgroup_under_socket_pressure()) in v2 where psi is not counted (writing to memory.max, memory.high, and memory.reclaim). Do you want us to drop (a) and keep (b) ? or do you want to further break down (b) to only limit the change to proactive reclaim through memory.reclaim (IOW keep socket pressure on limit-setting although it is not considered pressure in terms of psi) ? > -- > Michal Hocko > SUSE Labs