From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37835C4332F for ; Mon, 12 Dec 2022 08:55:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E9838E0003; Mon, 12 Dec 2022 03:55:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 799A38E0002; Mon, 12 Dec 2022 03:55:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6618E8E0003; Mon, 12 Dec 2022 03:55:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 53F0C8E0002 for ; Mon, 12 Dec 2022 03:55:58 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 146EAAAEEA for ; Mon, 12 Dec 2022 08:55:58 +0000 (UTC) X-FDA: 80233046796.09.E483DB9 Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf02.hostedemail.com (Postfix) with ESMTP id 32A528000F for ; Mon, 12 Dec 2022 08:55:55 +0000 (UTC) Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=e7WlOo+K; spf=pass (imf02.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670835356; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TGVOMJbI5VSuu2tCry8NVGy56zUSkKbZBynEIgKf3Ms=; b=xM8bQO81Gh/QfrH6wWkMkLrxDmw8LGnasR3WeH9fzafETOaFRhp2p2d/GCFw7q9V61VjCD sPQsZS0dAZS1MuUbxPO2rwCv1T6f5OuLkiFus3bFkr8IjW2DD6zBf1LU4nWHhVwqo/ow/I awxcPRrpnMImTDy2lBxi6fZILU6pyoo= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=e7WlOo+K; spf=pass (imf02.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670835356; a=rsa-sha256; cv=none; b=BrG4qqkyv9CuWMuRvV7KqkveNrN6mxHXFVwt+EBPLsyl9MgqWx+UvVAhE0pd02wuCtffCn N+09d8knPavsjBiSLeZZR0NkphkPuNOvmdTDsygMDVQ3MRpyzIyitbDSdY10BzGLmX2tQ5 qdESuv7sXslcODLW7X54LaIA1bag45c= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id D6ACF338A2; Mon, 12 Dec 2022 08:55:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1670835354; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=TGVOMJbI5VSuu2tCry8NVGy56zUSkKbZBynEIgKf3Ms=; b=e7WlOo+Kk87BEbCBNwskXM7yzdzJaKGsFfe8oyqFNEfXIitKdo5y7ay8NCB+3SVFXff3cJ GR5vhrP/1xCsq6oCvRi6mNucyOzWGdMku6SHUqxKcvu0y7JCY81V/IY3luS2NOgA/C0S7+ YTMgtj+HVc6jHmGNR/MARafPRYQAziM= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id B449913456; Mon, 12 Dec 2022 08:55:54 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id CQPRKZrslmMoGgAAMHmgww (envelope-from ); Mon, 12 Dec 2022 08:55:54 +0000 Date: Mon, 12 Dec 2022 09:55:54 +0100 From: Michal Hocko To: Mina Almasry Cc: Tejun Heo , Zefan Li , Johannes Weiner , Jonathan Corbet , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Huang Ying , Yang Shi , Yosry Ahmed , weixugc@google.com, fvdl@google.com, bagasdotme@gmail.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH v3] mm: Add nodes= arg to memory.reclaim Message-ID: References: <20221202223533.1785418-1-almasrymina@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20221202223533.1785418-1-almasrymina@google.com> X-Rspam-User: X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 32A528000F X-Stat-Signature: gg3t6macso6pe86nyyfh8c71qphj3fn6 X-HE-Tag: 1670835355-398510 X-HE-Meta: U2FsdGVkX1+LRsh95zbKQvJT1ueKAL5ali9JT2saP/p8eF6gCvaiipvB9BF4luaHDjGD8XWSXhQAfo5+/3hCm5N6RV3WV8TopUJZtXaZQl6LUd4PSD8WTkIBvfnbe5mNNrizwAPDkUMARKwyoE/okqkP1+5Rt7uYmptKVfDz4d7NY6WeOsegpcrAyi9ciQedG3O1XNFPlJLEtr/aUI1XA7cH2shNrSoAGAVoXVIMjQAe1eKtb82PCPVQVlwB0yDC+znpEkNJOC9BUT05z8bGVHq9Fz6HcxNFjhI40aB4HVVuHZU++5KqPdYanJ0ycehOJZs0d62WaZss3xp2OodKHjz620lvWd64V4cQPlR3PZ8NzbCjnaytOP9ly1aKk0P/Ag9/CU+fBvE/hhT9cERKvrO43dUq+16ghLUttpO5ioPjfBNm5LMCrFTITOnZuQr/7PMqfkJkMrwvugSR7LTQD91AP97GgQtMb7bsWHPvc62zFO+XJfqeoepq7aH+/25VjWjB7d4AiJm4Ky0+VlNIzc7PYOsWfj5F0EEjLNGr9HlAna32QNFGihbhF/9Qy7mv2clbnpQReFgNHll1CZ/MqqULnLWjWekqDH+IHbR6Qu/ZdQBAe9Z8ufaKxstJop5xawEBd9kUoAjEBKafPbcU2ceXOPltWXOn9AKEPZdH2RqW9h30aSuQrJvQsyLBamhyzvbhIHrVup2LJMgc5ptPWMn7xrCzQ4MbW01uWPkKW8unmTNPaqmfQPLZmFUM5bh5mbYQQCQIp7ytby9RTcKWFHwPomP2WiyHR2TsJ0Lx6QpncnmEFto2NKi23ruSsrDiWtiCY/AhVasQ/auL/n8Xp3rN2vB8UmhuVTxeqIroHfSgsFtCLo6gjg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Fri 02-12-22 14:35:31, Mina Almasry wrote: > The nodes= arg instructs the kernel to only scan the given nodes for > proactive reclaim. For example use cases, consider a 2 tier memory system: > > nodes 0,1 -> top tier > nodes 2,3 -> second tier > > $ echo "1m nodes=0" > memory.reclaim > > This instructs the kernel to attempt to reclaim 1m memory from node 0. > Since node 0 is a top tier node, demotion will be attempted first. This > is useful to direct proactive reclaim to specific nodes that are under > pressure. > > $ echo "1m nodes=2,3" > memory.reclaim > > This instructs the kernel to attempt to reclaim 1m memory in the second tier, > since this tier of memory has no demotion targets the memory will be > reclaimed. > > $ echo "1m nodes=0,1" > memory.reclaim > > Instructs the kernel to reclaim memory from the top tier nodes, which can > be desirable according to the userspace policy if there is pressure on > the top tiers. Since these nodes have demotion targets, the kernel will > attempt demotion first. > > Since commit 3f1509c57b1b ("Revert "mm/vmscan: never demote for memcg > reclaim""), the proactive reclaim interface memory.reclaim does both > reclaim and demotion. Reclaim and demotion incur different latency costs > to the jobs in the cgroup. Demoted memory would still be addressable > by the userspace at a higher latency, but reclaimed memory would need to > incur a pagefault. > > The 'nodes' arg is useful to allow the userspace to control demotion > and reclaim independently according to its policy: if the memory.reclaim > is called on a node with demotion targets, it will attempt demotion first; > if it is called on a node without demotion targets, it will only attempt > reclaim. > > Acked-by: Michal Hocko > Signed-off-by: Mina Almasry After discussion in [1] I have realized that I haven't really thought through all the consequences of this patch and therefore I am retracting my ack here. I am not nacking the patch at this statge but I also think this shouldn't be merged now and we should really consider all the consequences. Let me summarize my main concerns here as well. The proposed implementation doesn't apply the provided nodemask to the whole reclaim process. This means that demotion can happen outside of the mask so the the user request cannot really control demotion targets and that limits the interface should there be any need for a finer grained control in the future (see an example in [2]). Another problem is that this can limit future reclaim extensions because of existing assumptions of the interface [3] - specify only top-tier node to force the aging without actually reclaiming any charges and (ab)use the interface only for aging on multi-tier system. A change to the reclaim to not demote in some cases could break this usecase. My counter proposal would be to define the nodemask for memory.reclaim as a domain to constrain the charge reclaim. That means both aging and reclaim including demotion which is a part of aging. This will allow to control where to demote for balancing purposes (e.g. demote to node 2 rather than 3) which is impossible with the proposed scheme. [1] http://lkml.kernel.org/r/20221206023406.3182800-1-almasrymina@google.com [2] http://lkml.kernel.org/r/Y5bnRtJ6sojtjgVD@dhcp22.suse.cz [3] http://lkml.kernel.org/r/CAAPL-u8rgW-JACKUT5ChmGSJiTDABcDRjNzW_QxMjCTk9zO4sg@mail.gmail.com -- Michal Hocko SUSE Labs