From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 294D1C4708E for ; Wed, 4 Jan 2023 08:42:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 97ED88E0002; Wed, 4 Jan 2023 03:42:58 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 92EE38E0001; Wed, 4 Jan 2023 03:42:58 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7F67B8E0002; Wed, 4 Jan 2023 03:42:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 6F9728E0001 for ; Wed, 4 Jan 2023 03:42:58 -0500 (EST) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 2835516068A for ; Wed, 4 Jan 2023 08:42:58 +0000 (UTC) X-FDA: 80316476436.09.D0EAFDC Received: from mga14.intel.com (mga14.intel.com [192.55.52.115]) by imf24.hostedemail.com (Postfix) with ESMTP id 0B75618000B for ; Wed, 4 Jan 2023 08:42:55 +0000 (UTC) Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=LgwruHtA; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1672821776; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=P3JaNVLDipJoAT40wzO85Wa679eNZJ86YtGzsr8N7Cs=; b=hN/mJw08CTJ6fZKyrQ9IxxGtotc/fdFMP4A22yrKVtSuO3Szg609X2L1b4dDE1pqV89xF+ p7i8NtM9U6w5gNZ7nrIfPyXIFSOvS/+uOU/YXfcRQRd7mlFWlJCGS5vVOcMn02c49y7H9X UmTo2Mb2rWFHDa2/dYRQ5I+3JKEpTEw= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=LgwruHtA; spf=pass (imf24.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.115 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1672821776; a=rsa-sha256; cv=none; b=vbkTpWiQlOJ+A+w6TCWEDPDJX8I1ouq8SIqihvVYUWbOA8yGbvSCvx/4tFPkuRn5EyhfDS 0IxMz59YgLW7nob7iYMvceboJfvL5K+L8p97oEDyjefQKjMME9pFufsrJZ0kkBPvDZDlMW t0m+7pHDJEQbZnmz1mtA5l0A1t5Doa8= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1672821776; x=1704357776; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=uaz6QB5HKom2SgAX7GeYtr38/rEA4ATvE3w66kjiUZo=; b=LgwruHtACcsIgOoZchGtQkML1xwADPARk8TLXNK2EuweGMyTp5Jg59gl WguWT/qlbdB4Ncn+EaDkiofAbXYLa7qn7+D2tBvtO25nKzZS8+D+U0ESX oVf1qrn5g0triPHWzgvzrQKMMKkEJ44WCIwpBgOw6M8OYuCytCDBYJokI xIVdAlUfH6CldeG7ioq74StTeI2gcacLrLFPY5Et113YRDdWcq/HorbkR I59QgEdnMo6oqiiVLxeYXSFzfcn+OTtAZS/8JAQgGnA9nzfqcynERVIEC O85cD2Gt67VTKAAkdL/rgDhDRTqsPBMhRqYb6vbuZmcdX7fL0ke6dZetr Q==; X-IronPort-AV: E=McAfee;i="6500,9779,10579"; a="321947416" X-IronPort-AV: E=Sophos;i="5.96,299,1665471600"; d="scan'208";a="321947416" Received: from fmsmga002.fm.intel.com ([10.253.24.26]) by fmsmga103.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jan 2023 00:42:54 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10579"; a="762604707" X-IronPort-AV: E=Sophos;i="5.96,299,1665471600"; d="scan'208";a="762604707" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga002-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 04 Jan 2023 00:42:49 -0800 From: "Huang, Ying" To: Michal Hocko , Mina Almasry , Johannes Weiner , Yang Shi , Yosry Ahmed , weixugc@google.com, Tim Chen Cc: Andrew Morton , Tejun Heo , Zefan Li , Jonathan Corbet , Roman Gushchin , Shakeel Butt , Muchun Song , fvdl@google.com, bagasdotme@gmail.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Proactive reclaim/demote discussion (was Re: [PATCH] Revert "mm: add nodes= arg to memory.reclaim") In-Reply-To: (Michal Hocko's message of "Tue, 3 Jan 2023 09:37:54 +0100") References: <20221202223533.1785418-1-almasrymina@google.com> <20221216101820.3f4a370af2c93d3c2e78ed8a@linux-foundation.org> <20221219144252.f3da256e75e176905346b4d1@linux-foundation.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) Date: Wed, 04 Jan 2023 16:41:50 +0800 Message-ID: <87lemiitdd.fsf_-_@yhuang6-desk2.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 0B75618000B X-Rspam-User: X-Stat-Signature: kamcrjiw7aga344ndtz8iii1ke5azkba X-HE-Tag: 1672821775-592995 X-HE-Meta: U2FsdGVkX18ISKO68cYhaBd/G/gg8OZ9+6Dqe/e1QZZEtB5AXGvK1jyxbg+425Jofwiv9gMPkfkIZV+On5+C0ok4opommWxKsQxiu76T5MIR7mkXz5M5l9KjGZm6+/G0Ee0R2fM+6c+l7wGCaXkji7tT/uKi8hmHIMQ4W4KdfyWgMbIcg8JBlO8ucymIEagnlPKOHqVA3aZGp6GKb4x3eiYD9lbw1gqaCNWbjqwFRXjnfimlopOL1ZINu9moLZr4A0E0/ixoa8Aba0RTxgE6V69wst+DmD8f9Tm/jPNVBo17eWAnp6F06exH2OU0CEJIh+TgwbmHY0tGvA1NP2U6sE81ino+oPK5sN9nd4yOw8YLKtc+JPf2SOZBHVVm9FdRFY/rd24x2aiOgSsBSZSCHOaLzZpnb/G2C2+Az72x+k1BGrNYGOwaPAJeMzVLBt5RKsA8k7pvWpmDt3GvWbAP15X97TdrbqWwu5UkM74cs9rjMfTmrB+oWDMAaWJRwRHc04evXqU4dWgrvjtY7XuFEv24hhipqvIPfqw8CTLvZoKdRci06in9Mg+jMaVOovBYtxQnU1oeDTtHa9X1foNnB5lhPob7HbucyVxOfNQ8RhM5X+8/0a4RtgIcSJixhI//1miG6myW4HM8ZdMDYHGylH62aW0KT6/1ZymdbbgMIjdxTdaR3CMI7Ve5DUX7HEELPo+BPk/1qtSxEa62iGsTjOauFim7ZL96yFV1oDCfa9yJhg20X6zh6PEohpCCdA7MK/oqBugS4iDPMVzFpBPIGiCbRJn05KGF5Q6LAxbpuhJ2XA3CQWXBNG0UPfg0U3JQ8NjZRywkh81P6FUsVzKrIOchHeUnHWe7ZJpEpbXVy6H8ND0g7FkW7A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Michal Hocko writes: [snip] > This really requires more discussion. Let's start the discussion with some summary. Requirements: - Proactive reclaim. The counting of current per-memcg proactive reclaim (memory.reclaim) isn't correct. The demoted, but not reclaimed pages will be counted as reclaimed. So "echo XXM > memory.reclaim" may exit prematurely before the specified number of memory is reclaimed. - Proactive demote. We need an interface to do per-memcg proactive demote. We may reuse memory.reclaim via extending the concept of reclaiming to include demoting. Or, we can add a new interface for that (for example, memory.demote). In addition to demote from fast tier to slow tier, in theory, we may need to demote from a set of nodes to another set of nodes for something like general node balancing. - Proactive promote. In theory, this is possible, but there's no real life requirements yet. And it should use a separate interface, so I don't think we need to discuss that here. Open questions: - Use memory.reclaim or memory.demote for proactive demote. In current memcg context, reclaiming and demoting is quite different, because reclaiming will uncharge, while demoting will not. But if we will add per-memory-tier charging finally, the difference disappears. So the question becomes whether will we add per-memory-tier charging. - Whether should we demote from faster tier nodes to lower tier nodes during the proactive reclaiming. Choice A is to keep as much fast memory as possible. That is, reclaim from the lowest tier nodes firstly, then the secondary lowest tier nodes, and so on. Choice B is to demote at the same time of reclaiming. In this way, if we proactively reclaim XX MB memory, we may free XX MB memory on the fastest memory nodes. - When we proactively demote some memory from a fast memory tier, should we trigger memory competition in the slower memory tiers? That is, whether to wake up kswapd of the slower memory tiers nodes? If we want to make per-memcg proactive demoting to be per-memcg strictly, we should avoid to trigger the global behavior such as triggering memory competition in the slower memory tiers. Instead, we can add a global proactive demote interface for that (such as per-memory-tier or per-node). Best Regards, Huang, Ying