From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6BA43C4332F for ; Tue, 13 Dec 2022 19:30:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2E4B8E0006; Tue, 13 Dec 2022 14:29:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id DB5428E0002; Tue, 13 Dec 2022 14:29:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2F6F8E0006; Tue, 13 Dec 2022 14:29:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id AED808E0002 for ; Tue, 13 Dec 2022 14:29:59 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 76A6740D64 for ; Tue, 13 Dec 2022 19:29:59 +0000 (UTC) X-FDA: 80238273318.03.992DC49 Received: from mail-ua1-f47.google.com (mail-ua1-f47.google.com [209.85.222.47]) by imf27.hostedemail.com (Postfix) with ESMTP id D308C40007 for ; Tue, 13 Dec 2022 19:29:57 +0000 (UTC) Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Co3Pb7L3; spf=pass (imf27.hostedemail.com: domain of almasrymina@google.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1670959797; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=neDu/yrL/JSqlqVsfpnzJOzopkIFlG1Zhb0kcbvbZxE=; b=Jx2oV5grsI6o5HFC2XcEkPULzWSP2Wg+c9DUp/K7AkR9zXBcOie5PEDN1YSp7MepWfvkkZ xIedtow+wy6k0ZZ5QQ4yTKi//CnBnxdtDs6r3Nub0NL/YqXwAo2yKYsnUce/4ny+nfTDLN ePyiiaP3r4MIm6Np4+EzPUY0dsJ2K6A= ARC-Authentication-Results: i=1; imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Co3Pb7L3; spf=pass (imf27.hostedemail.com: domain of almasrymina@google.com designates 209.85.222.47 as permitted sender) smtp.mailfrom=almasrymina@google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1670959797; a=rsa-sha256; cv=none; b=6Vsdu5j3MMKwJjkE0jj3SwTkmIAN1evypxMQd/45WilKpQBIOqr9/vL29/pBJEg1viRgg/ hEdGlcfV+OshrNAaKHsvn4qUPqgpK3Y58Rokc7nnvar6GtxLFvNQ3fNGHayitqdwCvu0rt MgcnnQsPqAE1DsIkdbgDNUtkU/OyRSI= Received: by mail-ua1-f47.google.com with SMTP id s25so4354017uac.2 for ; Tue, 13 Dec 2022 11:29:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=neDu/yrL/JSqlqVsfpnzJOzopkIFlG1Zhb0kcbvbZxE=; b=Co3Pb7L35N5xF9xIoLlMGqZks/LFNr4reOf6gCRzOWPQh4D6Vm5kabu8s/xMDPyKTQ ErT7zPGS59EnPAPdgJyQICedLFv7aA4I93fBmhAP2ADL50Q9eHAlOZlo25vtmjegw/ry HAU2435Dz4h3vUBsm3VLQvKKEmtJOKX/r2zbudg7uWWZR1p0L7HQ6EaBhfUEYyiQowii HOgtFY9O2bc1hIO+pd9BTsFuike7L+HqcurRV/1SHITmYxuIlaDBiwdqdX9oEVuyPSVy gSxOOgREz2nB1WeQKwxcAlyv2Gup+c5xEjA/7k1NVdA37rZLc/Jyc9dG7iRjCzzmXeEi bpFg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=neDu/yrL/JSqlqVsfpnzJOzopkIFlG1Zhb0kcbvbZxE=; b=iwdd5cdvTw9oPmBelGzRiZshYXs2lTZCUo+Z9N3YtDxc7wUaaRrK9it1/pIGOtuRli GnY4C+sVcx7kT8nXHAROmr6hudOwDSv+2y8AO1+SRzuxNzGCJrapiu0u+ibIlRu2+22V grLjZM9AdFmlz2W3Tbw3NdJBhbASpHM6k9qMb0buAr+i/vFUcZeSxXOT5idtcjavSMIs 36bdgke04hlMrP3gcdGi2JxmzKHiNrrLKBzaGW96CV4dY3viCgHgar+OV9fr30wbfe10 1v2zZ0Cjt+buZvzLaZLYJlW5obEg9hDpcFYk8Rp59tXFUATO5LfJ/JNmH2IAMkEH32qB THvQ== X-Gm-Message-State: ANoB5pmny7DBRrmJ9AkrYZY30uNABscqCLgQgbQmQ6xCOuNtHz94N0vX QhFbys0jTQIMPVCgybpVUOUWTKOD1bh7kYrigaOqMw== X-Google-Smtp-Source: AA0mqf6S0jwUysVhhhnyze8Iay7wzwTiDYa9dfjFaErbonxdVBa5+9U47yOdj4FzqZQT797xfd6gPj6K4vjbuelncLA= X-Received: by 2002:ab0:6014:0:b0:419:c9d3:a3c6 with SMTP id j20-20020ab06014000000b00419c9d3a3c6mr10708006ual.18.1670959796820; Tue, 13 Dec 2022 11:29:56 -0800 (PST) MIME-Version: 1.0 References: <20221202223533.1785418-1-almasrymina@google.com> <87k02volwe.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: From: Mina Almasry Date: Tue, 13 Dec 2022 11:29:45 -0800 Message-ID: Subject: Re: [PATCH v3] mm: Add nodes= arg to memory.reclaim To: Michal Hocko Cc: Johannes Weiner , "Huang, Ying" , Tejun Heo , Zefan Li , Jonathan Corbet , Roman Gushchin , Shakeel Butt , Muchun Song , Andrew Morton , Yang Shi , Yosry Ahmed , weixugc@google.com, fvdl@google.com, bagasdotme@gmail.com, cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Stat-Signature: k8peexcowowmdxn1x8gcx3hp91q7gdso X-Rspam-User: X-Rspamd-Queue-Id: D308C40007 X-Rspamd-Server: rspam06 X-HE-Tag: 1670959797-653699 X-HE-Meta: U2FsdGVkX191jqeFcseTzpJ8WiTpwtlgVHZHMHOx4ajdtBwAUElAvnY5viFIqKjX5202SEEu4sl/DW6vaoYceCDZ4kN52glNwB+Vj7noOUkb6PnIrGC7r0QNxYQKRBwzdmRM1VYx8zyCFkGB+Tf+MWw8x/t6bUjIqB3lV5lA3IYfybI2QkUwUlcRqc1MlmTjp+eQQOy6Fph5gdkqviNB2EqhHvbfPjo+bMNJgX8nrgP96zEAl8GTBW4VNDWWzHdgiYnexon7MaAxtkeuLue3XEKtOJ2oPecaroniK2yN7QbgvmmJ2cvHi45Q+gXiSJD8CdZ1UeFdQXSmJZd2wwvrz71SHj7C/+ZXxOAvhHUHT5v1uVYSAksYhKIsag8pONKA9/R15pkmS9dPdP9sGx0ijz1q8hWtSXrhj1OTdnE2xlj/t/0BO0LTYhvK70BQuYiMwYZke/IoE0wZOuYr8dDdGuvo3GQiTbWLk7Tmzv5Z1t5JT7vlzCWEwflx+Mfet9INwSlDxYTBQIRb1evTxUPiHznkosZusqp4HWxCHQWtxQ833AJvs+6VJRw18aVGv0FPuaX/s73I3+Rb7NKQoYbPIy55bGte3ydGbO0H28Jf3UyQHsRJZmbBGbse32DqtJr2NzoNVGvcqD6vU5bMqVAlT5RWyR6lQ2TR/mLjfWfuaqe/Z7gGP47wyWLqa15oLIpPgqMUe3WoakobkVGgLUocs/QLhWja+s6dHWxeKGZ1dEssuPUfy9tXlz5jl7U/3aEmb+SvnJhyoIWDBn1pvM+6SoyGB67iLNDgIiGTRKusbGtHPkeEKtdqiN3TtL2PvqqrHBVP5a0hpgCieT+f2P0nlZJetb0BLTEP6bfpGwSg+0YlEcJjurhp5XmkjiBqFLpV3xL1BiLVlfSgTeXiqXnyO2b/WqVGkVxod/mLNLX4yVPASppf75LmKV/oH/g/OlYSc5WYiH32oth6Zs/KpCD jBeq+e5/ zEnIVmMgC45rpZPY= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Dec 13, 2022 at 6:03 AM Michal Hocko wrote: > > On Tue 13-12-22 14:30:40, Johannes Weiner wrote: > > On Tue, Dec 13, 2022 at 02:30:57PM +0800, Huang, Ying wrote: > [...] > > > After these discussion, I think the solution maybe use different > > > interfaces for "proactive demote" and "proactive reclaim". That is, > > > reconsider "memory.demote". In this way, we will always uncharge the > > > cgroup for "memory.reclaim". This avoid the possible confusion there. > > > And, because demotion is considered aging, we don't need to disable > > > demotion for "memory.reclaim", just don't count it. > > > > Hm, so in summary: > > > > 1) memory.reclaim would demote and reclaim like today, but it would > > change to only count reclaimed pages against the goal. > > > > 2) memory.demote would only demote. > > If the above 2 points are agreeable then yes, this sounds good to me and does address our use case. > > a) What if the demotion targets are full? Would it reclaim or fail? > > Wei will chime in if he disagrees, but I think we _require_ that it fails, not falls back to reclaim. The interface is asking for demotion, and is called memory.demote. For such an interface to fall back to reclaim would be very confusing to userspace and may trigger reclaim on a high priority job that we want to shield from proactive reclaim. > > 3) Would memory.reclaim and memory.demote still need nodemasks? memory.demote will need a nodemask, for sure. Today the nodemask would be useful if there is a specific node in the top tier that is overloaded and we want to reduce the pressure by demoting. In the future there will be N tiers and the nodemask says which tier to demote from. I don't think memory.reclaim would need a nodemask anymore? At least I no longer see the use for it for us. > > Would > > they return -EINVAL if a) memory.reclaim gets passed only toptier > > nodes or b) memory.demote gets passed any lasttier nodes? > Honestly it would be great if memory.reclaim can force reclaim from a top tier nodes. It breaks the aginig pipeline, yes, but if the user is specifically asking for that because they decided in their usecase it's a good idea then the kernel should comply IMO. Not a strict requirement for us. Wei will chime in if he disagrees. memory.demote returning -EINVAL for lasttier nodes makes sense to me. > I would also add > 4) Do we want to allow to control the demotion path (e.g. which node to > demote from and to) and how to achieve that? We care deeply about specifying which node to demote _from_. That would be some node that is approaching pressure and we're looking for proactive saving from. So far I haven't seen any reason to control which nodes to demote _to_. The kernel deciding that based on the aging pipeline and the node distances sounds good to me. Obviously someone else may find that useful. > 5) Is the demotion api restricted to multi-tier systems or any numa > configuration allowed as well? > demotion will of course not work on single tiered systems. The interface may return some failure on such systems or not be available at all. > -- > Michal Hocko > SUSE Labs