From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43F47C4332F for ; Thu, 24 Nov 2022 05:52:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF63E6B0071; Thu, 24 Nov 2022 00:52:18 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A80A06B0072; Thu, 24 Nov 2022 00:52:18 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 921356B0074; Thu, 24 Nov 2022 00:52:18 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 7CB2A6B0071 for ; Thu, 24 Nov 2022 00:52:18 -0500 (EST) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 49EFF1C5CBF for ; Thu, 24 Nov 2022 05:52:18 +0000 (UTC) X-FDA: 80167265556.06.A970B9F Received: from mga07.intel.com (mga07.intel.com [134.134.136.100]) by imf05.hostedemail.com (Postfix) with ESMTP id D68B9100009 for ; Thu, 24 Nov 2022 05:52:16 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1669269136; x=1700805136; h=from:to:cc:subject:references:date:in-reply-to: message-id:mime-version; bh=6G7nnH68EGLhd5S/EKcOEOe5slAPl+r6rFU7g2dJ4Z8=; b=ilN4UBSPQm24aNWYcMEVtub2QXi71Huf0jJVuyJxgkI9SUyF2NvUQloU 4+r9rsmqkfRmV/JNfr1318ks9cRpu/qH3BIgiaat5Ofwh6a7IOdLLTHAm VfrZDFfCceN8JU4Cves7it5vvLFFA4Shs63C45rLtBUV6MXXsR0Xz2nJO U+7mQg009qP0+9bg3q/geQAfoj6AqNGrLGS/9tVAw/YiJGzbf3HscSOBA 5FCfTfoZdM9bJ37kwMJo5QhfjcUDIlc62qojuww/duxejy//rfaPZegcR FNNTROkqEpYtl9w6iSqmYwHIMtEkZ7TBDBEcamLUgm7BHYDotKSCUZApr w==; X-IronPort-AV: E=McAfee;i="6500,9779,10540"; a="378481307" X-IronPort-AV: E=Sophos;i="5.96,189,1665471600"; d="scan'208";a="378481307" Received: from fmsmga007.fm.intel.com ([10.253.24.52]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2022 21:52:15 -0800 X-IronPort-AV: E=McAfee;i="6500,9779,10540"; a="644367190" X-IronPort-AV: E=Sophos;i="5.96,189,1665471600"; d="scan'208";a="644367190" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga007-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 23 Nov 2022 21:52:10 -0800 From: "Huang, Ying" To: Johannes Weiner Cc: Mina Almasry , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim References: <20221122203850.2765015-1-almasrymina@google.com> Date: Thu, 24 Nov 2022 13:51:20 +0800 In-Reply-To: (Johannes Weiner's message of "Wed, 23 Nov 2022 16:58:10 -0500") Message-ID: <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669269138; a=rsa-sha256; cv=none; b=gKuxay671HJ0XTqvABA3MZiCRWs90lVXAX8Umy69oCZ3n3qV6Fiaud9laufgQ/mq7+UQTh IPjCwLeiyuY8YBnbkvYPsM/w3feXZrJUStGBdpoSj8XznI765GmWvZTN63kCRVCuVivfml Urc50TIwwMCVI4gHZgMX+Cy7oveyPP0= ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ilN4UBSP; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669269138; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kT0XSIKoo/PwxlRxYYuYHf6jk0dcUjVmB1wdt/04XPc=; b=42ZuMsCW9e05e/9KJVT9mFiuPTXaIxXj0JAQAGadFWiSmy/NXqp9BJBVlVpVRH/DkiTjcY +gSIGA2CykvqLqeco5iMWptsou7I3WKzI90E4DKnBJm0G+Qfn9bqPmbwBFlVo4kufcINXw osQAKn+8IZBrQKkarnY+dFKW6Qia4ZQ= X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D68B9100009 X-Rspam-User: Authentication-Results: imf05.hostedemail.com; dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=ilN4UBSP; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf05.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.100 as permitted sender) smtp.mailfrom=ying.huang@intel.com X-Stat-Signature: cy37sp67pdzjzex3o1igokxuxdjtid3s X-HE-Tag: 1669269136-263818 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, Johannes, Johannes Weiner writes: [...] > > The fallback to reclaim actually strikes me as wrong. > > Think of reclaim as 'demoting' the pages to the storage tier. If we > have a RAM -> CXL -> storage hierarchy, we should demote from RAM to > CXL and from CXL to storage. If we reclaim a page from RAM, it means > we 'demote' it directly from RAM to storage, bypassing potentially a > huge amount of pages colder than it in CXL. That doesn't seem right. > > If demotion fails, IMO it shouldn't satisfy the reclaim request by > breaking the layering. Rather it should deflect that pressure to the > lower layers to make room. This makes sure we maintain an aging > pipeline that honors the memory tier hierarchy. Yes. I think that we should avoid to fall back to reclaim as much as possible too. Now, when we allocate memory for demotion (alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used. So, we will trigger kswapd reclaim on lower tier node to free some memory to avoid fall back to reclaim on current (higher tier) node. This may be not good enough, for example, the following patch from Hasan may help via waking up kswapd earlier. https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/ Do you know what is the next step plan for this patch? Should we do even more? >From another point of view, I still think that we can use falling back to reclaim as the last resort to avoid OOM in some special situations, for example, most pages in the lowest tier node are mlock() or too hot to be reclaimed. > So I'm hesitant to design cgroup controls around the current behavior. > Best Regards, Huang, Ying