From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <owner-linux-mm@kvack.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 0A0F5C4332F
	for <linux-mm@archiver.kernel.org>; Wed, 30 Nov 2022 05:33:04 +0000 (UTC)
Received: by kanga.kvack.org (Postfix)
	id 9FE7F6B0075; Wed, 30 Nov 2022 00:33:03 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id 9AE886B0078; Wed, 30 Nov 2022 00:33:03 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 89E796B007B; Wed, 30 Nov 2022 00:33:03 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15])
	by kanga.kvack.org (Postfix) with ESMTP id 7D68B6B0075
	for <linux-mm@kvack.org>; Wed, 30 Nov 2022 00:33:03 -0500 (EST)
Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1])
	by unirelay04.hostedemail.com (Postfix) with ESMTP id 4B04D1A020A
	for <linux-mm@kvack.org>; Wed, 30 Nov 2022 05:33:03 +0000 (UTC)
X-FDA: 80188989846.27.3F14CAB
Received: from mga09.intel.com (mga09.intel.com [134.134.136.24])
	by imf19.hostedemail.com (Postfix) with ESMTP id 8B4741A000E
	for <linux-mm@kvack.org>; Wed, 30 Nov 2022 05:33:01 +0000 (UTC)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple;
  d=intel.com; i=@intel.com; q=dns/txt; s=Intel;
  t=1669786381; x=1701322381;
  h=from:to:cc:subject:references:date:in-reply-to:
   message-id:mime-version;
  bh=WZob3vgXpHacPnSlYO9NSo3VFZm9z8aX4AW0HYbjC0g=;
  b=oIbVB3XNlF6nTvzuxIzxwjbbLw+ug0bfu/FKb+IHuYYI33YDGktaptiK
   7NSn3m/KkUJgII6hBBqiMDFL+CZcYbAde278HFkN1mR5NWl765Ed5uwqz
   vEF5CnvQrwET/Q3CPtI7WX0lO/9B0yFWFR3umgcJRGnOoD78AxHfXsORT
   YVktt8tgbWphIz6FYjE60UZFywPItBnm2LDVgH8g3kT/a8FBKKPTdCxAS
   EDWMDS+vOSxPRJDhQ7EvfS95ZTKvh2fi+inWvq9OTON7FsxbhS3QGHqpv
   Doun65ytO8v7/WB8968+dp0OK+EYaDPpcZSByAPRAlvl+f71OV6KITLON
   g==;
X-IronPort-AV: E=McAfee;i="6500,9779,10546"; a="316460648"
X-IronPort-AV: E=Sophos;i="5.96,205,1665471600"; 
   d="scan'208";a="316460648"
Received: from fmsmga003.fm.intel.com ([10.253.24.29])
  by orsmga102.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Nov 2022 21:32:59 -0800
X-IronPort-AV: E=McAfee;i="6500,9779,10546"; a="732846296"
X-IronPort-AV: E=Sophos;i="5.96,205,1665471600"; 
   d="scan'208";a="732846296"
Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55])
  by fmsmga003-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 29 Nov 2022 21:32:56 -0800
From: "Huang, Ying" <ying.huang@intel.com>
To: Yang Shi <shy828301@gmail.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>,  Mina Almasry
 <almasrymina@google.com>,  Yang Shi <yang.shi@linux.alibaba.com>,  Yosry
 Ahmed <yosryahmed@google.com>,  Tim Chen <tim.c.chen@linux.intel.com>,
  weixugc@google.com,  shakeelb@google.com,  gthelen@google.com,
  fvdl@google.com,  Michal Hocko <mhocko@kernel.org>,  Roman Gushchin
 <roman.gushchin@linux.dev>,  Muchun Song <songmuchun@bytedance.com>,
  Andrew Morton <akpm@linux-foundation.org>,  linux-kernel@vger.kernel.org,
  cgroups@vger.kernel.org,  linux-mm@kvack.org
Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim
References: <20221122203850.2765015-1-almasrymina@google.com>
	<Y35fw2JSAeAddONg@cmpxchg.org>
	<CAHS8izN+xqM67XLT4y5qyYnGQMUWRQCJrdvf2gjTHd8nZ_=0sw@mail.gmail.com>
	<Y36XchdgTCsMP4jT@cmpxchg.org>
	<874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<CAHbLzkrmxyzH4R7a9sJQavrUyKCEiNYeA543+sdJLsgRPrwBwQ@mail.gmail.com>
	<87a64ad1iz.fsf@yhuang6-desk2.ccr.corp.intel.com>
	<CAHbLzkpVZf-3K0Ys8HG8x6D_XpPChB-H2XMYar7UwnNDeMiw8w@mail.gmail.com>
Date: Wed, 30 Nov 2022 13:31:51 +0800
In-Reply-To: <CAHbLzkpVZf-3K0Ys8HG8x6D_XpPChB-H2XMYar7UwnNDeMiw8w@mail.gmail.com>
	(Yang Shi's message of "Tue, 29 Nov 2022 09:27:33 -0800")
Message-ID: <87ilixatyw.fsf@yhuang6-desk2.ccr.corp.intel.com>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain; charset=ascii
ARC-Authentication-Results: i=1;
	imf19.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=oIbVB3XN;
	spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669786383; a=rsa-sha256;
	cv=none;
	b=J405oeq9r56Dl9boQo82RdFJopoNMgg8AuXzthRcyloWayCGXkm6Ps5LZ/8Zc3U2IB1BRA
	BLQkkb6TKPN7eyPYsUeHUDDv1cULOYDh0f/qhSjlDhCdoFru+cSycTEM0PSMMWbxhrEahm
	CzkZ9Kh92aerZ1NCxzQJoUM0NIixmhY=
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com;
	s=arc-20220608; t=1669786383;
	h=from:from:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:dkim-signature;
	bh=NtfHH6Wj5goqz48HLGTwP7gRGZxBLlD0nx4w16hZHFA=;
	b=WKXOU9rkqyzHQzCAiUhLyoAUytdqRmXATZhawlpMMgbiDp5C5kOB5nbgPyDJ3FQmE+DtfG
	PSoRgBR4PetOV+nOuyyvNNbvsrVA3RZTPqo5yOSo5Rb1Pt2W7NkuQrljErZTbb6OZPXTBb
	cD83WJC35CRrcxCNgJ5ZckSQL+kU/wk=
X-Rspam-User: 
X-Rspamd-Server: rspam07
X-Rspamd-Queue-Id: 8B4741A000E
Authentication-Results: imf19.hostedemail.com;
	dkim=none ("invalid DKIM record") header.d=intel.com header.s=Intel header.b=oIbVB3XN;
	spf=pass (imf19.hostedemail.com: domain of ying.huang@intel.com designates 134.134.136.24 as permitted sender) smtp.mailfrom=ying.huang@intel.com;
	dmarc=pass (policy=none) header.from=intel.com
X-Stat-Signature: 8twfm7sxrb6tytzn1o14ihqjaon1dq4i
X-HE-Tag: 1669786381-132947
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

Yang Shi <shy828301@gmail.com> writes:

> On Mon, Nov 28, 2022 at 4:54 PM Huang, Ying <ying.huang@intel.com> wrote:
>>
>> Yang Shi <shy828301@gmail.com> writes:
>>
>> > On Wed, Nov 23, 2022 at 9:52 PM Huang, Ying <ying.huang@intel.com> wrote:
>> >>
>> >> Hi, Johannes,
>> >>
>> >> Johannes Weiner <hannes@cmpxchg.org> writes:
>> >> [...]
>> >> >
>> >> > The fallback to reclaim actually strikes me as wrong.
>> >> >
>> >> > Think of reclaim as 'demoting' the pages to the storage tier. If we
>> >> > have a RAM -> CXL -> storage hierarchy, we should demote from RAM to
>> >> > CXL and from CXL to storage. If we reclaim a page from RAM, it means
>> >> > we 'demote' it directly from RAM to storage, bypassing potentially a
>> >> > huge amount of pages colder than it in CXL. That doesn't seem right.
>> >> >
>> >> > If demotion fails, IMO it shouldn't satisfy the reclaim request by
>> >> > breaking the layering. Rather it should deflect that pressure to the
>> >> > lower layers to make room. This makes sure we maintain an aging
>> >> > pipeline that honors the memory tier hierarchy.
>> >>
>> >> Yes.  I think that we should avoid to fall back to reclaim as much as
>> >> possible too.  Now, when we allocate memory for demotion
>> >> (alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used.  So, we will trigger
>> >> kswapd reclaim on lower tier node to free some memory to avoid fall back
>> >> to reclaim on current (higher tier) node.  This may be not good enough,
>> >> for example, the following patch from Hasan may help via waking up
>> >> kswapd earlier.
>> >
>> > For the ideal case, I do agree with Johannes to demote the page tier
>> > by tier rather than reclaiming them from the higher tiers. But I also
>> > agree with your premature OOM concern.
>> >
>> >>
>> >> https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/
>> >>
>> >> Do you know what is the next step plan for this patch?
>> >>
>> >> Should we do even more?
>> >
>> > In my initial implementation I implemented a simple throttle logic
>> > when the demotion is not going to succeed if the demotion target has
>> > not enough free memory (just check the watermark) to make migration
>> > succeed without doing any reclamation. Shall we resurrect that?
>>
>> Can you share the link to your throttle patch?  Or paste it here?
>
> I just found this on the mailing list.
> https://lore.kernel.org/linux-mm/1560468577-101178-8-git-send-email-yang.shi@linux.alibaba.com/

Per my understanding, this patch will avoid demoting if there's no free
space on demotion target?  If so, I think that we should trigger kswapd
reclaiming on demotion target before that.  And we can simply avoid to
fall back to reclaim firstly, then avoid to scan as an improvement as
that in your patch above.

Best Regards,
Huang, Ying

> But it didn't have the throttling logic, I may not submit that version
> to the mailing list since we decided to drop this and merge mine and
> Dave's.
>
> Anyway it is not hard to add the throttling logic, we already have a
> few throttling cases in vmscan, for example, "mm/vmscan: throttle
> reclaim until some writeback completes if congested".
>>
>> > Waking kswapd sooner is fine to me, but it may be not enough, for
>> > example, the kswapd may not keep up so remature OOM may happen on
>> > higher tiers or reclaim may still happen. I think throttling the
>> > reclaimer/demoter until kswapd makes progress could avoid both. And
>> > since the lower tiers memory typically is quite larger than the higher
>> > tiers, so the throttle should happen very rarely IMHO.
>> >
>> >>
>> >> From another point of view, I still think that we can use falling back
>> >> to reclaim as the last resort to avoid OOM in some special situations,
>> >> for example, most pages in the lowest tier node are mlock() or too hot
>> >> to be reclaimed.
>> >>
>> >> > So I'm hesitant to design cgroup controls around the current behavior.
>>
>> Best Regards,
>> Huang, Ying