From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C1CECC4332F for ; Tue, 29 Nov 2022 17:27:47 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 712856B0081; Tue, 29 Nov 2022 12:27:47 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6C32A6B0082; Tue, 29 Nov 2022 12:27:47 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DA536B0083; Tue, 29 Nov 2022 12:27:47 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 5007E6B0081 for ; Tue, 29 Nov 2022 12:27:47 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 03C49AAF88 for ; Tue, 29 Nov 2022 17:27:46 +0000 (UTC) X-FDA: 80187162174.30.45A1BC4 Received: from mail-pj1-f48.google.com (mail-pj1-f48.google.com [209.85.216.48]) by imf18.hostedemail.com (Postfix) with ESMTP id 9C3DB1C000B for ; Tue, 29 Nov 2022 17:27:46 +0000 (UTC) Received: by mail-pj1-f48.google.com with SMTP id u15-20020a17090a3fcf00b002191825cf02so9810058pjm.2 for ; Tue, 29 Nov 2022 09:27:46 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=+nfXkLtHcbcHqvlWMT2FI5FkLy2T405brGAlDCc79rA=; b=kwu9geb/XZrQHY0mZJErmlKtIQ1zOENfXRBtEsAUufzgum2upmjcwtB5SuDbE5MEqv gIJVuG8Xm/LtmOahWY3XHgaqoGISeAWqhhypDjedtMMohjCVvXj0I5QyAKoqHmDfIiiX 03x8iFaG1ZSPKfaGUmNiTMI4tTgC0afMbjRRvKcuuhV+6WCzkCahw2bxP1paRF+LjqEz 2nk0o6nnd5T8UDdrZxQpo5+C9z1BATxOHdUGTM1tN5mMTstUnIjVnk9SqmdwUTAB/ufl mDe8257BkwTPA9lRa+GLqPonhEcZcBsiKZ+q3G7zjC8htvhg2dPI68Pkfy2Et8yYV3Uo n4MA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=+nfXkLtHcbcHqvlWMT2FI5FkLy2T405brGAlDCc79rA=; b=0o43aRgyZe74OpCyJ3xLnoC6wZJwtilV2iJWKtGU6VyeoyuSCcyLpfPhCwjTtj6wq6 vVEcO6c1fHw1rFx7E0veqr8O2YJHEs0h5dBo6Tt+DDg3M967KYyGVEyoRLX3xYeUGslt S9NOIZuXfj0IQ57VIKuwP7Wrw3RJ2WDfmy5IdrxxYi41QT7/jVRQrppPhtnS5COn0B8e LD3bYn3zl+N13GJaAgx4UOw4ZiwdcW4mnaO6PvSo5dnhvyZzV+PqcMrDt3m5HxwE3Fpe Nc3XcZfgq5rxBxgkOpR+BVAA0sIRuaxAkQbW6gmaiErHXy659RFNl6lEx7OjzhgZsYoP CCug== X-Gm-Message-State: ANoB5plCUZT0lBTL+jpPugzIOdDwNZ8VKhqw7HqMK5rFJREhtDtHTecR QXg1GumyF+KjS8oHNvfz04hrEx21On4+njg6wNA= X-Google-Smtp-Source: AA0mqf4zImlMiKjDrqJUkHXvzwMr9YMxo5+Qm2bctCGa3Qitgn/8MiJFhpHDGX4PZpJCIavmpsN2ckIwpQHV1Mbh3+E= X-Received: by 2002:a17:903:22c4:b0:184:cb7e:ba36 with SMTP id y4-20020a17090322c400b00184cb7eba36mr37721305plg.57.1669742865590; Tue, 29 Nov 2022 09:27:45 -0800 (PST) MIME-Version: 1.0 References: <20221122203850.2765015-1-almasrymina@google.com> <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87a64ad1iz.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87a64ad1iz.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Yang Shi Date: Tue, 29 Nov 2022 09:27:33 -0800 Message-ID: Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim To: "Huang, Ying" Cc: Johannes Weiner , Mina Almasry , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="kwu9geb/"; spf=pass (imf18.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669742866; a=rsa-sha256; cv=none; b=V3IspjWD4lO6D4pWsg2G8szxXrcy5Jobh9Bv/DFnDmuQmCw0sthnh8vsELFlcU5Aq/Q9hE U6SOAEV772gdn4t1CzJkPxyOXRyAUnu4JSLCkEZQjXwMbTwuSbpHHefCU9OZym/3D1VtSe W/Jv4fXpihJmal5pN+EHFVvgVIwSW9I= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669742866; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+nfXkLtHcbcHqvlWMT2FI5FkLy2T405brGAlDCc79rA=; b=y5TZQ2hRWBfPQycSOIFZfG7wYnjiVIuR5GsQLuD6gtX7uPC40C5RS7zw2RqSwpFZdLC9p/ oQJeXgaa8FRGxwk/Hsr/k4A3OnN5rwmSVLoljeZvq1nzjhvwmMMHIDbOV9A4+APZ0n8k/W YixZaTnJpGPgaT8fbg0tr9/LKtphWDg= X-Rspam-User: X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 9C3DB1C000B Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b="kwu9geb/"; spf=pass (imf18.hostedemail.com: domain of shy828301@gmail.com designates 209.85.216.48 as permitted sender) smtp.mailfrom=shy828301@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Stat-Signature: 6q1gpqe4n78a53qpbxpw71ositctc4of X-HE-Tag: 1669742866-119406 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Nov 28, 2022 at 4:54 PM Huang, Ying wrote: > > Yang Shi writes: > > > On Wed, Nov 23, 2022 at 9:52 PM Huang, Ying wrote: > >> > >> Hi, Johannes, > >> > >> Johannes Weiner writes: > >> [...] > >> > > >> > The fallback to reclaim actually strikes me as wrong. > >> > > >> > Think of reclaim as 'demoting' the pages to the storage tier. If we > >> > have a RAM -> CXL -> storage hierarchy, we should demote from RAM to > >> > CXL and from CXL to storage. If we reclaim a page from RAM, it means > >> > we 'demote' it directly from RAM to storage, bypassing potentially a > >> > huge amount of pages colder than it in CXL. That doesn't seem right. > >> > > >> > If demotion fails, IMO it shouldn't satisfy the reclaim request by > >> > breaking the layering. Rather it should deflect that pressure to the > >> > lower layers to make room. This makes sure we maintain an aging > >> > pipeline that honors the memory tier hierarchy. > >> > >> Yes. I think that we should avoid to fall back to reclaim as much as > >> possible too. Now, when we allocate memory for demotion > >> (alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used. So, we will trigger > >> kswapd reclaim on lower tier node to free some memory to avoid fall back > >> to reclaim on current (higher tier) node. This may be not good enough, > >> for example, the following patch from Hasan may help via waking up > >> kswapd earlier. > > > > For the ideal case, I do agree with Johannes to demote the page tier > > by tier rather than reclaiming them from the higher tiers. But I also > > agree with your premature OOM concern. > > > >> > >> https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/ > >> > >> Do you know what is the next step plan for this patch? > >> > >> Should we do even more? > > > > In my initial implementation I implemented a simple throttle logic > > when the demotion is not going to succeed if the demotion target has > > not enough free memory (just check the watermark) to make migration > > succeed without doing any reclamation. Shall we resurrect that? > > Can you share the link to your throttle patch? Or paste it here? I just found this on the mailing list. https://lore.kernel.org/linux-mm/1560468577-101178-8-git-send-email-yang.shi@linux.alibaba.com/ But it didn't have the throttling logic, I may not submit that version to the mailing list since we decided to drop this and merge mine and Dave's. Anyway it is not hard to add the throttling logic, we already have a few throttling cases in vmscan, for example, "mm/vmscan: throttle reclaim until some writeback completes if congested". > > > Waking kswapd sooner is fine to me, but it may be not enough, for > > example, the kswapd may not keep up so remature OOM may happen on > > higher tiers or reclaim may still happen. I think throttling the > > reclaimer/demoter until kswapd makes progress could avoid both. And > > since the lower tiers memory typically is quite larger than the higher > > tiers, so the throttle should happen very rarely IMHO. > > > >> > >> From another point of view, I still think that we can use falling back > >> to reclaim as the last resort to avoid OOM in some special situations, > >> for example, most pages in the lowest tier node are mlock() or too hot > >> to be reclaimed. > >> > >> > So I'm hesitant to design cgroup controls around the current behavior. > > Best Regards, > Huang, Ying