From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AFD23C4321E for ; Thu, 1 Dec 2022 20:40:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E2D0B6B0072; Thu, 1 Dec 2022 15:40:30 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D8F176B0075; Thu, 1 Dec 2022 15:40:30 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C08986B0078; Thu, 1 Dec 2022 15:40:30 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id AA3276B0072 for ; Thu, 1 Dec 2022 15:40:30 -0500 (EST) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 7E9F2161251 for ; Thu, 1 Dec 2022 20:40:30 +0000 (UTC) X-FDA: 80194905420.30.7DF01A1 Received: from mail-vk1-f175.google.com (mail-vk1-f175.google.com [209.85.221.175]) by imf06.hostedemail.com (Postfix) with ESMTP id 30078180006 for ; Thu, 1 Dec 2022 20:40:28 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kHhd1lJN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of almasrymina@google.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=almasrymina@google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1669927229; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8+d/Y6QGxzCRwo466zPM3VyCOXSKLx8dJYK4YlExqPw=; b=nN3H9PPNe1j2hiuqEthZCoCbf3o2QNodVi9Dn9+Q9nO6IAsKqMt3A82jjBHulrkpei56Cn thlKYLZfGGo1s1VdJ5y4ftO1ciwPxEjs5mpFmd4kzg6YBHHMq3ldwW9cN+x1opTXWlAfhO DTnryv07RsKKReldllNI/8UuE5JUIQ4= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kHhd1lJN; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of almasrymina@google.com designates 209.85.221.175 as permitted sender) smtp.mailfrom=almasrymina@google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1669927229; a=rsa-sha256; cv=none; b=sz8O/2fV9J11SNMwDQ38a12ElSX8verTMdcr9hh6FRzQ4PjaX73WpG6+uLVwPSAVhMXyff /DImSAqc618azSYNUlDMVDTVurrQ+4EBaDLSd0Z+/lWILj7mK5DOIA5iaLKVJcU1j82hnn PlxVh/HPlX/n/gx5y9ntpGZWN+cFI1A= Received: by mail-vk1-f175.google.com with SMTP id g137so1400020vke.10 for ; Thu, 01 Dec 2022 12:40:28 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=8+d/Y6QGxzCRwo466zPM3VyCOXSKLx8dJYK4YlExqPw=; b=kHhd1lJNN1/QkG4w2PtJZDkbCFG9xjthR8OUDLqRnbAHospcNfy5OCNh/Hx2uf59HM 3qsa2ra3nVtmktTsWSpx/1wvizTHK+f3yfNmfer9LBmr8CEl1HXS+QrleH8ngBBUzh9Z j/S3wnaSCMBs88wZ12gX2Tl2YnYGx0RvGtCyumY/GZrCVo0Ubo7Qh7pkOijxaXFiSlBO yDYGD00ryih7qUIdL2O4BXMriKUaxq5pSMbLv+d1aFQPjlTdpFx/KILSJQ35wFxwd8BE slEz+13gdl9RT5DNs0PqkQ+RHSePka0GLdwfP1MFajg3gB+QyI9oX+p+Kfz2JenxgPDb 9g7g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8+d/Y6QGxzCRwo466zPM3VyCOXSKLx8dJYK4YlExqPw=; b=oNLA5be+u7NSbtd9nk7EIAycovN4de7rBxMEUTktqDCeTGYVuLtu3ryBUkraMs8aPB LrkgB04oAa/4ilFTp3OcMnwnReV3ikLNzi5fpq5vHwjGbs5h9wy+QsvjGhgzUr6iROyj 6hA0ESstPtiT0r839yhOieoj8J+A9ETKqM8BgFujojl+hyKS2g0MHE/iNwLDINcesZQZ Lf7Zch9Q8QEifQLPBrCxspmdWGUjwwhqaZuK5XB4RuMHIIXvFc/NGihseGurXE7+52OM k1lBYhmNy1NvZvDSHUyt/ChGf6yMQioqpi728li/v6dEuNkk1IDxrWkbih0lqYHJIyT/ 8OpQ== X-Gm-Message-State: ANoB5pl9X+UvEhG32Pd8poTEPCWm1b9M0F+ccLu995jI/4Gkzrm6KfWH pcDeuy30Rb/I2A4GwKK4JQsQJvj1lA70O0KlMjkNNw== X-Google-Smtp-Source: AA0mqf7vbPk0q9HDNwJryKLFrpFpMSN/fajOGVbvLf3ajA8ApvPQWKfnkG+PQ//esM2Efw/J/HAk/mqT3Y0pz1xE+zE= X-Received: by 2002:a1f:2a49:0:b0:3bc:61cb:e4fa with SMTP id q70-20020a1f2a49000000b003bc61cbe4famr29011961vkq.15.1669927228271; Thu, 01 Dec 2022 12:40:28 -0800 (PST) MIME-Version: 1.0 References: <20221122203850.2765015-1-almasrymina@google.com> <874juonbmv.fsf@yhuang6-desk2.ccr.corp.intel.com> <87wn7dayfz.fsf@yhuang6-desk2.ccr.corp.intel.com> In-Reply-To: <87wn7dayfz.fsf@yhuang6-desk2.ccr.corp.intel.com> From: Mina Almasry Date: Thu, 1 Dec 2022 12:40:16 -0800 Message-ID: Subject: Re: [RFC PATCH V1] mm: Disable demotion from proactive reclaim To: "Huang, Ying" Cc: Johannes Weiner , Yang Shi , Yosry Ahmed , Tim Chen , weixugc@google.com, shakeelb@google.com, gthelen@google.com, fvdl@google.com, Michal Hocko , Roman Gushchin , Muchun Song , Andrew Morton , linux-kernel@vger.kernel.org, cgroups@vger.kernel.org, linux-mm@kvack.org Content-Type: text/plain; charset="UTF-8" X-Spamd-Result: default: False [0.10 / 9.00]; SORBS_IRL_BL(3.00)[209.85.221.175:from]; BAYES_HAM(-3.00)[100.00%]; BAD_REP_POLICIES(0.10)[]; RCVD_NO_TLS_LAST(0.10)[]; MIME_GOOD(-0.10)[text/plain]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_TWO(0.00)[2]; FROM_EQ_ENVFROM(0.00)[]; DMARC_POLICY_ALLOW(0.00)[google.com,reject]; RCPT_COUNT_TWELVE(0.00)[16]; DKIM_TRACE(0.00)[google.com:+]; TO_MATCH_ENVRCPT_SOME(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[linux-mm@kvack.org]; R_DKIM_ALLOW(0.00)[google.com:s=20210112]; ARC_SIGNED(0.00)[hostedemail.com:s=arc-20220608:i=1]; FROM_HAS_DN(0.00)[]; R_SPF_ALLOW(0.00)[+ip4:209.85.128.0/17]; TO_DN_SOME(0.00)[]; ARC_NA(0.00)[] X-Stat-Signature: 5uc5it4tg7hd9o17zuzogze7pwqmg5qh X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 30078180006 X-Rspam-User: X-HE-Tag: 1669927228-386999 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Nov 29, 2022 at 7:56 PM Huang, Ying wrote: > > Johannes Weiner writes: > > > Hello Ying, > > > > On Thu, Nov 24, 2022 at 01:51:20PM +0800, Huang, Ying wrote: > >> Johannes Weiner writes: > >> > The fallback to reclaim actually strikes me as wrong. > >> > > >> > Think of reclaim as 'demoting' the pages to the storage tier. If we > >> > have a RAM -> CXL -> storage hierarchy, we should demote from RAM to > >> > CXL and from CXL to storage. If we reclaim a page from RAM, it means > >> > we 'demote' it directly from RAM to storage, bypassing potentially a > >> > huge amount of pages colder than it in CXL. That doesn't seem right. > >> > > >> > If demotion fails, IMO it shouldn't satisfy the reclaim request by > >> > breaking the layering. Rather it should deflect that pressure to the > >> > lower layers to make room. This makes sure we maintain an aging > >> > pipeline that honors the memory tier hierarchy. > >> > >> Yes. I think that we should avoid to fall back to reclaim as much as > >> possible too. Now, when we allocate memory for demotion > >> (alloc_demote_page()), __GFP_KSWAPD_RECLAIM is used. So, we will trigger > >> kswapd reclaim on lower tier node to free some memory to avoid fall back > >> to reclaim on current (higher tier) node. This may be not good enough, > >> for example, the following patch from Hasan may help via waking up > >> kswapd earlier. > >> > >> https://lore.kernel.org/linux-mm/b45b9bf7cd3e21bca61d82dcd1eb692cd32c122c.1637778851.git.hasanalmaruf@fb.com/ > >> > >> Do you know what is the next step plan for this patch? > >> > >> Should we do even more? > >> > >> From another point of view, I still think that we can use falling back > >> to reclaim as the last resort to avoid OOM in some special situations, > >> for example, most pages in the lowest tier node are mlock() or too hot > >> to be reclaimed. > > > > If they're hotter than reclaim candidates on the toptier, shouldn't > > they get promoted instead and make room that way? We may have to tweak > > the watermark logic a bit to facilitate that (allow promotions where > > regular allocations already fail?). But this sort of resorting would > > be preferable to age inversions. > > Now it's legal to enable demotion and disable promotion. Yes, this is > wrong configuration in general. But should we trigger OOM for these > users? > > And now promotion only works for default NUMA policy (and MPOL_BIND to > both promotion source and target nodes with MPOL_F_NUMA_BALANCING). If > we use some other NUMA policy, the pages cannot be promoted too. > > > The mlock scenario sounds possible. In that case, it wouldn't be an > > aging inversion, since there is nothing colder on the CXL node. > > > > Maybe a bypass check should explicitly consult the demotion target > > watermarks against its evictable pages (similar to the file_is_tiny > > check in prepare_scan_count)? > > Yes. This sounds doable. > > > Because in any other scenario, if there is a bug in the promo/demo > > coordination, I think we'd rather have the OOM than deal with age > > inversions causing intermittent performance issues that are incredibly > > hard to track down. > > Previously, I thought that people will always prefer performance > regression than OOM. Apparently, I am wrong. > > Anyway, I think that we need to reduce the possibility of OOM or falling > back to reclaim as much as possible firstly. Do you agree? > I've been discussing this with a few folks here. I think FWIW general feeling here is that demoting from top tier nodes is preferred, except in extreme circumstances we would indeed like to run with a performance issue rather than OOM a customer VM. I wonder if there is another way to debug mis-tiered pages rather than trigger an oom to debug. One thing I think/hope we can trivially agree on is that proactive reclaim/demotion is _not_ an extreme circumstance. I would like me or someone from the team to follow up with a patch that disables fallback to reclaim on proactive reclaim/demotion (sc->proactive). > One possibility, can we fall back to reclaim only if the sc->priority is > small enough (even 0)? > This makes sense to me. > Best Regards, > Huang, Ying >