From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.3 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_IN_DEF_DKIM_WL autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0768CC433FE for ; Wed, 15 Sep 2021 00:30:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9BC4461214 for ; Wed, 15 Sep 2021 00:30:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 9BC4461214 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id EBAB56B006C; Tue, 14 Sep 2021 20:30:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E419D6B0072; Tue, 14 Sep 2021 20:30:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CBCB26B0073; Tue, 14 Sep 2021 20:30:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0057.hostedemail.com [216.40.44.57]) by kanga.kvack.org (Postfix) with ESMTP id B9BC86B006C for ; Tue, 14 Sep 2021 20:30:06 -0400 (EDT) Received: from smtpin30.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6BDED2FD68 for ; Wed, 15 Sep 2021 00:30:06 +0000 (UTC) X-FDA: 78587925612.30.5D9306A Received: from mail-pf1-f178.google.com (mail-pf1-f178.google.com [209.85.210.178]) by imf28.hostedemail.com (Postfix) with ESMTP id 28FDA900009E for ; Wed, 15 Sep 2021 00:30:06 +0000 (UTC) Received: by mail-pf1-f178.google.com with SMTP id b7so984653pfo.11 for ; Tue, 14 Sep 2021 17:30:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:from:to:cc:subject:in-reply-to:message-id:references :mime-version; bh=rKPPAqWinMrhwkk8dufmKi/m/ykeo+r7lvr15eegekI=; b=W2174p+PqjMxAD9mRnHIDPyb0Hd47zmv7xSvtB0BplA6tW99nXaMcj6zZMFxVYKqr/ aiWbzUiEnk7Yv0hrWrGe6Iv+pWtGsAhgHWbQUJznuTtE/8voNYohxNO8yD8ldhvedz5+ X7S7j0KC/4hn3xr7HmGB5e8QTvtfLe4BXudToAyDuruq6R+cFEkf1jZ8jGyKsXEdaPT9 mo3QPYhGJVCYsYVyh/dFAHL6eb5zq19RzKbCWkcUZKlt2o4UZ+vx2zsfwGGh+NFSIWQF CmCwHYqugGqzlhbWTeEwAaZfQ0rNWF4ZoEs9xk2J5qFerGA79vTYmcIW1/Z49zvr+fDH CjyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:mime-version; bh=rKPPAqWinMrhwkk8dufmKi/m/ykeo+r7lvr15eegekI=; b=fbm6FX/+/ogm1TNDg5FRW0mH6KOKAFp+z7f7nuTg5rdSX8zimaCMPE7dwEm+3pF/xN IheM8/zxzte0G3WZTesjDyWSKWYyOQXL0PjFm0gowgIxT8jz2L+G5AXLdNw+sQWVBde/ kIlhpDx5qXK0ShWrg0uJMx1KYFdCrADRKjGvjZOTkZRQjIYNlNY43IHJ9FlXqu2IxdhI VFTKhRp+2r2DePFjsYYm1V2YKNvMFiAI4zRUM5hdJ8oS6DbBexe/7X8RZGsewrKRUz0b TReaGY197hJI7+NK/OmCsXNV7WYHXQ8us50fZiDsDoLAocdM6YxowgkuD2qyt3D57V3x fOFg== X-Gm-Message-State: AOAM531/DyWlQwGaxpZCxwn2G5uVyUnZ6Cp+vmRaT7ygsgLBmt+fddgu dlV5RybOsslxuQYTTvUJP9FVMw== X-Google-Smtp-Source: ABdhPJzH68Knnx/GOsyKTTt1YdZ4FXTe8bdfjPXenc66eMB8J3rzXcbOFFI8k061RyWlq6EgUsb3sg== X-Received: by 2002:a05:6a00:2283:b0:43d:ef23:48e5 with SMTP id f3-20020a056a00228300b0043def2348e5mr7522479pfe.18.1631665804664; Tue, 14 Sep 2021 17:30:04 -0700 (PDT) Received: from [2620:15c:17:3:22e5:69e1:debc:7973] ([2620:15c:17:3:22e5:69e1:debc:7973]) by smtp.gmail.com with ESMTPSA id j6sm12193573pgq.0.2021.09.14.17.30.03 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 14 Sep 2021 17:30:04 -0700 (PDT) Date: Tue, 14 Sep 2021 17:30:03 -0700 (PDT) From: David Rientjes To: Feng Tang cc: Andrew Morton , Michal Hocko , Tejun Heo , Zefan Li , Johannes Weiner , Mel Gorman , Vlastimil Babka , linux-mm@kvack.org, cgroups@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Re: [PATCH v3] mm/page_alloc: detect allocation forbidden by cpuset and bail out early In-Reply-To: <1631590828-25565-1-git-send-email-feng.tang@intel.com> Message-ID: <3bd87d8a-d09e-ac7-1d1d-25ad1b9d5ed9@google.com> References: <1631590828-25565-1-git-send-email-feng.tang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII X-Stat-Signature: 8iz6udwyxqhybrr9wzecq7g1ib57k9sh Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=W2174p+P; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of rientjes@google.com designates 209.85.210.178 as permitted sender) smtp.mailfrom=rientjes@google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 28FDA900009E X-HE-Tag: 1631665806-237109 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, 14 Sep 2021, Feng Tang wrote: > diff --git a/include/linux/cpuset.h b/include/linux/cpuset.h > index d2b9c41..d58e047 100644 > --- a/include/linux/cpuset.h > +++ b/include/linux/cpuset.h > @@ -34,6 +34,8 @@ > */ > extern struct static_key_false cpusets_pre_enable_key; > extern struct static_key_false cpusets_enabled_key; > +extern struct static_key_false cpusets_insane_config_key; > + > static inline bool cpusets_enabled(void) > { > return static_branch_unlikely(&cpusets_enabled_key); > @@ -51,6 +53,19 @@ static inline void cpuset_dec(void) > static_branch_dec_cpuslocked(&cpusets_pre_enable_key); > } > > +/* > + * This will get enabled whenever a cpuset configuration is considered > + * unsupportable in general. E.g. movable only node which cannot satisfy > + * any non movable allocations (see update_nodemask). Page allocator > + * needs to make additional checks for those configurations and this > + * check is meant to guard those checks without any overhead for sane > + * configurations. > + */ > +static inline bool cpusets_insane_config(void) > +{ > + return static_branch_unlikely(&cpusets_insane_config_key); > +} > + > extern int cpuset_init(void); > extern void cpuset_init_smp(void); > extern void cpuset_force_rebuild(void); > @@ -167,6 +182,8 @@ static inline void set_mems_allowed(nodemask_t nodemask) > > static inline bool cpusets_enabled(void) { return false; } > > +static inline bool cpusets_insane_config(void) { return false; } > + > static inline int cpuset_init(void) { return 0; } > static inline void cpuset_init_smp(void) {} > > diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h > index 6a1d79d..a455333 100644 > --- a/include/linux/mmzone.h > +++ b/include/linux/mmzone.h > @@ -1220,6 +1220,22 @@ static inline struct zoneref *first_zones_zonelist(struct zonelist *zonelist, > #define for_each_zone_zonelist(zone, z, zlist, highidx) \ > for_each_zone_zonelist_nodemask(zone, z, zlist, highidx, NULL) > > +/* Whether the 'nodes' are all movable nodes */ > +static inline bool movable_only_nodes(nodemask_t *nodes) > +{ > + struct zonelist *zonelist; > + struct zoneref *z; > + > + if (nodes_empty(*nodes)) > + return false; > + > + zonelist = > + &NODE_DATA(first_node(*nodes))->node_zonelists[ZONELIST_FALLBACK]; > + z = first_zones_zonelist(zonelist, ZONE_NORMAL, nodes); > + return (!z->zone) ? true : false; > +} > + > + > #ifdef CONFIG_SPARSEMEM > #include > #endif > diff --git a/kernel/cgroup/cpuset.c b/kernel/cgroup/cpuset.c > index df1ccf4..7fa633e 100644 > --- a/kernel/cgroup/cpuset.c > +++ b/kernel/cgroup/cpuset.c > @@ -69,6 +69,13 @@ > DEFINE_STATIC_KEY_FALSE(cpusets_pre_enable_key); > DEFINE_STATIC_KEY_FALSE(cpusets_enabled_key); > > +/* > + * There could be abnormal cpuset configurations for cpu or memory > + * node binding, add this key to provide a quick low-cost judgement > + * of the situation. > + */ > +DEFINE_STATIC_KEY_FALSE(cpusets_insane_config_key); > + > /* See "Frequency meter" comments, below. */ > > struct fmeter { > @@ -1868,6 +1875,14 @@ static int update_nodemask(struct cpuset *cs, struct cpuset *trialcs, > if (retval < 0) > goto done; > > + if (!cpusets_insane_config() && > + movable_only_nodes(&trialcs->mems_allowed)) { > + static_branch_enable(&cpusets_insane_config_key); > + pr_info("Unsupported (movable nodes only) cpuset configuration detected (nmask=%*pbl)! " > + "Cpuset allocations might fail even with a lot of memory available.\n", > + nodemask_pr_args(&trialcs->mems_allowed)); > + } > + > spin_lock_irq(&callback_lock); > cs->mems_allowed = trialcs->mems_allowed; > spin_unlock_irq(&callback_lock); Is this the only time that the state of the nodemask may change? I'm wondering about a single node nodemask, for example, where all ZONE_NORMAL memory is hot-removed. > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index b37435c..a7e0854 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -4914,6 +4914,19 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int order, > if (!ac->preferred_zoneref->zone) > goto nopage; > > + /* > + * Check for insane configurations where the cpuset doesn't contain > + * any suitable zone to satisfy the request - e.g. non-movable > + * GFP_HIGHUSER allocations from MOVABLE nodes only. > + */ > + if (cpusets_insane_config() && (gfp_mask & __GFP_HARDWALL)) { > + struct zoneref *z = first_zones_zonelist(ac->zonelist, > + ac->highest_zoneidx, > + &cpuset_current_mems_allowed); > + if (!z->zone) > + goto nopage; > + } > + > if (alloc_flags & ALLOC_KSWAPD) > wake_all_kswapds(order, gfp_mask, ac); >