From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB423C19F28 for ; Wed, 3 Aug 2022 07:36:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 349EA6B0071; Wed, 3 Aug 2022 03:36:45 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2F9FC6B0072; Wed, 3 Aug 2022 03:36:45 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1C1348E0001; Wed, 3 Aug 2022 03:36:45 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 0C45E6B0071 for ; Wed, 3 Aug 2022 03:36:45 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id D1B101A13EF for ; Wed, 3 Aug 2022 07:36:44 +0000 (UTC) X-FDA: 79757474328.01.B0C555F Received: from smtp-out1.suse.de (smtp-out1.suse.de [195.135.220.28]) by imf03.hostedemail.com (Postfix) with ESMTP id D914E2011B for ; Wed, 3 Aug 2022 07:36:43 +0000 (UTC) Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by smtp-out1.suse.de (Postfix) with ESMTPS id 60EF24007C; Wed, 3 Aug 2022 07:36:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1659512202; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=Blhmz/bey4/9JiT5I8zzJO8Hj3s1hGffUTuW4GklLyY=; b=UW2+X6zsjhiz0t2S7DSu40G0eLakYa0UwLxql1l3jzK1ZZVvlThQ1evwDpKJWD2XF2yUWQ WLJo/StRFly5ftRc2uzbbU2ReksSQ7N4QeUp8+iD/T3n2NapG+ZX6XsTWvgNZt/GGbEd3D 3sl5vuxIgOU1pMCpcotKGunse7hkN9E= Received: from imap2.suse-dmz.suse.de (imap2.suse-dmz.suse.de [192.168.254.74]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature ECDSA (P-521) server-digest SHA512) (No client certificate requested) by imap2.suse-dmz.suse.de (Postfix) with ESMTPS id 3F30E13AF2; Wed, 3 Aug 2022 07:36:42 +0000 (UTC) Received: from dovecot-director2.suse.de ([192.168.254.65]) by imap2.suse-dmz.suse.de with ESMTPSA id bHqUDIol6mJiUgAAMHmgww (envelope-from ); Wed, 03 Aug 2022 07:36:42 +0000 Date: Wed, 3 Aug 2022 09:36:41 +0200 From: Michal Hocko To: Feng Tang Cc: Muchun Song , "akpm@linux-foundation.org" , "bwidawsk@kernel.org" , "dave.hansen@linux.intel.com" , "linux-mm@kvack.org" , "linux-kernel@vger.kernel.org" , Mike Kravetz Subject: Re: [PATCH] mm: mempolicy: fix policy_nodemask() for MPOL_PREFERRED_MANY case Message-ID: References: <20220801084207.39086-1-songmuchun@bytedance.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=UW2+X6zs; spf=pass (imf03.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1659512204; a=rsa-sha256; cv=none; b=5VfUYDcxhcKVVE6js4B7VvZ/JGPoU4tFTuEt7SjqnLe2w/Jo+L02iII2nTg5royA5s3cbK JJ7waYtiF9LgQ51M2rof9jssl10k15+dDy2PF+ZIav5f8HGto8tJUQ5tAUzD52XkiZ0g5L v+UNUKvX3OxQT9BDPU7GLOQdDwqXnMY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1659512204; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Blhmz/bey4/9JiT5I8zzJO8Hj3s1hGffUTuW4GklLyY=; b=3w3iruEjhYGFjDKrnDdDELWLuUcgoudx7iv6ctBjtV/kFe/yGCP7GQYxWbx/f8D5YQIMww U7Z2o+56/XHfTFUIyIfOGvoE5YId7qP4Nsa5fZLnRzNq5wy4Aw0s5/9QJMtv/1PfGXUdlZ gDmBJu9SWA/pj7jLVPAVf/e7gL/3H4Y= X-Stat-Signature: is3esejjjp65pywfjn97t5wtr1as6fuj X-Rspamd-Queue-Id: D914E2011B X-Rspam-User: X-Rspamd-Server: rspam05 Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=UW2+X6zs; spf=pass (imf03.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.28 as permitted sender) smtp.mailfrom=mhocko@suse.com; dmarc=pass (policy=quarantine) header.from=suse.com X-HE-Tag: 1659512203-334414 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 03-08-22 14:41:20, Feng Tang wrote: > On Tue, Aug 02, 2022 at 05:02:37PM +0800, Michal Hocko wrote: > > Please make sure to CC Mike on hugetlb related changes. > > OK. > > > I didn't really get to grasp your proposed solution but it feels goind > > sideways. The real issue is that hugetlb uses a dedicated allocation > > scheme which is not fully MPOL_PREFERRED_MANY aware AFAICS. I do not > > think we should be tricking that by providing some fake nodemasks and > > what not. > > > > The good news is that allocation from the pool is MPOL_PREFERRED_MANY > > aware because it first tries to allocation from the preffered node mask > > and then fall back to the full nodemask (dequeue_huge_page_vma). > > If the existing pools cannot really satisfy that allocation then it > > tries to allocate a new hugetlb page (alloc_fresh_huge_page) which also > > performs 2 stage allocation with the node mask and no node masks. But > > both of them might fail. > > > > The bad news is that other allocation functions - including those that > > allocate to the pool are not fully MPOL_PREFERRED_MANY aware. E.g. > > __nr_hugepages_store_common paths which use the allocating process > > policy to fill up the pool so the pool could be under provisioned if > > that context is using MPOL_PREFERRED_MANY. > > Thanks for the check! > > So you mean if the prferred nodes don't have enough pages, we should > also fallback to all like dequeue_huge_page_vma() does? > > Or we can user a policy API which return nodemask for MPOL_BIND and > NULL for all other policies, like allowed_mems_nr() needs. > > --- a/include/linux/mempolicy.h > +++ b/include/linux/mempolicy.h > @@ -158,6 +158,18 @@ static inline nodemask_t *policy_nodemask_current(gfp_t gfp) > return policy_nodemask(gfp, mpol); > } > > +#ifdef CONFIG_HUGETLB_FS > +static inline nodemask_t *strict_policy_nodemask_current(void) > +{ > + struct mempolicy *mpol = get_task_policy(current); > + > + if (mpol->mode == MPOL_BIND) > + return &mpol->nodes; > + > + return NULL; > +} > +#endif Yes something like this, except that I would also move this into hugetlb proper because this doesn't seem generally useful. > > Wrt. allowed_mems_nr (i.e. hugetlb_acct_memory) this is a reservation > > code and I have to admit I do not really remember details there. This is > > a subtle code and my best guess would be that policy_nodemask_current > > should be hugetlb specific and only care about MPOL_BIND. > > The API needed by allowed_mem_nr() is a little different as it has gfp > flag and cpuset config to consider. Why would gfp mask matter? -- Michal Hocko SUSE Labs