From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.0 required=3.0 tests=MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 16AC1C433E0 for ; Thu, 25 Jun 2020 06:28:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D601A20702 for ; Thu, 25 Jun 2020 06:28:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D601A20702 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 628E76B0002; Thu, 25 Jun 2020 02:28:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5D9EE6B0003; Thu, 25 Jun 2020 02:28:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4EF996B0005; Thu, 25 Jun 2020 02:28:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3900E6B0002 for ; Thu, 25 Jun 2020 02:28:34 -0400 (EDT) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id D018D180AD820 for ; Thu, 25 Jun 2020 06:28:33 +0000 (UTC) X-FDA: 76966755306.17.can23_540a13b26e4a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin17.hostedemail.com (Postfix) with ESMTP id AB9CD180D0184 for ; Thu, 25 Jun 2020 06:28:33 +0000 (UTC) X-HE-Tag: can23_540a13b26e4a X-Filterd-Recvd-Size: 7666 Received: from mail-ej1-f68.google.com (mail-ej1-f68.google.com [209.85.218.68]) by imf42.hostedemail.com (Postfix) with ESMTP for ; Thu, 25 Jun 2020 06:28:33 +0000 (UTC) Received: by mail-ej1-f68.google.com with SMTP id w16so4798655ejj.5 for ; Wed, 24 Jun 2020 23:28:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=IZxXz9KByq6hibP/lbhu1aFBYmRX2QrnJLs+2zSgJz0=; b=J5fsTPwaAUVIVARUW1SqZs4MDZ9Zj+MtwuKV/86OSyPGlc7Tdo2Y/VcSWCjFJ/Brcq Fp7UqSKeRLPejo0Dcy/3ce2ak9V0NVm1Z7MgHm0lcLlcxckQSj7oGSdADITmaGSfpARx PvvDx1/BfmdNVHch/C/IX6z62EysijNFniOzusq1RRQPveDSq21aGUNYISdePdSVIIdA OE5vPWOl9XBpjzsyy8XGmEhu6fTuxszrGm9ncjD99EhG4y/aqhTz7fpFcKYgVnRgGJNd 5hu+EDqTUnNba4ZFpEZ6JHyoDgVG0rqE0/By2IKa6q5uu93YJXeX6HskEcqrRFtp9t7z eWQA== X-Gm-Message-State: AOAM533xip/9lOjenQJxRu5upUyIJh5UOmsAp7KjRSJHL71gzkID5Idw 0oh5ubMv+P/0j25hufhqRga41fnG X-Google-Smtp-Source: ABdhPJxk1Bk247vqEFFVV1m7vyabawPQ99efDuds9l9gBdxnvVfUu45BgKrTQEkkN8a6fEkmvAAArA== X-Received: by 2002:a17:906:488b:: with SMTP id v11mr19650243ejq.173.1593066512104; Wed, 24 Jun 2020 23:28:32 -0700 (PDT) Received: from localhost (ip-37-188-168-3.eurotel.cz. [37.188.168.3]) by smtp.gmail.com with ESMTPSA id m13sm6736266ejc.1.2020.06.24.23.28.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 Jun 2020 23:28:28 -0700 (PDT) Date: Thu, 25 Jun 2020 08:28:27 +0200 From: Michal Hocko To: Ben Widawsky Cc: linux-mm , Andi Kleen , Andrew Morton , Christoph Lameter , Dan Williams , Dave Hansen , David Hildenbrand , David Rientjes , Jason Gunthorpe , Johannes Weiner , Jonathan Corbet , Kuppuswamy Sathyanarayanan , Lee Schermerhorn , Li Xinhai , Mel Gorman , Mike Kravetz , Mina Almasry , Tejun Heo , Vlastimil Babka , linux-api@vger.kernel.org Subject: Re: [PATCH 00/18] multiple preferred nodes Message-ID: <20200625062827.GB1320@dhcp22.suse.cz> References: <20200624075216.GC1320@dhcp22.suse.cz> <20200624161643.75fkkvsxlmp3bf2e@intel.com> <20200624183917.GW1320@dhcp22.suse.cz> <20200624193733.tqeligjd3pdvrsmi@intel.com> <20200624195158.GX1320@dhcp22.suse.cz> <20200624200140.dypw6snshshzlbwa@intel.com> <20200624200750.GY1320@dhcp22.suse.cz> <20200624202344.woogq4n3bqkuejty@intel.com> <20200624204232.GZ1320@dhcp22.suse.cz> <20200624205518.tzcvjayntez4ueqw@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200624205518.tzcvjayntez4ueqw@intel.com> X-Rspamd-Queue-Id: AB9CD180D0184 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam02 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 24-06-20 13:55:18, Ben Widawsky wrote: > On 20-06-24 22:42:32, Michal Hocko wrote: > > On Wed 24-06-20 13:23:44, Ben Widawsky wrote: > > > On 20-06-24 22:07:50, Michal Hocko wrote: > > > > On Wed 24-06-20 13:01:40, Ben Widawsky wrote: > > > > > On 20-06-24 21:51:58, Michal Hocko wrote: > > > > > > On Wed 24-06-20 12:37:33, Ben Widawsky wrote: > > > > > > > On 20-06-24 20:39:17, Michal Hocko wrote: > > > > > > > > On Wed 24-06-20 09:16:43, Ben Widawsky wrote: > > > > [...] > > > > > > > > > > Or do I miss something that really requires more involved approach like > > > > > > > > > > building custom zonelists and other larger changes to the allocator? > > > > > > > > > > > > > > > > > > I think I'm missing how this allows selecting from multiple preferred nodes. In > > > > > > > > > this case when you try to get the page from the freelist, you'll get the > > > > > > > > > zonelist of the preferred node, and when you actually scan through on page > > > > > > > > > allocation, you have no way to filter out the non-preferred nodes. I think the > > > > > > > > > plumbing of multiple nodes has to go all the way through > > > > > > > > > __alloc_pages_nodemask(). But it's possible I've missed the point. > > > > > > > > > > > > > > > > policy_nodemask() will provide the nodemask which will be used as a > > > > > > > > filter on the policy_node. > > > > > > > > > > > > > > Ah, gotcha. Enabling independent masks seemed useful. Some bad decisions got me > > > > > > > to that point. UAPI cannot get independent masks, and callers of these functions > > > > > > > don't yet use them. > > > > > > > > > > > > > > So let me ask before I actually type it up and find it's much much simpler, is > > > > > > > there not some perceived benefit to having both masks being independent? > > > > > > > > > > > > I am not sure I follow. Which two masks do you have in mind? zonelist > > > > > > and user provided nodemask? > > > > > > > > > > Internally, a nodemask_t for preferred node, and a nodemask_t for bound nodes. > > > > > > > > Each mask is a local to its policy object. > > > > > > I mean for __alloc_pages_nodemask as an internal API. That is irrespective of > > > policy. Policy decisions are all made beforehand. The question from a few mails > > > ago was whether there is any use in keeping that change to > > > __alloc_pages_nodemask accepting two nodemasks. > > > > It is probably too late for me because I am still not following you > > mean. Maybe it would be better to provide a pseudo code what you have in > > mind. Anyway all that I am saying is that for the functionality that you > > propose and _if_ the fallback strategy is fixed then all you should need > > is to use the preferred nodemask for the __alloc_pages_nodemask and a > > fallback allocation to the full (NULL nodemask). So you first try what > > the userspace prefers - __GFP_RETRY_MAYFAIL will give you try hard but > > do not OOM if the memory is depleted semantic and the fallback > > allocation goes all the way to OOM on the complete memory depletion. > > So I do not see much point in a custom zonelist for the policy. Maybe as > > a micro-optimization to save some branches here and there. > > > > If you envision usecases which might want to control the fallback > > allocation strategy then this would get more complex because you > > would need a sorted list of zones to try but this would really require > > some solid usecase and it should build on top of a trivial > > implementation which really is BIND with the fallback. > > > > I will implement what you suggest. I think it's a good suggestion. Here is what > I mean though: > -struct page * > -__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > - nodemask_t *nodemask); > +struct page * > +__alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, nodemask_t *prefmask, > + nodemask_t *nodemask); > > Is there any value in keeping two nodemasks as part of the interface? I do not see any advantage. The first thing you would have to do is either intersect the two or special case the code to use one over another and then you would need a clear criterion on how to do that. -- Michal Hocko SUSE Labs