From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C84FC4338F for ; Fri, 6 Aug 2021 13:28:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 015F7610FF for ; Fri, 6 Aug 2021 13:28:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 015F7610FF Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 66A488D0002; Fri, 6 Aug 2021 09:28:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 61AA66B0071; Fri, 6 Aug 2021 09:28:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5091A8D0002; Fri, 6 Aug 2021 09:28:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0187.hostedemail.com [216.40.44.187]) by kanga.kvack.org (Postfix) with ESMTP id 366796B006C for ; Fri, 6 Aug 2021 09:28:15 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id DD0091D58F for ; Fri, 6 Aug 2021 13:28:14 +0000 (UTC) X-FDA: 78444734508.33.4DA1578 Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.220.29]) by imf18.hostedemail.com (Postfix) with ESMTP id 73AE74005BF7 for ; Fri, 6 Aug 2021 13:28:14 +0000 (UTC) Received: from relay2.suse.de (relay2.suse.de [149.44.160.134]) by smtp-out2.suse.de (Postfix) with ESMTP id 78F891FED4; Fri, 6 Aug 2021 13:28:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=suse.com; s=susede1; t=1628256493; h=from:from:reply-to:date:date:message-id:message-id:to:to:cc:cc: mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=5g2SBpEkCatphDUwYVXrJmzhDlPfjpgjUvk4dHWmc7s=; b=DLBtZe2U5qpqD52Mi1+ZGD14Nn5rs0T1Hw9vcj21EBJL0BZTBlzKJfKG8qZlFjd/MTfxCs gJeq1DOa+ka/YcKBwZFyF4cp3+P+0kPp3CuqHj6/hK6dgv8RfE25Z5GuZRGBpYuihfGAwK KSbcnd4poQ0SqLt3VJEeGzg/S89FtLQ= Received: from suse.cz (mhocko.udp.ovpn2.prg.suse.de [10.100.201.86]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by relay2.suse.de (Postfix) with ESMTPS id 4B766A3B91; Fri, 6 Aug 2021 13:28:13 +0000 (UTC) Date: Fri, 6 Aug 2021 15:28:12 +0200 From: Michal Hocko To: Feng Tang Cc: linux-mm@kvack.org, Andrew Morton , David Rientjes , Dave Hansen , Ben Widawsky , linux-kernel@vger.kernel.org, linux-api@vger.kernel.org, Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , ying.huang@intel.com, Dave Hansen Subject: Re: [PATCH v7 1/5] mm/mempolicy: Add MPOL_PREFERRED_MANY for multiple preferred nodes Message-ID: References: <1627970362-61305-1-git-send-email-feng.tang@intel.com> <1627970362-61305-2-git-send-email-feng.tang@intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <1627970362-61305-2-git-send-email-feng.tang@intel.com> X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 73AE74005BF7 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=suse.com header.s=susede1 header.b=DLBtZe2U; dmarc=pass (policy=quarantine) header.from=suse.com; spf=pass (imf18.hostedemail.com: domain of mhocko@suse.com designates 195.135.220.29 as permitted sender) smtp.mailfrom=mhocko@suse.com X-Stat-Signature: oiktftcm8dbmqjf7bfys5c6sk5p567k4 X-HE-Tag: 1628256494-926851 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue 03-08-21 13:59:18, Feng Tang wrote: > From: Dave Hansen > > The NUMA APIs currently allow passing in a "preferred node" as a > single bit set in a nodemask. If more than one bit it set, bits > after the first are ignored. > > This single node is generally OK for location-based NUMA where > memory being allocated will eventually be operated on by a single > CPU. However, in systems with multiple memory types, folks want > to target a *type* of memory instead of a location. For instance, > someone might want some high-bandwidth memory but do not care about > the CPU next to which it is allocated. Or, they want a cheap, > high capacity allocation and want to target all NUMA nodes which > have persistent memory in volatile mode. In both of these cases, > the application wants to target a *set* of nodes, but does not > want strict MPOL_BIND behavior as that could lead to OOM killer or > SIGSEGV. > > So add MPOL_PREFERRED_MANY policy to support the multiple preferred > nodes requirement. This is not a pie-in-the-sky dream for an API. > This was a response to a specific ask of more than one group at Intel. > Specifically: > > 1. There are existing libraries that target memory types such as > https://github.com/memkind/memkind. These are known to suffer > from SIGSEGV's when memory is low on targeted memory "kinds" that > span more than one node. The MCDRAM on a Xeon Phi in "Cluster on > Die" mode is an example of this. > 2. Volatile-use persistent memory users want to have a memory policy > which is targeted at either "cheap and slow" (PMEM) or "expensive and > fast" (DRAM). However, they do not want to experience allocation > failures when the targeted type is unavailable. > 3. Allocate-then-run. Generally, we let the process scheduler decide > on which physical CPU to run a task. That location provides a > default allocation policy, and memory availability is not generally > considered when placing tasks. For situations where memory is > valuable and constrained, some users want to allocate memory first, > *then* allocate close compute resources to the allocation. This is > the reverse of the normal (CPU) model. Accelerators such as GPUs > that operate on core-mm-managed memory are interested in this model. > > A check is added in sanitize_mpol_flags() to not permit 'prefer_many' > policy to be used for now, and will be removed in later patch after all > implementations for 'prefer_many' are ready, as suggested by Michal Hocko. > > [Michal Hocko: suggest to refine policy_node/policy_nodemask handling] > Link: https://lore.kernel.org/r/20200630212517.308045-4-ben.widawsky@intel.com > Co-developed-by: Ben Widawsky > Signed-off-by: Ben Widawsky > Signed-off-by: Dave Hansen > Signed-off-by: Feng Tang Acked-by: Michal Hocko Thanks! -- Michal Hocko SUSE Labs