From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9464CC433F5 for ; Wed, 13 Oct 2021 12:59:16 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 27ACC610E7 for ; Wed, 13 Oct 2021 12:59:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 27ACC610E7 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B697F6B006C; Wed, 13 Oct 2021 08:59:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B19796B0071; Wed, 13 Oct 2021 08:59:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E1A1900002; Wed, 13 Oct 2021 08:59:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0239.hostedemail.com [216.40.44.239]) by kanga.kvack.org (Postfix) with ESMTP id 915CB6B006C for ; Wed, 13 Oct 2021 08:59:15 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 547B98249980 for ; Wed, 13 Oct 2021 12:59:15 +0000 (UTC) X-FDA: 78691419870.12.EDE2EE1 Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf05.hostedemail.com (Postfix) with ESMTP id CA3B3507DDE0 for ; Wed, 13 Oct 2021 12:59:14 +0000 (UTC) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19DAj2RV030875; Wed, 13 Oct 2021 08:59:09 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=I6QWnvh7CDu2cHlUScZ2g8w/ZhEhc7C3C6S2gifLyME=; b=Ws+G4uwqkB/gfDXzxpI/LZDYvx2SPF5g9WnWRrT7aJK/BPZDvZRr35lqI8UuzyTTJvs8 H0X/qOu/gWryLFYZNRukfeRQZ1jsLPBOj1iM7E0dU/uR2qcFHTSGkd34zat4/c1vy+oB vjxnLXzpppdGsp4RxVw6ggnWT5BFMCE8YPluSyoB+B8Y7sv8iu70Fj+CWfjVFcKUWYsS rjsQY5ZpCjfoGBBJFwF4jlPZZwPmM/uJkJzDoOiSH/DqXFrsvFuYjzc2e2lYzfafHv18 fUWsHrqLJ2f709y2W7ndL4xrX8Krzr2OYH2x/i88Zbdx9nbk5cV82Stbb2L1TAkOPD6V NQ== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3bnm40x6wt-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 13 Oct 2021 08:59:09 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 19DBeZCN008462; Wed, 13 Oct 2021 08:59:08 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 3bnm40x6vr-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 13 Oct 2021 08:59:08 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19DCvRxc026011; Wed, 13 Oct 2021 12:59:06 GMT Received: from b06cxnps4074.portsmouth.uk.ibm.com (d06relay11.portsmouth.uk.ibm.com [9.149.109.196]) by ppma03ams.nl.ibm.com with ESMTP id 3bk2q9tw9h-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Wed, 13 Oct 2021 12:59:05 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4074.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19DCwtQn2163296 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Wed, 13 Oct 2021 12:58:55 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 116F4AE04D; Wed, 13 Oct 2021 12:58:55 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id E2B5AAE068; Wed, 13 Oct 2021 12:58:43 +0000 (GMT) Received: from [9.43.38.58] (unknown [9.43.38.58]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Wed, 13 Oct 2021 12:58:43 +0000 (GMT) Message-ID: <9a0baa59-f316-103f-3030-990cd91d1813@linux.ibm.com> Date: Wed, 13 Oct 2021 18:28:40 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0 Subject: Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy Content-Language: en-US To: Michal Hocko Cc: linux-mm@kvack.org, akpm@linux-foundation.org, Ben Widawsky , Dave Hansen , Feng Tang , Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Andi Kleen , Dan Williams , Huang Ying , linux-api@vger.kernel.org References: <20211013094539.962357-1-aneesh.kumar@linux.ibm.com> <4399a215-296f-e880-c5f4-8065ab13d210@linux.ibm.com> From: "Aneesh Kumar K.V" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-ORIG-GUID: do1FoY5ZMD10-LI6VnV8SC0i2V0OHcD5 X-Proofpoint-GUID: vo7-CIkvo6E282yGxa_abL5WBqbghWMh X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-13_05,2021-10-13_02,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 mlxlogscore=818 phishscore=0 suspectscore=0 lowpriorityscore=0 impostorscore=0 mlxscore=0 adultscore=0 priorityscore=1501 spamscore=0 clxscore=1015 bulkscore=0 malwarescore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001 definitions=main-2110130084 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: CA3B3507DDE0 X-Stat-Signature: u34t7e5kd6i7urki31dk1cs59k445bcz Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=Ws+G4uwq; spf=pass (imf05.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-HE-Tag: 1634129954-373271 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/13/21 18:20, Michal Hocko wrote: > On Wed 13-10-21 18:05:49, Aneesh Kumar K.V wrote: >> On 10/13/21 16:18, Michal Hocko wrote: >>> On Wed 13-10-21 12:42:34, Michal Hocko wrote: >>>> [Cc linux-api] >>>> >>>> On Wed 13-10-21 15:15:39, Aneesh Kumar K.V wrote: >>>>> This mempolicy mode can be used with either the set_mempolicy(2) >>>>> or mbind(2) interfaces. Like the MPOL_PREFERRED interface, it >>>>> allows an application to set a preference node from which the kernel >>>>> will fulfill memory allocation requests. Unlike the MPOL_PREFERRED mode, >>>>> it takes a set of nodes. The nodes in the nodemask are used as fallback >>>>> allocation nodes if memory is not available on the preferred node. >>>>> Unlike MPOL_PREFERRED_MANY, it will not fall back memory allocations >>>>> to all nodes in the system. Like the MPOL_BIND interface, it works over a >>>>> set of nodes and will cause a SIGSEGV or invoke the OOM killer if >>>>> memory is not available on those preferred nodes. >>>>> >>>>> This patch helps applications to hint a memory allocation preference node >>>>> and fallback to _only_ a set of nodes if the memory is not available >>>>> on the preferred node. Fallback allocation is attempted from the node which is >>>>> nearest to the preferred node. >>>>> >>>>> This new memory policy helps applications to have explicit control on slow >>>>> memory allocation and avoids default fallback to slow memory NUMA nodes. >>>>> The difference with MPOL_BIND is the ability to specify a preferred node >>>>> which is the first node in the nodemask argument passed. >>> >>> I am sorry but I do not understand the semantic diffrence from >>> MPOL_BIND. Could you be more specific please? >>> >> >> >> >> MPOL_BIND >> This mode specifies that memory must come from the set of >> nodes specified by the policy. Memory will be allocated from >> the node in the set with sufficient free memory that is >> closest to the node where the allocation takes place. >> >> >> MPOL_PREFERRED_STRICT >> This mode specifies that the allocation should be attempted >> from the first node specified in the nodemask of the policy. >> If that allocation fails, the kernel will search other nodes >> in the nodemask, in order of increasing distance from the >> preferred node based on information provided by the platform firmware. >> >> The difference is the ability to specify the preferred node as the first >> node in the nodemask and all fallback allocations are based on the distance >> from the preferred node. With MPOL_BIND they base based on the node where >> the allocation takes place. > > OK, this makes it more clear. Thanks! > > I am still not sure the semantic makes sense though. Why should > the lowest node in the nodemask have any special meaning? What if it is > a node with a higher number that somebody preferes to start with? > That is true. I haven't been able to find an easy way to specify the preferred node other than expressing it as first node in the node mask. Yes, it limits the usage of the policy. Any alternate suggestion? We could do set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX))) set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for above PREFERRED policy)) But that really complicates the interface? -aneesh