From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 54E74C433EF for ; Thu, 14 Oct 2021 15:51:14 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C570760E05 for ; Thu, 14 Oct 2021 15:51:13 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org C570760E05 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=linux.ibm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 5DB7A940007; Thu, 14 Oct 2021 11:51:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 588F3900002; Thu, 14 Oct 2021 11:51:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42A62940007; Thu, 14 Oct 2021 11:51:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0029.hostedemail.com [216.40.44.29]) by kanga.kvack.org (Postfix) with ESMTP id 33A03900002 for ; Thu, 14 Oct 2021 11:51:13 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D8D7F1845F495 for ; Thu, 14 Oct 2021 15:51:12 +0000 (UTC) X-FDA: 78695481984.01.B50334F Received: from mx0a-001b2d01.pphosted.com (mx0b-001b2d01.pphosted.com [148.163.158.5]) by imf10.hostedemail.com (Postfix) with ESMTP id 06922600198F for ; Thu, 14 Oct 2021 15:51:10 +0000 (UTC) Received: from pps.filterd (m0098419.ppops.net [127.0.0.1]) by mx0b-001b2d01.pphosted.com (8.16.1.2/8.16.1.2) with SMTP id 19EEV1FR030630; Thu, 14 Oct 2021 11:51:02 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ibm.com; h=message-id : date : mime-version : subject : to : cc : references : from : in-reply-to : content-type : content-transfer-encoding; s=pp1; bh=VRZGl9bVSg/6HkZdK/Vb807nWihxGvbvEYCgmvZGz3Y=; b=ZMtgC5CJhWw2AVUaiNRs6TF/xElw3I35w9N0YUQRkVUul3pDEBbPqRTH62svrnurvKdU pBDL8oOsFTmMhV8+dooAWSC//ySDlvINZGb8o9IOxqtOsdCpJAc1ebmkl8x/K5Zd+xOh F51ATxwd4NbzDGt7l545BENb03e/FR3HIAK6giX0Ke2bEOzidzSow+mGOfmkNDEcWzpz NHt49hrMM3sjjt4c8CyMDQlqWuG2TMSD5p5qpXG7SHyGSVxjn9tJoe36O3XKUYNpQSmQ 80dLvmvTlX8UY8WlDNDCbCtoaKUwK29ZQOYezAASqUyKa4RoKGZs+v9hi1uFnUDck8tI fw== Received: from pps.reinject (localhost [127.0.0.1]) by mx0b-001b2d01.pphosted.com with ESMTP id 3bpgv4jxc4-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Oct 2021 11:51:01 -0400 Received: from m0098419.ppops.net (m0098419.ppops.net [127.0.0.1]) by pps.reinject (8.16.0.43/8.16.0.43) with SMTP id 19EFp1jX014339; Thu, 14 Oct 2021 11:51:01 -0400 Received: from ppma03ams.nl.ibm.com (62.31.33a9.ip4.static.sl-reverse.com [169.51.49.98]) by mx0b-001b2d01.pphosted.com with ESMTP id 3bpgv4jxbd-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Oct 2021 11:51:01 -0400 Received: from pps.filterd (ppma03ams.nl.ibm.com [127.0.0.1]) by ppma03ams.nl.ibm.com (8.16.1.2/8.16.1.2) with SMTP id 19EFXVnY029841; Thu, 14 Oct 2021 15:50:59 GMT Received: from b06cxnps4076.portsmouth.uk.ibm.com (d06relay13.portsmouth.uk.ibm.com [9.149.109.198]) by ppma03ams.nl.ibm.com with ESMTP id 3bk2qa6x9g-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=NOT); Thu, 14 Oct 2021 15:50:59 +0000 Received: from d06av26.portsmouth.uk.ibm.com (d06av26.portsmouth.uk.ibm.com [9.149.105.62]) by b06cxnps4076.portsmouth.uk.ibm.com (8.14.9/8.14.9/NCO v10.0) with ESMTP id 19EFouJL38797822 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 14 Oct 2021 15:50:56 GMT Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id AD857AE04D; Thu, 14 Oct 2021 15:50:56 +0000 (GMT) Received: from d06av26.portsmouth.uk.ibm.com (unknown [127.0.0.1]) by IMSVA (Postfix) with ESMTP id 46C4CAE045; Thu, 14 Oct 2021 15:50:53 +0000 (GMT) Received: from [9.43.85.209] (unknown [9.43.85.209]) by d06av26.portsmouth.uk.ibm.com (Postfix) with ESMTP; Thu, 14 Oct 2021 15:50:53 +0000 (GMT) Message-ID: <249414f6-1bb7-b76c-5b5b-2b3ed8937d7b@linux.ibm.com> Date: Thu, 14 Oct 2021 21:20:51 +0530 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.1.0 Subject: Re: [RFC PATCH] mm/mempolicy: add MPOL_PREFERRED_STRICT memory policy Content-Language: en-US To: Michal Hocko Cc: Andi Kleen , linux-mm@kvack.org, akpm@linux-foundation.org, Ben Widawsky , Dave Hansen , Feng Tang , Andrea Arcangeli , Mel Gorman , Mike Kravetz , Randy Dunlap , Vlastimil Babka , Dan Williams , Huang Ying References: <20211013094539.962357-1-aneesh.kumar@linux.ibm.com> <83483424-e617-51c4-d55c-6106e66e2659@linux.intel.com> <87pms8ymvl.fsf@linux.ibm.com> <49514c97-c540-48ee-0b2f-3cd7bd3dfcf9@linux.ibm.com> From: "Aneesh Kumar K.V" In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-TM-AS-GCONF: 00 X-Proofpoint-GUID: J84908C_k0zwsyq6uWD0pVT-l52SctO8 X-Proofpoint-ORIG-GUID: eMqTFWGDCO2wKWNn6tZ84kjOTWVHFb55 X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.182.1,Aquarius:18.0.790,Hydra:6.0.425,FMLib:17.0.607.475 definitions=2021-10-14_08,2021-10-14_02,2020-04-07_01 X-Proofpoint-Spam-Details: rule=outbound_notspam policy=outbound score=0 clxscore=1015 phishscore=0 mlxscore=0 priorityscore=1501 mlxlogscore=999 impostorscore=0 malwarescore=0 lowpriorityscore=0 bulkscore=0 spamscore=0 adultscore=0 suspectscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.12.0-2109230001 definitions=main-2110140093 X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 06922600198F X-Stat-Signature: gm7y5bzhnfiezwwd57dxybq6s1s5n9cd Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=ibm.com header.s=pp1 header.b=ZMtgC5CJ; spf=pass (imf10.hostedemail.com: domain of aneesh.kumar@linux.ibm.com designates 148.163.158.5 as permitted sender) smtp.mailfrom=aneesh.kumar@linux.ibm.com; dmarc=pass (policy=none) header.from=ibm.com X-HE-Tag: 1634226670-545605 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 10/14/21 20:26, Michal Hocko wrote: > On Thu 14-10-21 18:59:14, Aneesh Kumar K.V wrote: >> On 10/14/21 17:11, Michal Hocko wrote: >>> On Thu 14-10-21 15:58:29, Aneesh Kumar K.V wrote: >>>> On 10/14/21 15:08, Michal Hocko wrote: >>> [...] >>>>> Besides that it would be really great to finish the discussion about the >>>>> usecase before suggesting a new userspace API. >>>>> >>>> >>>> Application would like to hint a preferred node for allocating memory >>>> backing a va range and at the same time wants to avoid fallback to some set >>>> of nodes (in the use case I am interested don't fall back to slow memory >>>> nodes). >>> >>> We do have means for that, right? You can set your memory policy and >>> then set the cpu afffinity to the node you want to allocate from >>> initially. You can migrate to a different cpu/node if this is not the >>> preferred affinity. Why is that not usable? >> >> For the same reason you mentioned earlier, these nodes can be cpu less >> nodes. > > It would have been easier if you were explicit about the usecase rather > than let other guess. > >>> Also think about extensibility. Say I want to allocate from a set of >>> nodes first before falling back to the rest of the nodemask? If you want >>> to add a new API then think of other potential usecases. >>> >> >> Describing the specific allocation details become hard with preferred node >> being a nodemask. With the below interface >> >> SYSCALL_DEFINE5(preferred_mbind, unsigned long, start, unsigned long, len, >> const unsigned long __user *, preferred_nmask, const unsigned long __user >> *, fallback_nmask, >> unsigned long, maxnode) >> { >> >> >> 1. The preferred node is the first node in the preferred node mask >> 2. Then we try to allocate from nodes present in the preferred node mask >> which is closer to the first node in the preferred node mask >> 3. If the above fails, we try to allocate from nodes in the fallback node >> mask which is closer to the first node in the preferred nodemask. >> >> Isn't that too complicated? Do we have a real usecase for that? > > No, I think this is a suboptimal interface. AFAIU you really want to > define a "home" node(s) rather than any policy. Home node would > effectively override the default local node whatever policy you have as > it makes sense whether you have MPOL_PREFERRED_MANY or MPOL_BIND. > yes. I did describe it as below in an earlier email "We could do set_mempolicy(MPOLD_PREFERRED, nodemask(nodeX))) set_mempolicy(MPOLD_PREFFERED_EXTEND, nodemask(fallback nodemask for above PREFERRED policy)) " But I agree that restricting this to virtual address range is much better. Now I am wondering whether a nodemask is any better than a nodeid. The concept of home nodes is confusing when compared to home node. What would be the meaning of multiple nodes in a home nodes concept? Should we do SYSCALL_DEFINE4(home_node_mbind, unsigned long, start, unsigned long, len, unsigned long, home_node, unsigned long, flags) the flags is kept for future extension if any. I guess this home node will only apply w.r.t MPOL_BIND and MPOL_PREFFERED_MANY policy for now? > Another potential interface would be set_nodeorder which would > explicitly set the allocation fallback ordering. Again agnostic of the > underlying memory policy. This would be more generic but the question is > whether this is not too generic and whether there are usecases for that. > I would suggest we wait for applications really wanting a fallback order other than distance based one before adding this. Distance based fallback order from a preferred node is well understood from application point of view. -aneesh