From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DE6F2C41535 for ; Wed, 20 Dec 2023 02:29:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5C88F6B0078; Tue, 19 Dec 2023 21:29:20 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 578C36B007B; Tue, 19 Dec 2023 21:29:20 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 440A16B007D; Tue, 19 Dec 2023 21:29:20 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 356EE6B0078 for ; Tue, 19 Dec 2023 21:29:20 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 08E1D1C0FAC for ; Wed, 20 Dec 2023 02:29:20 +0000 (UTC) X-FDA: 81585614880.05.88E48E9 Received: from mgamail.intel.com (mgamail.intel.com [192.55.52.43]) by imf17.hostedemail.com (Postfix) with ESMTP id 3E02540007 for ; Wed, 20 Dec 2023 02:29:17 +0000 (UTC) Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KLjFi6nk; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf17.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1703039358; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TrbJCoa0iF+nqZYXbx53/Rl3ygUVpc2Fiuhy79j3qFs=; b=kUqw0BZJR8ozap0STa2UagXjt4zLUcpYLlbky3yplHnm4tNQQtlaxjMnToiqUrZbfa8+t9 FkJ5GfCDrW559VQcTMWEXUScTa3vqxuAQ8VGI3+lD/UdmwCUtOObwBYt0N9NkV8agolQJJ sZphpBiMkBNDbUbngSNzKXK3aEm5ysA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=KLjFi6nk; dmarc=pass (policy=none) header.from=intel.com; spf=pass (imf17.hostedemail.com: domain of ying.huang@intel.com designates 192.55.52.43 as permitted sender) smtp.mailfrom=ying.huang@intel.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1703039358; a=rsa-sha256; cv=none; b=hVERwMbHTYLwlBzP6/kgenLIYqZzj5zmQncOTKp43DVEVaR4OGimB1nmAVNyRMGvHJ7pTf 9P7Gtlh10aZ2oyNP44MxY7d1nkwJvI7sH+Jw7fsWbX6Q/8BZzHP9J/F37Wuo3Y0KFPs32z GcID1CahP3mA7xVnixleZrb1k91flEo= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1703039357; x=1734575357; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version; bh=5P8valVma7379oKA2C4O5u8JYMU/WxLyLWS4JRVSF50=; b=KLjFi6nkVexZKjoxRlNoVpNNtvyyC3OydphMY5cFcsRkVP4tBGrfK8lr T1dQVOsot+vViEFp+D+TLONNLQ4GT6G51KtiYlMz6JFedHt74fq5sewj9 qMGapZVTfUEGvzPzTwNgxqMWdPqsSt+x2BNiLTR3WNTrOYuRZLGTHxzm1 zJwSdcT6XgEUprJV4WD1ad48ttPv5BTptNn7XwqXb1T6ZpWg/RtafZx+D TcnriKQVsWkhvLl8e9l8G6ZROzQNmFfgkz3K6noJILtPEGN+P9C4legCv KuinQQTIswYJF+ggE5//yvRAN138BBApuwvRPVU/WVbzdGy8zQrFfxYkt w==; X-IronPort-AV: E=McAfee;i="6600,9927,10929"; a="481940591" X-IronPort-AV: E=Sophos;i="6.04,290,1695711600"; d="scan'208";a="481940591" Received: from orsmga005.jf.intel.com ([10.7.209.41]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Dec 2023 18:29:15 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10929"; a="949378315" X-IronPort-AV: E=Sophos;i="6.04,290,1695711600"; d="scan'208";a="949378315" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by orsmga005-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 19 Dec 2023 18:29:06 -0800 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Johannes Weiner , Hasan Al Maruf , Hao Wang , Dan Williams , "Michal Hocko" , Zhongkun He , "Frank van der Linden" , John Groves , Jonathan Cameron Subject: Re: [PATCH v4 00/11] mempolicy2, mbind2, and weighted interleave In-Reply-To: (Gregory Price's message of "Tue, 19 Dec 2023 13:09:02 -0500") References: <20231218194631.21667-1-gregory.price@memverge.com> <87wmtanba2.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Wed, 20 Dec 2023 10:27:06 +0800 Message-ID: <87zfy5libp.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=ascii X-Rspamd-Queue-Id: 3E02540007 X-Rspam-User: X-Rspamd-Server: rspam04 X-Stat-Signature: i4ffgz3jqzgpgwc3oasckfqkh3c48fq6 X-HE-Tag: 1703039357-231460 X-HE-Meta: U2FsdGVkX1+gq6q/ZQ7/CdH+Zca17BPCQ06TW5aks19TecMwUCey3HmkgqhG8X41kQWHiSEubdLb7ns8Bhk0nfJ91Hq4kMY4Dj+zwmevnt+8LWGSSJdAxqQwu3WoYF2hifqiNPFqpKX1+Puv80kWEPGP9vrEKDTNwhtfYshnGKFXPhuULVy2UZbs/DJsjRaut561wS1zf+Xum19Yh2RRIZtK2jvI2h70zSNy/36459zBbfxeCYZUVOIxKEFJqOgOaw1ordc5TC6hGAFKm7Lz+xGR3cPRZBKXx2ftEm0aI3qtyCl8PJRZ5MYmQXcv5ytBSh6z9kN/WQZsez3nFegV+h4wUp5qBuhCE78C35Hh/sJ/zOCbJBmUcAxpQpMgtNCQeg2YsDGCevtGXrxuw28Mju2OyJXETK2VqhAnnBm1RlmoKhiAFVlIVHkWUFEDICUJHoUgMao0vVYM0DDNOCkRd7N3tswOydJJbSXfPe9bGX2CqdQlX2L96GPdBYAgxvy3kmoe45e8cJM+5Xu1WScQM2jNESwq8gG2LrOQ9NcEiwoVGPBVlEXyNYDHjLeI4n48hLBdu3A9rKwPBuMxY8rFJnCka7684CT3iq8zy7Ak842jJUIhSwKme/efHTqNEXJ+B/Cpjmgo+liXVSVM5wZJ+AKONkPfThrrJ7yflzZP3M63BCwkK+TiEuahVzIgyfak4nRAg1NTWRb1WN8XKdktK1hHrvNtjz7h+XK7xlX8lvXw7CjAiYA/YJIyW3sNe13ouOc3maIRhyHbqUbApMEaPCWhk51USyG30A+jSJUZJz7ipn+Gz83TGQtB8+Pyab5b8YtSiqeMruUqP9LFrDAs7APxCEKS1NS+ZoiGfxFzb5S4diQQY0F570eaRp/Sdn06xs3OeumrnlcY/kv0PS1graLkhT2ijxGW6KOHLvQRicm3JwAotdUhJieRvq1uAJht233TQ1Q4lgEO6NSXKEp sADklZnZ Z4P3N7UVCMblJhbfUl1O2S1YJvttoq1OiY2uQ4Wz8qYB7SLBBY2jP0vOgpL64K+ZMIwbXs4DjDiFc9BadGbAUdVZAGTToSwPY8+4N8bG3POLVoT/UfYAKeNRIyAu0IBo8/eQ8EySzgH+NgOXplB5cG3tQRhbLTbDhVN0bbBzPPCz6VOxUPx3dw9a1vCBfBa5wR2lzDI7uvWhcwb+UHDf2sPqJHHnlunlg+Goe7oazaupyMQAQnjQhCw07OQ8UxQ3/cl5F8lpqzFBIXbEB2gRuPitvE+/slqNnyz7HmdgHkYUdRn4= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Gregory Price writes: > On Tue, Dec 19, 2023 at 11:04:05AM +0800, Huang, Ying wrote: >> Gregory Price writes: >> >> > This patch set extends the mempolicy interface to enable new >> > mempolicies which may require extended data to operate. >> > >> > MPOL_WEIGHTED_INTERLEAVE is included as an example extension. >> >> Per my understanding, it's better to describe why we need this patchset >> at the beginning. Per my understanding, weighted interleave is used to >> expand DRAM bandwidth for workloads with real high memory bandwidth >> requirements. Without it, DRAM bandwidth will be saturated, which leads >> to poor performance. >> > > Will add more details, thanks. > >> > struct mempolicy_args { >> > unsigned short mode; /* policy mode */ >> > unsigned short mode_flags; /* policy mode flags */ >> > int home_node; /* mbind: use MPOL_MF_HOME_NODE */ >> > nodemask_t *policy_nodes; /* get/set/mbind */ >> > unsigned char *il_weights; /* for mode MPOL_WEIGHTED_INTERLEAVE */ >> > int policy_node; /* get: policy node information */ >> > }; >> >> Because we use more and more parameters to describe the mempolicy, I >> think it's a good idea to replace some parameters with struct. But I >> don't think it's a good idea to put unrelated stuff into the struct. >> For example, >> >> struct mempolicy_param { >> unsigned short mode; /* policy mode */ >> unsigned short mode_flags; /* policy mode flags */ >> int home_node; /* mbind: use MPOL_MF_HOME_NODE */ >> nodemask_t *policy_nodes; >> unsigned char *il_weights; /* for mode MPOL_WEIGHTED_INTERLEAVE */ >> }; >> >> describe the parameters to create the mempolicy. It can be used by >> set/get_mempolicy() and mbind(). So, I think that it's a good >> abstraction. But "policy_node" has nothing to do with set_mempolicy() >> and mbind(). So I think that we shouldn't add it into the struct. It's >> totally OK to use different parameters for different functions. For >> example, >> >> long do_set_mempolicy(struct mempolicy_param *mparam); >> long do_mbind(unsigned long start, unsigned long len, >> struct mempolicy_param *mparam, unsigned long flags); >> long do_get_task_mempolicy(struct mempolicy_param *mparam, int >> *policy_node); >> >> This isn't the full list. My point is to use separate parameter for >> something specific for some function. >> > > this is the internal structure, but i get the point, we can drop it from > the structure and extend the arg list internally. > > I'd originally thought to just remove the policy_node stuff all > together from get_mempolicy2(). Do you prefer to have a separate struct > for set/get interfaces so that the get interface struct can be extended? > > All the MPOL_F_NODE "alternate data fetch" mechanisms from > get_mempolicy() feel like more of a wart than a feature. And presently > the only data returned in policy_node is the next allocation node for > interleave. That's not even particularly useful, so I'm of a mind to > remove it. > > Assuming we remove policy_node altogether... do we still break up the > set/get interface into separate structures to avoid this in the future? I have no much experience at ABI definition. So, I want to get guidance from more experienced people on this. Is it good to implement all functionality of get_mempolicy() with get_mempolicy2(), so we can deprecate get_mempolicy() and remove it finally? So, users don't need to use 2 similar syscalls? And, IIUC, we will not get policy_node, addr_node, and policy config at the same time, is it better to use a union instead of struct in get_mempolicy2()? >> > struct mpol_args { >> > /* Basic mempolicy settings */ >> > __u16 mode; >> > __u16 mode_flags; >> > __s32 home_node; >> > __aligned_u64 pol_nodes; >> > __aligned_u64 *il_weights; /* of size pol_maxnodes */ >> > __u64 pol_maxnodes; >> > __s32 policy_node; >> > }; >> >> Same as my idea above. I think we shouldn't add policy_node for >> set_mempolicy2()/mbind2(). That will make users confusing. We can use >> a different struct for get_mempolicy2(). >> > > See above. -- Best Regards, Huang, Ying