From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55471C4167B for ; Tue, 12 Dec 2023 07:10:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D263B6B028E; Tue, 12 Dec 2023 02:10:38 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id CD7736B028F; Tue, 12 Dec 2023 02:10:38 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B78626B0290; Tue, 12 Dec 2023 02:10:38 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A4BF66B028E for ; Tue, 12 Dec 2023 02:10:38 -0500 (EST) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 48C38A1D2B for ; Tue, 12 Dec 2023 07:10:38 +0000 (UTC) X-FDA: 81557293356.07.32F839D Received: from mgamail.intel.com (mgamail.intel.com [198.175.65.11]) by imf25.hostedemail.com (Postfix) with ESMTP id 8CF8AA0013 for ; Tue, 12 Dec 2023 07:10:35 +0000 (UTC) Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JV7hd4Eb; spf=pass (imf25.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1702365036; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=28KV19zP3VEjye3AH7bxIajb9foX7kCsFm2lNCEOMG4=; b=dW62m1Gli804da6luZFTK+dIqQg2oODWBtkYX4BTmGQUPiJfpP7xEEbfhNcHF9i+urmXH0 UFrLYgkRusibur+FpsPk/NvHm7LHuZlMp0L0NeVx7b3qdMU74zQmafgmxpguBy835ZwJNE MF1YGgrmAiSZH/eVNQi6pXAa6EH+lIg= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1702365036; a=rsa-sha256; cv=none; b=gEi14Dsotp/JHPQSL5AB/aKTt16mCp02h0DMou87B34rcrOKddIFLrf9Ea0+bEBv3WotR2 +Kwcvv7pOzwZ1HChtBOjE0ydIikPT5rExwDwevCefMOk4/v1JizaM4ch1mbitJWZkpA7TN YCf4xv5QIxFh+qmyH9lCZg5wsucDFFM= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=intel.com header.s=Intel header.b=JV7hd4Eb; spf=pass (imf25.hostedemail.com: domain of ying.huang@intel.com designates 198.175.65.11 as permitted sender) smtp.mailfrom=ying.huang@intel.com; dmarc=pass (policy=none) header.from=intel.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1702365036; x=1733901036; h=from:to:cc:subject:in-reply-to:references:date: message-id:mime-version:content-transfer-encoding; bh=c+J2fNloI48VUHDJJRFMGroqhUwoShcQh1SbFVIWMEA=; b=JV7hd4Ebj0aD3rZ7wo20jrrNgzsZ/UATjA6yAz/q376PXDkfuGHXHf+Q G0HA2KEEBhzbRaF7wu01JFlQddG09UF19e6uuCiDfDFzp/zhKBsuQbqZ4 DOHvcIHqjb2j4oiStqI1R0ZceYQp98E+/xUNYMywEHqiyuOVATfiGwF69 jafCmmmhyBpOpC0Y0mT8Q591vtP9hLF5Wvb5xlP2B7TuxmpPzB6jVKAJN U5yhMHe0arHC/BuMwL7Ks+T6CMItFqo4rlgG34vsMrI+AElFTntv32tej SaYWUNtMSCtDw27qZHo56uKR+tDl4244brWWQNgSXh6kfW7wnEH9X3zvX Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="1599731" X-IronPort-AV: E=Sophos;i="6.04,269,1695711600"; d="scan'208";a="1599731" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orvoesa103.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 23:10:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10921"; a="839332406" X-IronPort-AV: E=Sophos;i="6.04,269,1695711600"; d="scan'208";a="839332406" Received: from yhuang6-desk2.sh.intel.com (HELO yhuang6-desk2.ccr.corp.intel.com) ([10.238.208.55]) by fmsmga008-auth.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 11 Dec 2023 23:10:24 -0800 From: "Huang, Ying" To: Gregory Price Cc: Gregory Price , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , Johannes Weiner , "Hasan Al Maruf" , Hao Wang , Dan Williams , Michal Hocko , Zhongkun He , Frank van der Linden , "John Groves" , Jonathan Cameron Subject: Re: [PATCH v2 00/11] mempolicy2, mbind2, and weighted interleave In-Reply-To: (Gregory Price's message of "Mon, 11 Dec 2023 11:42:11 -0500") References: <20231209065931.3458-1-gregory.price@memverge.com> <87r0jtxp23.fsf@yhuang6-desk2.ccr.corp.intel.com> Date: Tue, 12 Dec 2023 15:08:24 +0800 Message-ID: <87plzbx5hz.fsf@yhuang6-desk2.ccr.corp.intel.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Rspamd-Queue-Id: 8CF8AA0013 X-Rspam-User: X-Rspamd-Server: rspam11 X-Stat-Signature: 4exyuoujyhn7fa68u8k6n5wq8ndce4zw X-HE-Tag: 1702365035-772818 X-HE-Meta: U2FsdGVkX1/fWLN7iF/UgKo4hO+PSLL7TI9WJsc75kOB1PXM6aRtCaTbTkeXrqNw3cK2a/OClxWNckuTR2+QKtHgRHDkvE+10hRDoqm7QYjSOi/cFqBt2lixX5+CSIPgaP/5S5seGyaaBMp4aLBJFivgPyXhE34PfrhJnZOjL5xkR/h8WfoSIErFXbkxAb33bOtsCflI2B8p0C+z0I64LFuNXYBtvuAiVfF1vvn41l44TKZoz9BBh7STtXHbQ+pRmcRa/kq6xPADSErvIrOnfUQK+RCl4GSbSM2DLS91q909XqV9YBFv6sSGe3PzUKCB/kfu7VihbK3XWWSZbWy55iW3gw+N1kL9ILvz8JXBPVOyUPXsPj/SsBXGm39yTsxb5ajWEGy9OfaDtdueTx1aquXU2oAdPLL/N2OE00kNmHc3Llgm98DEz3Y7rMG2vJp6NxnDuYShSmgp03lpZvJMTrMiNlFa2x1VItCEE8IEF+Y7XN7t/SQ0MhooCUvc2Bt2ezpi75XpVRndpwQ38fOxM78yft8pQ8VMTDRCmEkUvcdWFCwEl29jiilT5Mw0L1ZlVfmGirNVHkl4j1IHgj1meNl/jiThrFDy41Ls0opfH3wSns9vq5L+lcpLMp2NkwRI4rpA1SBPllMZbXVdQZKc4yETC5Nk+Dto+XnulOMHNcHKAPVKXKnKUSP4ZafBHApKJMQF50hnxyDbeE3WCZ1eaj1ij8Wk9zTwEo4KrUvyFEiqbJIQ1B7Eywlv5PoDxinISVLKlScxZo4tScjGFFZlx1VCH1ikPziqknBe6MnIOmBKeqmRv086ebWCnmy1gmzO5cExngXU4BS9gjyTxTAiOpVemI4P3gdfE4bUouBG+kkT4EBV/lSU6hLgPJoXndK0Vapr3I/KUCHrnR2gTt60mndYds8bLPSTlRR0nSOLPEUXzPKGRkam52DnZJ5bCPsUfmKVwR8NV/o/tYSkieG GPHEFPg3 XdD2Ml3YEZoEfWNi5MXtG/jk8NWIsE7Z1DCyQDhEf+qaA9ryQmYnwngm0lnZjEjIMZIzkd8KoQftRJV+3OyXSPT/4C2xJh87ZlyaYhY6eflP9SGk5AUbVtcEEHQWAgeObpHX1n8OdXYmGWOwBktvTYrl8LhPRStgvftDdB4ByWdxdY5BGqZ8eS+5I26DLKlfO7Zs+diGfqz7+BfwGiDXIedm8/CGQyTYIUzN/aes+0gyw4093glKuvUegm6ZxMWUlqxeDBNE7YvxdEcuum1BBtdAYyJvUdnrdhuBRUj2IK2zczMA= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Gregory Price writes: > On Mon, Dec 11, 2023 at 01:53:40PM +0800, Huang, Ying wrote: >> Hi, Gregory, >>=20 >> Thanks for updated version! >>=20 >> Gregory Price writes: >>=20 >> > v2: >> > changes / adds: >> > - flattened weight matrix to an array at requested of Ying Huang >> > - Updated ABI docs per Davidlohr Bueso request >> > - change uapi structure to use aligned/fixed-length members as >> > Suggested-by: Arnd Bergmann >> > - Implemented weight fetch logic in get_mempolicy2 >> > - mbind2 was changed to take (iovec,len) as function arguments >> > rather than add them to the uapi structure, since they describe >> > where to apply the mempolicy - as opposed to being part of it. >> > >> > The sysfs structure is designed as follows. >> > >> > $ tree /sys/kernel/mm/mempolicy/ >> > /sys/kernel/mm/mempolicy/ >> > =E2=94=9C=E2=94=80=E2=94=80 possible_nodes >> > =E2=94=94=E2=94=80=E2=94=80 weighted_interleave >> > =E2=94=9C=E2=94=80=E2=94=80 nodeN >> > =E2=94=82=C2=A0 =E2=94=94=E2=94=80=E2=94=80 weight >> > =E2=94=94=E2=94=80=E2=94=80 nodeN+X >> > =C2=A0 =E2=94=94=E2=94=80=E2=94=80 weight >> > >> > 'mempolicy' is added to '/sys/kernel/mm/' as a control group for >> > the mempolicy subsystem. >>=20 >> Is it good to add 'mempolicy' in '/sys/kernel/mm/numa'? The advantage >> is that 'mempolicy' here is in fact "NUMA mempolicy". The disadvantage >> is one more directory nesting. I have no strong opinion here. >>=20 > > i don't have a strong opinion here. > >> > 'possible_nodes' is added to 'mm/mempolicy' to help describe the >> > expected structures under mempolicy directorys. For example, >> > possible_nodes describes what nodeN directories wille exist under >> > the weighted_interleave directory. >>=20 >> We have '/sys/devices/system/node/possible' already. Is this just a >> duplication? If so, why? And, the possible nodes can be gotten via >> contents of 'weighted_interleave' too. >>=20 > > I'll remove it > >> And it appears not necessary to make 'weighted_interleave/nodeN' >> directory. Why not just make it a file. >>=20 > > Originally I wasn't sure whether there would be more attributes, but > this is probably fine. I'll change it. > >> And, can we add a way to reset weight to the default value? For example >> `echo > nodeN/weight` or `echo > nodeN`. >>=20 > > Seems reasonable. > >> > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D >> > (Patches 7-10) set_mempolicy2, get_mempolicy2, mbind2 >> > >> > These interfaces are the 'extended' counterpart to their relatives. >> > They use the userland 'struct mpol_args' structure to communicate a >> > complete mempolicy configuration to the kernel. This structure >> > looks very much like the kernel-internal 'struct mempolicy_args': >> > >> > struct mpol_args { >> > /* Basic mempolicy settings */ >> > __u16 mode; >> > __u16 mode_flags; >> > __s32 home_node; >> > __aligned_u64 pol_nodes; >> > __u64 pol_maxnodes; >> > __u64 addr; >> > __s32 policy_node; >> > __s32 addr_node; >> > __aligned_u64 *il_weights; /* of size pol_maxnodes */ >> > }; >>=20 >> This looks unnecessarily complex. I don't think that it's a good idea >> to use exact same parameter for all 3 syscalls. >> > > It is exactly as complex as mempolicy is. Everything here is already > described in the existing interfaces (except il_weights). > >> For example, can we use something as below? >>=20 >> long set_mempolicy2(int mode, const unsigned long *nodemask, unsigned = int *il_weights, >> unsigned long maxnode, unsigned long home_node, >> unsigned long flags); >>=20 >> long mbind2(unsigned long start, unsigned long len, >> int mode, const unsigned long *nodemask, unsig= ned int *il_weights, >> unsigned long maxnode, unsigned long home_node, >> unsigned long flags); >>=20 > > Your definition of mbind2 is impossible. > > Neither of these interfaces solve the extensibility issue. If a new > policy which requires a new format of data arrives, we can look forward > to set_mempolicy3 and mbind3. IIUC, we will not over-engineering too much. It's hard to predict the requirements in the future. >> A struct may be defined to hold mempolicy iteself. >>=20 >> struct mpol { >> int mode; >> unsigned int home_node; >> const unsigned long *nodemask; >> unsigned int *il_weights; >> unsigned int maxnode; >> }; >>=20 > > addr could be pulled out for get_mempolicy2, so i will do that > > 'addr_node' and 'policy_node' are warts that came from the original > get_mempolicy. Removing them increases the complexity of handling > arguments in the common get_mempolicy code. > > I could probably just drop support for retrieving the addr_node from > get_mempolicy2, since it's already possible with get_mempolicy. So I > will do that. If it's necessary, we can add another struct for get_mempolicy2(). But I don't think that it's necessary to add get_mempolicy2() specific parameters for set_mempolicy2() or mbind2(). -- Best Regards, Huang, Ying