From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1684EC4167B for ; Thu, 2 Nov 2023 14:14:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 892688D008A; Thu, 2 Nov 2023 10:14:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 841E18D000F; Thu, 2 Nov 2023 10:14:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 709948D008A; Thu, 2 Nov 2023 10:14:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 6063A8D000F for ; Thu, 2 Nov 2023 10:14:10 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 2FBCA1CB41E for ; Thu, 2 Nov 2023 14:14:10 +0000 (UTC) X-FDA: 81413208660.14.DD5F816 Received: from frasgout.his.huawei.com (frasgout.his.huawei.com [185.176.79.56]) by imf04.hostedemail.com (Postfix) with ESMTP id 804AC40005 for ; Thu, 2 Nov 2023 14:14:06 +0000 (UTC) Authentication-Results: imf04.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf04.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1698934448; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=RQeZwBUQrtut39xfKG1SAxIolQwulalyko7e7GXJlhI=; b=i23jjEphq7X6BjaY7BR3ZTir8aXHUZjJe0aK6WS67Dxba5L46yI7yuqvQSN2tbhuw8d3PH Hr/Nph5jeVbD1Y7rknfoVZwVYBl1A4eRkM7wspHzgLErebMWW2zqGMHnoW6re8MMzrTTZ9 Q/Te5Jvd1yq/vuVrsf65evepbCIbR+c= ARC-Authentication-Results: i=1; imf04.hostedemail.com; dkim=none; dmarc=pass (policy=quarantine) header.from=huawei.com; spf=pass (imf04.hostedemail.com: domain of jonathan.cameron@huawei.com designates 185.176.79.56 as permitted sender) smtp.mailfrom=jonathan.cameron@huawei.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1698934448; a=rsa-sha256; cv=none; b=jU4srdReG5UTW7VOxc/L6TxefQlGwI/j2lfkaYrybMzrJ2eXpBP0T4Igd5wVuWdm9et+56 fr8hkAnlroMbn0Zl4VhJi1/9W1vRGiP4QH1o5kTFs8xAtsM37m1mSP9f0qIAytDYjjL96V LymXoSBcmyQ2RFDf590TQ9MTBv49O1c= Received: from lhrpeml500005.china.huawei.com (unknown [172.18.147.201]) by frasgout.his.huawei.com (SkyGuard) with ESMTP id 4SLm4W0KYZz6H7bF; Thu, 2 Nov 2023 22:10:51 +0800 (CST) Received: from localhost (10.126.170.21) by lhrpeml500005.china.huawei.com (7.191.163.240) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2507.31; Thu, 2 Nov 2023 14:14:00 +0000 Date: Thu, 2 Nov 2023 14:13:59 +0000 From: Jonathan Cameron To: Ravi Jonnalagadda CC: , , , , , , , , , , , , , , , , Subject: Re: [RFC PATCH v3 0/4] Node Weights and Weighted Interleave Message-ID: <20231102141359.00000aa6@Huawei.com> In-Reply-To: <20231102093542.70-1-ravis.opensrc@micron.com> References: <87a5rw1wu8.fsf@yhuang6-desk2.ccr.corp.intel.com> <20231102093542.70-1-ravis.opensrc@micron.com> Organization: Huawei Technologies Research and Development (UK) Ltd. X-Mailer: Claws Mail 4.1.0 (GTK 3.24.33; x86_64-w64-mingw32) MIME-Version: 1.0 Content-Type: text/plain; charset="US-ASCII" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.126.170.21] X-ClientProxiedBy: lhrpeml100006.china.huawei.com (7.191.160.224) To lhrpeml500005.china.huawei.com (7.191.163.240) X-CFilter-Loop: Reflected X-Rspamd-Queue-Id: 804AC40005 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: 9umpdm141hcji8g8z5f91igdup7d9f5w X-HE-Tag: 1698934446-251177 X-HE-Meta: U2FsdGVkX19RcyivJFhApWdPuFL/wTeUY6VGk9ypLwktXV2Op/ci1wo6GcD/DHfWW9dEqOaTPzxZRC8fdJOYSmq+nX7yrK7lLpmWKSm5HMXr5VrIs+K0uygafpTOdHShB/M/v0iQgGi4qzR6/wji0T6v3xDj0v3bdlK2ya64591yMSdSuQ9kfulXdfGpGdEa9X55j9Ew3Nmkvaz6rZ1yhrzGGVNV0oktDamswOSMmUiMzM7FC9l7CEjB1ffA3YMOEVp3rgFSVIJs1WHCBPdUQ3WdMwD9XIaShLHc+B5tSS191y7uuP3eJZQSJ3WZnOPmiSqWkWAbaHGJ+B1+3+Cb1iI5lhP3kzdBeIOJ0s9TBIGwi+UPAXjTkSARM0caXEKD4gPb72ahkj4mbMLDHTGk/iLBApb4VmGbTcEHNoRVb5gvv0NTWoH3S6RGMly72NjCN/4TDM9UjMJVxucEdhcF6nbP8VA3aW8fOn02lppVYpDd6QPTVfeH14hnRn5+XTNt171ETlBm9m2ZBiuQtU3cwkJQ6zfpk+KpgtJeBwLi07Cr+dDU9b3tVVuJCuowIkDHWVg++xwNuyWFEToYyY0Pal6Y2ynYabOwIlRz9g3866wr4LpkSA8fd1yeNUFvRFCF4XAU2uS/kSQkbLwzchdhMyXQuQPUr6008ZmAfkukrAGpCAumKrf5R0OHvlepyrBzXsOCJvwm89OZg2HPxZXbWGQVlIraAv9VB0Is+nCLacp0b83XOD/KxHQRk+401Cgw5OxP5aiXlNstvL6pHyNrTEy9s8WnQydQTHxYWpOcMM+bAtDqR8mK4svQ8pB6pj+mzR9y/tv+eM97G69RJ6LbHlT+ofyEG17eXk4VfXPiIQ7wDLBv0pDtTR9W/GqXoV7frGbj5mGCFwlU4c9nc7NSjs/X3bP58MfwrKeMvbwcq8CwLKgqJbfXvTvCEnAV4m9ewf96FqOLEneeSSrkwU7 a9CTachQ tGnNG8sq3I0OdobQR8zMRjImzLurAl6g3Ewo0WoYaNltwmRgzYh0HpyggzUi9Z0dhUtG04LUlDsk99IZpSc+S81huiY9tjRj/GmZC3ijzotUAgtAzBKEVtVl+I5f8i28ngpiJD+4D91ZBdtDxSu1U5qpdwO/3I57EGcLi3ofdVIgYln8ToemvVMi2wSUvjgveKFYWyjTb91XNcy74zYuzEmGjsOuaanHWiZhPiPcTb+5e4pZ6Marj0b9BSFXbnmaw59b8SALVnsWgKfsRtZZjyIElbA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: icable. > > > >You mean the different memory ranges of a NUMA node may have different > >performance? I don't think that we can deal with this. > > Example Configuration: On a server that we are using now, four different > CXL cards are combined to form a single NUMA node and two other cards are > exposed as two individual numa nodes. > So if we have the ability to combine multiple CXL memory ranges to a > single NUMA node the number of NUMA nodes in the system would potentially > decrease even if we can't combine the entire range to form a single node. > If it's in control of the kernel, today for CXL NUMA nodes are defined by CXL Fixed Memory Windows rather than the individual characteristics of devices that might be accessed from those windows. That's a useful simplification to get things going and it's not clear how the QoS aspects of CFMWS will be used. So will we always have enough windows with fine enough granularity coming from the _DSM QTG magic that they don't end up with different performance devices (or topologies) within each one? No idea. It's a bunch of trade offs of where the complexity lies and how much memory is being provided over CXL vs physical address space exhaustion. Long term, my guess is we'll need to support something more sophisticated with dynamic 'creation' of NUMA nodes (or something that looks like that anyway) so we can always have a separate one for each significantly different set of memory access characteristics. If they are coming from ACPI that's already required by the specification. This space is going to continue getting more complex. Upshot is that I wouldn't focus too much on possibility of a NUMA node having devices with very different memory access characterstics in it. That's a quirk of today's world that we can and should look to fix. If your bios is setting this up for you and presenting them in SRAT / HMAT etc then it's not complying with the ACPI spec. Jonathan