From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-ot0-f200.google.com (mail-ot0-f200.google.com [74.125.82.200]) by kanga.kvack.org (Postfix) with ESMTP id C0AEB6B0266 for ; Fri, 22 Dec 2017 20:14:43 -0500 (EST) Received: by mail-ot0-f200.google.com with SMTP id o43so11353317otd.12 for ; Fri, 22 Dec 2017 17:14:43 -0800 (PST) Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id d46sor7038466otf.133.2017.12.22.17.14.42 for (Google Transport Security); Fri, 22 Dec 2017 17:14:42 -0800 (PST) MIME-Version: 1.0 In-Reply-To: References: <20171214130032.GK16951@dhcp22.suse.cz> <20171218203547.GA2366@linux.intel.com> <20171220181937.GB12236@bombadil.infradead.org> <2da89d31-27a3-34ab-2dbb-92403c8215ec@intel.com> <20171220211649.GA32200@bombadil.infradead.org> <20171220212408.GA8308@linux.intel.com> <20171220224105.GA27258@linux.intel.com> <39cbe02a-d309-443d-54c9-678a0799342d@gmail.com> <20171222232231.GA26715@linux.intel.com> From: "Rafael J. Wysocki" Date: Sat, 23 Dec 2017 02:14:41 +0100 Message-ID: Subject: Re: [PATCH v3 0/3] create sysfs representation of ACPI HMAT Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org List-ID: To: Dan Williams Cc: Ross Zwisler , Brice Goglin , Matthew Wilcox , Dave Hansen , Michal Hocko , "linux-kernel@vger.kernel.org" , "Anaczkowski, Lukasz" , "Box, David E" , "Kogut, Jaroslaw" , "Koss, Marcin" , "Koziej, Artur" , "Lahtinen, Joonas" , "Moore, Robert" , "Nachimuthu, Murugasamy" , "Odzioba, Lukasz" , "Rafael J. Wysocki" , "Rafael J. Wysocki" , "Schmauss, Erik" , "Verma, Vishal L" , "Zheng, Lv" , Andrew Morton , Balbir Singh , Jerome Glisse , John Hubbard , Len Brown , Tim Chen , devel@acpica.org, Linux ACPI , Linux MM , "linux-nvdimm@lists.01.org" , Linux API , linuxppc-dev On Sat, Dec 23, 2017 at 12:57 AM, Dan Williams w= rote: > On Fri, Dec 22, 2017 at 3:22 PM, Ross Zwisler > wrote: >> On Fri, Dec 22, 2017 at 02:53:42PM -0800, Dan Williams wrote: >>> On Thu, Dec 21, 2017 at 12:31 PM, Brice Goglin = wrote: >>> > Le 20/12/2017 =C3=A0 23:41, Ross Zwisler a =C3=A9crit : >>> [..] >>> > Hello >>> > >>> > I can confirm that HPC runtimes are going to use these patches (at le= ast >>> > all runtimes that use hwloc for topology discovery, but that's the va= st >>> > majority of HPC anyway). >>> > >>> > We really didn't like KNL exposing a hacky SLIT table [1]. We had to >>> > explicitly detect that specific crazy table to find out which NUMA no= des >>> > were local to which cores, and to find out which NUMA nodes were >>> > HBM/MCDRAM or DDR. And then we had to hide the SLIT values to the >>> > application because the reported latencies didn't match reality. Quit= e >>> > annoying. >>> > >>> > With Ross' patches, we can easily get what we need: >>> > * which NUMA nodes are local to which CPUs? /sys/devices/system/node/ >>> > can only report a single local node per CPU (doesn't work for KNL and >>> > upcoming architectures with HBM+DDR+...) >>> > * which NUMA nodes are slow/fast (for both bandwidth and latency) >>> > And we can still look at SLIT under /sys/devices/system/node if reall= y >>> > needed. >>> > >>> > And of course having this in sysfs is much better than parsing ACPI >>> > tables that are only accessible to root :) >>> >>> On this point, it's not clear to me that we should allow these sysfs >>> entries to be world readable. Given /proc/iomem now hides physical >>> address information from non-root we at least need to be careful not >>> to undo that with new sysfs HMAT attributes. >> >> This enabling does not expose any physical addresses to userspace. It o= nly >> provides performance numbers from the HMAT and associates them with exis= ting >> NUMA nodes. Are you worried that exposing performance numbers to non-ro= ot >> users via sysfs poses a security risk? > > It's an information disclosure that's not clear we need to make to > non-root processes. > > I'm more worried about userspace growing dependencies on the absolute > numbers when those numbers can change from platform to platform. > Differentiated memory on one platform may be the common memory pool on > another. > > To me this has parallels with storage device hinting where > specifications like T10 have a complex enumeration of all the > performance hints that can be passed to the device, but the Linux > enabling effort aims for a sanitzed set of relative hints that make > sense. It's more flexible if userspace specifies a relative intent > rather than an absolute performance target. Putting all the HMAT > information into sysfs gives userspace more information than it could > possibly do anything reasonable, at least outside of specialized apps > that are hand tuned for a given hardware platform. That's a valid point IMO. It is sort of tempting to expose everything to user space verbatim, especially early in the enabling process when the kernel has not yet found suitable ways to utilize the given information, but the very act of exposing it may affect what can be done with it in the future. User space interfaces need to stay around and be supported forever, at least potentially, so adding every one of them is a serious commitment. Thanks, Rafael -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: email@kvack.org