From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 621196B7CF7 for ; Thu, 6 Dec 2018 18:09:24 -0500 (EST) Received: by mail-pf1-f197.google.com with SMTP id q64so1590742pfa.18 for ; Thu, 06 Dec 2018 15:09:24 -0800 (PST) Received: from mga04.intel.com (mga04.intel.com. [192.55.52.120]) by mx.google.com with ESMTPS id g69si1394306pfg.225.2018.12.06.15.09.23 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Thu, 06 Dec 2018 15:09:23 -0800 (PST) Subject: Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind() References: <20181205001544.GR2937@redhat.com> <42006749-7912-1e97-8ccd-945e82cebdde@intel.com> <20181205021334.GB3045@redhat.com> <20181205175357.GG3536@redhat.com> <20181206192050.GC3544@redhat.com> <20181206223935.GG3544@redhat.com> From: Dave Hansen Message-ID: Date: Thu, 6 Dec 2018 15:09:21 -0800 MIME-Version: 1.0 In-Reply-To: <20181206223935.GG3544@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: owner-linux-mm@kvack.org List-ID: To: Jerome Glisse Cc: Logan Gunthorpe , linux-mm@kvack.org, Andrew Morton , linux-kernel@vger.kernel.org, "Rafael J . Wysocki" , Matthew Wilcox , Ross Zwisler , Keith Busch , Dan Williams , Haggai Eran , Balbir Singh , "Aneesh Kumar K . V" , Benjamin Herrenschmidt , Felix Kuehling , Philip Yang , =?UTF-8?Q?Christian_K=c3=b6nig?= , Paul Blinzer , John Hubbard , Ralph Campbell , Michal Hocko , Jonathan Cameron , Mark Hairgrove , Vivek Kini , Mel Gorman , Dave Airlie , Ben Skeggs , Andrea Arcangeli , Rik van Riel , Ben Woodard , linux-acpi@vger.kernel.org On 12/6/18 2:39 PM, Jerome Glisse wrote: > No if the 4 sockets are connect in a ring fashion ie: > Socket0 - Socket1 > | | > Socket3 - Socket2 > > Then you have 4 links: > link0: socket0 socket1 > link1: socket1 socket2 > link3: socket2 socket3 > link4: socket3 socket0 > > I do not see how their can be an explosion of link directory, worse > case is as many link directories as they are bus for a CPU/device/ > target. This looks great. But, we don't _have_ this kind of information for any system that I know about or any system available in the near future. We basically have two different world views: 1. The system is described point-to-point. A connects to B @ 100GB/s. B connects to C at 50GB/s. Thus, C->A should be 50GB/s. * Less information to convey * Potentially less precise if the properties are not perfectly additive. If A->B=10ns and B->C=20ns, A->C might be >30ns. * Costs must be calculated instead of being explicitly specified 2. The system is described endpoint-to-endpoint. A->B @ 100GB/s B->C @ 50GB/s, A->C @ 50GB/s. * A *lot* more information to convey O(N^2)? * Potentially more precise. * Costs are explicitly specified, not calculated These patches are really tied to world view #1. But, the HMAT is really tied to world view #1. I know you're not a fan of the HMAT. But it is the firmware reality that we are stuck with, until something better shows up. I just don't see a way to convert it into what you have described here. I'm starting to think that, no matter if the HMAT or some other approach gets adopted, we shouldn't be exposing this level of gunk to userspace at *all* since it requires adopting one of the world views.