From: Dave Hansen <dave.hansen@intel.com>
To: Jerome Glisse <jglisse@redhat.com>
Cc: linux-mm@kvack.org, "Andrew Morton" <akpm@linux-foundation.org>,
linux-kernel@vger.kernel.org,
"Rafael J . Wysocki" <rafael@kernel.org>,
"Matthew Wilcox" <willy@infradead.org>,
"Ross Zwisler" <ross.zwisler@linux.intel.com>,
"Keith Busch" <keith.busch@intel.com>,
"Dan Williams" <dan.j.williams@intel.com>,
"Haggai Eran" <haggaie@mellanox.com>,
"Balbir Singh" <bsingharora@gmail.com>,
"Aneesh Kumar K . V" <aneesh.kumar@linux.ibm.com>,
"Benjamin Herrenschmidt" <benh@kernel.crashing.org>,
"Felix Kuehling" <felix.kuehling@amd.com>,
"Philip Yang" <Philip.Yang@amd.com>,
"Christian König" <christian.koenig@amd.com>,
"Paul Blinzer" <Paul.Blinzer@amd.com>,
"Logan Gunthorpe" <logang@deltatee.com>,
"John Hubbard" <jhubbard@nvidia.com>,
"Ralph Campbell" <rcampbell@nvidia.com>,
"Michal Hocko" <mhocko@kernel.org>,
"Jonathan Cameron" <jonathan.cameron@huawei.com>,
"Mark Hairgrove" <mhairgrove@nvidia.com>,
"Vivek Kini" <vkini@nvidia.com>,
"Mel Gorman" <mgorman@techsingularity.net>,
"Dave Airlie" <airlied@redhat.com>,
"Ben Skeggs" <bskeggs@redhat.com>,
"Andrea Arcangeli" <aarcange@redhat.com>,
"Rik van Riel" <riel@surriel.com>,
"Ben Woodard" <woodard@redhat.com>,
linux-acpi@vger.kernel.org
Subject: Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()
Date: Tue, 4 Dec 2018 17:06:49 -0800 [thread overview]
Message-ID: <42006749-7912-1e97-8ccd-945e82cebdde@intel.com> (raw)
In-Reply-To: <20181205001544.GR2937@redhat.com>
On 12/4/18 4:15 PM, Jerome Glisse wrote:
> On Tue, Dec 04, 2018 at 03:54:22PM -0800, Dave Hansen wrote:
>> Basically, is sysfs the right place to even expose this much data?
>
> I definitly want to avoid the memoryX mistake. So i do not want to
> see one link directory per device. Taking my simple laptop as an
> example with 4 CPUs, a wifi and 2 GPU (the integrated one and a
> discret one):
>
> link0: cpu0 cpu1 cpu2 cpu3
> link1: wifi (2 pcie lane)
> link2: gpu0 (unknown number of lane but i believe it has higher
> bandwidth to main memory)
> link3: gpu1 (16 pcie lane)
> link4: gpu1 and gpu memory
>
> So one link directory per number of pcie lane your device have
> so that you can differentiate on bandwidth. The main memory is
> symlinked inside all the link directory except link4. The GPU
> discret memory is only in link4 directory as it is only
> accessible by the GPU (we could add it under link3 too with the
> non cache coherent property attach to it).
I'm actually really interested in how this proposal scales. It's quite
easy to represent a laptop, but can this scale to the largest systems
that we expect to encounter over the next 20 years that this ABI will live?
> The issue then becomes how to convert down the HMAT over verbose
> information to populate some reasonable layout for HMS. For that
> i would say that create a link directory for each different
> matrix cell. As an example let say that each entry in the matrix
> has bandwidth and latency then we create a link directory for
> each combination of bandwidth and latency. On simple system that
> should boils down to a handfull of combination roughly speaking
> mirroring the example above of one link directory per number of
> PCIE lane for instance.
OK, but there are 1024*1024 matrix cells on a systems with 1024
proximity domains (ACPI term for NUMA node). So it sounds like you are
proposing a million-directory approach.
We also can't simply say that two CPUs with the same connection to two
other CPUs (think a 4-socket QPI-connected system) share the same "link"
because they share the same combination of bandwidth and latency. We
need to know that *each* has its own, unique link and do not share link
resources.
> I don't think i have a system with an HMAT table if you have one
> HMAT table to provide i could show up the end result.
It is new enough (ACPI 6.2) that no publicly-available hardware that
exists that implements one (that I know of). Keith Busch can probably
extract one and send it to you or show you how we're faking them with QEMU.
> Note i believe the ACPI HMAT matrix is a bad design for that
> reasons ie there is lot of commonality in many of the matrix
> entry and many entry also do not make sense (ie initiator not
> being able to access all the targets). I feel that link/bridge
> is much more compact and allow to represent any directed graph
> with multiple arrows from one node to another same node.
I don't disagree. But, folks are building systems with them and we need
to either deal with it, or make its data manageable. You saw our
approach: we cull the data and only expose the bare minimum in sysfs.
next prev parent reply other threads:[~2018-12-05 1:06 UTC|newest]
Thread overview: 95+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-12-03 23:34 jglisse
2018-12-03 23:34 ` [RFC PATCH 01/14] mm/hms: heterogeneous memory system (sysfs infrastructure) jglisse
2018-12-03 23:34 ` [RFC PATCH 02/14] mm/hms: heterogenenous memory system (HMS) documentation jglisse
2018-12-04 17:06 ` Andi Kleen
2018-12-04 18:24 ` Jerome Glisse
2018-12-04 18:31 ` Dan Williams
2018-12-04 18:57 ` Jerome Glisse
2018-12-04 19:11 ` Logan Gunthorpe
2018-12-04 19:22 ` Jerome Glisse
2018-12-04 19:41 ` Logan Gunthorpe
2018-12-04 20:13 ` Jerome Glisse
2018-12-04 20:30 ` Logan Gunthorpe
2018-12-04 20:59 ` Jerome Glisse
2018-12-04 21:19 ` Logan Gunthorpe
2018-12-04 21:51 ` Jerome Glisse
2018-12-04 22:16 ` Logan Gunthorpe
2018-12-04 23:56 ` Jerome Glisse
2018-12-05 1:15 ` Logan Gunthorpe
2018-12-05 2:31 ` Jerome Glisse
2018-12-05 17:41 ` Logan Gunthorpe
2018-12-05 18:07 ` Jerome Glisse
2018-12-05 18:20 ` Logan Gunthorpe
2018-12-05 18:33 ` Jerome Glisse
2018-12-05 18:48 ` Logan Gunthorpe
2018-12-05 18:55 ` Jerome Glisse
2018-12-05 19:10 ` Logan Gunthorpe
2018-12-05 22:58 ` Jerome Glisse
2018-12-05 23:09 ` Logan Gunthorpe
2018-12-05 23:20 ` Jerome Glisse
2018-12-05 23:23 ` Logan Gunthorpe
2018-12-05 23:27 ` Jerome Glisse
2018-12-06 0:08 ` Dan Williams
2018-12-05 2:34 ` Dan Williams
2018-12-05 2:37 ` Jerome Glisse
2018-12-05 17:25 ` Logan Gunthorpe
2018-12-05 18:01 ` Jerome Glisse
2018-12-04 20:14 ` Andi Kleen
2018-12-04 20:47 ` Logan Gunthorpe
2018-12-04 21:15 ` Jerome Glisse
2018-12-05 0:54 ` Kuehling, Felix
2018-12-04 19:19 ` Dan Williams
2018-12-04 19:32 ` Jerome Glisse
2018-12-04 20:12 ` Andi Kleen
2018-12-04 20:41 ` Jerome Glisse
2018-12-05 4:36 ` Aneesh Kumar K.V
2018-12-05 4:41 ` Jerome Glisse
2018-12-05 10:52 ` Mike Rapoport
2018-12-03 23:34 ` [RFC PATCH 03/14] mm/hms: add target memory to heterogeneous memory system infrastructure jglisse
2018-12-03 23:34 ` [RFC PATCH 04/14] mm/hms: add initiator " jglisse
2018-12-03 23:35 ` [RFC PATCH 05/14] mm/hms: add link " jglisse
2018-12-03 23:35 ` [RFC PATCH 06/14] mm/hms: add bridge " jglisse
2018-12-03 23:35 ` [RFC PATCH 07/14] mm/hms: register main memory with heterogenenous memory system jglisse
2018-12-03 23:35 ` [RFC PATCH 08/14] mm/hms: register main CPUs " jglisse
2018-12-03 23:35 ` [RFC PATCH 09/14] mm/hms: hbind() for heterogeneous memory system (aka mbind() for HMS) jglisse
2018-12-03 23:35 ` [RFC PATCH 10/14] mm/hbind: add heterogeneous memory policy tracking infrastructure jglisse
2018-12-03 23:35 ` [RFC PATCH 11/14] mm/hbind: add bind command to heterogeneous memory policy jglisse
2018-12-03 23:35 ` [RFC PATCH 12/14] mm/hbind: add migrate command to hbind() ioctl jglisse
2018-12-03 23:35 ` [RFC PATCH 13/14] drm/nouveau: register GPU under heterogeneous memory system jglisse
2018-12-03 23:35 ` [RFC PATCH 14/14] test/hms: tests for " jglisse
2018-12-04 7:44 ` [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind() Aneesh Kumar K.V
2018-12-04 14:44 ` Jerome Glisse
2018-12-04 18:02 ` Dave Hansen
2018-12-04 18:49 ` Jerome Glisse
2018-12-04 18:54 ` Dave Hansen
2018-12-04 19:11 ` Jerome Glisse
2018-12-04 21:37 ` Dave Hansen
2018-12-04 21:57 ` Jerome Glisse
2018-12-04 23:58 ` Dave Hansen
2018-12-05 0:29 ` Jerome Glisse
2018-12-05 1:22 ` Kuehling, Felix
2018-12-05 11:27 ` Aneesh Kumar K.V
2018-12-05 16:09 ` Jerome Glisse
2018-12-04 23:54 ` Dave Hansen
2018-12-05 0:15 ` Jerome Glisse
2018-12-05 1:06 ` Dave Hansen [this message]
2018-12-05 2:13 ` Jerome Glisse
2018-12-05 17:27 ` Dave Hansen
2018-12-05 17:53 ` Jerome Glisse
2018-12-06 18:25 ` Dave Hansen
2018-12-06 19:20 ` Jerome Glisse
2018-12-06 19:31 ` Dave Hansen
2018-12-06 20:11 ` Logan Gunthorpe
2018-12-06 22:04 ` Dave Hansen
2018-12-06 22:39 ` Jerome Glisse
2018-12-06 23:09 ` Dave Hansen
2018-12-06 23:28 ` Logan Gunthorpe
2018-12-06 23:34 ` Dave Hansen
2018-12-06 23:38 ` Dave Hansen
2018-12-06 23:48 ` Logan Gunthorpe
2018-12-07 0:20 ` Jerome Glisse
2018-12-07 15:06 ` Jonathan Cameron
2018-12-07 19:37 ` Jerome Glisse
2018-12-07 0:15 ` Jerome Glisse
2018-12-06 20:27 ` Jerome Glisse
2018-12-06 21:46 ` Jerome Glisse
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=42006749-7912-1e97-8ccd-945e82cebdde@intel.com \
--to=dave.hansen@intel.com \
--cc=Paul.Blinzer@amd.com \
--cc=Philip.Yang@amd.com \
--cc=aarcange@redhat.com \
--cc=airlied@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=aneesh.kumar@linux.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=bsingharora@gmail.com \
--cc=bskeggs@redhat.com \
--cc=christian.koenig@amd.com \
--cc=dan.j.williams@intel.com \
--cc=felix.kuehling@amd.com \
--cc=haggaie@mellanox.com \
--cc=jglisse@redhat.com \
--cc=jhubbard@nvidia.com \
--cc=jonathan.cameron@huawei.com \
--cc=keith.busch@intel.com \
--cc=linux-acpi@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=logang@deltatee.com \
--cc=mgorman@techsingularity.net \
--cc=mhairgrove@nvidia.com \
--cc=mhocko@kernel.org \
--cc=rafael@kernel.org \
--cc=rcampbell@nvidia.com \
--cc=riel@surriel.com \
--cc=ross.zwisler@linux.intel.com \
--cc=vkini@nvidia.com \
--cc=willy@infradead.org \
--cc=woodard@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox