Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Jerome Glisse <jglisse@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Huang Ying <ying.huang@intel.com>,
	Zhang Yi <yi.z.zhang@linux.intel.com>,
	kvm@vger.kernel.org, Dave Hansen <dave.hansen@intel.com>,
	Liu Jingqi <jingqi.liu@intel.com>, Yao Yuan <yuan.yao@intel.com>,
	Fan Du <fan.du@intel.com>, Dong Eddie <eddie.dong@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Peng Dong <dongx.peng@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-accelerators@lists.ozlabs.org, Mel Gorman <mgorman@suse.de>
Subject: Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration
Date: Thu, 10 Jan 2019 12:42:00 -0500	[thread overview]
Message-ID: <20190110174159.GD4394@redhat.com> (raw)
In-Reply-To: <20190110164248.GO31793@dhcp22.suse.cz>

On Thu, Jan 10, 2019 at 05:42:48PM +0100, Michal Hocko wrote:
> On Thu 10-01-19 10:53:17, Jerome Glisse wrote:
> > On Tue, Jan 08, 2019 at 03:52:56PM +0100, Michal Hocko wrote:
> > > On Wed 02-01-19 12:21:10, Jonathan Cameron wrote:
> > > [...]
> > > > So ideally I'd love this set to head in a direction that helps me tick off
> > > > at least some of the above usecases and hopefully have some visibility on
> > > > how to address the others moving forwards,
> > > 
> > > Is it sufficient to have such a memory marked as movable (aka only have
> > > ZONE_MOVABLE)? That should rule out most of the kernel allocations and
> > > it fits the "balance by migration" concept.
> > 
> > This would not work for GPU, GPU driver really want to be in total
> > control of their memory yet sometimes they want to migrate some part
> > of the process to their memory.
> 
> But that also means that GPU doesn't really fit the model discussed
> here, right? I thought HMM is the way to manage such a memory.

HMM provides the plumbing and tools to manage but right now the patchset
for nouveau expose API through nouveau device file as nouveau ioctl. This
is not a good long term solution when you want to mix and match multiple
GPUs memory (possibly from different vendors). Then you get each device
driver implementing their own mem policy infrastructure and without any
coordination between devices/drivers. While it is _mostly_ ok for single
GPU case, it is seriously crippling for the multi-GPUs or multi-devices
cases (for instance when you chain network and GPU together or GPU and
storage).

People have been asking for a single common API to manage both regular
memory and device memory. As anyway the common case is you move things
around depending on which devices/CPUs is working on the dataset.

Cheers,
Jï¿½rï¿½me

WARNING: multiple messages have this Message-ID

From: Jerome Glisse <jglisse@redhat.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: Andrea Arcangeli <aarcange@redhat.com>,
	Huang Ying <ying.huang@intel.com>,
	Zhang Yi <yi.z.zhang@linux.intel.com>,
	kvm@vger.kernel.org, Dave Hansen <dave.hansen@intel.com>,
	Liu Jingqi <jingqi.liu@intel.com>, Yao Yuan <yuan.yao@intel.com>,
	Fan Du <fan.du@intel.com>, Dong Eddie <eddie.dong@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linux Memory Management List <linux-mm@kvack.org>,
	Peng Dong <dongx.peng@intel.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	Fengguang Wu <fengguang.wu@intel.com>,
	Dan Williams <dan.j.williams@intel.com>,
	linux-accelerators@lists.ozlabs.org, Mel Gorman <mgorman@suse.de>
Subject: Re: [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration
Date: Thu, 10 Jan 2019 12:42:00 -0500	[thread overview]
Message-ID: <20190110174159.GD4394@redhat.com> (raw)
Message-ID: <20190110174200.zqFodphxD1BbKrTb-uA7IfGbNa69XtI4J_5lF_2IAIw@z> (raw)
In-Reply-To: <20190110164248.GO31793@dhcp22.suse.cz>

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="UTF-8", Size: 1813 bytes --]

On Thu, Jan 10, 2019 at 05:42:48PM +0100, Michal Hocko wrote:
> On Thu 10-01-19 10:53:17, Jerome Glisse wrote:
> > On Tue, Jan 08, 2019 at 03:52:56PM +0100, Michal Hocko wrote:
> > > On Wed 02-01-19 12:21:10, Jonathan Cameron wrote:
> > > [...]
> > > > So ideally I'd love this set to head in a direction that helps me tick off
> > > > at least some of the above usecases and hopefully have some visibility on
> > > > how to address the others moving forwards,
> > > 
> > > Is it sufficient to have such a memory marked as movable (aka only have
> > > ZONE_MOVABLE)? That should rule out most of the kernel allocations and
> > > it fits the "balance by migration" concept.
> > 
> > This would not work for GPU, GPU driver really want to be in total
> > control of their memory yet sometimes they want to migrate some part
> > of the process to their memory.
> 
> But that also means that GPU doesn't really fit the model discussed
> here, right? I thought HMM is the way to manage such a memory.

HMM provides the plumbing and tools to manage but right now the patchset
for nouveau expose API through nouveau device file as nouveau ioctl. This
is not a good long term solution when you want to mix and match multiple
GPUs memory (possibly from different vendors). Then you get each device
driver implementing their own mem policy infrastructure and without any
coordination between devices/drivers. While it is _mostly_ ok for single
GPU case, it is seriously crippling for the multi-GPUs or multi-devices
cases (for instance when you chain network and GPU together or GPU and
storage).

People have been asking for a single common API to manage both regular
memory and device memory. As anyway the common case is you move things
around depending on which devices/CPUs is working on the dataset.

Cheers,
Jérôme

next prev parent reply	other threads:[~2019-01-10 17:42 UTC|newest]

Thread overview: 99+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-26 13:14 Fengguang Wu
2018-12-26 13:14 ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 01/21] e820: cheat PMEM as DRAM Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-27  3:41   ` Matthew Wilcox
2018-12-27  4:11     ` Fengguang Wu
2018-12-27  5:13       ` Dan Williams
2018-12-27  5:13         ` Dan Williams
2018-12-27 19:32         ` Yang Shi
2018-12-27 19:32           ` Yang Shi
2018-12-28  3:27           ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 02/21] acpi/numa: memorize NUMA node type from SRAT table Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 03/21] x86/numa_emulation: fix fake NUMA in uniform case Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 04/21] x86/numa_emulation: pass numa node type to fake nodes Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 05/21] mmzone: new pgdat flags for DRAM and PMEM Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 06/21] x86,numa: update numa node type Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 07/21] mm: export node type {pmem|dram} under /sys/bus/node Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 08/21] mm: introduce and export pgdat peer_node Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-27 20:07   ` Christopher Lameter
2018-12-27 20:07     ` Christopher Lameter
2018-12-28  2:31     ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 09/21] mm: avoid duplicate peer target node Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 10/21] mm: build separate zonelist for PMEM and DRAM node Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2019-01-01  9:14   ` Aneesh Kumar K.V
2019-01-01  9:14     ` Aneesh Kumar K.V
2019-01-07  9:57     ` Fengguang Wu
2019-01-07 14:09       ` Aneesh Kumar K.V
2018-12-26 13:14 ` [RFC][PATCH v2 11/21] kvm: allocate page table pages from DRAM Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2019-01-01  9:23   ` Aneesh Kumar K.V
2019-01-01  9:23     ` Aneesh Kumar K.V
2019-01-02  0:59     ` Yuan Yao
2019-01-02 16:47   ` Dave Hansen
2019-01-07 10:21     ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 12/21] x86/pgtable: " Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:14 ` [RFC][PATCH v2 13/21] x86/pgtable: dont check PMD accessed bit Fengguang Wu
2018-12-26 13:14   ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 14/21] kvm: register in mm_struct Fengguang Wu
2018-12-26 13:15   ` Fengguang Wu
2019-02-02  6:57   ` Peter Xu
2019-02-02 10:50     ` Fengguang Wu
2019-02-04 10:46     ` Paolo Bonzini
2018-12-26 13:15 ` [RFC][PATCH v2 15/21] ept-idle: EPT walk for virtual machine Fengguang Wu
2018-12-26 13:15   ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 16/21] mm-idle: mm_walk for normal task Fengguang Wu
2018-12-26 13:15   ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 17/21] proc: introduce /proc/PID/idle_pages Fengguang Wu
2018-12-26 13:15   ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 18/21] kvm-ept-idle: enable module Fengguang Wu
2018-12-26 13:15   ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 19/21] mm/migrate.c: add move_pages(MPOL_MF_SW_YOUNG) flag Fengguang Wu
2018-12-26 13:15   ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 20/21] mm/vmscan.c: migrate anon DRAM pages to PMEM node Fengguang Wu
2018-12-26 13:15   ` Fengguang Wu
2018-12-26 13:15 ` [RFC][PATCH v2 21/21] mm/vmscan.c: shrink anon list if can migrate to PMEM Fengguang Wu
2018-12-26 13:15   ` Fengguang Wu
2018-12-27 20:31 ` [RFC][PATCH v2 00/21] PMEM NUMA node and hotness accounting/migration Michal Hocko
2018-12-28  5:08   ` Fengguang Wu
2018-12-28  8:41     ` Michal Hocko
2018-12-28  9:42       ` Fengguang Wu
2018-12-28 12:15         ` Michal Hocko
2018-12-28 13:15           ` Fengguang Wu
2018-12-28 13:15             ` Fengguang Wu
2018-12-28 19:46             ` Michal Hocko
2018-12-28 13:31           ` Fengguang Wu
2018-12-28 18:28             ` Yang Shi
2018-12-28 18:28               ` Yang Shi
2018-12-28 19:52             ` Michal Hocko
2019-01-02 12:21               ` Jonathan Cameron
2019-01-02 12:21                 ` Jonathan Cameron
2019-01-08 14:52                 ` Michal Hocko
2019-01-10 15:53                   ` Jerome Glisse
2019-01-10 15:53                     ` Jerome Glisse
2019-01-10 16:42                     ` Michal Hocko
2019-01-10 17:42                       ` Jerome Glisse [this message]
2019-01-10 17:42                         ` Jerome Glisse
2019-01-10 18:26                   ` Jonathan Cameron
2019-01-10 18:26                     ` Jonathan Cameron
2019-01-28 17:42                 ` Jonathan Cameron
2019-01-28 17:42                   ` Jonathan Cameron
2019-01-29  2:00                   ` Fengguang Wu
2019-01-03 10:57               ` Mel Gorman
2019-01-10 16:25               ` Jerome Glisse
2019-01-10 16:25                 ` Jerome Glisse
2019-01-10 16:50                 ` Michal Hocko
2019-01-10 18:02                   ` Jerome Glisse
2019-01-10 18:02                     ` Jerome Glisse
2019-01-02 18:12       ` Dave Hansen
2019-01-08 14:53         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190110174159.GD4394@redhat.com \
    --to=jglisse@redhat.com \
    --cc=aarcange@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dongx.peng@intel.com \
    --cc=eddie.dong@intel.com \
    --cc=fan.du@intel.com \
    --cc=fengguang.wu@intel.com \
    --cc=jingqi.liu@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-accelerators@lists.ozlabs.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@kernel.org \
    --cc=yi.z.zhang@linux.intel.com \
    --cc=ying.huang@intel.com \
    --cc=yuan.yao@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox