linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Anshuman Khandual <khandual@linux.vnet.ibm.com>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	mhocko@suse.com, js1304@gmail.com, vbabka@suse.cz,
	mgorman@suse.de, minchan@kernel.org, akpm@linux-foundation.org,
	bsingharora@gmail.com
Subject: Re: [RFC 0/8] Define coherent device memory node
Date: Tue, 25 Oct 2016 11:32:56 -0400	[thread overview]
Message-ID: <20161025153256.GB6131@gmail.com> (raw)
In-Reply-To: <877f8xaurp.fsf@linux.vnet.ibm.com>

On Tue, Oct 25, 2016 at 10:29:38AM +0530, Aneesh Kumar K.V wrote:
> Jerome Glisse <j.glisse@gmail.com> writes:
> > On Mon, Oct 24, 2016 at 10:01:49AM +0530, Anshuman Khandual wrote:

[...]

> > You can take a look at hmm-v13 if you want to see how i do non LRU page
> > migration. While i put most of the migration code inside hmm_migrate.c it
> > could easily be move to migrate.c without hmm_ prefix.
> >
> > There is 2 missing piece with existing migrate code. First is to put memory
> > allocation for destination under control of who call the migrate code. Second
> > is to allow offloading the copy operation to device (ie not use the CPU to
> > copy data).
> >
> > I believe same requirement also make sense for platform you are targeting.
> > Thus same code can be use.
> >
> > hmm-v13 https://cgit.freedesktop.org/~glisse/linux/log/?h=hmm-v13
> >
> > I haven't posted this patchset yet because we are doing some modifications
> > to the device driver API to accomodate some new features. But the ZONE_DEVICE
> > changes and the overall migration code will stay the same more or less (i have
> > patches that move it to migrate.c and share more code with existing migrate
> > code).
> >
> > If you think i missed anything about lru and page cache please point it to
> > me. Because when i audited code for that i didn't see any road block with
> > the few fs i was looking at (ext4, xfs and core page cache code).
> >
> 
> The other restriction around ZONE_DEVICE is, it is not a managed zone.
> That prevents any direct allocation from coherent device by application.
> ie, we would like to force allocation from coherent device using
> interface like mbind(MPOL_BIND..) . Is that possible with ZONE_DEVICE ?
 
To achieve this we rely on device fault code path ie when device take a page fault
with help of HMM it will use existing memory if any for fault address but if CPU
page table is empty (and it is not file back vma because of readback) then device
can directly allocate device memory and HMM will update CPU page table to point to
newly allocated device memory.

So in fact i am not using existing kernel API to achieve this but the whole policy
of where to allocate and what to allocate is under device driver responsability and
device driver leverage its existing userspace API to get proper hint/direction from
the application.

Device memory is really a special case in my view, it only make sense to use it if
memory is actively access by device and only way device access memory is when it is
program to do so through the device driver API. There is nothing such as GPU threads
in the kernel and there is no way to spawn or move work thread to GPU. This are
specialize device and they require special per device API. So in my view using
existing kernel API such as mbind() is counter productive. You might have buggy
software that will mbind their memory to device and never use the device which
lead to device memory being wasted for a process that never use the device.

So my opinion is that you should not try to use existing kernel API to get policy
information from userspace but let the device driver gather such policy through its
own private API.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-10-25 15:33 UTC|newest]

Thread overview: 68+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-24  4:31 Anshuman Khandual
2016-10-24  4:31 ` [RFC 1/8] mm: " Anshuman Khandual
2016-10-24 17:09   ` Dave Hansen
2016-10-25  1:22     ` Anshuman Khandual
2016-10-25 15:47       ` Dave Hansen
2016-10-24  4:31 ` [RFC 2/8] mm: Add specialized fallback zonelist for coherent device memory nodes Anshuman Khandual
2016-10-24 17:10   ` Dave Hansen
2016-10-25  1:27     ` Anshuman Khandual
2016-11-17  7:40   ` Anshuman Khandual
2016-11-17  7:59     ` [DRAFT 1/2] mm/cpuset: Exclude CDM nodes from each task's mems_allowed node mask Anshuman Khandual
2016-11-17  7:59       ` [DRAFT 2/2] mm/hugetlb: Restrict HugeTLB allocations only to the system RAM nodes Anshuman Khandual
2016-11-17  8:28       ` [DRAFT 1/2] mm/cpuset: Exclude CDM nodes from each task's mems_allowed node mask kbuild test robot
2016-10-24  4:31 ` [RFC 3/8] mm: Isolate coherent device memory nodes from HugeTLB allocation paths Anshuman Khandual
2016-10-24 17:16   ` Dave Hansen
2016-10-25  4:15     ` Aneesh Kumar K.V
2016-10-25  7:17       ` Balbir Singh
2016-10-25  7:25         ` Balbir Singh
2016-10-24  4:31 ` [RFC 4/8] mm: Accommodate coherent device memory nodes in MPOL_BIND implementation Anshuman Khandual
2016-10-24  4:31 ` [RFC 5/8] mm: Add new flag VM_CDM for coherent device memory Anshuman Khandual
2016-10-24 17:38   ` Dave Hansen
2016-10-24 18:00     ` Dave Hansen
2016-10-25 12:36     ` Balbir Singh
2016-10-25 19:20     ` Aneesh Kumar K.V
2016-10-25 20:01       ` Dave Hansen
2016-10-24  4:31 ` [RFC 6/8] mm: Make VM_CDM marked VMAs non migratable Anshuman Khandual
2016-10-24  4:31 ` [RFC 7/8] mm: Add a new migration function migrate_virtual_range() Anshuman Khandual
2016-10-24  4:31 ` [RFC 8/8] mm: Add N_COHERENT_DEVICE node type into node_states[] Anshuman Khandual
2016-10-25  7:22   ` Balbir Singh
2016-10-26  4:52     ` Anshuman Khandual
2016-10-24  4:42 ` [DEBUG 00/10] Test and debug patches for coherent device memory Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 01/10] dt-bindings: Add doc for ibm,hotplug-aperture Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 02/10] powerpc/mm: Create numa nodes for hotplug memory Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 03/10] powerpc/mm: Allow memory hotplug into a memory less node Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 04/10] mm: Enable CONFIG_MOVABLE_NODE on powerpc Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 05/10] powerpc/mm: Identify isolation seeking coherent memory nodes during boot Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 06/10] mm: Export definition of 'zone_names' array through mmzone.h Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 07/10] mm: Add debugfs interface to dump each node's zonelist information Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 08/10] powerpc: Enable CONFIG_MOVABLE_NODE for PPC64 platform Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 09/10] drivers: Add two drivers for coherent device memory tests Anshuman Khandual
2016-10-24  4:42   ` [DEBUG 10/10] test: Add a script to perform random VMA migrations across nodes Anshuman Khandual
2016-10-24 17:09 ` [RFC 0/8] Define coherent device memory node Jerome Glisse
2016-10-25  4:26   ` Aneesh Kumar K.V
2016-10-25 15:16     ` Jerome Glisse
2016-10-26 11:09       ` Aneesh Kumar K.V
2016-10-26 16:07         ` Jerome Glisse
2016-10-28  5:29           ` Aneesh Kumar K.V
2016-10-28 16:16             ` Jerome Glisse
2016-11-05  5:21     ` Anshuman Khandual
2016-11-05 18:02       ` Jerome Glisse
2016-10-25  4:59   ` Aneesh Kumar K.V
2016-10-25 15:32     ` Jerome Glisse [this message]
2016-10-25 17:31       ` Aneesh Kumar K.V
2016-10-25 18:52         ` Jerome Glisse
2016-10-26 11:13           ` Anshuman Khandual
2016-10-26 16:02             ` Jerome Glisse
2016-10-27  4:38               ` Anshuman Khandual
2016-10-27  7:03                 ` Anshuman Khandual
2016-10-27 15:05                   ` Jerome Glisse
2016-10-28  5:47                     ` Anshuman Khandual
2016-10-28 16:08                       ` Jerome Glisse
2016-10-26 12:56           ` Anshuman Khandual
2016-10-26 16:28             ` Jerome Glisse
2016-10-27 10:23               ` Balbir Singh
2016-10-25 12:07   ` Balbir Singh
2016-10-25 15:21     ` Jerome Glisse
2016-10-24 18:04 ` Dave Hansen
2016-10-24 18:32   ` David Nellans
2016-10-24 19:36     ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161025153256.GB6131@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=bsingharora@gmail.com \
    --cc=js1304@gmail.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@suse.de \
    --cc=mhocko@suse.com \
    --cc=minchan@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox