linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Haggai Eran <haggaie@mellanox.com>
Cc: Christoph Hellwig <hch@infradead.org>,
	davide rossetti <davide.rossetti@gmail.com>,
	Jason Gunthorpe <jgunthorpe@obsidianresearch.com>,
	Kovalyov Artemy <artemyko@mellanox.com>,
	"dledford@redhat.com" <dledford@redhat.com>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	Leon Romanovsky <leonro@mellanox.com>,
	Sagi Grimberg <sagig@mellanox.com>
Subject: Re: [RFC 0/7] Peer-direct memory
Date: Fri, 19 Feb 2016 10:54:58 -0800	[thread overview]
Message-ID: <CAA9_cmfSixFu6roxjQ7z6N7tgDKgK2oEYrwb=7=MmjgnxOhEkA@mail.gmail.com> (raw)
In-Reply-To: <56C490DF.1090100@mellanox.com>

On Wed, Feb 17, 2016 at 7:25 AM, Haggai Eran <haggaie@mellanox.com> wrote:
> On 17/02/2016 10:44, Christoph Hellwig wrote:
>> That doesn't change how the are managed.  We've always suppored mapping
>> BARs to userspace in various drivers, and the only real news with things
>> like the pmem driver with DAX or some of the things people want to do
>> with the NVMe controller memoery buffer is that there are much bigger
>> quantities of it, and:
>>
>>  a) people want to be able  have cachable mappings of various kinds
>>     instead of the old uncachable default.
> What if we do want an uncachable mapping for our device's BAR. Can we still
> expose it under ZONE_DEVICE?
>
>>  b) we want to be able to DMA (including RDMA) to the regions in the
>>     BARs.
>>
>> a) is something that needs smaller amounts in all kinds of areas to be
>> done properly, but in principle GPU drivers have been doing this forever
>> using all kinds of hacks.
>>
>> b) is the real issue.  The Linux DMA support code doesn't really operate
>> on just physical addresses, but on page structures, and we don't
>> allocate for BARs.  We investigated two ways to address this:  1) allow
>> DMA operations without struct page and 2) create struct page structures
>> for BARs that we want to be able to use DMA operations on.  For various
>> reasons version 2) was favored and this is how we ended up with
>> ZONE_DEVICE.  Read the linux-mm and linux-nvdimm lists for the lenghty
>> discussions how we ended up here.
>
> I was wondering what are your thoughts regarding the other questions we raised
> about ZONE_DEVICE.
>
> How can we overcome the section-alignment requirement in the current code? Our
> HCA's BARs are usually smaller than 128MB.

This may not help, but note that the section-alignment only bites when
trying to have 2 mappings with different lifetimes in a single
section.  It's otherwise fine to map a full section for a smaller
single range, you'll just end up with pages that won't be used.
However, this assumes that you are fine with everything in that
section being mapped cacheable, you couldn't mix uncacheable mappings
in that same range.

> Sagi also asked how should a peer device who got a ZONE_DEVICE page know it
> should stop using it (the CMB example).

ZONE_DEVICE pages come with a per-cpu reference counter via
page->pgmap.  See get_dev_pagemap(), get_zone_device_page(), and
put_zone_device_page().

However this gets confusing quickly when a 'pfn' and a 'page' start
referencing mmio space instead of host memory.  It seems like we need
new data types because a dma_addr_t does not necessarily reflect the
peer-to-peer address as seen by the device.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-02-19 18:54 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1455207177-11949-1-git-send-email-artemyko@mellanox.com>
     [not found] ` <20160211191838.GA23675@obsidianresearch.com>
2016-02-14 14:27   ` Haggai Eran
2016-02-16 18:22     ` Jason Gunthorpe
2016-02-17  4:03       ` davide rossetti
2016-02-17  4:13         ` davide rossetti
2016-02-17  4:44           ` Jason Gunthorpe
2016-02-17  8:49             ` Christoph Hellwig
2016-02-18 17:12               ` Jason Gunthorpe
2016-02-17  8:44           ` Christoph Hellwig
2016-02-17 15:25             ` Haggai Eran
2016-02-19 18:54               ` Dan Williams [this message]
     [not found]   ` <20160212201328.GA14122@infradead.org>
     [not found]     ` <20160212203649.GA10540@obsidianresearch.com>
     [not found]       ` <56C09C7E.4060808@dev.mellanox.co.il>
     [not found]         ` <36F6EBABA23FEF4391AF72944D228901EB70C102@BBYEXM01.pmc-sierra.internal>
2016-02-21  9:06           ` Haggai Eran
2016-02-24 23:45             ` Stephen Bates
2016-02-25 11:27               ` Haggai Eran

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAA9_cmfSixFu6roxjQ7z6N7tgDKgK2oEYrwb=7=MmjgnxOhEkA@mail.gmail.com' \
    --to=dan.j.williams@intel.com \
    --cc=artemyko@mellanox.com \
    --cc=davide.rossetti@gmail.com \
    --cc=dledford@redhat.com \
    --cc=haggaie@mellanox.com \
    --cc=hch@infradead.org \
    --cc=jgunthorpe@obsidianresearch.com \
    --cc=leonro@mellanox.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=sagig@mellanox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox