linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Jerome Glisse <jglisse@redhat.com>
To: Jason Gunthorpe <jgg@mellanox.com>
Cc: Ralph Campbell <rcampbell@nvidia.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Christoph Hellwig <hch@lst.de>,
	"linux-rdma@vger.kernel.org" <linux-rdma@vger.kernel.org>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>
Subject: Re: [PATCH v2 2/3] mm/hmm: allow snapshot of the special zero page
Date: Tue, 22 Oct 2019 13:06:31 -0400	[thread overview]
Message-ID: <20191022170631.GA4805@redhat.com> (raw)
In-Reply-To: <20191022150514.GH22766@mellanox.com>

On Tue, Oct 22, 2019 at 03:05:18PM +0000, Jason Gunthorpe wrote:
> On Mon, Oct 21, 2019 at 10:45:49PM -0400, Jerome Glisse wrote:
> > On Mon, Oct 21, 2019 at 01:54:15PM -0700, Ralph Campbell wrote:
> > > 
> > > On 10/21/19 11:49 AM, Jerome Glisse wrote:
> > > > On Tue, Oct 15, 2019 at 01:48:13PM -0700, Ralph Campbell wrote:
> > > > > Allow hmm_range_fault() to return success (0) when the CPU pagetable
> > > > > entry points to the special shared zero page.
> > > > > The caller can then handle the zero page by possibly clearing device
> > > > > private memory instead of DMAing a zero page.
> > > > 
> > > > I do not understand why you are talking about DMA. GPU can work
> > > > on main memory and migrating to GPU memory is optional and should
> > > > not involve this function at all.
> > > 
> > > Good point. This is the device accessing the zero page over PCIe
> > > or another bus, not migrating a zero page to device private memory.
> > > I'll update the wording.
> > > 
> > > > > 
> > > > > Signed-off-by: Ralph Campbell <rcampbell@nvidia.com>
> > > > > Reviewed-by: Christoph Hellwig <hch@lst.de>
> > > > > Cc: "Jérôme Glisse" <jglisse@redhat.com>
> > > > > Cc: Jason Gunthorpe <jgg@mellanox.com>
> > > > 
> > > > NAK please keep semantic or change it fully. See the alternative
> > > > below.
> > > > 
> > > > >   mm/hmm.c | 4 +++-
> > > > >   1 file changed, 3 insertions(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/mm/hmm.c b/mm/hmm.c
> > > > > index 5df0dbf77e89..f62b119722a3 100644
> > > > > +++ b/mm/hmm.c
> > > > > @@ -530,7 +530,9 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
> > > > >   			return -EBUSY;
> > > > >   	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> > > > >   		*pfn = range->values[HMM_PFN_SPECIAL];
> > > > > -		return -EFAULT;
> > > > > +		if (!is_zero_pfn(pte_pfn(pte)))
> > > > > +			return -EFAULT;
> > > > > +		return 0;
> > > > 
> > > > An acceptable change would be to turn the branch into:
> > > > 	} else if (IS_ENABLED(CONFIG_ARCH_HAS_PTE_SPECIAL) && pte_special(pte)) {
> > > > 		if (!is_zero_pfn(pte_pfn(pte))) {
> > > > 			*pfn = range->values[HMM_PFN_SPECIAL];
> > > > 			return -EFAULT;
> > > > 		}
> > > > 		/* Fall-through for zero pfn (if write was needed the above
> > > > 		 * hmm_pte_need_faul() would had catched it).
> > > > 		 */
> > > > 	}
> > > > 
> > > 
> > > Except this will return the zero pfn with no indication that it is special
> > > (i.e., doesn't have a struct page).
> > 
> > That is fine, the device driver should not do anything with it ie
> > if the device driver wanted to write then the write fault test
> > would return true and it would fault.
> > 
> > Note that driver should not dereference the struct page.
> 
> Can this thing be dma mapped for read?
> 

Yes it can, the zero page is just a regular page (AFAIK on all
architecture). So device can dma map it for read only, there is
no reason to treat it any differently.

The HMM_PTE_SPECIAL is only (as documented in the header) for
pte insert with insert_pfn or insert_page ie pte inserted in
vma with MIXED or PFNMAP flag. While HMM catch those vma early
on and backof it can still race with some driver setting the vma
flag and installing special pte afterward hence why special pte
goes through this special path.

The zero page being a special pte is just an exception ie it
is the only special pte allowed in vma that do not have MIXED or
PFNMAP flag set.

Cheers,
Jérôme



  reply	other threads:[~2019-10-22 17:06 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-15 20:48 [PATCH v2 0/3] HMM tests and minor fixes Ralph Campbell
2019-10-15 20:48 ` [PATCH v2 1/3] mm/hmm: make full use of walk_page_range() Ralph Campbell
2019-10-21 18:32   ` Jason Gunthorpe
2019-10-21 20:32     ` Ralph Campbell
2019-10-15 20:48 ` [PATCH v2 2/3] mm/hmm: allow snapshot of the special zero page Ralph Campbell
2019-10-21 18:08   ` Jason Gunthorpe
2019-10-21 20:08     ` Ralph Campbell
2019-10-21 18:49   ` Jerome Glisse
2019-10-21 20:54     ` Ralph Campbell
2019-10-22  2:45       ` Jerome Glisse
2019-10-22 15:05         ` Jason Gunthorpe
2019-10-22 17:06           ` Jerome Glisse [this message]
2019-10-22 17:09             ` Jason Gunthorpe
2019-10-22 17:30               ` Jerome Glisse
2019-10-22 17:41                 ` Jason Gunthorpe
2019-10-22 17:52                   ` Jerome Glisse
2019-10-15 20:48 ` [PATCH v2 3/3] mm/hmm/test: add self tests for HMM Ralph Campbell
2019-10-21 18:50   ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191022170631.GA4805@redhat.com \
    --to=jglisse@redhat.com \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-mm@kvack.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=rcampbell@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox