linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
From: Dan Williams <dan.j.williams@intel.com>
To: Jason Gunthorpe <jgg@mellanox.com>
Cc: "Christoph Hellwig" <hch@lst.de>,
	"Jérôme Glisse" <jglisse@redhat.com>,
	"Ben Skeggs" <bskeggs@redhat.com>,
	"Felix Kuehling" <Felix.Kuehling@amd.com>,
	"Ralph Campbell" <rcampbell@nvidia.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	"nouveau@lists.freedesktop.org" <nouveau@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk
Date: Wed, 7 Aug 2019 11:47:22 -0700	[thread overview]
Message-ID: <CAPcyv4hPCuHBLhSJgZZEh0CbuuJNPLFDA3f-79FX5uVOO0yubA@mail.gmail.com> (raw)
In-Reply-To: <20190807174548.GJ1571@mellanox.com>

On Wed, Aug 7, 2019 at 10:45 AM Jason Gunthorpe <jgg@mellanox.com> wrote:
>
> On Tue, Aug 06, 2019 at 07:05:42PM +0300, Christoph Hellwig wrote:
> > There is only a single place where the pgmap is passed over a function
> > call, so replace it with local variables in the places where we deal
> > with the pgmap.
> >
> > Signed-off-by: Christoph Hellwig <hch@lst.de>
> >  mm/hmm.c | 62 ++++++++++++++++++++++++--------------------------------
> >  1 file changed, 27 insertions(+), 35 deletions(-)
> >
> > diff --git a/mm/hmm.c b/mm/hmm.c
> > index 9a908902e4cc..d66fa29b42e0 100644
> > +++ b/mm/hmm.c
> > @@ -278,7 +278,6 @@ EXPORT_SYMBOL(hmm_mirror_unregister);
> >
> >  struct hmm_vma_walk {
> >       struct hmm_range        *range;
> > -     struct dev_pagemap      *pgmap;
> >       unsigned long           last;
> >       unsigned int            flags;
> >  };
> > @@ -475,6 +474,7 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> >  #ifdef CONFIG_TRANSPARENT_HUGEPAGE
> >       struct hmm_vma_walk *hmm_vma_walk = walk->private;
> >       struct hmm_range *range = hmm_vma_walk->range;
> > +     struct dev_pagemap *pgmap = NULL;
> >       unsigned long pfn, npages, i;
> >       bool fault, write_fault;
> >       uint64_t cpu_flags;
> > @@ -490,17 +490,14 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
> >       pfn = pmd_pfn(pmd) + pte_index(addr);
> >       for (i = 0; addr < end; addr += PAGE_SIZE, i++, pfn++) {
> >               if (pmd_devmap(pmd)) {
> > -                     hmm_vma_walk->pgmap = get_dev_pagemap(pfn,
> > -                                           hmm_vma_walk->pgmap);
> > -                     if (unlikely(!hmm_vma_walk->pgmap))
> > +                     pgmap = get_dev_pagemap(pfn, pgmap);
> > +                     if (unlikely(!pgmap))
> >                               return -EBUSY;
>
> Unrelated to this patch, but what is the point of getting checking
> that the pgmap exists for the page and then immediately releasing it?
> This code has this pattern in several places.
>
> It feels racy

Agree, not sure what the intent is here. The only other reason call
get_dev_pagemap() is to just check in general if the pfn is indeed
owned by some ZONE_DEVICE instance, but if the intent is to make sure
the device is still attached/enabled that check is invalidated at
put_dev_pagemap().

If it's the former case, validating ZONE_DEVICE pfns, I imagine we can
do something cheaper with a helper that is on the order of the same
cost as pfn_valid(). I.e. replace PTE_DEVMAP with a mem_section flag
or something similar.

>
> >               }
> >               pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
> >       }
> > -     if (hmm_vma_walk->pgmap) {
> > -             put_dev_pagemap(hmm_vma_walk->pgmap);
> > -             hmm_vma_walk->pgmap = NULL;
>
> Putting the value in the hmm_vma_walk would have made some sense to me
> if the pgmap was not set to NULL all over the place. Then the most
> xa_loads would be eliminated, as I would expect the pgmap tends to be
> mostly uniform for these use cases.
>
> Is there some reason the pgmap ref can't be held across
> faulting/sleeping? ie like below.

No restriction on holding refs over faulting / sleeping.

>
> Anyhow, I looked over this pretty carefully and the change looks
> functionally OK, I just don't know why the code is like this in the
> first place.
>
> Reviewed-by: Jason Gunthorpe <jgg@mellanox.com>
>
> diff --git a/mm/hmm.c b/mm/hmm.c
> index 9a908902e4cc38..4e30128c23a505 100644
> --- a/mm/hmm.c
> +++ b/mm/hmm.c
> @@ -497,10 +497,6 @@ static int hmm_vma_handle_pmd(struct mm_walk *walk,
>                 }
>                 pfns[i] = hmm_device_entry_from_pfn(range, pfn) | cpu_flags;
>         }
> -       if (hmm_vma_walk->pgmap) {
> -               put_dev_pagemap(hmm_vma_walk->pgmap);
> -               hmm_vma_walk->pgmap = NULL;
> -       }
>         hmm_vma_walk->last = end;
>         return 0;
>  #else
> @@ -604,10 +600,6 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr,
>         return 0;
>
>  fault:
> -       if (hmm_vma_walk->pgmap) {
> -               put_dev_pagemap(hmm_vma_walk->pgmap);
> -               hmm_vma_walk->pgmap = NULL;
> -       }
>         pte_unmap(ptep);
>         /* Fault any virtual address we were asked to fault */
>         return hmm_vma_walk_hole_(addr, end, fault, write_fault, walk);
> @@ -690,16 +682,6 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp,
>                         return r;
>                 }
>         }
> -       if (hmm_vma_walk->pgmap) {
> -               /*
> -                * We do put_dev_pagemap() here and not in hmm_vma_handle_pte()
> -                * so that we can leverage get_dev_pagemap() optimization which
> -                * will not re-take a reference on a pgmap if we already have
> -                * one.
> -                */
> -               put_dev_pagemap(hmm_vma_walk->pgmap);
> -               hmm_vma_walk->pgmap = NULL;
> -       }
>         pte_unmap(ptep - 1);
>
>         hmm_vma_walk->last = addr;
> @@ -751,10 +733,6 @@ static int hmm_vma_walk_pud(pud_t *pudp,
>                         pfns[i] = hmm_device_entry_from_pfn(range, pfn) |
>                                   cpu_flags;
>                 }
> -               if (hmm_vma_walk->pgmap) {
> -                       put_dev_pagemap(hmm_vma_walk->pgmap);
> -                       hmm_vma_walk->pgmap = NULL;
> -               }
>                 hmm_vma_walk->last = end;
>                 return 0;
>         }
> @@ -1026,6 +1004,14 @@ long hmm_range_fault(struct hmm_range *range, unsigned int flags)
>                         /* Keep trying while the range is valid. */
>                 } while (ret == -EBUSY && range->valid);
>
> +               /*
> +                * We do put_dev_pagemap() here so that we can leverage
> +                * get_dev_pagemap() optimization which will not re-take a
> +                * reference on a pgmap if we already have one.
> +                */
> +               if (hmm_vma_walk->pgmap)
> +                       put_dev_pagemap(hmm_vma_walk->pgmap);
> +

Seems ok, but only if the caller is guaranteeing that the range does
not span outside of a single pagemap instance. If that guarantee is
met why not just have the caller pass in a pinned pagemap? If that
guarantee is not met, then I think we're back to your race concern.


  reply	other threads:[~2019-08-07 18:47 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-06 16:05 hmm cleanups, v2 Christoph Hellwig
2019-08-06 16:05 ` [PATCH 01/15] amdgpu: remove -EAGAIN handling for hmm_range_fault Christoph Hellwig
2019-08-06 16:05 ` [PATCH 02/15] amdgpu: don't initialize range->list in amdgpu_hmm_init_range Christoph Hellwig
2019-08-06 16:05 ` [PATCH 03/15] nouveau: pass struct nouveau_svmm to nouveau_range_fault Christoph Hellwig
2019-08-06 18:02   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 04/15] mm: remove the pgmap field from struct hmm_vma_walk Christoph Hellwig
2019-08-07 17:45   ` Jason Gunthorpe
2019-08-07 18:47     ` Dan Williams [this message]
2019-08-08  6:59       ` Christoph Hellwig
2019-08-14  1:36         ` Dan Williams
2019-08-14  7:38           ` Christoph Hellwig
2019-08-14 13:27             ` Jason Gunthorpe
2019-08-14 14:48               ` Dan Williams
2019-08-15 18:03                 ` Jerome Glisse
2019-08-15 19:22                   ` Jason Gunthorpe
2019-08-15 19:36                   ` Dan Williams
2019-08-15 19:43                     ` Jerome Glisse
2019-08-15 20:12                       ` Dan Williams
2019-08-15 20:33                         ` Jerome Glisse
2019-08-15 20:41                           ` Jason Gunthorpe
2019-08-15 20:47                             ` Dan Williams
2019-08-16  0:40                               ` Jason Gunthorpe
2019-08-16  3:54                                 ` Dan Williams
2019-08-16 12:24                                   ` Jason Gunthorpe
2019-08-16 17:21                                     ` Dan Williams
2019-08-16 17:28                                       ` Jason Gunthorpe
2019-08-16 21:10                                         ` Ralph Campbell
2019-08-15 20:51                             ` Jerome Glisse
2019-08-16  0:43                               ` Jason Gunthorpe
2019-08-16  4:44                                 ` Christoph Hellwig
2019-08-16 12:30                                   ` Jason Gunthorpe
2019-08-16 12:34                                     ` Christoph Hellwig
2019-08-16  4:41                           ` Christoph Hellwig
2019-08-06 16:05 ` [PATCH 05/15] mm: remove the unused vma argument to hmm_range_dma_unmap Christoph Hellwig
2019-08-06 16:05 ` [PATCH 06/15] mm: remove superflous arguments from hmm_range_register Christoph Hellwig
2019-08-06 16:05 ` [PATCH 07/15] mm: remove the page_shift member from struct hmm_range Christoph Hellwig
2019-08-07 17:51   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 08/15] mm: remove the mask variable in hmm_vma_walk_hugetlb_entry Christoph Hellwig
2019-08-06 18:02   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 09/15] mm: don't abuse pte_index() in hmm_vma_handle_pmd Christoph Hellwig
2019-08-07 17:18   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 10/15] mm: only define hmm_vma_walk_pud if needed Christoph Hellwig
2019-08-06 16:05 ` [PATCH 11/15] mm: cleanup the hmm_vma_handle_pmd stub Christoph Hellwig
2019-08-06 18:00   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 12/15] mm: cleanup the hmm_vma_walk_hugetlb_entry stub Christoph Hellwig
2019-08-06 16:05 ` [PATCH 13/15] mm: allow HMM_MIRROR on all architectures with MMU Christoph Hellwig
2019-08-06 16:05 ` [PATCH 14/15] mm: make HMM_MIRROR an implicit option Christoph Hellwig
2019-08-06 17:44   ` Jason Gunthorpe
2019-08-06 16:05 ` [PATCH 15/15] amdgpu: remove CONFIG_DRM_AMDGPU_USERPTR Christoph Hellwig
2019-08-06 17:44   ` Jason Gunthorpe
2019-08-06 17:51     ` Kuehling, Felix
2019-08-06 18:58       ` Alex Deucher
2019-08-06 20:03         ` Jason Gunthorpe
2019-08-07  6:57           ` Koenig, Christian
2019-08-07 11:46             ` Jason Gunthorpe
2019-08-07 18:17 ` hmm cleanups, v2 Jason Gunthorpe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAPcyv4hPCuHBLhSJgZZEh0CbuuJNPLFDA3f-79FX5uVOO0yubA@mail.gmail.com \
    --to=dan.j.williams@intel.com \
    --cc=Felix.Kuehling@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=bskeggs@redhat.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=hch@lst.de \
    --cc=jgg@mellanox.com \
    --cc=jglisse@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=nouveau@lists.freedesktop.org \
    --cc=rcampbell@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox