Re: [PATCH v5 1/2] mm, hugepages: add mremap() support for hugepage backed vma

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Mina Almasry <almasrymina@google.com>
To: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org,  linux-kernel@vger.kernel.org,
	Ken Chen <kenchen@google.com>,
	 Chris Kennelly <ckennelly@google.com>,
	Michal Hocko <mhocko@suse.com>, Vlastimil Babka <vbabka@suse.cz>,
	 Kirill Shutemov <kirill@shutemov.name>
Subject: Re: [PATCH v5 1/2] mm, hugepages: add mremap() support for hugepage backed vma
Date: Mon, 11 Oct 2021 18:21:31 -0700	[thread overview]
Message-ID: <CAHS8izPjCJf=qotNMKdpzrRtR78nUux1mAUUY8SPW+ydj56QyA@mail.gmail.com> (raw)
In-Reply-To: <b889652d-7050-a721-367c-dcf3ff084daa@oracle.com>

On Mon, Oct 11, 2021 at 5:18 PM Mike Kravetz <mike.kravetz@oracle.com> wrote:
>
> On 10/8/21 11:32 AM, Mina Almasry wrote:
> > Support mremap() for hugepage backed vma segment by simply repositioning
> > page table entries. The page table entries are repositioned to the new
> > virtual address on mremap().
> >
> > Hugetlb mremap() support is of course generic; my motivating use case
> > is a library (hugepage_text), which reloads the ELF text of executables
> > in hugepages. This significantly increases the execution performance of
> > said executables.
> >
> > Restricts the mremap operation on hugepages to up to the size of the
> > original mapping as the underlying hugetlb reservation is not yet
> > capable of handling remapping to a larger size.
> >
> > During the mremap() operation we detect pmd_share'd mappings and we
> > unshare those during the mremap(). On access and fault the sharing is
> > established again.
> >
> > Signed-off-by: Mina Almasry <almasrymina@google.com>
> >
> ...
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index 6d2f4c25dd9fb..8200b4c8d09d8 100644
> > --- a/mm/hugetlb.c
> > +++ b/mm/hugetlb.c
> > @@ -1015,6 +1015,35 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma)
> >               vma->vm_private_data = (void *)0;
> >  }
> >
> > +/*
> > + * Reset and decrement one ref on hugepage private reservation.
> > + * Called with mm->mmap_sem writer semaphore held.
> > + * This function should be only used by move_vma() and operate on
> > + * same sized vma. It should never come here with last ref on the
> > + * reservation.
> > + */
> > +void clear_vma_resv_huge_pages(struct vm_area_struct *vma)
> > +{
> > +     /*
> > +      * Clear the old hugetlb private page reservation.
> > +      * It has already been transferred to new_vma.
> > +      *
> > +      * During a mremap() operation of a hugetlb vma we call move_vma()
> > +      * which copies *vma* into *new_vma* and unmaps *vma*. After the copy
> > +      * operation both *new_vma* and *vma* share a reference to the resv_map
> > +      * struct, and at that point *vma* is about to be unmapped. We don't
> > +      * want to return the reservation to the pool at unmap of *vma* because
> > +      * the reservation still lives on in new_vma, so simply decrement the
> > +      * ref here and remove the resv_map reference from this vma.
> > +      */
>
> Are the *...* for special formatting of the words somewhere?  Or, just
> for simple added emphasis?  This convention is not used anywhere else in
> the file.  Unless there is a good reason for doing so, I would prefer to
> to drop the *...* convention here.
>

It was just emphasis, removed!

> > +     struct resv_map *reservations = vma_resv_map(vma);
> > +
> > +     if (reservations && is_vma_resv_set(vma, HPAGE_RESV_OWNER))
> > +             kref_put(&reservations->refs, resv_map_release);
> > +
> > +     reset_vma_resv_huge_pages(vma);
> > +}
> ...
> > diff --git a/mm/mremap.c b/mm/mremap.c
> > index c0b6c41b7b78f..6a3f7d38b7539 100644
> > --- a/mm/mremap.c
> > +++ b/mm/mremap.c
> > @@ -489,6 +489,10 @@ unsigned long move_page_tables(struct vm_area_struct *vma,
> >       old_end = old_addr + len;
> >       flush_cache_range(vma, old_addr, old_end);
> >
> > +     if (is_vm_hugetlb_page(vma))
> > +             return move_hugetlb_page_tables(vma, new_vma, old_addr,
> > +                                             new_addr, len);
> > +
> >       mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm,
> >                               old_addr, old_end);
> >       mmu_notifier_invalidate_range_start(&range);
> > @@ -646,6 +650,10 @@ static unsigned long move_vma(struct vm_area_struct *vma,
> >               mremap_userfaultfd_prep(new_vma, uf);
> >       }
> >
> > +     if (is_vm_hugetlb_page(vma)) {
> > +             clear_vma_resv_huge_pages(vma);
> > +     }
> > +
> >       /* Conceal VM_ACCOUNT so old reservation is not undone */
> >       if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) {
> >               vma->vm_flags &= ~VM_ACCOUNT;
> > @@ -739,9 +747,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr,
> >                       (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP)))
> >               return ERR_PTR(-EINVAL);
> >
> > -     if (is_vm_hugetlb_page(vma))
> > -             return ERR_PTR(-EINVAL);
> > -
> >       /* We can't remap across vm area boundaries */
> >       if (old_len > vma->vm_end - addr)
> >               return ERR_PTR(-EFAULT);
> > @@ -937,6 +942,27 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len,
> >
> >       if (mmap_write_lock_killable(current->mm))
> >               return -EINTR;
> > +     vma = find_vma(mm, addr);
> > +     if (!vma || vma->vm_start > addr) {
> > +             ret = EFAULT;
> > +             goto out;
> > +     }
> > +
> > +     if (is_vm_hugetlb_page(vma)) {
> > +             struct hstate *h __maybe_unused = hstate_vma(vma);
> > +
> > +             old_len = ALIGN(old_len, huge_page_size(h));
> > +             new_len = ALIGN(new_len, huge_page_size(h));
> > +             addr = ALIGN(addr, huge_page_size(h));
> > +             new_addr = ALIGN(new_addr, huge_page_size(h));
>
> Instead of aligning addr and new_addr, we should be checking for huge
> page alignment and returning error.  This makes it consistent with the
> requirement that they be PAGE aligned in the non-hugetlb case.  Sorry if
> that was unclear in previous comments.
>
>                 /* addrs must be huge page aligned */
>                 if (addr & ~huge_page_mask(h))
>                         goto out;
>                 if (new_addr & ~huge_page_mask(h))
>                         goto out;
>

Sorry I misunderstood. Added!

     prev parent reply	other threads:[~2021-10-12  1:21 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-08 18:32 Mina Almasry
2021-10-08 18:32 ` [PATCH v5 2/2] mm, hugepages: Add hugetlb vma mremap() test Mina Almasry
2021-10-09 17:21 ` [PATCH v5 1/2] mm, hugepages: add mremap() support for hugepage backed vma kernel test robot
2021-10-12  0:18 ` Mike Kravetz
2021-10-12  1:21   ` Mina Almasry [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHS8izPjCJf=qotNMKdpzrRtR78nUux1mAUUY8SPW+ydj56QyA@mail.gmail.com' \
    --to=almasrymina@google.com \
    --cc=akpm@linux-foundation.org \
    --cc=ckennelly@google.com \
    --cc=kenchen@google.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@suse.com \
    --cc=mike.kravetz@oracle.com \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox