From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DED58C433EF for ; Tue, 12 Oct 2021 01:21:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 86D4160FD7 for ; Tue, 12 Oct 2021 01:21:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 86D4160FD7 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 2C97B6B006C; Mon, 11 Oct 2021 21:21:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 279D56B0071; Mon, 11 Oct 2021 21:21:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1688B900002; Mon, 11 Oct 2021 21:21:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0035.hostedemail.com [216.40.44.35]) by kanga.kvack.org (Postfix) with ESMTP id 075486B006C for ; Mon, 11 Oct 2021 21:21:44 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id AF4A130158 for ; Tue, 12 Oct 2021 01:21:43 +0000 (UTC) X-FDA: 78686033286.23.BC2463A Received: from mail-pj1-f41.google.com (mail-pj1-f41.google.com [209.85.216.41]) by imf21.hostedemail.com (Postfix) with ESMTP id 74EB5D042914 for ; Tue, 12 Oct 2021 01:21:43 +0000 (UTC) Received: by mail-pj1-f41.google.com with SMTP id g13-20020a17090a3c8d00b00196286963b9so681366pjc.3 for ; Mon, 11 Oct 2021 18:21:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=IPoscquxjIb2AeDUSXUKVfU6+dDnudQArTvMu9Qs840=; b=dD4lXSjQIOkMIOZ5hkbI0As/8rhccuPxt+Bt5ZM0xA7AKE5luhxiqYhUIVwjqcbycy /F4GlFCY6fr2Nhqo/vz+MwZG3TSskBWN4ogLGV9YLhxbVcNVZ6L2xXQ3dS64f0aRA8+p 8+0uttdf8HefKMA0J0Tb3r4HJfQHvcLtXOf+ZUiCKHaZ7vdBm1zA6fzuGLbcTEQ4DEbR nsEGfNyeC5wI782IuPKT3xA6Wpr7pCt+2Hk2x83BbdGu90yyENJuolD8CzIj5mHddHvL 9LThohDe/YqYNGKl7T495Wxdki7W3+Qn/ThQf85plXIyhFwhS+mvAWNqTJaGk3Bo7vNV g25A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IPoscquxjIb2AeDUSXUKVfU6+dDnudQArTvMu9Qs840=; b=tRRYdrGzw+MqA+qItU1vSZphu0vGBIX6chs1699YcRbOZn7adwaVoeIImhMaxNyPlM Iyq7YgX59r9/ifX6bFAgy0vWxBuMpL6g/iXWjZ6V/QqB5IKmi2ny+CiL/OCt+Ua3RdJg XQ6a/7AmThf6uKXp1Z0xXxU3BMv3eP9YGBoPFdTCr5MJcNv95KvsjZgy8WqrgffQ+mQi uDZSahhNs/w0orc3hvyrPRMOtPboqr2Qup3RnLI6MZ6K6SHLwuEOLipi1e9W6li5Do9s GO8ClcIWHjIG/iqoqI0VLEoGZq48RD8yIiLm1IvysUWfvqWe5e4NsOZHmhuxK7SL093F bfHw== X-Gm-Message-State: AOAM531UE3i3f856ymTdjed82Gb+AO1bmkp0Kh+SgFrzkz2P5OGfQco5 OtfhuUn7WJ0ezHkhw/LGUsHbPaLsacaxTwW3hrs1zw== X-Google-Smtp-Source: ABdhPJyc9fvpBx2zXVPF/JGgtN+qX9uTdF68t25f0qveOndhtTzMoQHZub/nZw0Pt/FTnJa8Mq9Nhou735ZXl/y5dcM= X-Received: by 2002:a17:903:41c2:b0:13f:f26:d6b9 with SMTP id u2-20020a17090341c200b0013f0f26d6b9mr23228927ple.14.1634001702062; Mon, 11 Oct 2021 18:21:42 -0700 (PDT) MIME-Version: 1.0 References: <20211008183256.1558105-1-almasrymina@google.com> In-Reply-To: From: Mina Almasry Date: Mon, 11 Oct 2021 18:21:31 -0700 Message-ID: Subject: Re: [PATCH v5 1/2] mm, hugepages: add mremap() support for hugepage backed vma To: Mike Kravetz Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ken Chen , Chris Kennelly , Michal Hocko , Vlastimil Babka , Kirill Shutemov Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 74EB5D042914 X-Stat-Signature: ma1qay5q9xra9thjwrss1c1dimuemjrp Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=dD4lXSjQ; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of almasrymina@google.com designates 209.85.216.41 as permitted sender) smtp.mailfrom=almasrymina@google.com X-HE-Tag: 1634001703-921874 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Oct 11, 2021 at 5:18 PM Mike Kravetz wrote: > > On 10/8/21 11:32 AM, Mina Almasry wrote: > > Support mremap() for hugepage backed vma segment by simply repositioning > > page table entries. The page table entries are repositioned to the new > > virtual address on mremap(). > > > > Hugetlb mremap() support is of course generic; my motivating use case > > is a library (hugepage_text), which reloads the ELF text of executables > > in hugepages. This significantly increases the execution performance of > > said executables. > > > > Restricts the mremap operation on hugepages to up to the size of the > > original mapping as the underlying hugetlb reservation is not yet > > capable of handling remapping to a larger size. > > > > During the mremap() operation we detect pmd_share'd mappings and we > > unshare those during the mremap(). On access and fault the sharing is > > established again. > > > > Signed-off-by: Mina Almasry > > > ... > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 6d2f4c25dd9fb..8200b4c8d09d8 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1015,6 +1015,35 @@ void reset_vma_resv_huge_pages(struct vm_area_struct *vma) > > vma->vm_private_data = (void *)0; > > } > > > > +/* > > + * Reset and decrement one ref on hugepage private reservation. > > + * Called with mm->mmap_sem writer semaphore held. > > + * This function should be only used by move_vma() and operate on > > + * same sized vma. It should never come here with last ref on the > > + * reservation. > > + */ > > +void clear_vma_resv_huge_pages(struct vm_area_struct *vma) > > +{ > > + /* > > + * Clear the old hugetlb private page reservation. > > + * It has already been transferred to new_vma. > > + * > > + * During a mremap() operation of a hugetlb vma we call move_vma() > > + * which copies *vma* into *new_vma* and unmaps *vma*. After the copy > > + * operation both *new_vma* and *vma* share a reference to the resv_map > > + * struct, and at that point *vma* is about to be unmapped. We don't > > + * want to return the reservation to the pool at unmap of *vma* because > > + * the reservation still lives on in new_vma, so simply decrement the > > + * ref here and remove the resv_map reference from this vma. > > + */ > > Are the *...* for special formatting of the words somewhere? Or, just > for simple added emphasis? This convention is not used anywhere else in > the file. Unless there is a good reason for doing so, I would prefer to > to drop the *...* convention here. > It was just emphasis, removed! > > + struct resv_map *reservations = vma_resv_map(vma); > > + > > + if (reservations && is_vma_resv_set(vma, HPAGE_RESV_OWNER)) > > + kref_put(&reservations->refs, resv_map_release); > > + > > + reset_vma_resv_huge_pages(vma); > > +} > ... > > diff --git a/mm/mremap.c b/mm/mremap.c > > index c0b6c41b7b78f..6a3f7d38b7539 100644 > > --- a/mm/mremap.c > > +++ b/mm/mremap.c > > @@ -489,6 +489,10 @@ unsigned long move_page_tables(struct vm_area_struct *vma, > > old_end = old_addr + len; > > flush_cache_range(vma, old_addr, old_end); > > > > + if (is_vm_hugetlb_page(vma)) > > + return move_hugetlb_page_tables(vma, new_vma, old_addr, > > + new_addr, len); > > + > > mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, > > old_addr, old_end); > > mmu_notifier_invalidate_range_start(&range); > > @@ -646,6 +650,10 @@ static unsigned long move_vma(struct vm_area_struct *vma, > > mremap_userfaultfd_prep(new_vma, uf); > > } > > > > + if (is_vm_hugetlb_page(vma)) { > > + clear_vma_resv_huge_pages(vma); > > + } > > + > > /* Conceal VM_ACCOUNT so old reservation is not undone */ > > if (vm_flags & VM_ACCOUNT && !(flags & MREMAP_DONTUNMAP)) { > > vma->vm_flags &= ~VM_ACCOUNT; > > @@ -739,9 +747,6 @@ static struct vm_area_struct *vma_to_resize(unsigned long addr, > > (vma->vm_flags & (VM_DONTEXPAND | VM_PFNMAP))) > > return ERR_PTR(-EINVAL); > > > > - if (is_vm_hugetlb_page(vma)) > > - return ERR_PTR(-EINVAL); > > - > > /* We can't remap across vm area boundaries */ > > if (old_len > vma->vm_end - addr) > > return ERR_PTR(-EFAULT); > > @@ -937,6 +942,27 @@ SYSCALL_DEFINE5(mremap, unsigned long, addr, unsigned long, old_len, > > > > if (mmap_write_lock_killable(current->mm)) > > return -EINTR; > > + vma = find_vma(mm, addr); > > + if (!vma || vma->vm_start > addr) { > > + ret = EFAULT; > > + goto out; > > + } > > + > > + if (is_vm_hugetlb_page(vma)) { > > + struct hstate *h __maybe_unused = hstate_vma(vma); > > + > > + old_len = ALIGN(old_len, huge_page_size(h)); > > + new_len = ALIGN(new_len, huge_page_size(h)); > > + addr = ALIGN(addr, huge_page_size(h)); > > + new_addr = ALIGN(new_addr, huge_page_size(h)); > > Instead of aligning addr and new_addr, we should be checking for huge > page alignment and returning error. This makes it consistent with the > requirement that they be PAGE aligned in the non-hugetlb case. Sorry if > that was unclear in previous comments. > > /* addrs must be huge page aligned */ > if (addr & ~huge_page_mask(h)) > goto out; > if (new_addr & ~huge_page_mask(h)) > goto out; > Sorry I misunderstood. Added!