From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA7F9C433F5 for ; Wed, 13 Oct 2021 19:56:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4412361168 for ; Wed, 13 Oct 2021 19:56:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 4412361168 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id CEB05900002; Wed, 13 Oct 2021 15:56:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CCCF06B0071; Wed, 13 Oct 2021 15:56:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3CCB900002; Wed, 13 Oct 2021 15:56:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0081.hostedemail.com [216.40.44.81]) by kanga.kvack.org (Postfix) with ESMTP id 9F7DA6B006C for ; Wed, 13 Oct 2021 15:56:04 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5964C3017F for ; Wed, 13 Oct 2021 19:56:04 +0000 (UTC) X-FDA: 78692470248.26.BBA6E60 Received: from mail-pf1-f179.google.com (mail-pf1-f179.google.com [209.85.210.179]) by imf05.hostedemail.com (Postfix) with ESMTP id 5085C507E73D for ; Wed, 13 Oct 2021 19:56:03 +0000 (UTC) Received: by mail-pf1-f179.google.com with SMTP id m26so3442400pff.3 for ; Wed, 13 Oct 2021 12:56:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=ccxD0qNeOeY4gSB3C/LzDiuYmAlSnm9Q2TA3WA13E9w=; b=qM0RicvFlcjd/JqyQ4Fl4lRFJukkBXbr+Aq8HeDdZbfs3mKJcIZZb/l6TdSgrLoBaU DNnBrDOniYEdWR3/MzuXJ4f6z1mqYY8xJP4P5ilNeco+jrmZWZlLrdj4uUHM6AlLC3MP gcIsdIXCkwj361i5DmWDXtkWkVc1kaQJwIYMZKJoXNVimpATKM4ayXQbAcSxu5bJQMDR h22YqTvGKtveTnqooArkkz3C+THoKRUvof+LeX16d81YQiwZhO0UdA5zDXkRgV4FsDst pn3JdgnzmXG2xEICHdOtxQkkNvntdMkvR5J35hVXeBBWhJEB3urP6Cni/3G6mIvbtLB7 SFkg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=ccxD0qNeOeY4gSB3C/LzDiuYmAlSnm9Q2TA3WA13E9w=; b=gW8d9biSrWsZTlOcV83iJ7h0hDQX8JapRsrcj2PRpqU33dkJq6YV+BgHBn1c6+GsEz QmgTFSEyRz6/jh5+pa33aifwjje3tkhJsp6fvNkch9UcsOgO2lgJtOifgJ+pCGxB6Zn2 00I6ez3m2ZBIAPmsJ2GvhyhxDsMxQEShGdKdVtCKYn4ye8h9hpk4EYHF0iCEkgh4Wx/3 ZxPICa0PIAougLcXf5/2tEX1EJu1IRdMvmbIP+yZ49pYupKOb4jIa18JYfCd2KPYgCAg Yg66/AG6lPtReFSkGl+rnlBV/mWCJpE9XvPilo6gYFbLjdRJ0F9lVtQN0xjeS0yWolM+ UR7Q== X-Gm-Message-State: AOAM5301CK6jBTofA2i2CnSfTdZchkWaYrcY7FNhLagSe+j2r9/2jSru 57YZi2t16JOZ+azHz9g+cEAifecu97eYlGlUz1zVdA== X-Google-Smtp-Source: ABdhPJxIFc59FWmoIzTWkBTZjqu8jdRRD0wzLpnVVkFhAb7L9rfwI7WfzpKPOaf/MlZBck58Junx1yrqulRT9YqKXq4= X-Received: by 2002:a05:6a00:214d:b0:44d:35e9:4ce2 with SMTP id o13-20020a056a00214d00b0044d35e94ce2mr1326528pfk.13.1634154962252; Wed, 13 Oct 2021 12:56:02 -0700 (PDT) MIME-Version: 1.0 References: <20211012011728.646120-1-almasrymina@google.com> <8ade8697-3f03-c2d4-f7a9-e64cb600ad33@oracle.com> In-Reply-To: <8ade8697-3f03-c2d4-f7a9-e64cb600ad33@oracle.com> From: Mina Almasry Date: Wed, 13 Oct 2021 12:55:51 -0700 Message-ID: Subject: Re: [PATCH v6 1/2] mm, hugepages: add mremap() support for hugepage backed vma To: Mike Kravetz Cc: Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Ken Chen , Chris Kennelly , Michal Hocko , Vlastimil Babka , Kirill Shutemov Content-Type: text/plain; charset="UTF-8" X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 5085C507E73D X-Stat-Signature: hkhcdd6ydi39fgx8wpcpkwg9yqtwakfw Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qM0RicvF; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of almasrymina@google.com designates 209.85.210.179 as permitted sender) smtp.mailfrom=almasrymina@google.com X-HE-Tag: 1634154963-902996 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Tue, Oct 12, 2021 at 4:51 PM Mike Kravetz wrote: > > On 10/11/21 6:17 PM, Mina Almasry wrote: > > Support mremap() for hugepage backed vma segment by simply repositioning > > page table entries. The page table entries are repositioned to the new > > virtual address on mremap(). > > > > Hugetlb mremap() support is of course generic; my motivating use case > > is a library (hugepage_text), which reloads the ELF text of executables > > in hugepages. This significantly increases the execution performance of > > said executables. > > > > Restricts the mremap operation on hugepages to up to the size of the > > original mapping as the underlying hugetlb reservation is not yet > > capable of handling remapping to a larger size. > > > > During the mremap() operation we detect pmd_share'd mappings and we > > unshare those during the mremap(). On access and fault the sharing is > > established again. > > > > Signed-off-by: Mina Almasry > > Thanks! > > Just some minor nits below. If you agree with the suggestions and make > the changes, you can add: > > Reviewed-by: Mike Kravetz > Thank you as always for your review and patience. I've applied the changes and uploaded v7 with the reviewed-by. Any chance I can get an Ack at least on the associated test patch? Or should I send these somewhere else for review? > > diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h > > index ebaba02706c87..c6b70f1ede6bf 100644 > > --- a/include/linux/hugetlb.h > > +++ b/include/linux/hugetlb.h > > @@ -124,6 +124,7 @@ struct hugepage_subpool *hugepage_new_subpool(struct hstate *h, long max_hpages, > > void hugepage_put_subpool(struct hugepage_subpool *spool); > > > > void reset_vma_resv_huge_pages(struct vm_area_struct *vma); > > +void clear_vma_resv_huge_pages(struct vm_area_struct *vma); > > int hugetlb_sysctl_handler(struct ctl_table *, int, void *, size_t *, loff_t *); > > int hugetlb_overcommit_handler(struct ctl_table *, int, void *, size_t *, > > loff_t *); > > @@ -132,6 +133,10 @@ int hugetlb_treat_movable_handler(struct ctl_table *, int, void *, size_t *, > > int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int, void *, size_t *, > > loff_t *); > > > > +int move_hugetlb_page_tables(struct vm_area_struct *vma, > > + struct vm_area_struct *new_vma, > > + unsigned long old_addr, unsigned long new_addr, > > + unsigned long len); > > int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *); > > long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, > > struct page **, struct vm_area_struct **, > > @@ -215,6 +220,10 @@ static inline void reset_vma_resv_huge_pages(struct vm_area_struct *vma) > > { > > } > > > > +static inline void clear_vma_resv_huge_pages(struct vm_area_struct *vma) > > +{ > > +} > > + > > static inline unsigned long hugetlb_total_pages(void) > > { > > return 0; > > @@ -262,6 +271,12 @@ static inline int copy_hugetlb_page_range(struct mm_struct *dst, > > return 0; > > } > > > > +#define move_hugetlb_page_tables(vma, new_vma, old_addr, new_addr, len) \ > > + ({ \ > > + BUG(); \ > > + 0; \ > > + }) > > + > > Any reason why you did not make this a static inline? Trying to save > code in the !CONFIG_HUGETLB case? macros seem to end up causing more > issues down the line. I would suggest making this a static inline > unless there is a good reason to keep as a macro. > No good reason. I converted to static inline. > > static inline void hugetlb_report_meminfo(struct seq_file *m) > > { > > } > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index 6d2f4c25dd9fb..6e91cd3905e73 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > ... > > +int move_hugetlb_page_tables(struct vm_area_struct *vma, > > + struct vm_area_struct *new_vma, > > + unsigned long old_addr, unsigned long new_addr, > > + unsigned long len) > > +{ > > + struct hstate *h = hstate_vma(vma); > > + struct address_space *mapping = vma->vm_file->f_mapping; > > + unsigned long sz = huge_page_size(h); > > + struct mm_struct *mm = vma->vm_mm; > > + unsigned long old_end = old_addr + len; > > + unsigned long old_addr_copy; > > + pte_t *src_pte, *dst_pte; > > + struct mmu_notifier_range range; > > + > > + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr, > > + old_end); > > + adjust_range_if_pmd_sharing_possible(vma, &range.start, &range.end); > > + mmu_notifier_invalidate_range_start(&range); > > + /* Prevent race with file truncation */ > > + i_mmap_lock_write(mapping); > > + for (; old_addr < old_end; old_addr += sz, new_addr += sz) { > > + src_pte = huge_pte_offset(mm, old_addr, sz); > > + if (!src_pte) > > + continue; > > + if (huge_pte_none(huge_ptep_get(src_pte))) > > + continue; > > + > > + /* old_addr arg to huge_pmd_unshare() is a pointer and so the > > + * arg may be modified. Pass a copy instead to preserve the > > + * value in old_arg. > > value in old_addr. > Fixed. > > + */ > > + old_addr_copy = old_addr; > > + > > + if (huge_pmd_unshare(mm, vma, &old_addr_copy, src_pte)) > > + continue; > > + > > + dst_pte = huge_pte_alloc(mm, new_vma, new_addr, sz); > > + if (!dst_pte) > > + break; > > + > > + move_huge_pte(vma, old_addr, new_addr, src_pte); > > + } > > + i_mmap_unlock_write(mapping); > > + flush_tlb_range(vma, old_end - len, old_end); > > + mmu_notifier_invalidate_range_end(&range); > > + > > + return len + old_addr - old_end; > > +} > > + > > static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, > > unsigned long start, unsigned long end, > > struct page *ref_page) > > @@ -6280,7 +6385,8 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, > > return saddr; > > } > > > > -static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) > > +static bool hugetlb_vma_shareable(struct vm_area_struct *vma, > > + unsigned long addr) > > { > > unsigned long base = addr & PUD_MASK; > > unsigned long end = base + PUD_SIZE; > > @@ -6299,7 +6405,7 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) > > if (uffd_disable_huge_pmd_share(vma)) > > return false; > > #endif > > - return vma_shareable(vma, addr); > > + return hugetlb_vma_shareable(vma, addr); > > } > > > > /* > > In an earlier version of the patch, vma_shareable was renamed > hugetlb_vma_shareable because it was going to be used outside hugetlb.c. > Therefore the hugetlb_* name would provide some context. That is no > longer the case. So, there really is no need to change the name. In > fact, none of the remap code even calls this routine. Sugggest you > drop the name change. Fixed. > -- > Mike Kravetz