From: "Zhang, Wei" <wzam@amazon.com>
To: Peter Xu <peterx@redhat.com>,
"linux-mm@kvack.org" <linux-mm@kvack.org>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Cc: Kirill Shutemov <kirill@shutemov.name>,
Mike Rapoport <rppt@linux.vnet.ibm.com>,
Matthew Wilcox <willy@infradead.org>,
Miaohe Lin <linmiaohe@huawei.com>,
Andrea Arcangeli <aarcange@redhat.com>,
"Pressman, Gal" <galpress@amazon.com>, Jan Kara <jack@suse.cz>,
Jann Horn <jannh@google.com>,
Andrew Morton <akpm@linux-foundation.org>,
Kirill Tkhai <ktkhai@virtuozzo.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
"Mike Kravetz" <mike.kravetz@oracle.com>,
Jason Gunthorpe <jgg@ziepe.ca>,
"David Gibson" <david@gibson.dropbear.id.au>,
Christoph Hellwig <hch@lst.de>
Subject: Re: [PATCH v2 4/4] hugetlb: Do early cow when page pinned on src mm
Date: Fri, 5 Feb 2021 14:58:33 +0000 [thread overview]
Message-ID: <329ADC08-552E-423B-9230-99643B81C14A@amazon.com> (raw)
In-Reply-To: <20210204145033.136755-5-peterx@redhat.com>
Hi Peter,
Gal and I worked together. We tested the patch v2 and can confirm it is working as intended.
Thank you very much for your quick response!
Sincerely,
Wei Zhang
On 2/4/21, 6:51 AM, "Peter Xu" <peterx@redhat.com> wrote:
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you can confirm the sender and know the content is safe.
This is the last missing piece of the COW-during-fork effort when there're
pinned pages found. One can reference 70e806e4e645 ("mm: Do early cow for
pinned pages during fork() for ptes", 2020-09-27) for more information, since
we do similar things here rather than pte this time, but just for hugetlb.
Signed-off-by: Peter Xu <peterx@redhat.com>
---
mm/hugetlb.c | 61 +++++++++++++++++++++++++++++++++++++++++++++++-----
1 file changed, 56 insertions(+), 5 deletions(-)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 9e6ea96bf33b..5793936e00ef 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -3734,11 +3734,27 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte)
return false;
}
+static void
+hugetlb_copy_page(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr,
+ struct page *old_page, struct page *new_page)
+{
+ struct hstate *h = hstate_vma(vma);
+ unsigned int psize = pages_per_huge_page(h);
+
+ copy_user_huge_page(new_page, old_page, addr, vma, psize);
+ __SetPageUptodate(new_page);
+ ClearPagePrivate(new_page);
+ set_page_huge_active(new_page);
+ set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1));
+ hugepage_add_new_anon_rmap(new_page, vma, addr);
+ hugetlb_count_add(psize, vma->vm_mm);
+}
+
int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
struct vm_area_struct *vma)
{
pte_t *src_pte, *dst_pte, entry, dst_entry;
- struct page *ptepage;
+ struct page *ptepage, *prealloc = NULL;
unsigned long addr;
int cow;
struct hstate *h = hstate_vma(vma);
@@ -3787,7 +3803,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
dst_entry = huge_ptep_get(dst_pte);
if ((dst_pte == src_pte) || !huge_pte_none(dst_entry))
continue;
-
+again:
dst_ptl = huge_pte_lock(h, dst, dst_pte);
src_ptl = huge_pte_lockptr(h, src, src_pte);
spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING);
@@ -3816,6 +3832,39 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
}
set_huge_swap_pte_at(dst, addr, dst_pte, entry, sz);
} else {
+ entry = huge_ptep_get(src_pte);
+ ptepage = pte_page(entry);
+ get_page(ptepage);
+
+ /*
+ * This is a rare case where we see pinned hugetlb
+ * pages while they're prone to COW. We need to do the
+ * COW earlier during fork.
+ *
+ * When pre-allocating the page we need to be without
+ * all the locks since we could sleep when allocate.
+ */
+ if (unlikely(page_needs_cow_for_dma(vma, ptepage))) {
+ if (!prealloc) {
+ put_page(ptepage);
+ spin_unlock(src_ptl);
+ spin_unlock(dst_ptl);
+ prealloc = alloc_huge_page(vma, addr, 1);
+ if (!prealloc) {
+ ret = -ENOMEM;
+ break;
+ }
+ goto again;
+ }
+ hugetlb_copy_page(vma, dst_pte, addr, ptepage,
+ prealloc);
+ put_page(ptepage);
+ spin_unlock(src_ptl);
+ spin_unlock(dst_ptl);
+ prealloc = NULL;
+ continue;
+ }
+
if (cow) {
/*
* No need to notify as we are downgrading page
@@ -3826,9 +3875,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
*/
huge_ptep_set_wrprotect(src, addr, src_pte);
}
- entry = huge_ptep_get(src_pte);
- ptepage = pte_page(entry);
- get_page(ptepage);
+
page_dup_rmap(ptepage, true);
set_huge_pte_at(dst, addr, dst_pte, entry);
hugetlb_count_add(pages_per_huge_page(h), dst);
@@ -3842,6 +3889,10 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src,
else
i_mmap_unlock_read(mapping);
+ /* Free the preallocated page if not used at last */
+ if (prealloc)
+ put_page(prealloc);
+
return ret;
}
--
2.26.2
next prev parent reply other threads:[~2021-02-05 14:58 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-04 14:50 [PATCH v2 0/4] mm/hugetlb: Early cow on fork, and a few cleanups Peter Xu
2021-02-04 14:50 ` [PATCH v2 1/4] hugetlb: Dedup the code to add a new file_region Peter Xu
2021-02-04 14:50 ` [PATCH v2 2/4] hugetlg: Break earlier in add_reservation_in_range() when we can Peter Xu
2021-02-04 14:50 ` [PATCH v2 3/4] mm: Introduce page_needs_cow_for_dma() for deciding whether cow Peter Xu
2021-02-04 17:54 ` Linus Torvalds
2021-02-04 19:25 ` Peter Xu
2021-02-04 23:20 ` Jason Gunthorpe
2021-02-05 0:50 ` Peter Xu
2021-02-04 14:50 ` [PATCH v2 4/4] hugetlb: Do early cow when page pinned on src mm Peter Xu
2021-02-04 23:25 ` Mike Kravetz
2021-02-05 1:43 ` Peter Xu
2021-02-05 5:11 ` Mike Kravetz
2021-02-05 16:05 ` Peter Xu
2021-02-05 14:58 ` Zhang, Wei [this message]
2021-02-05 15:51 ` Peter Xu
2021-02-07 9:09 ` Gal Pressman
2021-02-07 15:31 ` Peter Xu
2021-02-04 20:20 ` [PATCH v2 5/4] mm: Use is_cow_mapping() across tree where proper Peter Xu
2021-02-04 20:26 ` Linus Torvalds
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=329ADC08-552E-423B-9230-99643B81C14A@amazon.com \
--to=wzam@amazon.com \
--cc=aarcange@redhat.com \
--cc=akpm@linux-foundation.org \
--cc=david@gibson.dropbear.id.au \
--cc=galpress@amazon.com \
--cc=hch@lst.de \
--cc=jack@suse.cz \
--cc=jannh@google.com \
--cc=jgg@ziepe.ca \
--cc=kirill@shutemov.name \
--cc=ktkhai@virtuozzo.com \
--cc=linmiaohe@huawei.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=mike.kravetz@oracle.com \
--cc=peterx@redhat.com \
--cc=rppt@linux.vnet.ibm.com \
--cc=torvalds@linux-foundation.org \
--cc=willy@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox