From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18620C4332F for ; Thu, 3 Nov 2022 06:02:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A749A8E0001; Thu, 3 Nov 2022 02:02:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A233580007; Thu, 3 Nov 2022 02:02:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8C4218E0003; Thu, 3 Nov 2022 02:02:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 7EBBA8E0001 for ; Thu, 3 Nov 2022 02:02:01 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 4F33C12045A for ; Thu, 3 Nov 2022 06:02:01 +0000 (UTC) X-FDA: 80091085242.19.EE0AA1A Received: from mx0a-00082601.pphosted.com (mx0a-00082601.pphosted.com [67.231.145.42]) by imf17.hostedemail.com (Postfix) with ESMTP id CE05140007 for ; Thu, 3 Nov 2022 06:02:00 +0000 (UTC) Received: from pps.filterd (m0044012.ppops.net [127.0.0.1]) by mx0a-00082601.pphosted.com (8.17.1.5/8.17.1.5) with ESMTP id 2A31HI3e006038 for ; Wed, 2 Nov 2022 23:01:59 -0700 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=fb.com; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : content-type; s=facebook; bh=nOZNU2SvD6joKKCAopm3WTxnYc35bVVepgnU8IonlhI=; b=Yfw8uOLauxlL1OPS3RcjKtWuhV7F22eeNPKx4QYVyCIKXYtSa0aojgK35QnehLQGtCXB ze/1+3XsbLM4voFjaugp955HRrItbCGLW35H5m8KKhHFQtcsgguwgG5tdUt7batq+IP1 chBx1APd/mT29esnIXolyNz3fczf2JTBK1U= Received: from maileast.thefacebook.com ([163.114.130.16]) by mx0a-00082601.pphosted.com (PPS) with ESMTPS id 3km3ubhjtc-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128 verify=NOT) for ; Wed, 02 Nov 2022 23:01:59 -0700 Received: from twshared13940.35.frc1.facebook.com (2620:10d:c0a8:1b::d) by mail.thefacebook.com (2620:10d:c0a8:83::5) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256) id 15.1.2375.31; Wed, 2 Nov 2022 23:01:58 -0700 Received: by devvm6390.atn0.facebook.com (Postfix, from userid 352741) id 943BF5FD0418; Wed, 2 Nov 2022 23:01:49 -0700 (PDT) From: To: , CC: , , , , , Alexander Zhu Subject: [PATCH v6 3/5] mm: do not remap clean subpages when splitting isolated thp Date: Wed, 2 Nov 2022 23:01:45 -0700 Message-ID: X-Mailer: git-send-email 2.30.2 In-Reply-To: References: MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-FB-Internal: Safe Content-Type: text/plain X-Proofpoint-ORIG-GUID: WhsfE1vHxeToZqmIA6agSMch4BNkOeqK X-Proofpoint-GUID: WhsfE1vHxeToZqmIA6agSMch4BNkOeqK X-Proofpoint-Virus-Version: vendor=baseguard engine=ICAP:2.0.205,Aquarius:18.0.895,Hydra:6.0.545,FMLib:17.11.122.1 definitions=2022-11-02_15,2022-11-02_01,2022-06-22_01 ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1667455321; a=rsa-sha256; cv=none; b=G6bnt2CoRPI4TJzm5jUBQiPO7hBc7EjlqCW+eMEv2zl1C0qj8mhXhRG2ARuh5mPhbIxKKm 0hetmxV3U/OKD/b7ve5dmWccaC23eugmZ+O2w0pYvaANzgIwb9ZdJ0/TVleSJc0JD46Nbj F81EtjrdzhA3ZKf8x6H3OPk9k1NLbCE= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=Yfw8uOLa; dmarc=pass (policy=reject) header.from=fb.com; spf=pass (imf17.hostedemail.com: domain of "prvs=2306c4488a=alexlzhu@meta.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=2306c4488a=alexlzhu@meta.com" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1667455321; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nOZNU2SvD6joKKCAopm3WTxnYc35bVVepgnU8IonlhI=; b=GpKh2CxmDyerrFZp4kp6anCXC14fWWFWoc7lIyTYEnWxvUiqQY9IdWVm4kiI7QC96SJ1W7 /xRI4Oc6U6WYm6yI7GhJS10FJlt8QH2B+2zSwHuTab8KZ132Jchrsm06zhc6mFav2RbTAO jttSWoVgrEuKqPNFn2BQHnoLzO0JZsU= X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: CE05140007 X-Rspam-User: Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=fb.com header.s=facebook header.b=Yfw8uOLa; dmarc=pass (policy=reject) header.from=fb.com; spf=pass (imf17.hostedemail.com: domain of "prvs=2306c4488a=alexlzhu@meta.com" designates 67.231.145.42 as permitted sender) smtp.mailfrom="prvs=2306c4488a=alexlzhu@meta.com" X-Stat-Signature: kzmykj31kodyhm3r5ypmk5oxrfgjjmi7 X-HE-Tag: 1667455320-301082 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Alexander Zhu Changes to avoid remap on zero pages that are free'd in split_huge_page()= . Pages are not remapped except in the case of userfaultfd. In the case of userfaultfd we remap to the shared zero page, similar to what is done by KSM. Signed-off-by: Alexander Zhu --- include/linux/rmap.h | 2 +- include/linux/vm_event_item.h | 2 + mm/huge_memory.c | 8 ++-- mm/migrate.c | 73 +++++++++++++++++++++++++++++++---- mm/migrate_device.c | 4 +- mm/vmstat.c | 2 + 6 files changed, 77 insertions(+), 14 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bd3504d11b15..3f83bbcf1333 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -428,7 +428,7 @@ int folio_mkclean(struct folio *); int pfn_mkclean_range(unsigned long pfn, unsigned long nr_pages, pgoff_t= pgoff, struct vm_area_struct *vma); =20 -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lo= cked); +void remove_migration_ptes(struct folio *src, struct folio *dst, bool lo= cked, bool unmap_clean); =20 int page_mapped_in_vma(struct page *page, struct vm_area_struct *vma); =20 diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.= h index f733ffc5f6f3..3618b10ddec9 100644 --- a/include/linux/vm_event_item.h +++ b/include/linux/vm_event_item.h @@ -112,6 +112,8 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT= , THP_SPLIT_PUD, #endif THP_SPLIT_FREE, + THP_SPLIT_UNMAP, + THP_SPLIT_REMAP_READONLY_ZERO_PAGE, THP_ZERO_PAGE_ALLOC, THP_ZERO_PAGE_ALLOC_FAILED, THP_SWPOUT, diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 6a5c70080c07..cba0bbbb2a93 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -2373,7 +2373,7 @@ static void unmap_folio(struct folio *folio) try_to_unmap(folio, ttu_flags | TTU_IGNORE_MLOCK); } =20 -static void remap_page(struct folio *folio, unsigned long nr) +static void remap_page(struct folio *folio, unsigned long nr, bool unmap= _clean) { int i =3D 0; =20 @@ -2381,7 +2381,7 @@ static void remap_page(struct folio *folio, unsigne= d long nr) if (!folio_test_anon(folio)) return; for (;;) { - remove_migration_ptes(folio, folio, true); + remove_migration_ptes(folio, folio, true, unmap_clean); i +=3D folio_nr_pages(folio); if (i >=3D nr) break; @@ -2569,7 +2569,7 @@ static void __split_huge_page(struct page *page, st= ruct list_head *list, } local_irq_enable(); =20 - remap_page(folio, nr); + remap_page(folio, nr, PageAnon(head)); =20 if (PageSwapCache(head)) { swp_entry_t entry =3D { .val =3D page_private(head) }; @@ -2798,7 +2798,7 @@ int split_huge_page_to_list(struct page *page, stru= ct list_head *list) if (mapping) xas_unlock(&xas); local_irq_enable(); - remap_page(folio, folio_nr_pages(folio)); + remap_page(folio, folio_nr_pages(folio), false); ret =3D -EBUSY; } =20 diff --git a/mm/migrate.c b/mm/migrate.c index dff333593a8a..2764b14d3383 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -30,6 +30,7 @@ #include #include #include +#include #include #include #include @@ -168,13 +169,62 @@ void putback_movable_pages(struct list_head *l) } } =20 +static bool try_to_unmap_clean(struct page_vma_mapped_walk *pvmw, struct= page *page) +{ + void *addr; + bool dirty; + pte_t newpte; + + VM_BUG_ON_PAGE(PageCompound(page), page); + VM_BUG_ON_PAGE(!PageAnon(page), page); + VM_BUG_ON_PAGE(!PageLocked(page), page); + VM_BUG_ON_PAGE(pte_present(*pvmw->pte), page); + + if (PageMlocked(page) || (pvmw->vma->vm_flags & VM_LOCKED)) + return false; + + /* + * The pmd entry mapping the old thp was flushed and the pte mapping + * this subpage has been non present. Therefore, this subpage is + * inaccessible. We don't need to remap it if it contains only zeros. + */ + addr =3D kmap_local_page(page); + dirty =3D memchr_inv(addr, 0, PAGE_SIZE); + kunmap_local(addr); + + if (dirty) + return false; + + pte_clear_not_present_full(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, = false); + + if (userfaultfd_armed(pvmw->vma)) { + newpte =3D pte_mkspecial(pfn_pte(page_to_pfn(ZERO_PAGE(pvmw->address))= , + pvmw->vma->vm_page_prot)); + ptep_clear_flush(pvmw->vma, pvmw->address, pvmw->pte); + set_pte_at(pvmw->vma->vm_mm, pvmw->address, pvmw->pte, newpte); + dec_mm_counter(pvmw->vma->vm_mm, MM_ANONPAGES); + count_vm_event(THP_SPLIT_REMAP_READONLY_ZERO_PAGE); + return true; + } + + dec_mm_counter(pvmw->vma->vm_mm, mm_counter(page)); + count_vm_event(THP_SPLIT_UNMAP); + return true; +} + +struct rmap_walk_arg { + struct folio *folio; + bool unmap_clean; +}; + /* * Restore a potential migration pte to a working pte entry */ static bool remove_migration_pte(struct folio *folio, - struct vm_area_struct *vma, unsigned long addr, void *old) + struct vm_area_struct *vma, unsigned long addr, void *arg) { - DEFINE_FOLIO_VMA_WALK(pvmw, old, vma, addr, PVMW_SYNC | PVMW_MIGRATION)= ; + struct rmap_walk_arg *rmap_walk_arg =3D arg; + DEFINE_FOLIO_VMA_WALK(pvmw, rmap_walk_arg->folio, vma, addr, PVMW_SYNC = | PVMW_MIGRATION); =20 while (page_vma_mapped_walk(&pvmw)) { rmap_t rmap_flags =3D RMAP_NONE; @@ -197,6 +247,8 @@ static bool remove_migration_pte(struct folio *folio, continue; } #endif + if (rmap_walk_arg->unmap_clean && try_to_unmap_clean(&pvmw, new)) + continue; =20 folio_get(folio); pte =3D mk_pte(new, READ_ONCE(vma->vm_page_prot)); @@ -272,13 +324,20 @@ static bool remove_migration_pte(struct folio *foli= o, * Get rid of all migration entries and replace them by * references to the indicated page. */ -void remove_migration_ptes(struct folio *src, struct folio *dst, bool lo= cked) +void remove_migration_ptes(struct folio *src, struct folio *dst, bool lo= cked, bool unmap_clean) { + struct rmap_walk_arg rmap_walk_arg =3D { + .folio =3D src, + .unmap_clean =3D unmap_clean, + }; + struct rmap_walk_control rwc =3D { .rmap_one =3D remove_migration_pte, - .arg =3D src, + .arg =3D &rmap_walk_arg, }; =20 + VM_BUG_ON_FOLIO(unmap_clean && src !=3D dst, src); + if (locked) rmap_walk_locked(dst, &rwc); else @@ -872,7 +931,7 @@ static int writeout(struct address_space *mapping, st= ruct folio *folio) * At this point we know that the migration attempt cannot * be successful. */ - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, false, false); =20 rc =3D mapping->a_ops->writepage(&folio->page, &wbc); =20 @@ -1128,7 +1187,7 @@ static int __unmap_and_move(struct folio *src, stru= ct folio *dst, =20 if (page_was_mapped) remove_migration_ptes(src, - rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false); + rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false, false); =20 out_unlock_both: folio_unlock(dst); @@ -1338,7 +1397,7 @@ static int unmap_and_move_huge_page(new_page_t get_= new_page, =20 if (page_was_mapped) remove_migration_ptes(src, - rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false); + rc =3D=3D MIGRATEPAGE_SUCCESS ? dst : src, false, false); =20 unlock_put_anon: folio_unlock(dst); diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 6fa682eef7a0..6508a083d7fd 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -421,7 +421,7 @@ static unsigned long migrate_device_unmap(unsigned lo= ng *src_pfns, continue; =20 folio =3D page_folio(page); - remove_migration_ptes(folio, folio, false); + remove_migration_ptes(folio, folio, false, false); =20 src_pfns[i] =3D 0; folio_unlock(folio); @@ -847,7 +847,7 @@ void migrate_device_finalize(unsigned long *src_pfns, =20 src =3D page_folio(page); dst =3D page_folio(newpage); - remove_migration_ptes(src, dst, false); + remove_migration_ptes(src, dst, false, false); folio_unlock(src); =20 if (is_zone_device_page(page)) diff --git a/mm/vmstat.c b/mm/vmstat.c index a2ba5d7922f4..3d802eb6754d 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -1360,6 +1360,8 @@ const char * const vmstat_text[] =3D { "thp_split_pud", #endif "thp_split_free", + "thp_split_unmap", + "thp_split_remap_readonly_zero_page", "thp_zero_page_alloc", "thp_zero_page_alloc_failed", "thp_swpout", --=20 2.30.2