From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-12.3 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 629FDC2D0A8 for ; Mon, 28 Sep 2020 17:55:32 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id D09E5208FE for ; Mon, 28 Sep 2020 17:55:31 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=sent.com header.i=@sent.com header.b="SEbbQmed"; dkim=temperror (0-bit key) header.d=messagingengine.com header.i=@messagingengine.com header.b="tYJqjIrU" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D09E5208FE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=sent.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B44F36B0068; Mon, 28 Sep 2020 13:55:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AA52D900006; Mon, 28 Sep 2020 13:55:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7D197900003; Mon, 28 Sep 2020 13:55:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0065.hostedemail.com [216.40.44.65]) by kanga.kvack.org (Postfix) with ESMTP id 3A939900007 for ; Mon, 28 Sep 2020 13:55:24 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id EF1224410 for ; Mon, 28 Sep 2020 17:55:23 +0000 (UTC) X-FDA: 77313222126.28.sink46_450292827183 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 7D8686C0B for ; Mon, 28 Sep 2020 17:55:23 +0000 (UTC) X-HE-Tag: sink46_450292827183 X-Filterd-Recvd-Size: 24015 Received: from wnew3-smtp.messagingengine.com (wnew3-smtp.messagingengine.com [64.147.123.17]) by imf16.hostedemail.com (Postfix) with ESMTP for ; Mon, 28 Sep 2020 17:55:22 +0000 (UTC) Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailnew.west.internal (Postfix) with ESMTP id 657D2E1B; Mon, 28 Sep 2020 13:55:20 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Mon, 28 Sep 2020 13:55:21 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sent.com; h=from :to:cc:subject:date:message-id:in-reply-to:references:reply-to :mime-version:content-transfer-encoding; s=fm1; bh=lIEkADkjiRgN2 eBw4b52cvu6kArTIQoE8NgK2aLJwLE=; b=SEbbQmedG4QyLRc+5zepJmelkZpLP FcKJ2Qw1JpE2tZewH3B6iOpnoy8mHcTdd14lMDTzWV0LtFGQ2kEx0otfFIjNJ0NU D8n02ApfgmYinn30FLg3s9FWX0Ef3f6mKWggYYZoakDCNVVPVUUQPRno4kRm2GnE CX9IsQ+EVCBdosacSp+TKGJdzOWCRwJbFMUkrc0eGwYKhwXjMk4/s5HpR7FAIila l0nnGY3mb/F2kkD6TCgPK+wU35Ctrjfc+dOQ0VgSKjaYBgLzLtQNz9v1LGfdWiKi p4b99jKfpv65BNNyICb+aS/TcCteLt+SlYIUs/u9r/64FgEi8FvVqU9IA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:date:from :in-reply-to:message-id:mime-version:references:reply-to:subject :to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm3; bh=lIEkADkjiRgN2eBw4b52cvu6kArTIQoE8NgK2aLJwLE=; b=tYJqjIrU c+YHhYc2ZrOfx+ysmnmvB+TkNAQhFHvUJjPyQY+1objcdKbhrKDhcIkBnnKme41B gvXe/f6Zn2N26YlYRozSmX/veC2UlEV9miKYVts+VC53Hz6iL9toaRWTyH5puZZa uKsVPjqJxB51oWurD64qbYgx6sz5hxne7+jnexFzDgTdY+/qIqHO5QapF6NYy4km vcXpaWhU/YY/kGka/1OiPCrrn4eM1iAEbBJgrADlGTsaKnOto0cC9JpmZZXLoynk dS2ewb2WroLyJlPs2dWxFunAnEa2vomga7E9aJPK/96dy75fdo3YoVlOhl4WwnhG e0hPWqUonHCdqA== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedujedrvdeigdeliecutefuodetggdotefrodftvf curfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfghnecu uegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmdenuc fjughrpefhvffufffkofgjfhhrggfgsedtkeertdertddtnecuhfhrohhmpegkihcujggr nhcuoeiiihdrhigrnhesshgvnhhtrdgtohhmqeenucggtffrrghtthgvrhhnpeduhfffve ektdduhfdutdfgtdekkedvhfetuedufedtgffgvdevleehheevjefgtdenucfkphepuddv rdegiedruddtiedrudeigeenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmh grihhlfhhrohhmpeiiihdrhigrnhesshgvnhhtrdgtohhm X-ME-Proxy: Received: from nvrsysarch6.NVidia.COM (unknown [12.46.106.164]) by mail.messagingengine.com (Postfix) with ESMTPA id 951493064682; Mon, 28 Sep 2020 13:55:18 -0400 (EDT) From: Zi Yan To: linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Roman Gushchin , Rik van Riel , Matthew Wilcox , Shakeel Butt , Yang Shi , Jason Gunthorpe , Mike Kravetz , Michal Hocko , David Hildenbrand , William Kucharski , Andrea Arcangeli , John Hubbard , David Nellans , linux-kernel@vger.kernel.org, Zi Yan Subject: [RFC PATCH v2 02/30] mm: pagewalk: use READ_ONCE when reading the PMD entry unlocked Date: Mon, 28 Sep 2020 13:54:00 -0400 Message-Id: <20200928175428.4110504-3-zi.yan@sent.com> X-Mailer: git-send-email 2.28.0 In-Reply-To: <20200928175428.4110504-1-zi.yan@sent.com> References: <20200928175428.4110504-1-zi.yan@sent.com> Reply-To: Zi Yan MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Zi Yan The pagewalker runs while only holding the mmap_sem for read. The pud can be set asynchronously, while also holding the mmap_sem for read. This follows the same way as the commit: mm/pagewalk: use READ_ONCE when reading the PUD entry unlocked" Signed-off-by: Zi Yan --- fs/proc/task_mmu.c | 69 ++++++++++++++++++++++++++-------------- include/linux/pagewalk.h | 2 +- mm/madvise.c | 59 ++++++++++++++++++---------------- mm/memcontrol.c | 30 +++++++++++------ mm/mempolicy.c | 15 ++++++--- mm/mincore.c | 10 +++--- mm/pagewalk.c | 21 ++++++------ 7 files changed, 124 insertions(+), 82 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 069978777423..a21484b1414d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -570,28 +570,33 @@ static void smaps_pmd_entry(pmd_t *pmd, unsigned lo= ng addr, } #endif =20 -static int smaps_pte_range(pmd_t *pmd, unsigned long addr, unsigned long= end, - struct mm_walk *walk) +static int smaps_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, + unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma =3D walk->vma; pte_t *pte; spinlock_t *ptl; =20 - ptl =3D pmd_trans_huge_lock(pmd, vma); + ptl =3D pmd_trans_huge_lock(pmdp, vma); if (ptl) { - smaps_pmd_entry(pmd, addr, walk); + if (memcmp(pmdp, &pmd, sizeof(pmd)) !=3D 0) { + walk->action =3D ACTION_AGAIN; + spin_unlock(ptl); + return 0; + } + smaps_pmd_entry(pmdp, addr, walk); spin_unlock(ptl); goto out; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(&pmd)) goto out; /* * The mmap_lock held all the way back in m_start() is what * keeps khugepaged out of here and from collapsing things * in here. */ - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + pte =3D pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) smaps_pte_entry(pte, addr, walk); pte_unmap_unlock(pte - 1, ptl); @@ -1091,7 +1096,7 @@ static inline void clear_soft_dirty_pmd(struct vm_a= rea_struct *vma, } #endif =20 -static int clear_refs_pte_range(pmd_t *pmd, unsigned long addr, +static int clear_refs_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long ad= dr, unsigned long end, struct mm_walk *walk) { struct clear_refs_private *cp =3D walk->private; @@ -1100,20 +1105,25 @@ static int clear_refs_pte_range(pmd_t *pmd, unsig= ned long addr, spinlock_t *ptl; struct page *page; =20 - ptl =3D pmd_trans_huge_lock(pmd, vma); + ptl =3D pmd_trans_huge_lock(pmdp, vma); if (ptl) { + if (memcmp(pmdp, &pmd, sizeof(pmd)) !=3D 0) { + walk->action =3D ACTION_AGAIN; + spin_unlock(ptl); + return 0; + } if (cp->type =3D=3D CLEAR_REFS_SOFT_DIRTY) { - clear_soft_dirty_pmd(vma, addr, pmd); + clear_soft_dirty_pmd(vma, addr, pmdp); goto out; } =20 - if (!pmd_present(*pmd)) + if (!pmd_present(pmd)) goto out; =20 - page =3D pmd_page(*pmd); + page =3D pmd_page(pmd); =20 /* Clear accessed and referenced bits. */ - pmdp_test_and_clear_young(vma, addr, pmd); + pmdp_test_and_clear_young(vma, addr, pmdp); test_and_clear_page_young(page); ClearPageReferenced(page); out: @@ -1121,10 +1131,10 @@ static int clear_refs_pte_range(pmd_t *pmd, unsig= ned long addr, return 0; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(&pmd)) return 0; =20 - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + pte =3D pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { ptent =3D *pte; =20 @@ -1388,8 +1398,8 @@ static pagemap_entry_t pte_to_pagemap_entry(struct = pagemapread *pm, return make_pme(frame, flags); } =20 -static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned l= ong end, - struct mm_walk *walk) +static int pagemap_pmd_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, + unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma =3D walk->vma; struct pagemapread *pm =3D walk->private; @@ -1401,9 +1411,14 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned= long addr, unsigned long end, ptl =3D pmd_trans_huge_lock(pmdp, vma); if (ptl) { u64 flags =3D 0, frame =3D 0; - pmd_t pmd =3D *pmdp; struct page *page =3D NULL; =20 + if (memcmp(pmdp, &pmd, sizeof(pmd)) !=3D 0) { + walk->action =3D ACTION_AGAIN; + spin_unlock(ptl); + return 0; + } + if (vma->vm_flags & VM_SOFTDIRTY) flags |=3D PM_SOFT_DIRTY; =20 @@ -1456,7 +1471,7 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned = long addr, unsigned long end, return err; } =20 - if (pmd_trans_unstable(pmdp)) + if (pmd_trans_unstable(&pmd)) return 0; #endif /* CONFIG_TRANSPARENT_HUGEPAGE */ =20 @@ -1768,7 +1783,7 @@ static struct page *can_gather_numa_stats_pmd(pmd_t= pmd, } #endif =20 -static int gather_pte_stats(pmd_t *pmd, unsigned long addr, +static int gather_pte_stats(pmd_t pmd, pmd_t *pmdp, unsigned long addr, unsigned long end, struct mm_walk *walk) { struct numa_maps *md =3D walk->private; @@ -1778,22 +1793,28 @@ static int gather_pte_stats(pmd_t *pmd, unsigned = long addr, pte_t *pte; =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE - ptl =3D pmd_trans_huge_lock(pmd, vma); + ptl =3D pmd_trans_huge_lock(pmdp, vma); if (ptl) { struct page *page; =20 - page =3D can_gather_numa_stats_pmd(*pmd, vma, addr); + if (memcmp(pmdp, &pmd, sizeof(pmd)) !=3D 0) { + walk->action =3D ACTION_AGAIN; + spin_unlock(ptl); + return 0; + } + + page =3D can_gather_numa_stats_pmd(pmd, vma, addr); if (page) - gather_stats(page, md, pmd_dirty(*pmd), + gather_stats(page, md, pmd_dirty(pmd), HPAGE_PMD_SIZE/PAGE_SIZE); spin_unlock(ptl); return 0; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(&pmd)) return 0; #endif - orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + orig_pte =3D pte =3D pte_offset_map_lock(walk->mm, pmdp, addr, &ptl); do { struct page *page =3D can_gather_numa_stats(*pte, vma, addr); if (!page) diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 6caf28aadafb..686b57e94a9f 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -41,7 +41,7 @@ struct mm_walk_ops { unsigned long next, struct mm_walk *walk); int (*pud_entry)(pud_t pud, pud_t *pudp, unsigned long addr, unsigned long next, struct mm_walk *walk); - int (*pmd_entry)(pmd_t *pmd, unsigned long addr, + int (*pmd_entry)(pmd_t pmd, pmd_t *pmdp, unsigned long addr, unsigned long next, struct mm_walk *walk); int (*pte_entry)(pte_t *pte, unsigned long addr, unsigned long next, struct mm_walk *walk); diff --git a/mm/madvise.c b/mm/madvise.c index ae266dfede8a..16e7b8eadb13 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -183,14 +183,14 @@ static long madvise_behavior(struct vm_area_struct = *vma, } =20 #ifdef CONFIG_SWAP -static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, +static int swapin_walk_pmd_entry(pmd_t pmd, pmd_t *pmdp, unsigned long s= tart, unsigned long end, struct mm_walk *walk) { pte_t *orig_pte; struct vm_area_struct *vma =3D walk->private; unsigned long index; =20 - if (pmd_none_or_trans_huge_or_clear_bad(pmd)) + if (pmd_none_or_trans_huge_or_clear_bad(&pmd)) return 0; =20 for (index =3D start; index !=3D end; index +=3D PAGE_SIZE) { @@ -199,7 +199,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned= long start, struct page *page; spinlock_t *ptl; =20 - orig_pte =3D pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl); + orig_pte =3D pte_offset_map_lock(vma->vm_mm, pmdp, start, &ptl); pte =3D *(orig_pte + ((index - start) / PAGE_SIZE)); pte_unmap_unlock(orig_pte, ptl); =20 @@ -304,7 +304,7 @@ static long madvise_willneed(struct vm_area_struct *v= ma, return 0; } =20 -static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, +static int madvise_cold_or_pageout_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -322,26 +322,29 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, return -EINTR; =20 #ifdef CONFIG_TRANSPARENT_HUGEPAGE - if (pmd_trans_huge(*pmd)) { - pmd_t orig_pmd; + if (pmd_trans_huge(pmd)) { unsigned long next =3D pmd_addr_end(addr, end); =20 tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - ptl =3D pmd_trans_huge_lock(pmd, vma); + ptl =3D pmd_trans_huge_lock(pmdp, vma); if (!ptl) return 0; =20 - orig_pmd =3D *pmd; - if (is_huge_zero_pmd(orig_pmd)) + if (memcmp(pmdp, &pmd, sizeof(pmd)) !=3D 0) { + walk->action =3D ACTION_AGAIN; + goto huge_unlock; + } + + if (is_huge_zero_pmd(pmd)) goto huge_unlock; =20 - if (unlikely(!pmd_present(orig_pmd))) { + if (unlikely(!pmd_present(pmd))) { VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); + !is_pmd_migration_entry(pmd)); goto huge_unlock; } =20 - page =3D pmd_page(orig_pmd); + page =3D pmd_page(pmd); =20 /* Do not interfere with other mappings of this page */ if (page_mapcount(page) !=3D 1) @@ -361,12 +364,12 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, return 0; } =20 - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd =3D pmd_mkold(orig_pmd); + if (pmd_young(pmd)) { + pmdp_invalidate(vma, addr, pmdp); + pmd =3D pmd_mkold(pmd); =20 - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + set_pmd_at(mm, addr, pmdp, pmd); + tlb_remove_pmd_tlb_entry(tlb, pmdp, addr); } =20 ClearPageReferenced(page); @@ -388,11 +391,11 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, } =20 regular_page: - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(&pmd)) return 0; #endif tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + orig_pte =3D pte =3D pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl); flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); for (; addr < end; pte++, addr +=3D PAGE_SIZE) { @@ -424,12 +427,12 @@ static int madvise_cold_or_pageout_pte_range(pmd_t = *pmd, if (split_huge_page(page)) { unlock_page(page); put_page(page); - pte_offset_map_lock(mm, pmd, addr, &ptl); + pte_offset_map_lock(mm, pmdp, addr, &ptl); break; } unlock_page(page); put_page(page); - pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + pte =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); pte--; addr -=3D PAGE_SIZE; continue; @@ -566,7 +569,7 @@ static long madvise_pageout(struct vm_area_struct *vm= a, return 0; } =20 -static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, +static int madvise_free_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long = addr, unsigned long end, struct mm_walk *walk) =20 { @@ -580,15 +583,15 @@ static int madvise_free_pte_range(pmd_t *pmd, unsig= ned long addr, unsigned long next; =20 next =3D pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) - if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) + if (pmd_trans_huge(pmd)) + if (madvise_free_huge_pmd(tlb, vma, pmdp, addr, next)) goto next; =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(&pmd)) return 0; =20 tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte =3D pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + orig_pte =3D pte =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); flush_tlb_batched_pending(mm); arch_enter_lazy_mmu_mode(); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { @@ -634,12 +637,12 @@ static int madvise_free_pte_range(pmd_t *pmd, unsig= ned long addr, if (split_huge_page(page)) { unlock_page(page); put_page(page); - pte_offset_map_lock(mm, pmd, addr, &ptl); + pte_offset_map_lock(mm, pmdp, addr, &ptl); goto out; } unlock_page(page); put_page(page); - pte =3D pte_offset_map_lock(mm, pmd, addr, &ptl); + pte =3D pte_offset_map_lock(mm, pmdp, addr, &ptl); pte--; addr -=3D PAGE_SIZE; continue; diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 9c4a0851348f..b28f620c1c5b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5827,7 +5827,7 @@ static inline enum mc_target_type get_mctgt_type_th= p(struct vm_area_struct *vma, } #endif =20 -static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, +static int mem_cgroup_count_precharge_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -5835,22 +5835,27 @@ static int mem_cgroup_count_precharge_pte_range(p= md_t *pmd, pte_t *pte; spinlock_t *ptl; =20 - ptl =3D pmd_trans_huge_lock(pmd, vma); + ptl =3D pmd_trans_huge_lock(pmdp, vma); if (ptl) { + if (memcmp(pmdp, &pmd, sizeof(pmd)) !=3D 0) { + walk->action =3D ACTION_AGAIN; + spin_unlock(ptl); + return 0; + } /* * Note their can not be MC_TARGET_DEVICE for now as we do not * support transparent huge page with MEMORY_DEVICE_PRIVATE but * this might change. */ - if (get_mctgt_type_thp(vma, addr, *pmd, NULL) =3D=3D MC_TARGET_PAGE) + if (get_mctgt_type_thp(vma, addr, pmd, NULL) =3D=3D MC_TARGET_PAGE) mc.precharge +=3D HPAGE_PMD_NR; spin_unlock(ptl); return 0; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(&pmd)) return 0; - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + pte =3D pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) if (get_mctgt_type(vma, addr, *pte, NULL)) mc.precharge++; /* increment precharge temporarily */ @@ -6023,7 +6028,7 @@ static void mem_cgroup_cancel_attach(struct cgroup_= taskset *tset) mem_cgroup_clear_mc(); } =20 -static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, +static int mem_cgroup_move_charge_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -6035,13 +6040,18 @@ static int mem_cgroup_move_charge_pte_range(pmd_t= *pmd, union mc_target target; struct page *page; =20 - ptl =3D pmd_trans_huge_lock(pmd, vma); + ptl =3D pmd_trans_huge_lock(pmdp, vma); if (ptl) { + if (memcmp(pmdp, &pmd, sizeof(pmd)) !=3D 0) { + walk->action =3D ACTION_AGAIN; + spin_unlock(ptl); + return 0; + } if (mc.precharge < HPAGE_PMD_NR) { spin_unlock(ptl); return 0; } - target_type =3D get_mctgt_type_thp(vma, addr, *pmd, &target); + target_type =3D get_mctgt_type_thp(vma, addr, pmd, &target); if (target_type =3D=3D MC_TARGET_PAGE) { page =3D target.page; if (!isolate_lru_page(page)) { @@ -6066,10 +6076,10 @@ static int mem_cgroup_move_charge_pte_range(pmd_t= *pmd, return 0; } =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(&pmd)) return 0; retry: - pte =3D pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + pte =3D pte_offset_map_lock(vma->vm_mm, pmdp, addr, &ptl); for (; addr !=3D end; addr +=3D PAGE_SIZE) { pte_t ptent =3D *(pte++); bool device =3D false; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index eddbe4e56c73..731a7710395f 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -516,7 +516,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *pt= l, unsigned long addr, * -EIO - only MPOL_MF_STRICT was specified and an existing page was alr= eady * on a node that does not follow the policy. */ -static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, +static int queue_pages_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long a= ddr, unsigned long end, struct mm_walk *walk) { struct vm_area_struct *vma =3D walk->vma; @@ -528,18 +528,23 @@ static int queue_pages_pte_range(pmd_t *pmd, unsign= ed long addr, pte_t *pte; spinlock_t *ptl; =20 - ptl =3D pmd_trans_huge_lock(pmd, vma); + ptl =3D pmd_trans_huge_lock(pmdp, vma); if (ptl) { - ret =3D queue_pages_pmd(pmd, ptl, addr, end, walk); + if (memcmp(pmdp, &pmd, sizeof(pmd)) !=3D 0) { + walk->action =3D ACTION_AGAIN; + spin_unlock(ptl); + return 0; + } + ret =3D queue_pages_pmd(pmdp, ptl, addr, end, walk); if (ret !=3D 2) return ret; } /* THP was split, fall through to pte walk */ =20 - if (pmd_trans_unstable(pmd)) + if (pmd_trans_unstable(&pmd)) return 0; =20 - pte =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + pte =3D pte_offset_map_lock(walk->mm, pmdp, addr, &ptl); for (; addr !=3D end; pte++, addr +=3D PAGE_SIZE) { if (!pte_present(*pte)) continue; diff --git a/mm/mincore.c b/mm/mincore.c index 02db1a834021..168661f32aaa 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -96,8 +96,8 @@ static int mincore_unmapped_range(unsigned long addr, u= nsigned long end, return 0; } =20 -static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned lo= ng end, - struct mm_walk *walk) +static int mincore_pte_range(pmd_t pmd, pmd_t *pmdp, unsigned long addr, + unsigned long end, struct mm_walk *walk) { spinlock_t *ptl; struct vm_area_struct *vma =3D walk->vma; @@ -105,19 +105,19 @@ static int mincore_pte_range(pmd_t *pmd, unsigned l= ong addr, unsigned long end, unsigned char *vec =3D walk->private; int nr =3D (end - addr) >> PAGE_SHIFT; =20 - ptl =3D pmd_trans_huge_lock(pmd, vma); + ptl =3D pmd_trans_huge_lock(pmdp, vma); if (ptl) { memset(vec, 1, nr); spin_unlock(ptl); goto out; } =20 - if (pmd_trans_unstable(pmd)) { + if (pmd_trans_unstable(&pmd)) { __mincore_unmapped_range(addr, end, vma, vec); goto out; } =20 - ptep =3D pte_offset_map_lock(walk->mm, pmd, addr, &ptl); + ptep =3D pte_offset_map_lock(walk->mm, pmdp, addr, &ptl); for (; addr !=3D end; ptep++, addr +=3D PAGE_SIZE) { pte_t pte =3D *ptep; =20 diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 15d1e423b4a3..a3752c82a7b2 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -61,17 +61,19 @@ static int walk_pte_range(pmd_t *pmd, unsigned long a= ddr, unsigned long end, static int walk_pmd_range(pud_t pud, unsigned long addr, unsigned long e= nd, struct mm_walk *walk) { - pmd_t *pmd; + pmd_t *pmdp; + pmd_t pmd; unsigned long next; const struct mm_walk_ops *ops =3D walk->ops; int err =3D 0; int depth =3D real_depth(3); =20 - pmd =3D pmd_offset(&pud, addr); + pmdp =3D pmd_offset(&pud, addr); do { again: + pmd =3D READ_ONCE(*pmdp); next =3D pmd_addr_end(addr, end); - if (pmd_none(*pmd) || (!walk->vma && !walk->no_vma)) { + if (pmd_none(pmd) || (!walk->vma && !walk->no_vma)) { if (ops->pte_hole) err =3D ops->pte_hole(addr, next, depth, walk); if (err) @@ -86,7 +88,7 @@ static int walk_pmd_range(pud_t pud, unsigned long addr= , unsigned long end, * needs to know about pmd_trans_huge() pmds */ if (ops->pmd_entry) - err =3D ops->pmd_entry(pmd, addr, next, walk); + err =3D ops->pmd_entry(pmd, pmdp, addr, next, walk); if (err) break; =20 @@ -97,21 +99,22 @@ static int walk_pmd_range(pud_t pud, unsigned long ad= dr, unsigned long end, * Check this here so we only break down trans_huge * pages when we _need_ to */ - if ((!walk->vma && (pmd_leaf(*pmd) || !pmd_present(*pmd))) || + if ((!walk->vma && (pmd_leaf(pmd) || !pmd_present(pmd))) || walk->action =3D=3D ACTION_CONTINUE || !(ops->pte_entry)) continue; =20 if (walk->vma) { - split_huge_pmd(walk->vma, pmd, addr); - if (pmd_trans_unstable(pmd)) + split_huge_pmd(walk->vma, pmdp, addr); + pmd =3D READ_ONCE(*pmdp); + if (pmd_trans_unstable(&pmd)) goto again; } =20 - err =3D walk_pte_range(pmd, addr, next, walk); + err =3D walk_pte_range(pmdp, addr, next, walk); if (err) break; - } while (pmd++, addr =3D next, addr !=3D end); + } while (pmdp++, addr =3D next, addr !=3D end); =20 return err; } --=20 2.28.0