From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0C65FC77B7A for ; Wed, 24 May 2023 22:29:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 94624900004; Wed, 24 May 2023 18:29:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8F6D8900002; Wed, 24 May 2023 18:29:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 797C2900004; Wed, 24 May 2023 18:29:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 69B1B900002 for ; Wed, 24 May 2023 18:29:29 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 3A179C099C for ; Wed, 24 May 2023 22:29:29 +0000 (UTC) X-FDA: 80826591258.05.C0DEE74 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 89A231C0012 for ; Wed, 24 May 2023 22:29:26 +0000 (UTC) Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="I/uMR/eh"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1684967367; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=nCgnShniVMahc4GrXOZIXXB2Y2cxK0vO0SVFBYZqGLA=; b=F0QNZNoBgCONs/q1CrASc4O1+RIAEnxP5BdGW8MRs5cxQbnBorsttXeLOAT9btf0aGwrs2 9CgscOfT3IrpJ2NoFxWY62V4lwcZJy4s7H4iyWj+EngFMYOunvZtHFMSeb1Jwws16o+w1y 3IrdaB4Tk5+H1llepgueDUku69Iiwo0= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b="I/uMR/eh"; dmarc=pass (policy=none) header.from=redhat.com; spf=pass (imf18.hostedemail.com: domain of peterx@redhat.com designates 170.10.129.124 as permitted sender) smtp.mailfrom=peterx@redhat.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1684967367; a=rsa-sha256; cv=none; b=WqYYzYJxRbQhE21Ssv36QqdDMIyJiP8Fbcx3Fnnih+n4Nl0CSQbSCtVP7YpeiO7MxYwSSp Z6WxAixYFQHgD0KMLd4IV6iIZc2dnokTwR1Li8Z5FWeVj5M8T9F+fJ0VDrozx9FYGgMo6g mp3nOQEe7SUBG8VvEn1va39wd62BT5s= DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1684967365; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=nCgnShniVMahc4GrXOZIXXB2Y2cxK0vO0SVFBYZqGLA=; b=I/uMR/ehiXLidinGAvWNkeGp2qUaOL0mvQGdlmUPXvzULr0kG9FJOXnkLN40Dl4dnOEJB3 4gaQ57pOVUx1UVEneJ/fNRfTB758dpWKty2ELMRM2jwInYXEtzMZNTopgjkq5tfbd3bnRx m5V+1+MmuRxXf6Z8aKen+nGoO0DgQbs= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-495-rAQwGSteP6iVDMRqJCmeFA-1; Wed, 24 May 2023 18:29:24 -0400 X-MC-Unique: rAQwGSteP6iVDMRqJCmeFA-1 Received: by mail-qv1-f72.google.com with SMTP id 6a1803df08f44-625891d5ad5so2949296d6.0 for ; Wed, 24 May 2023 15:29:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1684967364; x=1687559364; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=nCgnShniVMahc4GrXOZIXXB2Y2cxK0vO0SVFBYZqGLA=; b=kg/avIQ81y7W2zvR60HxgT3gEVe9Eo0bAL/ETvCHrmHsTovDS752UF3QOttbMseSXV +R6itCEN85A4qd+MyeRYUYeZoYBvC3Kad7+DsMxcd6CqbZq6wklWP0QHP55hMDw8ywGy rCylh4UvT8muRGF9FzLrTAlZQbDl81VFq4U34u+Wm4eIRnJJabaZkjvfZRakHXOKHqHK adT4IaCcdJ5wWcNf3IoO7J7zFuqiuoHL2Bh33Rr5MpYwCjCcf9juMu2rMQLSntnvUIh7 hc6cEFFm8GCsA31hxl1UsaFTT9Gmij1et2h89JlVYJvypZGFUomYRuzWPwHHMqHermxG un6Q== X-Gm-Message-State: AC+VfDxOs0WuqqmzFEP2jpaVmpP6DEOFAMA7hQuTn9STkIdrELDyS2Ng 61fO7aauuioeVEVov9De6pmEeVCp4Q85P/4FpZn1oxL0ZuRA/l2Iqip1rBiHcv4kroRMcPtKjBm CvENb4gnpAlU= X-Received: by 2002:a05:6214:5298:b0:532:141d:3750 with SMTP id kj24-20020a056214529800b00532141d3750mr27818621qvb.2.1684967363888; Wed, 24 May 2023 15:29:23 -0700 (PDT) X-Google-Smtp-Source: ACHHUZ46AVxXWHhQZOrlk0C8EsvcDVeirClCDl+ZrOnZVwVblMdxDe4GFkKpoOklWZSCzdrl2bjP2g== X-Received: by 2002:a05:6214:5298:b0:532:141d:3750 with SMTP id kj24-20020a056214529800b00532141d3750mr27818595qvb.2.1684967363515; Wed, 24 May 2023 15:29:23 -0700 (PDT) Received: from x1n (bras-base-aurron9127w-grc-62-70-24-86-62.dsl.bell.ca. [70.24.86.62]) by smtp.gmail.com with ESMTPSA id lw9-20020a05621457c900b005ddd27e2c0asm3928774qvb.36.2023.05.24.15.29.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 24 May 2023 15:29:22 -0700 (PDT) Date: Wed, 24 May 2023 18:29:20 -0400 From: Peter Xu To: Hugh Dickins Cc: Andrew Morton , Mike Kravetz , Mike Rapoport , "Kirill A. Shutemov" , Matthew Wilcox , David Hildenbrand , Suren Baghdasaryan , Qi Zheng , Yang Shi , Mel Gorman , Peter Zijlstra , Will Deacon , Yu Zhao , Alistair Popple , Ralph Campbell , Ira Weiny , Steven Price , SeongJae Park , Naoya Horiguchi , Christophe Leroy , Zack Rusin , Jason Gunthorpe , Axel Rasmussen , Anshuman Khandual , Pasha Tatashin , Miaohe Lin , Minchan Kim , Christoph Hellwig , Song Liu , Thomas Hellstrom , linux-kernel@vger.kernel.org, linux-mm@kvack.org Subject: Re: [PATCH 01/31] mm: use pmdp_get_lockless() without surplus barrier() Message-ID: References: <68a97fbe-5c1e-7ac6-72c-7b9c6290b370@google.com> <34467cca-58b6-3e64-1ee7-e3dc43257a@google.com> MIME-Version: 1.0 In-Reply-To: <34467cca-58b6-3e64-1ee7-e3dc43257a@google.com> X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline X-Rspamd-Queue-Id: 89A231C0012 X-Rspam-User: X-Rspamd-Server: rspam05 X-Stat-Signature: jwsg1qnqph6t5umy59s9zamh69esse8f X-HE-Tag: 1684967366-721081 X-HE-Meta: U2FsdGVkX1/nI2s7aZ/HKOYpjVlqLLF+UkK24x75Uz2fvZi5ve5asZTVeJpnHWFAbn2Bem+pnM7o40+pKLvOvaRbmVUBFHOZkBCoHgVqaFTpIrdU4Ls2yUd8FVVXXaGq3NRsRz9xsqhpw47YWvZ+O8FOIwrZ3g8uyAnz9olaidMUquFI5fYG8+224L2mj6Z1TtWpQDdneEDI2oNzxr5ck+JoaBzguiB2oXZnYrpNEcanD8e6cXBYJCu5zArz37U7Q7sqKwhXQjOEOiseSMOEBEjjOO1Jp9ZFxEGqXk5c321jr281W8qjiHEAzl6TH1HAJlo9lVvAjZ5JWWBalOmnLAlQ5uh+LOO9bUJS/oqc3/N1/hP5XY4ANFXfAO5Z9fEk0LB9vMEHZJsBoavKdHYZ+oB9N097+6qXc0SePN+0ztiVc+suJtD83llL0jdUJgeFYMfUujBgR3Jic2OmY5/JDaMD4rKer9UfCD4hvoJ7Pzf9PZNaW1Bf3tFcHJ+NhewTvf34Ej0fQvSGl27y9BFE6nR/RYRjP8egvqSmzMHeB8zP1Cdm+nDn/LLpy69+nhtQfRf0fKP1lbDWFteyWzXAiqb1iQ0NyPL2aMDdzBl+XjzfkBFhoTl9vWRK0AxOEwKivghCn0DDRk/XY9QblKaGLtpk016DIVpI0IElLOAlnBqUCWnkIlhIguKi/HD33+KpgifmD9Y0mGpNoFbzs3dPWTCKMerY8SiC/LIBLd7WqVfosN4OE8c1qR1AdWCd/8ixe4jkCR55R2LKa4jOqWVmT8Qdn5D1QDeHmrNAak6K3bqRYDtuz1feATgyTOBu1TaXjG6GVaXYK0oekjelkP16iEbMaURBexekF2dhGHUhLHobBQ3jxR+Nzf2BNAjdpk2/g2PkBqFZhSA9vQCMZY+8I1KPTEJ3D3t6y0x3sfvYLDux12HiF/994ntASUKXaP+V4IrpRRXGE9m4Kem+yXQ KWSWRVJH FlT6hEzO4WUyJppmWh1xUcs0BHxvr35gnDclES+AGgWZhdXS9729TMKeXUmBfvtBJAS2q1rkmc2oHR6RM+8PRDLf333X2YrAoUy5FSAfdhbhPDYT0CVOlSVbsbKwxJoL2NGcKPVSkVxdWzPj0F6ZbjYmRdWbwYylI7qqvDLp26N0NfUURp7379RSZLQuFNfvJIy0MXK/3Ots+rYX2AaS4rh0XdlrHM056ZQVbw67zayzgcrkBrCecYmn7c9oQWbVBv0SU/ITQKameaHxsMuOmmcFLR6BPmk6uFs3khTxmpdVCv/2f0sBX0pf5FSpHwi807RXxPfband0wX5AQhxiHj63Ow92xiOsPHc6pe/x3rfy2Q4Pu7YvcG8/uxO/dVneXgG4BUOwkgD/SDv5I5kLAnpUNLScZw16JOwjfVcgs6b7itBuUfsfIU0gEtA== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, May 21, 2023 at 09:49:45PM -0700, Hugh Dickins wrote: > Use pmdp_get_lockless() in preference to READ_ONCE(*pmdp), to get a more > reliable result with PAE (or READ_ONCE as before without PAE); and remove > the unnecessary extra barrier()s which got left behind in its callers. Pure question: does it mean that some of below path (missing barrier() ones) could have problem when CONFIG_PAE, hence this can be seen as a (potential) bug fix? Thanks, > > HOWEVER: Note the small print in linux/pgtable.h, where it was designed > specifically for fast GUP, and depends on interrupts being disabled for > its full guarantee: most callers which have been added (here and before) > do NOT have interrupts disabled, so there is still some need for caution. > > Signed-off-by: Hugh Dickins > --- > fs/userfaultfd.c | 10 +--------- > include/linux/pgtable.h | 17 ----------------- > mm/gup.c | 6 +----- > mm/hmm.c | 2 +- > mm/khugepaged.c | 5 ----- > mm/ksm.c | 3 +-- > mm/memory.c | 14 ++------------ > mm/mprotect.c | 5 ----- > mm/page_vma_mapped.c | 2 +- > 9 files changed, 7 insertions(+), 57 deletions(-) > > diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c > index 0fd96d6e39ce..f7a0817b1ec0 100644 > --- a/fs/userfaultfd.c > +++ b/fs/userfaultfd.c > @@ -349,15 +349,7 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, > if (!pud_present(*pud)) > goto out; > pmd = pmd_offset(pud, address); > - /* > - * READ_ONCE must function as a barrier with narrower scope > - * and it must be equivalent to: > - * _pmd = *pmd; barrier(); > - * > - * This is to deal with the instability (as in > - * pmd_trans_unstable) of the pmd. > - */ > - _pmd = READ_ONCE(*pmd); > + _pmd = pmdp_get_lockless(pmd); > if (pmd_none(_pmd)) > goto out; > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index c5a51481bbb9..8ec27fe69dc8 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -1344,23 +1344,6 @@ static inline int pud_trans_unstable(pud_t *pud) > static inline int pmd_none_or_trans_huge_or_clear_bad(pmd_t *pmd) > { > pmd_t pmdval = pmdp_get_lockless(pmd); > - /* > - * The barrier will stabilize the pmdval in a register or on > - * the stack so that it will stop changing under the code. > - * > - * When CONFIG_TRANSPARENT_HUGEPAGE=y on x86 32bit PAE, > - * pmdp_get_lockless is allowed to return a not atomic pmdval > - * (for example pointing to an hugepage that has never been > - * mapped in the pmd). The below checks will only care about > - * the low part of the pmd with 32bit PAE x86 anyway, with the > - * exception of pmd_none(). So the important thing is that if > - * the low part of the pmd is found null, the high part will > - * be also null or the pmd_none() check below would be > - * confused. > - */ > -#ifdef CONFIG_TRANSPARENT_HUGEPAGE > - barrier(); > -#endif > /* > * !pmd_present() checks for pmd migration entries > * > diff --git a/mm/gup.c b/mm/gup.c > index bbe416236593..3bd5d3854c51 100644 > --- a/mm/gup.c > +++ b/mm/gup.c > @@ -653,11 +653,7 @@ static struct page *follow_pmd_mask(struct vm_area_struct *vma, > struct mm_struct *mm = vma->vm_mm; > > pmd = pmd_offset(pudp, address); > - /* > - * The READ_ONCE() will stabilize the pmdval in a register or > - * on the stack so that it will stop changing under the code. > - */ > - pmdval = READ_ONCE(*pmd); > + pmdval = pmdp_get_lockless(pmd); > if (pmd_none(pmdval)) > return no_page_table(vma, flags); > if (!pmd_present(pmdval)) > diff --git a/mm/hmm.c b/mm/hmm.c > index 6a151c09de5e..e23043345615 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -332,7 +332,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, > pmd_t pmd; > > again: > - pmd = READ_ONCE(*pmdp); > + pmd = pmdp_get_lockless(pmdp); > if (pmd_none(pmd)) > return hmm_vma_walk_hole(start, end, -1, walk); > > diff --git a/mm/khugepaged.c b/mm/khugepaged.c > index 6b9d39d65b73..732f9ac393fc 100644 > --- a/mm/khugepaged.c > +++ b/mm/khugepaged.c > @@ -961,11 +961,6 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, > return SCAN_PMD_NULL; > > pmde = pmdp_get_lockless(*pmd); > - > -#ifdef CONFIG_TRANSPARENT_HUGEPAGE > - /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ > - barrier(); > -#endif > if (pmd_none(pmde)) > return SCAN_PMD_NONE; > if (!pmd_present(pmde)) > diff --git a/mm/ksm.c b/mm/ksm.c > index 0156bded3a66..df2aa281d49d 100644 > --- a/mm/ksm.c > +++ b/mm/ksm.c > @@ -1194,8 +1194,7 @@ static int replace_page(struct vm_area_struct *vma, struct page *page, > * without holding anon_vma lock for write. So when looking for a > * genuine pmde (in which to find pte), test present and !THP together. > */ > - pmde = *pmd; > - barrier(); > + pmde = pmdp_get_lockless(pmd); > if (!pmd_present(pmde) || pmd_trans_huge(pmde)) > goto out; > > diff --git a/mm/memory.c b/mm/memory.c > index f69fbc251198..2eb54c0d5d3c 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -4925,18 +4925,9 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) > * So now it's safe to run pte_offset_map(). > */ > vmf->pte = pte_offset_map(vmf->pmd, vmf->address); > - vmf->orig_pte = *vmf->pte; > + vmf->orig_pte = ptep_get_lockless(vmf->pte); > vmf->flags |= FAULT_FLAG_ORIG_PTE_VALID; > > - /* > - * some architectures can have larger ptes than wordsize, > - * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=y and > - * CONFIG_32BIT=y, so READ_ONCE cannot guarantee atomic > - * accesses. The code below just needs a consistent view > - * for the ifs and we later double check anyway with the > - * ptl lock held. So here a barrier will do. > - */ > - barrier(); > if (pte_none(vmf->orig_pte)) { > pte_unmap(vmf->pte); > vmf->pte = NULL; > @@ -5060,9 +5051,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, > if (!(ret & VM_FAULT_FALLBACK)) > return ret; > } else { > - vmf.orig_pmd = *vmf.pmd; > + vmf.orig_pmd = pmdp_get_lockless(vmf.pmd); > > - barrier(); > if (unlikely(is_swap_pmd(vmf.orig_pmd))) { > VM_BUG_ON(thp_migration_supported() && > !is_pmd_migration_entry(vmf.orig_pmd)); > diff --git a/mm/mprotect.c b/mm/mprotect.c > index 92d3d3ca390a..c5a13c0f1017 100644 > --- a/mm/mprotect.c > +++ b/mm/mprotect.c > @@ -309,11 +309,6 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) > { > pmd_t pmdval = pmdp_get_lockless(pmd); > > - /* See pmd_none_or_trans_huge_or_clear_bad for info on barrier */ > -#ifdef CONFIG_TRANSPARENT_HUGEPAGE > - barrier(); > -#endif > - > if (pmd_none(pmdval)) > return 1; > if (pmd_trans_huge(pmdval)) > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c > index 4e448cfbc6ef..64aff6718bdb 100644 > --- a/mm/page_vma_mapped.c > +++ b/mm/page_vma_mapped.c > @@ -210,7 +210,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) > * compiler and used as a stale value after we've observed a > * subsequent update. > */ > - pmde = READ_ONCE(*pvmw->pmd); > + pmde = pmdp_get_lockless(pvmw->pmd); > > if (pmd_trans_huge(pmde) || is_pmd_migration_entry(pmde) || > (pmd_present(pmde) && pmd_devmap(pmde))) { > -- > 2.35.3 > -- Peter Xu