From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.0 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91E33C43216 for ; Tue, 31 Aug 2021 12:37:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 319D86103D for ; Tue, 31 Aug 2021 12:37:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 319D86103D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=bytedance.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id B67736B0071; Tue, 31 Aug 2021 08:37:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B16448D0001; Tue, 31 Aug 2021 08:37:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9DDF56B0073; Tue, 31 Aug 2021 08:37:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 8ED546B0071 for ; Tue, 31 Aug 2021 08:37:00 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3C1E8824999B for ; Tue, 31 Aug 2021 12:37:00 +0000 (UTC) X-FDA: 78535325400.25.8A5E99A Received: from mail-pl1-f173.google.com (mail-pl1-f173.google.com [209.85.214.173]) by imf01.hostedemail.com (Postfix) with ESMTP id 3F18F503BB9E for ; Tue, 31 Aug 2021 12:36:59 +0000 (UTC) Received: by mail-pl1-f173.google.com with SMTP id bg1so5245406plb.13 for ; Tue, 31 Aug 2021 05:36:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20150623.gappssmtp.com; s=20150623; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-transfer-encoding; bh=v9g9wZbjQum2R8lBMgDtJLNvpHwYm9wDR3lI7i7rcFU=; b=sNkLyHwNNfFF76COqbJqEQo1Ny08kSwqJPx9SQq+TKr5su2z0NRZX1dL1+G8pcKV/T Yfj2qysGhdNIwTILbExSUF6zQ/Udza5KRyCJlbGnjAJ+UU8TKPK8wlGYZloeNkWqUNWG ZRjGJKU8It967TpSJPsvZyke+gorQdLfLqwXLXOGc+ePXVTNBzdfxXPwHlg269jFdRNN w/f0T2pYp9cj+AAdRQX5altSBt7MYK98voUg7ARMNFeYSdSwuGZ5ZUyIF6SzX4ANE7Fv QejU+iNUiMhb+8Akz7SGUxvpFDKDv+t8nR1Nhki5DAmUWAqafUGzW8PaNoqo5NIcYWZb CJpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding; bh=v9g9wZbjQum2R8lBMgDtJLNvpHwYm9wDR3lI7i7rcFU=; b=bMaIMaDgDdJEIJjdu34P8hnNytoharGegA56a8ZFY87a/H3yXGhlg7OcQ4oF9zbkv1 32USSORCKjljshXLWYAHlbzHXJ9t+lr1jNweFU54Bx5JVKBRtA9jaRb29kKlS7w7k/Tq ksN3lePZTDiDmn8dtPoe/HD+hKBszCy1zX12DDc9aQJccW1WEMO8uo2u8KpTRowQEg8p fiQjmh8eFWfjNcyfT98dVQ6BqdsHSLJWtWAb+wYI5XrvHBk2v514PKUe4DlUkLBz9ltR KTIIpuryHMqwL+EmZpwfyLcft0wCvNDh5GyT8DxdSsPxGVDVu39BIdwvKWrVOqN3I85z V/AA== X-Gm-Message-State: AOAM530sQgN7w65gsd5hrdOKZ8hlPExjLpCeIqG7j9hHFGSA4LMYy+2j sl+jTnCBaWkLK8Z3EjQMhJvmRg== X-Google-Smtp-Source: ABdhPJwE+H6sMqaImTHXCa39myLxG5X+AE/9mNuCpzedptSRGf6Y7h2n8durJreLqgRFhsWUFEsssQ== X-Received: by 2002:a17:90a:14c4:: with SMTP id k62mr5224838pja.126.1630413417784; Tue, 31 Aug 2021 05:36:57 -0700 (PDT) Received: from [10.254.46.235] ([139.177.225.230]) by smtp.gmail.com with ESMTPSA id j9sm21208132pgl.1.2021.08.31.05.36.53 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 31 Aug 2021 05:36:57 -0700 (PDT) Subject: Re: [PATCH v1 2/2] mm: remove redundant smp_wmb() To: David Hildenbrand , akpm@linux-foundation.org, tglx@linutronix.de, hannes@cmpxchg.org, mhocko@kernel.org, vdavydov.dev@gmail.com, kirill.shutemov@linux.intel.com, mika.penttila@nextfour.com Cc: linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-mm@kvack.org, songmuchun@bytedance.com References: <20210828042306.42886-1-zhengqi.arch@bytedance.com> <20210828042306.42886-3-zhengqi.arch@bytedance.com> <9da807d4-1fcc-72e0-dc9e-91ab9fbeb7c6@redhat.com> From: Qi Zheng Message-ID: <3f8e9805-b90f-7df3-8514-139afa653671@bytedance.com> Date: Tue, 31 Aug 2021 20:36:51 +0800 User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: <9da807d4-1fcc-72e0-dc9e-91ab9fbeb7c6@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=bytedance-com.20150623.gappssmtp.com header.s=20150623 header.b=sNkLyHwN; dmarc=pass (policy=none) header.from=bytedance.com; spf=pass (imf01.hostedemail.com: domain of zhengqi.arch@bytedance.com designates 209.85.214.173 as permitted sender) smtp.mailfrom=zhengqi.arch@bytedance.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 3F18F503BB9E X-Stat-Signature: up8ozg79cgem4xe58eascb3ksz9ohske X-HE-Tag: 1630413419-312691 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 2021/8/31 PM6:02, David Hildenbrand wrote: > On 28.08.21 06:23, Qi Zheng wrote: >> The smp_wmb() which is in the __pte_alloc() is used to >> ensure all ptes setup is visible before the pte is made >> visible to other CPUs by being put into page tables. We >> only need this when the pte is actually populated, so >> move it to pte_install(). __pte_alloc_kernel(), >> __p4d_alloc(), __pud_alloc() and __pmd_alloc() are similar >> to this case. >> >> We can also defer smp_wmb() to the place where the pmd entry >> is really populated by preallocated pte. There are two kinds >> of user of preallocated pte, one is filemap & finish_fault(), >> another is THP. The former does not need another smp_wmb() >> because the smp_wmb() has been done by pte_install(). >> Fortunately, the latter also does not need another smp_wmb() >> because there is already a smp_wmb() before populating the >> new pte when the THP uses a preallocated pte to split a huge >> pmd. >> >> Signed-off-by: Qi Zheng >> Reviewed-by: Muchun Song >> --- >> =C2=A0 mm/memory.c=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 | 4= 7=20 >> ++++++++++++++++++++--------------------------- >> =C2=A0 mm/sparse-vmemmap.c |=C2=A0 2 +- >> =C2=A0 2 files changed, 21 insertions(+), 28 deletions(-) >> >> diff --git a/mm/memory.c b/mm/memory.c >> index ef7b1762e996..9c7534187454 100644 >> --- a/mm/memory.c >> +++ b/mm/memory.c >> @@ -439,6 +439,20 @@ void pmd_install(struct mm_struct *mm, pmd_t=20 >> *pmd, pgtable_t *pte) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (likely(pmd_none(*pmd))) {=C2=A0=C2=A0= =C2=A0 /* Has another populated it ? */ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mm_inc_nr_ptes(= mm); >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Ensure all pte set= up (eg. pte page lock and page clearing)=20 >> are >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * visible before the= pte is made visible to other CPUs by being >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * put into page tabl= es. >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * The other side of = the story is the pointer chasing in the=20 >> page >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * table walking code= (when walking the page table without=20 >> locking; >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * ie. most of the ti= me). Fortunately, these data accesses=20 >> consist >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * of a chain of data= -dependent loads, meaning most CPUs (alpha >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * being the notable = exception) will already guarantee loads are >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * seen in-order. See= the alpha page table accessors for the >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * smp_rmb() barriers= in page table walking code. >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 smp_wmb(); /* Could be smp= _wmb__xxx(before|after)_spin_lock */ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pmd_populate(mm= , pmd, *pte); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 *pte =3D NULL; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> @@ -451,21 +465,6 @@ int __pte_alloc(struct mm_struct *mm, pmd_t *pmd) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!new) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return -ENOMEM; >> -=C2=A0=C2=A0=C2=A0 /* >> -=C2=A0=C2=A0=C2=A0=C2=A0 * Ensure all pte setup (eg. pte page lock an= d page clearing) are >> -=C2=A0=C2=A0=C2=A0=C2=A0 * visible before the pte is made visible to = other CPUs by being >> -=C2=A0=C2=A0=C2=A0=C2=A0 * put into page tables. >> -=C2=A0=C2=A0=C2=A0=C2=A0 * >> -=C2=A0=C2=A0=C2=A0=C2=A0 * The other side of the story is the pointer= chasing in the page >> -=C2=A0=C2=A0=C2=A0=C2=A0 * table walking code (when walking the page = table without locking; >> -=C2=A0=C2=A0=C2=A0=C2=A0 * ie. most of the time). Fortunately, these = data accesses consist >> -=C2=A0=C2=A0=C2=A0=C2=A0 * of a chain of data-dependent loads, meanin= g most CPUs (alpha >> -=C2=A0=C2=A0=C2=A0=C2=A0 * being the notable exception) will already = guarantee loads are >> -=C2=A0=C2=A0=C2=A0=C2=A0 * seen in-order. See the alpha page table ac= cessors for the >> -=C2=A0=C2=A0=C2=A0=C2=A0 * smp_rmb() barriers in page table walking c= ode. >> -=C2=A0=C2=A0=C2=A0=C2=A0 */ >> -=C2=A0=C2=A0=C2=A0 smp_wmb(); /* Could be smp_wmb__xxx(before|after)_= spin_lock */ >> - >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pmd_install(mm, pmd, &new); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (new) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pte_free(mm, ne= w); >> @@ -478,10 +477,9 @@ int __pte_alloc_kernel(pmd_t *pmd) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!new) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return -ENOMEM; >> -=C2=A0=C2=A0=C2=A0 smp_wmb(); /* See comment in __pte_alloc */ >> - >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 spin_lock(&init_mm.page_table_lock); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (likely(pmd_none(*pmd))) {=C2=A0=C2=A0= =C2=A0 /* Has another populated it ? */ >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 smp_wmb(); /* See comment = in pmd_install() */ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pmd_populate_ke= rnel(&init_mm, pmd, new); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 new =3D NULL; >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> @@ -3857,7 +3855,6 @@ static vm_fault_t __do_fault(struct vm_fault *vm= f) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vmf->prealloc_p= te =3D pte_alloc_one(vma->vm_mm); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!vmf->preal= loc_pte) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 return VM_FAULT_OOM; >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 smp_wmb(); /* See comment = in __pte_alloc() */ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D vma->vm_ops->fault(vmf); >> @@ -3919,7 +3916,6 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf,=20 >> struct page *page) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vmf->prealloc_p= te =3D pte_alloc_one(vma->vm_mm); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!vmf->preal= loc_pte) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 return VM_FAULT_OOM; >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 smp_wmb(); /* See comment = in __pte_alloc() */ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vmf->ptl =3D pmd_lock(vma->vm_mm, vmf->= pmd); >> @@ -4144,7 +4140,6 @@ static vm_fault_t do_fault_around(struct=20 >> vm_fault *vmf) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vmf->prealloc_p= te =3D pte_alloc_one(vmf->vma->vm_mm); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!vmf->preal= loc_pte) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 return VM_FAULT_OOM; >> -=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 smp_wmb(); /* See comment = in __pte_alloc() */ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return vmf->vma->vm_ops->map_pages(vmf,= start_pgoff, end_pgoff); >> @@ -4819,13 +4814,13 @@ int __p4d_alloc(struct mm_struct *mm, pgd_t=20 >> *pgd, unsigned long address) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (!new) >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return -ENOMEM; >> -=C2=A0=C2=A0=C2=A0 smp_wmb(); /* See comment in __pte_alloc */ >> - >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 spin_lock(&mm->page_table_lock); >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (pgd_present(*pgd))=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 /* Another has populated it */ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 p4d_free(mm, ne= w); >> -=C2=A0=C2=A0=C2=A0 else >> +=C2=A0=C2=A0=C2=A0 else { >> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 smp_wmb(); /* See comment = in pmd_install() */ >> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgd_populate(mm= , pgd, new); >> +=C2=A0=C2=A0=C2=A0 } >=20 > Nit: >=20 > if () { >=20 > } else { >=20 > } >=20 > see Documentation/process/coding-style.rst >=20 > "This does not apply if only one branch of a conditional statement is a= =20 > single statement; in the latter case use braces in both branches:" Got it. >=20 >=20 > Apart from that, I think this is fine, >=20 > Acked-by: David Hildenbrand >=20 Thanks, Qi