From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F19C2C433B4 for ; Fri, 30 Apr 2021 19:52:47 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9CF566146D for ; Fri, 30 Apr 2021 19:52:47 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9CF566146D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=lespinasse.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E16DE6B0075; Fri, 30 Apr 2021 15:52:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D0CC16B0082; Fri, 30 Apr 2021 15:52:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7CF5E6B0073; Fri, 30 Apr 2021 15:52:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0037.hostedemail.com [216.40.44.37]) by kanga.kvack.org (Postfix) with ESMTP id EDD3E6B0075 for ; Fri, 30 Apr 2021 15:52:36 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id AE4EE180945C5 for ; Fri, 30 Apr 2021 19:52:36 +0000 (UTC) X-FDA: 78090080712.27.9396D4E Received: from server.lespinasse.org (server.lespinasse.org [63.205.204.226]) by imf26.hostedemail.com (Postfix) with ESMTP id A6DD740002E2 for ; Fri, 30 Apr 2021 19:52:26 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-ed; t=1619812353; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=/loUZ02DI4vLNq42pqfAgn2FTqDP65md0qhN70FxHpk=; b=IwSGLPkn4zooFRT7/gWLCebPara7aRpy5Qj+LPZ0tukfbpfTRenz2PxZwOHaT0wEqGD31 h5DpMgPRFwLsUemAA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-14-rsa; t=1619812353; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=/loUZ02DI4vLNq42pqfAgn2FTqDP65md0qhN70FxHpk=; b=QCnRPKzkUxRk2aTYrFT8g5Z9E+pght0bvblf0FiErmKwNeEu/SgIE+yk0D47V3Sml/KhL ozkbM6lBlWu8LbZm7ex+QeqNn3HC68pgPjG/DJaqj/7bS9xI9cuUZVqUGBX9T5SkO6cUko/ /UW9i09OLWfQVhfmU7z7s9HfDlTQwokpKDbzuvWEacFEH3m7ga9Nzzahwi6mbNsGAqIjGIV HYj0ov47CZyiJcfF6gb98pOvWiAmK52W2C0l3KeS0Rr0Na4LS82iTJEwr0G+jMSxCfKQ6xW g8HfGfrU63q1nT0cE//UZWITnSuZc2MWRjuQ+HcfxBv+APJFbEAZvx+7kqIw== Received: from zeus.lespinasse.org (zeus.lespinasse.org [10.0.0.150]) by server.lespinasse.org (Postfix) with ESMTPS id 68992160325; Fri, 30 Apr 2021 12:52:33 -0700 (PDT) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id 59C6019F522; Fri, 30 Apr 2021 12:52:33 -0700 (PDT) From: Michel Lespinasse To: Linux-MM , Linux-Kernel Cc: Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Paul McKenney , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Andy Lutomirski , Michel Lespinasse Subject: [PATCH 14/29] mm: refactor __handle_mm_fault() / handle_pte_fault() Date: Fri, 30 Apr 2021 12:52:15 -0700 Message-Id: <20210430195232.30491-15-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210430195232.30491-1-michel@lespinasse.org> References: <20210430195232.30491-1-michel@lespinasse.org> MIME-Version: 1.0 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=lespinasse.org header.s=srv-14-ed header.b=IwSGLPkn; dkim=pass header.d=lespinasse.org header.s=srv-14-rsa header.b=QCnRPKzk; dmarc=pass (policy=none) header.from=lespinasse.org; spf=pass (imf26.hostedemail.com: domain of walken@lespinasse.org designates 63.205.204.226 as permitted sender) smtp.mailfrom=walken@lespinasse.org X-Stat-Signature: 7unnn9gb94gwcgda4rsy7a513wqgyc8p X-Rspamd-Queue-Id: A6DD740002E2 X-Rspamd-Server: rspam05 Received-SPF: none (lespinasse.org>: No applicable sender policy available) receiver=imf26; identity=mailfrom; envelope-from=""; helo=server.lespinasse.org; client-ip=63.205.204.226 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1619812346-364927 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move the code that initializes vmf->pte and vmf->orig_pte from handle_pte_fault() to its single call site in __handle_mm_fault(). This ensures vmf->pte is now initialized together with the higher levels of the page table hierarchy. This also prepares for speculative page faul= t handling, where the entire page table walk (higher levels down to ptes) needs special care in the speculative case. Signed-off-by: Michel Lespinasse --- mm/memory.c | 98 ++++++++++++++++++++++++++--------------------------- 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index b28047765de7..45696166b10f 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3538,7 +3538,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault= *vmf) if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; =20 - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (unlikely(pmd_trans_unstable(vmf->pmd))) return 0; =20 @@ -3819,7 +3819,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } =20 - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (pmd_devmap_trans_unstable(vmf->pmd)) return 0; =20 @@ -4275,53 +4275,6 @@ static vm_fault_t handle_pte_fault(struct vm_fault= *vmf) { pte_t entry; =20 - if (unlikely(pmd_none(*vmf->pmd))) { - /* - * Leave __pte_alloc() until later: because vm_ops->fault may - * want to allocate huge page, and if we expose page table - * for an instant, it will be difficult to retract from - * concurrent faults and from rmap lookups. - */ - vmf->pte =3D NULL; - } else { - /* - * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead - * of pmd_trans_huge() to ensure the pmd didn't become - * pmd_trans_huge under us and then back to pmd_none, as a - * result of MADV_DONTNEED running immediately after a huge pmd - * fault in a different thread of this mm, in turn leading to a - * misleading pmd_trans_huge() retval. All we have to ensure is - * that it is a regular pmd that we can walk with - * pte_offset_map() and we can do that through an atomic read - * in C, which is what pmd_trans_unstable() provides. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; - /* - * A regular pmd is established and it can't morph into a huge - * pmd from under us anymore at this point because we hold the - * mmap_lock read mode and khugepaged takes it in write mode. - * So now it's safe to run pte_offset_map(). - */ - vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); - vmf->orig_pte =3D *vmf->pte; - - /* - * some architectures can have larger ptes than wordsize, - * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=3Dy and - * CONFIG_32BIT=3Dy, so READ_ONCE cannot guarantee atomic - * accesses. The code below just needs a consistent view - * for the ifs and we later double check anyway with the - * ptl lock held. So here a barrier will do. - */ - barrier(); - if (pte_none(vmf->orig_pte)) { - pte_unmap(vmf->pte); - vmf->pte =3D NULL; - } - } - if (!vmf->pte) { if (vma_is_anonymous(vmf->vma)) return do_anonymous_page(vmf); @@ -4461,6 +4414,53 @@ static vm_fault_t __handle_mm_fault(struct vm_area= _struct *vma, } } =20 + if (unlikely(pmd_none(*vmf.pmd))) { + /* + * Leave __pte_alloc() until later: because vm_ops->fault may + * want to allocate huge page, and if we expose page table + * for an instant, it will be difficult to retract from + * concurrent faults and from rmap lookups. + */ + vmf.pte =3D NULL; + } else { + /* + * If a huge pmd materialized under us just retry later. Use + * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead + * of pmd_trans_huge() to ensure the pmd didn't become + * pmd_trans_huge under us and then back to pmd_none, as a + * result of MADV_DONTNEED running immediately after a huge pmd + * fault in a different thread of this mm, in turn leading to a + * misleading pmd_trans_huge() retval. All we have to ensure is + * that it is a regular pmd that we can walk with + * pte_offset_map() and we can do that through an atomic read + * in C, which is what pmd_trans_unstable() provides. + */ + if (pmd_devmap_trans_unstable(vmf.pmd)) + return 0; + /* + * A regular pmd is established and it can't morph into a huge + * pmd from under us anymore at this point because we hold the + * mmap_lock read mode and khugepaged takes it in write mode. + * So now it's safe to run pte_offset_map(). + */ + vmf.pte =3D pte_offset_map(vmf.pmd, vmf.address); + vmf.orig_pte =3D *vmf.pte; + + /* + * some architectures can have larger ptes than wordsize, + * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=3Dy and + * CONFIG_32BIT=3Dy, so READ_ONCE cannot guarantee atomic + * accesses. The code below just needs a consistent view + * for the ifs and we later double check anyway with the + * ptl lock held. So here a barrier will do. + */ + barrier(); + if (pte_none(vmf.orig_pte)) { + pte_unmap(vmf.pte); + vmf.pte =3D NULL; + } + } + return handle_pte_fault(&vmf); } =20 --=20 2.20.1