From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8E0A0C433B4 for ; Wed, 7 Apr 2021 01:45:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3CB56610CE for ; Wed, 7 Apr 2021 01:45:34 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3CB56610CE Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=lespinasse.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9A5DA8E000B; Tue, 6 Apr 2021 21:45:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4A83E8E0009; Tue, 6 Apr 2021 21:45:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 945D98E0006; Tue, 6 Apr 2021 21:45:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0018.hostedemail.com [216.40.44.18]) by kanga.kvack.org (Postfix) with ESMTP id 99D306B0092 for ; Tue, 6 Apr 2021 21:45:07 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 5AC448249980 for ; Wed, 7 Apr 2021 01:45:07 +0000 (UTC) X-FDA: 78003877854.29.60C1785 Received: from server.lespinasse.org (server.lespinasse.org [63.205.204.226]) by imf25.hostedemail.com (Postfix) with ESMTP id 5EE26600010B for ; Wed, 7 Apr 2021 01:45:05 +0000 (UTC) DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-11-ed; t=1617759902; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=mO5MU45MyVZcFHxHRLPz46S2R58Y3f1o/RvXopsTgug=; b=x8Ntsiwc/FMdqgxnWlPWGRMTMXKGXUd7FH7ae8ZNK07QdvjB9kdBhnKqEpDWUGep8WKgH agBG8aS8vepwuR9DQ== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=lespinasse.org; i=@lespinasse.org; q=dns/txt; s=srv-11-rsa; t=1617759902; h=from : to : cc : subject : date : message-id : in-reply-to : references : mime-version : content-transfer-encoding : from; bh=mO5MU45MyVZcFHxHRLPz46S2R58Y3f1o/RvXopsTgug=; b=psR8q2EllG8INTzPGm5GoRHikfJCeKeFiVpoGicOrWGGGHq0JVzKS7chnTxPadxN82J4S jgkUavAqZfvS7qtYoZ30w4IeLh1awXgYDxl7BwEvUJ4oQE6+AhCFkcE0Qzh08C4HXiovxGS hLlPXxtWwLKFOgCkA3eP9WXrmsitGCWenDzpIgjLC7olWkErJSelAlrPp2mvIB7lLX50swf YYOHYT/YnPjrziUErSUmlNX1HI0Z670r9mkloDUGgn4ugJbzNQcZloAxwZcMhUFKxiONRWf tBhnOM8mm/8aax6ivgVkb1HvaqDBIQwG/lC6CHAPXCUknPqfVjHYSjx3gBhg== Received: from zeus.lespinasse.org (zeus.lespinasse.org [IPv6:fd00::150:0]) by server.lespinasse.org (Postfix) with ESMTPS id B8D011602D1; Tue, 6 Apr 2021 18:45:02 -0700 (PDT) Received: by zeus.lespinasse.org (Postfix, from userid 1000) id A8AAC19F31F; Tue, 6 Apr 2021 18:45:02 -0700 (PDT) From: Michel Lespinasse To: Linux-MM Cc: Laurent Dufour , Peter Zijlstra , Michal Hocko , Matthew Wilcox , Rik van Riel , Paul McKenney , Andrew Morton , Suren Baghdasaryan , Joel Fernandes , Rom Lemarchand , Linux-Kernel , Michel Lespinasse Subject: [RFC PATCH 12/37] mm: refactor __handle_mm_fault() / handle_pte_fault() Date: Tue, 6 Apr 2021 18:44:37 -0700 Message-Id: <20210407014502.24091-13-michel@lespinasse.org> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210407014502.24091-1-michel@lespinasse.org> References: <20210407014502.24091-1-michel@lespinasse.org> MIME-Version: 1.0 X-Stat-Signature: u1jcx86p5rg9r1acjogb4wfm8uafktig X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 5EE26600010B Received-SPF: none (lespinasse.org>: No applicable sender policy available) receiver=imf25; identity=mailfrom; envelope-from=""; helo=server.lespinasse.org; client-ip=63.205.204.226 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1617759905-77036 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Move the code that initializes vmf->pte and vmf->orig_pte from handle_pte_fault() to its single call site in __handle_mm_fault(). This ensures vmf->pte is now initialized together with the higher levels of the page table hierarchy. This also prepares for speculative page faul= t handling, where the entire page table walk (higher levels down to ptes) needs special care in the speculative case. Signed-off-by: Michel Lespinasse --- mm/memory.c | 98 ++++++++++++++++++++++++++--------------------------- 1 file changed, 49 insertions(+), 49 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 3691be1f1319..66e7a4554c54 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3516,7 +3516,7 @@ static vm_fault_t do_anonymous_page(struct vm_fault= *vmf) if (pte_alloc(vma->vm_mm, vmf->pmd)) return VM_FAULT_OOM; =20 - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (unlikely(pmd_trans_unstable(vmf->pmd))) return 0; =20 @@ -3797,7 +3797,7 @@ vm_fault_t finish_fault(struct vm_fault *vmf) return VM_FAULT_OOM; } =20 - /* See comment in handle_pte_fault() */ + /* See comment in __handle_mm_fault() */ if (pmd_devmap_trans_unstable(vmf->pmd)) return 0; =20 @@ -4253,53 +4253,6 @@ static vm_fault_t handle_pte_fault(struct vm_fault= *vmf) { pte_t entry; =20 - if (unlikely(pmd_none(*vmf->pmd))) { - /* - * Leave __pte_alloc() until later: because vm_ops->fault may - * want to allocate huge page, and if we expose page table - * for an instant, it will be difficult to retract from - * concurrent faults and from rmap lookups. - */ - vmf->pte =3D NULL; - } else { - /* - * If a huge pmd materialized under us just retry later. Use - * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead - * of pmd_trans_huge() to ensure the pmd didn't become - * pmd_trans_huge under us and then back to pmd_none, as a - * result of MADV_DONTNEED running immediately after a huge pmd - * fault in a different thread of this mm, in turn leading to a - * misleading pmd_trans_huge() retval. All we have to ensure is - * that it is a regular pmd that we can walk with - * pte_offset_map() and we can do that through an atomic read - * in C, which is what pmd_trans_unstable() provides. - */ - if (pmd_devmap_trans_unstable(vmf->pmd)) - return 0; - /* - * A regular pmd is established and it can't morph into a huge - * pmd from under us anymore at this point because we hold the - * mmap_lock read mode and khugepaged takes it in write mode. - * So now it's safe to run pte_offset_map(). - */ - vmf->pte =3D pte_offset_map(vmf->pmd, vmf->address); - vmf->orig_pte =3D *vmf->pte; - - /* - * some architectures can have larger ptes than wordsize, - * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=3Dy and - * CONFIG_32BIT=3Dy, so READ_ONCE cannot guarantee atomic - * accesses. The code below just needs a consistent view - * for the ifs and we later double check anyway with the - * ptl lock held. So here a barrier will do. - */ - barrier(); - if (pte_none(vmf->orig_pte)) { - pte_unmap(vmf->pte); - vmf->pte =3D NULL; - } - } - if (!vmf->pte) { if (vma_is_anonymous(vmf->vma)) return do_anonymous_page(vmf); @@ -4439,6 +4392,53 @@ static vm_fault_t __handle_mm_fault(struct vm_area= _struct *vma, } } =20 + if (unlikely(pmd_none(*vmf.pmd))) { + /* + * Leave __pte_alloc() until later: because vm_ops->fault may + * want to allocate huge page, and if we expose page table + * for an instant, it will be difficult to retract from + * concurrent faults and from rmap lookups. + */ + vmf.pte =3D NULL; + } else { + /* + * If a huge pmd materialized under us just retry later. Use + * pmd_trans_unstable() via pmd_devmap_trans_unstable() instead + * of pmd_trans_huge() to ensure the pmd didn't become + * pmd_trans_huge under us and then back to pmd_none, as a + * result of MADV_DONTNEED running immediately after a huge pmd + * fault in a different thread of this mm, in turn leading to a + * misleading pmd_trans_huge() retval. All we have to ensure is + * that it is a regular pmd that we can walk with + * pte_offset_map() and we can do that through an atomic read + * in C, which is what pmd_trans_unstable() provides. + */ + if (pmd_devmap_trans_unstable(vmf.pmd)) + return 0; + /* + * A regular pmd is established and it can't morph into a huge + * pmd from under us anymore at this point because we hold the + * mmap_lock read mode and khugepaged takes it in write mode. + * So now it's safe to run pte_offset_map(). + */ + vmf.pte =3D pte_offset_map(vmf.pmd, vmf.address); + vmf.orig_pte =3D *vmf.pte; + + /* + * some architectures can have larger ptes than wordsize, + * e.g.ppc44x-defconfig has CONFIG_PTE_64BIT=3Dy and + * CONFIG_32BIT=3Dy, so READ_ONCE cannot guarantee atomic + * accesses. The code below just needs a consistent view + * for the ifs and we later double check anyway with the + * ptl lock held. So here a barrier will do. + */ + barrier(); + if (pte_none(vmf.orig_pte)) { + pte_unmap(vmf.pte); + vmf.pte =3D NULL; + } + } + return handle_pte_fault(&vmf); } =20 --=20 2.20.1