From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.3 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C105C49EA7 for ; Mon, 28 Jun 2021 06:19:40 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4004961474 for ; Mon, 28 Jun 2021 06:19:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4004961474 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=csgroup.eu Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 743E18D0021; Mon, 28 Jun 2021 02:19:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6F3858D0016; Mon, 28 Jun 2021 02:19:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 546388D0021; Mon, 28 Jun 2021 02:19:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0136.hostedemail.com [216.40.44.136]) by kanga.kvack.org (Postfix) with ESMTP id 221BF8D0016 for ; Mon, 28 Jun 2021 02:19:39 -0400 (EDT) Received: from smtpin37.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 81A871C326 for ; Mon, 28 Jun 2021 06:19:37 +0000 (UTC) X-FDA: 78302131194.37.D1DD5E0 Received: from pegase1.c-s.fr (pegase1.c-s.fr [93.17.236.30]) by imf30.hostedemail.com (Postfix) with ESMTP id 01C90E000254 for ; Mon, 28 Jun 2021 06:19:36 +0000 (UTC) Received: from localhost (mailhub3.si.c-s.fr [192.168.12.233]) by localhost (Postfix) with ESMTP id 4GCyBH0MpczBCKR; Mon, 28 Jun 2021 08:19:35 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from pegase1.c-s.fr ([192.168.12.234]) by localhost (pegase1.c-s.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id ZA7aZ6ejMGIW; Mon, 28 Jun 2021 08:19:34 +0200 (CEST) Received: from messagerie.si.c-s.fr (messagerie.si.c-s.fr [192.168.25.192]) by pegase1.c-s.fr (Postfix) with ESMTP id 4GCyBG6WYfzBCK1; Mon, 28 Jun 2021 08:19:34 +0200 (CEST) Received: from localhost (localhost [127.0.0.1]) by messagerie.si.c-s.fr (Postfix) with ESMTP id B24088B776; Mon, 28 Jun 2021 08:19:34 +0200 (CEST) X-Virus-Scanned: amavisd-new at c-s.fr Received: from messagerie.si.c-s.fr ([127.0.0.1]) by localhost (messagerie.si.c-s.fr [127.0.0.1]) (amavisd-new, port 10023) with ESMTP id d1i1Tjk67oaP; Mon, 28 Jun 2021 08:19:34 +0200 (CEST) Received: from [172.25.230.102] (po15451.idsi0.si.c-s.fr [172.25.230.102]) by messagerie.si.c-s.fr (Postfix) with ESMTP id 5F3578B763; Mon, 28 Jun 2021 08:19:34 +0200 (CEST) Subject: Re: [PATCH v3] mm: pagewalk: Fix walk for hugepage tables To: "Aneesh Kumar K.V" , Steven Price , akpm@linux-foundation.org, linux-mm@kvack.org Cc: Benjamin Herrenschmidt , Paul Mackerras , Michael Ellerman , dja@axtens.net, Oliver O'Halloran , linux-arch@vger.kernel.org, linuxppc-dev@lists.ozlabs.org, linux-kernel@vger.kernel.org References: <38d04410700c8d02f28ba37e020b62c55d6f3d2c.1624597695.git.christophe.leroy@csgroup.eu> <87bl7qle4o.fsf@linux.ibm.com> From: Christophe Leroy Message-ID: <7bbf9c5e-b81d-d20c-7ba1-d50b10238d32@csgroup.eu> Date: Mon, 28 Jun 2021 08:19:35 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.11.0 MIME-Version: 1.0 In-Reply-To: <87bl7qle4o.fsf@linux.ibm.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: fr Authentication-Results: imf30.hostedemail.com; dkim=none; dmarc=none; spf=pass (imf30.hostedemail.com: domain of christophe.leroy@csgroup.eu designates 93.17.236.30 as permitted sender) smtp.mailfrom=christophe.leroy@csgroup.eu X-Stat-Signature: mihwo6jdu77pjky9aqkaqdfxdcrsorbx X-Rspamd-Queue-Id: 01C90E000254 X-Rspamd-Server: rspam06 X-HE-Tag: 1624861176-231260 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Le 28/06/2021 =C3=A0 08:03, Aneesh Kumar K.V a =C3=A9crit=C2=A0: > Christophe Leroy writes: >=20 >> Pagewalk ignores hugepd entries and walk down the tables >> as if it was traditionnal entries, leading to crazy result. >=20 > But we do handle hugetlb separately >=20 > if (vma && is_vm_hugetlb_page(vma)) { > if (ops->hugetlb_entry) > err =3D walk_hugetlb_range(start, end, walk); > } else > err =3D walk_pgd_range(start, end, walk); >=20 > Are we using hugepd format for non hugetlb entries? Yes, on the 8xx we use hugepd for 8M pages for linear mapping and for kas= an shadow mapping (See=20 commit bb5f33c06940 ("Merge "Use hugepages to map kernel mem on 8xx" into= next") And I'm working on implementing huge VMAP with 8M pages, that will also m= ake use of hugepd. >=20 >> >> Add walk_hugepd_range() and use it to walk hugepage tables. >> >> Signed-off-by: Christophe Leroy >> Reviewed-by: Steven Price >> --- >> v3: >> - Rebased on next-20210624 (no change since v2) >> - Added Steven's Reviewed-by >> - Sent as standalone for merge via mm >> >> v2: >> - Add a guard for NULL ops->pte_entry >> - Take mm->page_table_lock when walking hugepage table, as suggested b= y follow_huge_pd() >> --- >> mm/pagewalk.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++---= -- >> 1 file changed, 53 insertions(+), 5 deletions(-) >> >> diff --git a/mm/pagewalk.c b/mm/pagewalk.c >> index e81640d9f177..9b3db11a4d1d 100644 >> --- a/mm/pagewalk.c >> +++ b/mm/pagewalk.c >> @@ -58,6 +58,45 @@ static int walk_pte_range(pmd_t *pmd, unsigned long= addr, unsigned long end, >> return err; >> } >> =20 >> +#ifdef CONFIG_ARCH_HAS_HUGEPD >> +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr, >> + unsigned long end, struct mm_walk *walk, int pdshift) >> +{ >> + int err =3D 0; >> + const struct mm_walk_ops *ops =3D walk->ops; >> + int shift =3D hugepd_shift(*phpd); >> + int page_size =3D 1 << shift; >> + >> + if (!ops->pte_entry) >> + return 0; >> + >> + if (addr & (page_size - 1)) >> + return 0; >> + >> + for (;;) { >> + pte_t *pte; >> + >> + spin_lock(&walk->mm->page_table_lock); >> + pte =3D hugepte_offset(*phpd, addr, pdshift); >> + err =3D ops->pte_entry(pte, addr, addr + page_size, walk); >> + spin_unlock(&walk->mm->page_table_lock); >> + >> + if (err) >> + break; >> + if (addr >=3D end - page_size) >> + break; >> + addr +=3D page_size; >> + } >> + return err; >> +} >> +#else >> +static int walk_hugepd_range(hugepd_t *phpd, unsigned long addr, >> + unsigned long end, struct mm_walk *walk, int pdshift) >> +{ >> + return 0; >> +} >> +#endif >> + >> static int walk_pmd_range(pud_t *pud, unsigned long addr, unsigned l= ong end, >> struct mm_walk *walk) >> { >> @@ -108,7 +147,10 @@ static int walk_pmd_range(pud_t *pud, unsigned lo= ng addr, unsigned long end, >> goto again; >> } >> =20 >> - err =3D walk_pte_range(pmd, addr, next, walk); >> + if (is_hugepd(__hugepd(pmd_val(*pmd)))) >> + err =3D walk_hugepd_range((hugepd_t *)pmd, addr, next, walk, PMD_S= HIFT); >> + else >> + err =3D walk_pte_range(pmd, addr, next, walk); >> if (err) >> break; >> } while (pmd++, addr =3D next, addr !=3D end); >> @@ -157,7 +199,10 @@ static int walk_pud_range(p4d_t *p4d, unsigned lo= ng addr, unsigned long end, >> if (pud_none(*pud)) >> goto again; >> =20 >> - err =3D walk_pmd_range(pud, addr, next, walk); >> + if (is_hugepd(__hugepd(pud_val(*pud)))) >> + err =3D walk_hugepd_range((hugepd_t *)pud, addr, next, walk, PUD_S= HIFT); >> + else >> + err =3D walk_pmd_range(pud, addr, next, walk); >> if (err) >> break; >> } while (pud++, addr =3D next, addr !=3D end); >> @@ -189,7 +234,9 @@ static int walk_p4d_range(pgd_t *pgd, unsigned lon= g addr, unsigned long end, >> if (err) >> break; >> } >> - if (ops->pud_entry || ops->pmd_entry || ops->pte_entry) >> + if (is_hugepd(__hugepd(p4d_val(*p4d)))) >> + err =3D walk_hugepd_range((hugepd_t *)p4d, addr, next, walk, P4D_S= HIFT); >> + else if (ops->pud_entry || ops->pmd_entry || ops->pte_entry) >> err =3D walk_pud_range(p4d, addr, next, walk); >> if (err) >> break; >> @@ -224,8 +271,9 @@ static int walk_pgd_range(unsigned long addr, unsi= gned long end, >> if (err) >> break; >> } >> - if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry || >> - ops->pte_entry) >> + if (is_hugepd(__hugepd(pgd_val(*pgd)))) >> + err =3D walk_hugepd_range((hugepd_t *)pgd, addr, next, walk, PGDIR= _SHIFT); >> + else if (ops->p4d_entry || ops->pud_entry || ops->pmd_entry || ops-= >pte_entry) >> err =3D walk_p4d_range(pgd, addr, next, walk); >> if (err) >> break; >> --=20 >> 2.25.0