From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.5 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9B08C63793 for ; Thu, 22 Jul 2021 07:08:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EC0B961279 for ; Thu, 22 Jul 2021 07:08:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EC0B961279 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=arm.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 5A9FB6B006C; Thu, 22 Jul 2021 03:08:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 557F26B0071; Thu, 22 Jul 2021 03:08:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 447296B0072; Thu, 22 Jul 2021 03:08:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0197.hostedemail.com [216.40.44.197]) by kanga.kvack.org (Postfix) with ESMTP id 2565D6B006C for ; Thu, 22 Jul 2021 03:08:01 -0400 (EDT) Received: from smtpin33.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id D07931F379 for ; Thu, 22 Jul 2021 07:08:00 +0000 (UTC) X-FDA: 78389344320.33.3E1091A Received: from foss.arm.com (foss.arm.com [217.140.110.172]) by imf22.hostedemail.com (Postfix) with ESMTP id 3FBE51B396 for ; Thu, 22 Jul 2021 07:08:00 +0000 (UTC) Received: from usa-sjc-imap-foss1.foss.arm.com (unknown [10.121.207.14]) by usa-sjc-mx-foss1.foss.arm.com (Postfix) with ESMTP id 422D0D6E; Thu, 22 Jul 2021 00:07:59 -0700 (PDT) Received: from [10.163.65.134] (unknown [10.163.65.134]) by usa-sjc-imap-foss1.foss.arm.com (Postfix) with ESMTPSA id BF3123F66F; Thu, 22 Jul 2021 00:07:56 -0700 (PDT) Subject: Re: [PATCH v3 01/12] mm/debug_vm_pgtable: Introduce struct pgtable_debug_args To: Gavin Shan , linux-mm@kvack.org Cc: linux-kernel@vger.kernel.org, catalin.marinas@arm.com, will@kernel.org, akpm@linux-foundation.org, chuhu@redhat.com, shan.gavin@gmail.com References: <20210719130613.334901-1-gshan@redhat.com> <20210719130613.334901-2-gshan@redhat.com> <280a5740-b5dc-4b78-3a38-67e5adbb0afd@redhat.com> <04a4618f-9899-1518-cee1-0a48cb4df4c6@arm.com> <65078a0c-c35c-8e3f-d4d3-3090b0c3daaf@redhat.com> From: Anshuman Khandual Message-ID: <4a534102-11ff-c849-781e-ed173e46da56@arm.com> Date: Thu, 22 Jul 2021 12:38:46 +0530 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.10.0 MIME-Version: 1.0 In-Reply-To: <65078a0c-c35c-8e3f-d4d3-3090b0c3daaf@redhat.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3FBE51B396 X-Stat-Signature: tam8hqe451ak8xnf1p4qsjj9xqogs5de Authentication-Results: imf22.hostedemail.com; dkim=none; dmarc=pass (policy=none) header.from=arm.com; spf=pass (imf22.hostedemail.com: domain of anshuman.khandual@arm.com designates 217.140.110.172 as permitted sender) smtp.mailfrom=anshuman.khandual@arm.com X-HE-Tag: 1626937680-119852 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 7/22/21 11:53 AM, Gavin Shan wrote: > Hi Anshuman, >=20 > On 7/22/21 2:41 PM, Anshuman Khandual wrote: >> On 7/21/21 3:50 PM, Gavin Shan wrote: >>> On 7/21/21 3:44 PM, Anshuman Khandual wrote: >>>> On 7/19/21 6:36 PM, Gavin Shan wrote: >>>>> In debug_vm_pgtable(), there are many local variables introduced to >>>>> track the needed information and they are passed to the functions f= or >>>>> various test cases. It'd better to introduce a struct as place hold= er >>>>> for these information. With it, what the functions for various test >>>>> cases need is the struct, to simplify the code. It also makes code >>>>> easier to be maintained. >>>>> >>>>> Besides, set_xxx_at() could access the data on the corresponding pa= ges >>>>> in the page table modifying tests. So the accessed pages in the tes= ts >>>>> should have been allocated from buddy. Otherwise, we're accessing p= ages >>>>> that aren't owned by us. This causes issues like page flag corrupti= on. >>>>> >>>>> This introduces "struct pgtable_debug_args". The struct is initiali= zed >>>>> and destroyed, but the information in the struct isn't used yet. Th= ey >>>>> will be used in subsequent patches. >>>>> >>>>> Signed-off-by: Gavin Shan >>>>> --- >>>>> =C2=A0=C2=A0 mm/debug_vm_pgtable.c | 197 ++++++++++++++++++++++++++= +++++++++++++++- >>>>> =C2=A0=C2=A0 1 file changed, 196 insertions(+), 1 deletion(-) >>>>> >>> >>> I saw you've finished the review on PATCH[v3 01/12] and PATCH[v3 02/1= 2]. >>> I will wait to integrate your comments to v4 until you finish the rev= iew >>> on all patches in v3 series. >>> >>>>> diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c >>>>> index 1c922691aa61..ea153ff40d23 100644 >>>>> --- a/mm/debug_vm_pgtable.c >>>>> +++ b/mm/debug_vm_pgtable.c >>>>> @@ -58,6 +58,36 @@ >>>>> =C2=A0=C2=A0 #define RANDOM_ORVALUE (GENMASK(BITS_PER_LONG - 1, 0) = & ~ARCH_SKIP_MASK) >>>>> =C2=A0=C2=A0 #define RANDOM_NZVALUE=C2=A0=C2=A0=C2=A0 GENMASK(7, 0) >>>>> =C2=A0=C2=A0 +struct pgtable_debug_args { >>>>> +=C2=A0=C2=A0=C2=A0 struct mm_struct=C2=A0=C2=A0=C2=A0 *mm; >>>>> +=C2=A0=C2=A0=C2=A0 struct vm_area_struct=C2=A0=C2=A0=C2=A0 *vma; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 pgd_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *pgdp; >>>>> +=C2=A0=C2=A0=C2=A0 p4d_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *p4dp; >>>>> +=C2=A0=C2=A0=C2=A0 pud_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *pudp; >>>>> +=C2=A0=C2=A0=C2=A0 pmd_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *pmdp; >>>>> +=C2=A0=C2=A0=C2=A0 pte_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *ptep; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 p4d_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *start_p4dp; >>>>> +=C2=A0=C2=A0=C2=A0 pud_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *start_pudp; >>>>> +=C2=A0=C2=A0=C2=A0 pmd_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0 *start_pmdp; >>>>> +=C2=A0=C2=A0=C2=A0 pgtable_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 start_ptep; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 vaddr; >>>>> +=C2=A0=C2=A0=C2=A0 pgprot_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 page_prot; >>>>> +=C2=A0=C2=A0=C2=A0 pgprot_t=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 page_prot_none; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 pud_pfn; >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 pmd_pfn; >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 pte_pfn; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 fixed_pgd_pfn; >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 fixed_p4d_pfn; >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 fixed_pud_pfn; >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 fixed_pmd_pfn; >>>>> +=C2=A0=C2=A0=C2=A0 unsigned long=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 fixed_pte_pfn; >>>>> +}; >>>>> + >>>>> =C2=A0=C2=A0 static void __init pte_basic_tests(unsigned long pfn, = int idx) >>>>> =C2=A0=C2=A0 { >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pgprot_t prot =3D protection_m= ap[idx]; >>>>> @@ -955,8 +985,167 @@ static unsigned long __init get_random_vaddr(= void) >>>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 return random_vaddr; >>>>> =C2=A0=C2=A0 } >>>>> =C2=A0=C2=A0 +static void __init destroy_args(struct pgtable_debug_= args *args) >>>>> +{ >>>>> +=C2=A0=C2=A0=C2=A0 struct page *page =3D NULL; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* Free (huge) page */ >>>>> +=C2=A0=C2=A0=C2=A0 if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IS_ENABLED(CONFIG_HAVE_= ARCH_TRANSPARENT_HUGEPAGE_PUD) && >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 has_transparent_hugepag= e() && >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 args->pud_pfn !=3D ULON= G_MAX) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 page =3D pfn_to_page(ar= gs->pud_pfn); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __free_pages(page, HPAG= E_PUD_SHIFT - PAGE_SHIFT); >>>>> +=C2=A0=C2=A0=C2=A0 } else if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPA= GE) && >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 has_t= ransparent_hugepage() && >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 args-= >pmd_pfn !=3D ULONG_MAX) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 page =3D pfn_to_page(ar= gs->pmd_pfn); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __free_pages(page, HPAG= E_PMD_ORDER); >>>>> +=C2=A0=C2=A0=C2=A0 } else if (args->pte_pfn !=3D ULONG_MAX) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 page =3D pfn_to_page(ar= gs->pte_pfn); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 __free_pages(page, 0); >>>>> +=C2=A0=C2=A0=C2=A0 } >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* Free page table */ >>>>> +=C2=A0=C2=A0=C2=A0 if (args->start_ptep) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pte_free(args->mm, args= ->start_ptep); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mm_dec_nr_ptes(args->mm= ); >>>>> +=C2=A0=C2=A0=C2=A0 } >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 if (args->start_pmdp) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pmd_free(args->mm, args= ->start_pmdp); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mm_dec_nr_pmds(args->mm= ); >>>>> +=C2=A0=C2=A0=C2=A0 } >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 if (args->start_pudp) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pud_free(args->mm, args= ->start_pudp); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mm_dec_nr_puds(args->mm= ); >>>>> +=C2=A0=C2=A0=C2=A0 } >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 if (args->start_p4dp) >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 p4d_free(args->mm, args= ->p4dp); >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* Free vma and mm struct */ >>>>> +=C2=A0=C2=A0=C2=A0 if (args->vma) >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 vm_area_free(args->vma)= ; >>>>> +=C2=A0=C2=A0=C2=A0 if (args->mm) >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 mmdrop(args->mm); >>>>> +} >>>>> + >>>>> +static int __init init_args(struct pgtable_debug_args *args) >>>>> +{ >>>>> +=C2=A0=C2=A0=C2=A0 struct page *page =3D NULL; >>>>> +=C2=A0=C2=A0=C2=A0 phys_addr_t phys; >>>>> +=C2=A0=C2=A0=C2=A0 int ret =3D 0; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* Initialize the debugging data */ >>>>> +=C2=A0=C2=A0=C2=A0 memset(args, 0, sizeof(*args)); >>>>> +=C2=A0=C2=A0=C2=A0 args->page_prot=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 =3D= vm_get_page_prot(VMFLAGS); >>>>> +=C2=A0=C2=A0=C2=A0 args->page_prot_none =3D __P000; >>>> >>>> Please preserve the existing comments before this assignment. >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * __P00= 0 (or even __S000) will help create page table entries with >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * PROT_= NONE permission as required for pxx_protnone_tests(). >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>> >>> >>> Sure. I will combine the comments in v4 as below: >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Initialize the debugging arguments. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * __P000 (or even __S000) will help cr= eate page table entries with >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * PROT_NONE pe= rmission as required for pxx_protnone_tests(). >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> >>> >>>>> +=C2=A0=C2=A0=C2=A0 args->pud_pfn=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =3D ULONG_MAX; >>>>> +=C2=A0=C2=A0=C2=A0 args->pmd_pfn=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =3D ULONG_MAX; >>>>> +=C2=A0=C2=A0=C2=A0 args->pte_pfn=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 =3D ULONG_MAX; >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_pgd_pfn=C2=A0 =3D ULONG_MAX; >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_p4d_pfn=C2=A0 =3D ULONG_MAX; >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_pud_pfn=C2=A0 =3D ULONG_MAX; >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_pmd_pfn=C2=A0 =3D ULONG_MAX; >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_pte_pfn=C2=A0 =3D ULONG_MAX; >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* Allocate mm and vma */ >>>>> +=C2=A0=C2=A0=C2=A0 args->mm =3D mm_alloc(); >>>>> +=C2=A0=C2=A0=C2=A0 if (!args->mm) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pr_err("Failed to alloc= ate mm struct\n"); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D -ENOMEM; >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto error; >>>>> +=C2=A0=C2=A0=C2=A0 } >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 args->vma =3D vm_area_alloc(args->mm); >>>>> +=C2=A0=C2=A0=C2=A0 if (!args->vma) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pr_err("Failed to alloc= ate vma\n"); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D -ENOMEM; >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto error; >>>>> +=C2=A0=C2=A0=C2=A0 } >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* Figure out the virtual address and allocate = page table entries */ >>>>> +=C2=A0=C2=A0=C2=A0 args->vaddr =3D get_random_vaddr(); >>>> >>>> Please group args->vaddr's init with page_prot and page_prot_none ab= ove. >>>> >>> >>> Yes, It will make the code tidy. I'll move this line accordingly in v= 4, >>> but the related comments will be dropped as the code is self-explanat= ory. >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 /* Allocate page tab= le entries */ >>> >>>>> +=C2=A0=C2=A0=C2=A0 args->pgdp =3D pgd_offset(args->mm, args->vaddr= ); >>>>> +=C2=A0=C2=A0=C2=A0 args->p4dp =3D p4d_alloc(args->mm, args->pgdp, = args->vaddr); >>>>> +=C2=A0=C2=A0=C2=A0 args->pudp =3D args->p4dp ? >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 pud_alloc(args->mm, args->p4dp, args->vaddr) : NULL; >>>>> +=C2=A0=C2=A0=C2=A0 args->pmdp =3D args->pudp ? >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 pmd_alloc(args->mm, args->pudp, args->vaddr) : NULL; >>>>> +=C2=A0=C2=A0=C2=A0 args->ptep =3D args->pmdp ? >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0 pte_alloc_map(args->mm, args->pmdp, args->vaddr) : NULL; >>>>> +=C2=A0=C2=A0=C2=A0 if (!args->ptep) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pr_err("Failed to alloc= ate page table\n"); >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D -ENOMEM; >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto error; >>>>> +=C2=A0=C2=A0=C2=A0 } >>>> >>>> Why not just assert that all page table level pointers are allocated >>>> successfully, otherwise bail out the test completely. Something like >>>> this at each level. >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!args->p4dp) { >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pr_err("Failed to a= llocate page table\n"); >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D -ENOMEM; >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto error; >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} >>>> >>>> Is there any value in proceeding with the test when some page table >>>> pointers have not been allocated. Also individual tests do not cross >>>> check these pointers. Also asserting successful allocations will >>>> make the freeing path simpler, as I had mentioned earlier. >>>> >>> >>> There is no tests will be carried out if we fail to allocate any leve= l >>> of page table entries. For other questions, please refer below respon= se. >>> In summary, this snippet needs to be combined with next snippet, as b= elow. >>> >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * The above page table entries will be mo= dified. Lets save the >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * page table entries so that they can be = released when the tests >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * are completed. >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>>> +=C2=A0=C2=A0=C2=A0 args->start_p4dp =3D p4d_offset(args->pgdp, 0UL= ); >>>>> +=C2=A0=C2=A0=C2=A0 args->start_pudp =3D pud_offset(args->p4dp, 0UL= ); >>>>> +=C2=A0=C2=A0=C2=A0 args->start_pmdp =3D pmd_offset(args->pudp, 0UL= ); >>>>> +=C2=A0=C2=A0=C2=A0 args->start_ptep =3D pmd_pgtable(READ_ONCE(*(ar= gs->pmdp))); >>>> >>>> If the above page table pointers have been validated to be allocated >>>> successfully, we could add these here. >>>> >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0WARN_ON(!args->start_p4dp) >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0WARN_ON(!args->start_pudp) >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0WARN_ON(!args->start_pmdp) >>>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0WARN_ON(!args->start_ptep) >>>> >>>> Afterwards all those if (args->start_pxdp) checks in the freeing pat= h >>>> will not be required anymore. >>>> >>> >>> The check on @args->start_pxdp is still needed in destroy_args() for >>> couple of cases: (1) destroy_args() is called on failing to allocate >>> @args->mm or @args->vma. That time, no page table entries are allocat= ed. >>> (2) It's possible to fail allocating current level of page table entr= ies >>> even the previous levels of page table entries are allocated successf= ully. >> >> This makes sense as destroy_args() is getting called if any of these >> allocations fails during init_args(). Did not realize that earlier. >> >>> >>> So Lets change these (above) two snippets as below in v4: >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Allocate page table entries. The all= ocated page table entries >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * will be modified in the tests. Lets = save the page table entries >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * so that they can be released when th= e tests are completed. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0args->pgdp =3D pgd_offset(args->mm, arg= s->vaddr); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0args->p4dp =3D p4d_alloc(args->mm, args= ->pgdp, args->vaddr); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!args->p4dp) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pr_err("Failed to al= locate p4d entries\n"); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D -ENOMEM; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto error; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0args->start_p4dp =3D p4d_offset(args->p= gdp, 0UL); >> >> Dont bring the arg->start_pxdp assignments here. If all page table lev= el >> pointer allocations succeed, they all get assigned together like we ha= ve >> right now. Although a sanity check afterwards like the following, migh= t >> still be better. >> >> WARN_ON(!args->start_p4dp) >> WARN_ON(!args->start_pudp) >> WARN_ON(!args->start_pmdp) >> WARN_ON(!args->start_ptep) >> >=20 > We have to assign arg->start_pxdp here because destroy_args() relies > it to release the corresponding page tables in failing path. For exampl= e, > the args->start_p4dp is going to be release if we fail to populate > args->start_pudp. Okay. >=20 > Ok. I will add WARN_ON() for each level of page table entries right aft= er > they are assigned in v4. >=20 >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0args->pudp =3D pud_alloc(args->mm, args= ->p4dp, args->vaddr); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!args->pudp) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pr_err("Failed to al= locate pud entries\n"); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D -ENOMEM; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto error; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0args->pmdp =3D pmd_alloc(args->mm, args= ->pudp, args->vaddr); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!args->pmdp) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pr_err("Failed to al= locate PMD entries\n"); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D -ENOMEM; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto error; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0args->start_pmdp =3D pmd_offset(args->p= udp, 0UL); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0args->ptep =3D pte_alloc_map(args->mm, = args->pmdp, args->vaddr); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (!args->ptep) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 pr_err("Failed to al= locate page table\n"); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 ret =3D -ENOMEM; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 goto error; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0args->start_ptep =3D pmd_pgtable(READ_O= NCE(*(args->pmdp))); >>> >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Figure out the fixed addresses, which a= re all around the kernel >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * symbol (@start_kernel). The correspondi= ng PFNs might be invalid, >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * but it's fine as the following tests wo= n't access the pages. >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>>> +=C2=A0=C2=A0=C2=A0 phys =3D __pa_symbol(&start_kernel); >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_pgd_pfn =3D __phys_to_pfn(phys & PG= DIR_MASK); >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_p4d_pfn =3D __phys_to_pfn(phys & P4= D_MASK); >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_pud_pfn =3D __phys_to_pfn(phys & PU= D_MASK); >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_pmd_pfn =3D __phys_to_pfn(phys & PM= D_MASK); >>>>> +=C2=A0=C2=A0=C2=A0 args->fixed_pte_pfn =3D __phys_to_pfn(phys & PA= GE_MASK); >>>>> + >>>>> +=C2=A0=C2=A0=C2=A0 /* >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * Allocate (huge) pages because some of t= he tests need to access >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * the data in the pages. The correspondin= g tests will be skipped >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 * if we fail to allocate (huge) pages. >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0 */ >>>>> +=C2=A0=C2=A0=C2=A0 if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) && >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IS_ENABLED(CONFIG_HAVE_= ARCH_TRANSPARENT_HUGEPAGE_PUD) && >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 has_transparent_hugepag= e()) { >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 page =3D alloc_pages(GF= P_KERNEL | __GFP_NOWARN, >>>>> +=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0= =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 HPAGE_PUD_SHIFT - PAGE_SHIFT); >>>> >>>> Please drop __GFP_NOWARN and instead use something like alloc_contig= _pages() >>>> when required allocation order exceed (MAX_ORDER - 1). Else the test= might >>>> not be able to execute on platform configurations, where PUD THP is = enabled. >>>> >>> >>> Yes, It's correct that alloc_contig_pages() should be used here, depe= nding >>> on CONFIG_CONTIG_ALLOC. Otherwise, alloc_pages(...__GFP_NOWARN...) is= still >>> used as we're doing. This snippet will be changed like below in v4: >> >> First 'order > (MAX_ORDER - 1)' needs to be established before calling= into >> alloc_contig_pages() without __GFP_NOWARN and set a new flag indicatin= g that >> there is contig page allocated. But if 'order <=3D (MAX_ORDER - 1)', t= hen call >> alloc_pages(..) without=C2=A0 __GFP_NOWARN. There is no need to add=C2= =A0 __GFP_NOWARN >> in any case. In case CONFIG_CONTIG_ALLOC is not available, directly re= turn a >> NULL as that would have been the case with alloc_pages(...__GFP_NOWARN= ...) as >> well. >> >> Symbol alloc_contig_pages() is not available outside CONFIG_CONTIG_ALL= OC. So >> IS_ENABLED() construct will not work, unless there is an empty stub ad= ded in >> the header. Otherwise #ifdef CONFIG_CONTIG_ALLOC needs to be used inst= ead. >> >> Regardless please do test this on a x86 platform with PUD based THP in= order >> to make sure every thing works as expected. >> >=20 > Thanks, I will change the code accordingly in v4 and test it on x86 > before posting it. >=20 >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0/* >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * Allocate (huge) pages because some o= f the tests need to access >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * the data in the pages. The correspon= ding tests will be skipped >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 * if we fail to allocate (huge) pages. >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 */ >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEP= AGE) && >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IS_ENABLED(CONFIG_HA= VE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IS_ENABLED(CONFIG_CO= NTIG_ALLOC)) && >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 has_transparent_huge= page()) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 page =3D alloc_conti= g_pages((1 << (HPAGE_PUD_SHIFT - PAGE_SHIFT)), >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 GFP_KERNE= L | __GFP_NOWARN, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 first_onl= ine_node, NULL); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (page) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 args->is_contiguous_pud_page =3D true; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 args->pud_pfn =3D page_to_pfn(page); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 args->pmd_pfn =3D args->pud_pfn; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 args->pte_pfn =3D args->pud_pfn; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 return 0; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0if (IS_ENABLED(CONFIG_TRANSPARENT_HUGEP= AGE) && >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 IS_ENABLED(CONFIG_HA= VE_ARCH_TRANSPARENT_HUGEPAGE_PUD) && >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 has_transparent_huge= page()) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 page =3D alloc_pages= (GFP_KERNEL | __GFP_NOWARN, >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 HPAGE_PUD_SHIFT - PAGE_SHIF= T); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 if (page) { >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 args->is_contiguous_pud_page =3D false; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 args->pud_pfn =3D page_to_pfn(page); >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 args->pmd_pfn =3D args->pud_pfn; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 args->pte_pfn =3D args->pud_pfn; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2= =A0 return 0; >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0 } >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0} >>> >>> =C2=A0=C2=A0=C2=A0=C2=A0=C2=A0[... The logic to allocate PMD huge pag= e or page is kept as of being] >> >> IIRC it is also not guaranteed that PMD_SHIFT <=3D (MAX_ORDER - 1). He= nce >> this same scheme should be followed for PMD level allocation as well. >> >=20 > In theory, it's possible to have PMD_SHIFT <=3D (MAX_ORDER - 1) with mi= sconfigured > kernel. I will apply the similar logic to PMD huge page in v4. >=20 >>> =C2=A0=C2=A0=C2=A0=C2=A0 [... The code to release the PUD huge page n= eeds changes based on @args->is_contiguous_pud_page] >> >> Right, a flag would be needed to call the appropriate free function. >> >=20 > Yes. We need two falgs for PUD and PMD huge pages separately. A single flag should be enough, the order would be dependent on whether args->pud_pfn or args->pmd_pfn is valid.