From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id D7D60CA9EAF for ; Thu, 24 Oct 2019 13:07:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7DE4D20659 for ; Thu, 24 Oct 2019 13:07:36 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="tUrpbdi8" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7DE4D20659 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=gmail.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2AF656B0008; Thu, 24 Oct 2019 09:07:36 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23A6F6B000A; Thu, 24 Oct 2019 09:07:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 101626B000C; Thu, 24 Oct 2019 09:07:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0199.hostedemail.com [216.40.44.199]) by kanga.kvack.org (Postfix) with ESMTP id CB9D86B0008 for ; Thu, 24 Oct 2019 09:07:35 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 3B052181AC9C6 for ; Thu, 24 Oct 2019 13:07:35 +0000 (UTC) X-FDA: 76078704870.12.hot03_422243f331948 X-HE-Tag: hot03_422243f331948 X-Filterd-Recvd-Size: 15991 Received: from mail-lj1-f193.google.com (mail-lj1-f193.google.com [209.85.208.193]) by imf40.hostedemail.com (Postfix) with ESMTP for ; Thu, 24 Oct 2019 13:07:34 +0000 (UTC) Received: by mail-lj1-f193.google.com with SMTP id u22so6951596lji.7 for ; Thu, 24 Oct 2019 06:07:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=FOM7ndVHXZYwpn+oC9FBYu9U1A1waA78aHf4RJR9TG4=; b=tUrpbdi8Eq04JfJSCU2ivYJh7UkVx4nqIXjqa8J9drpcNlEVXcNo4a9YwE/JAmjIyu TRDJxWh5NoPsRPbYJHYEXSJmYcAvR1763uPXPh7WDgnuz8axippQY0aZ5jphZUvKJOUq ezEU+BUHXhPAWfutE/aUnkq1vAUfYqjsQy1OzzXrf9Ah2vlUA1EzIyvQIJcjMTBPEuM+ GcE2jhkR1lGAVhjvtZD4gE5xQ5WHb30S1lFUeq30zGi8KzCyo95piwJ590DLLGjrn+v9 FxaPlAT75rgyIg4RKDfXjskQSEOiA1ihxXAVaXK0TjbMhSVvAT4EzHb0mCAYRYcN9/Pk +VVA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=FOM7ndVHXZYwpn+oC9FBYu9U1A1waA78aHf4RJR9TG4=; b=e8grlwfX7YVKND7LhoYLeqxDucKFflZ0cjBoWrbN7znHA8Hy+24kk5ANhjE+SlMKrg TuoOd9MgpfakDl7zhm3Fbjm1KqCpgKrme7DzPZ7B5YSBJf35HBJgjQ83cJ17GK/PJmmQ DKVFGXhkY96dd2t98/kx7IPo9J4SMaVvfBldIZaep1a3fbmKvLN2exOf3/UqJuTBu1JM ftcSv3luzw0zX8X0JgikN7gduTnjzE7jIEa9jGD3KtyWiJBm/H71AfvtOYcFSGP9JflD Zt+wFXa/vZUUH8lqpPjtLkew9PHsZdP60x+aUFcpZgu/+6bUqj/229u/ZyxIRUFYQhwL ovqw== X-Gm-Message-State: APjAAAVZSwico1oZpN4VTcEOV9coZH0bxaKwqYmW8Cwa1KzB7ZnbZyhv 8PgpvN5XNaL37VfvNPzDvlu8MSHhTvxnulRrRxM= X-Google-Smtp-Source: APXvYqw5KIftPmypxtxt7CnPqUUFBNnsiv5OeK7obTXadnJ3YzS5IG8h5KuyUgciUiVK1/xe2qKna7NwxkamCsQRO6s= X-Received: by 2002:a2e:3617:: with SMTP id d23mr4728822lja.169.1571922452313; Thu, 24 Oct 2019 06:07:32 -0700 (PDT) MIME-Version: 1.0 References: <20191018101248.33727-1-steven.price@arm.com> <20191018101248.33727-15-steven.price@arm.com> In-Reply-To: <20191018101248.33727-15-steven.price@arm.com> From: Zong Li Date: Thu, 24 Oct 2019 21:07:20 +0800 Message-ID: Subject: Re: [PATCH v12 14/22] mm: pagewalk: Add 'depth' parameter to pte_hole To: Steven Price Cc: linux-mm@kvack.org, Andy Lutomirski , Ard Biesheuvel , Arnd Bergmann , Borislav Petkov , Catalin Marinas , Dave Hansen , Ingo Molnar , James Morse , =?UTF-8?B?SsOpcsO0bWUgR2xpc3Nl?= , Peter Zijlstra , Thomas Gleixner , Will Deacon , x86@kernel.org, "H. Peter Anvin" , linux-arm-kernel@lists.infradead.org, Linux Kernel Mailing List , Mark Rutland , "Liang, Kan" , Andrew Morton Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Steven Price =E6=96=BC 2019=E5=B9=B410=E6=9C=8819=E6= =97=A5 =E9=80=B1=E5=85=AD =E4=B8=8B=E5=8D=884:13=E5=AF=AB=E9=81=93=EF=BC=9A > > The pte_hole() callback is called at multiple levels of the page tables. > Code dumping the kernel page tables needs to know what at what depth > the missing entry is. Add this is an extra parameter to pte_hole(). > When the depth isn't know (e.g. processing a vma) then -1 is passed. > > The depth that is reported is the actual level where the entry is > missing (ignoring any folding that is in place), i.e. any levels where > PTRS_PER_P?D is set to 1 are ignored. > > Note that depth starts at 0 for a PGD so that PUD/PMD/PTE retain their > natural numbers as levels 2/3/4. > > Signed-off-by: Steven Price > --- > fs/proc/task_mmu.c | 4 ++-- > include/linux/pagewalk.h | 7 +++++-- > mm/hmm.c | 8 ++++---- > mm/migrate.c | 5 +++-- > mm/mincore.c | 1 + > mm/pagewalk.c | 31 +++++++++++++++++++++++++------ > 6 files changed, 40 insertions(+), 16 deletions(-) > > diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c > index 9442631fd4af..3ba9ae83bff5 100644 > --- a/fs/proc/task_mmu.c > +++ b/fs/proc/task_mmu.c > @@ -505,7 +505,7 @@ static void smaps_account(struct mem_size_stats *mss,= struct page *page, > > #ifdef CONFIG_SHMEM > static int smaps_pte_hole(unsigned long addr, unsigned long end, > - struct mm_walk *walk) > + __always_unused int depth, struct mm_walk *walk= ) > { > struct mem_size_stats *mss =3D walk->private; > > @@ -1282,7 +1282,7 @@ static int add_to_pagemap(unsigned long addr, pagem= ap_entry_t *pme, > } > > static int pagemap_pte_hole(unsigned long start, unsigned long end, > - struct mm_walk *walk) > + __always_unused int depth, struct mm_walk *wa= lk) > { > struct pagemapread *pm =3D walk->private; > unsigned long addr =3D start; > diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h > index df424197a25a..90466d60f87a 100644 > --- a/include/linux/pagewalk.h > +++ b/include/linux/pagewalk.h > @@ -17,7 +17,10 @@ struct mm_walk; > * split_huge_page() instead of handling it explicit= ly. > * @pte_entry: if set, called for each non-empty PTE (lowest-lev= el) > * entry > - * @pte_hole: if set, called for each hole at all levels > + * @pte_hole: if set, called for each hole at all levels, > + * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:= PMD > + * 4:PTE. Any folded depths (where PTRS_PER_P?D is e= qual > + * to 1) are skipped. > * @hugetlb_entry: if set, called for each hugetlb entry > * @test_walk: caller specific callback function to determine wh= ether > * we walk over the current vma or not. Returning 0 = means > @@ -45,7 +48,7 @@ struct mm_walk_ops { > int (*pte_entry)(pte_t *pte, unsigned long addr, > unsigned long next, struct mm_walk *walk); > int (*pte_hole)(unsigned long addr, unsigned long next, > - struct mm_walk *walk); > + int depth, struct mm_walk *walk); > int (*hugetlb_entry)(pte_t *pte, unsigned long hmask, > unsigned long addr, unsigned long next, > struct mm_walk *walk); > diff --git a/mm/hmm.c b/mm/hmm.c > index 902f5fa6bf93..df3d531c8f2d 100644 > --- a/mm/hmm.c > +++ b/mm/hmm.c > @@ -376,7 +376,7 @@ static void hmm_range_need_fault(const struct hmm_vma= _walk *hmm_vma_walk, > } > > static int hmm_vma_walk_hole(unsigned long addr, unsigned long end, > - struct mm_walk *walk) > + __always_unused int depth, struct mm_walk *w= alk) > { > struct hmm_vma_walk *hmm_vma_walk =3D walk->private; > struct hmm_range *range =3D hmm_vma_walk->range; > @@ -564,7 +564,7 @@ static int hmm_vma_walk_pmd(pmd_t *pmdp, > again: > pmd =3D READ_ONCE(*pmdp); > if (pmd_none(pmd)) > - return hmm_vma_walk_hole(start, end, walk); > + return hmm_vma_walk_hole(start, end, -1, walk); > > if (thp_migration_supported() && is_pmd_migration_entry(pmd)) { > bool fault, write_fault; > @@ -666,7 +666,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned lon= g start, unsigned long end, > again: > pud =3D READ_ONCE(*pudp); > if (pud_none(pud)) > - return hmm_vma_walk_hole(start, end, walk); > + return hmm_vma_walk_hole(start, end, -1, walk); > > if (pud_huge(pud) && pud_devmap(pud)) { > unsigned long i, npages, pfn; > @@ -674,7 +674,7 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned lon= g start, unsigned long end, > bool fault, write_fault; > > if (!pud_present(pud)) > - return hmm_vma_walk_hole(start, end, walk); > + return hmm_vma_walk_hole(start, end, -1, walk); > > i =3D (addr - range->start) >> PAGE_SHIFT; > npages =3D (end - addr) >> PAGE_SHIFT; > diff --git a/mm/migrate.c b/mm/migrate.c > index 4fe45d1428c8..435258df9a36 100644 > --- a/mm/migrate.c > +++ b/mm/migrate.c > @@ -2123,6 +2123,7 @@ int migrate_misplaced_transhuge_page(struct mm_stru= ct *mm, > #ifdef CONFIG_DEVICE_PRIVATE > static int migrate_vma_collect_hole(unsigned long start, > unsigned long end, > + __always_unused int depth, > struct mm_walk *walk) > { > struct migrate_vma *migrate =3D walk->private; > @@ -2167,7 +2168,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > > again: > if (pmd_none(*pmdp)) > - return migrate_vma_collect_hole(start, end, walk); > + return migrate_vma_collect_hole(start, end, -1, walk); > > if (pmd_trans_huge(*pmdp)) { > struct page *page; > @@ -2200,7 +2201,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, > return migrate_vma_collect_skip(start, en= d, > walk); > if (pmd_none(*pmdp)) > - return migrate_vma_collect_hole(start, en= d, > + return migrate_vma_collect_hole(start, en= d, -1, > walk); > } > } > diff --git a/mm/mincore.c b/mm/mincore.c > index 49b6fa2f6aa1..0e6dd9948f1a 100644 > --- a/mm/mincore.c > +++ b/mm/mincore.c > @@ -112,6 +112,7 @@ static int __mincore_unmapped_range(unsigned long add= r, unsigned long end, > } > > static int mincore_unmapped_range(unsigned long addr, unsigned long end, > + __always_unused int depth, > struct mm_walk *walk) > { > walk->private +=3D __mincore_unmapped_range(addr, end, > diff --git a/mm/pagewalk.c b/mm/pagewalk.c > index 43acffefd43f..b67400dc1def 100644 > --- a/mm/pagewalk.c > +++ b/mm/pagewalk.c > @@ -4,6 +4,22 @@ > #include > #include > > +/* > + * We want to know the real level where a entry is located ignoring any > + * folding of levels which may be happening. For example if p4d is folde= d then > + * a missing entry found at level 1 (p4d) is actually at level 0 (pgd). > + */ > +static int real_depth(int depth) > +{ > + if (depth =3D=3D 3 && PTRS_PER_PMD =3D=3D 1) > + depth =3D 2; > + if (depth =3D=3D 2 && PTRS_PER_PUD =3D=3D 1) > + depth =3D 1; > + if (depth =3D=3D 1 && PTRS_PER_P4D =3D=3D 1) > + depth =3D 0; > + return depth; > +} > + > static int walk_pte_range(pmd_t *pmd, unsigned long addr, unsigned long = end, > struct mm_walk *walk) > { > @@ -33,6 +49,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long add= r, unsigned long end, > unsigned long next; > const struct mm_walk_ops *ops =3D walk->ops; > int err =3D 0; > + int depth =3D real_depth(3); > > if (ops->test_pmd) { > err =3D ops->test_pmd(addr, end, pmd_offset(pud, 0UL), wa= lk); > @@ -48,7 +65,7 @@ static int walk_pmd_range(pud_t *pud, unsigned long add= r, unsigned long end, > next =3D pmd_addr_end(addr, end); > if (pmd_none(*pmd)) { > if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, depth, = walk); > if (err) > break; > continue; > @@ -92,6 +109,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long ad= dr, unsigned long end, > unsigned long next; > const struct mm_walk_ops *ops =3D walk->ops; > int err =3D 0; > + int depth =3D real_depth(2); > > if (ops->test_pud) { > err =3D ops->test_pud(addr, end, pud_offset(p4d, 0UL), wa= lk); > @@ -107,7 +125,7 @@ static int walk_pud_range(p4d_t *p4d, unsigned long a= ddr, unsigned long end, > next =3D pud_addr_end(addr, end); > if (pud_none(*pud)) { > if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, depth, = walk); > if (err) > break; > continue; > @@ -143,6 +161,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long a= ddr, unsigned long end, > unsigned long next; > const struct mm_walk_ops *ops =3D walk->ops; > int err =3D 0; > + int depth =3D real_depth(1); > > if (ops->test_p4d) { > err =3D ops->test_p4d(addr, end, p4d_offset(pgd, 0UL), wa= lk); > @@ -157,7 +176,7 @@ static int walk_p4d_range(pgd_t *pgd, unsigned long a= ddr, unsigned long end, > next =3D p4d_addr_end(addr, end); > if (p4d_none_or_clear_bad(p4d)) { > if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, depth, = walk); > if (err) > break; > continue; > @@ -189,7 +208,7 @@ static int walk_pgd_range(unsigned long addr, unsigne= d long end, > next =3D pgd_addr_end(addr, end); > if (pgd_none_or_clear_bad(pgd)) { > if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, 0, walk= ); > if (err) > break; > continue; > @@ -236,7 +255,7 @@ static int walk_hugetlb_range(unsigned long addr, uns= igned long end, > if (pte) > err =3D ops->hugetlb_entry(pte, hmask, addr, next= , walk); > else if (ops->pte_hole) > - err =3D ops->pte_hole(addr, next, walk); > + err =3D ops->pte_hole(addr, next, -1, walk); > > if (err) > break; > @@ -280,7 +299,7 @@ static int walk_page_test(unsigned long start, unsign= ed long end, > if (vma->vm_flags & VM_PFNMAP) { > int err =3D 1; > if (ops->pte_hole) > - err =3D ops->pte_hole(start, end, walk); > + err =3D ops->pte_hole(start, end, -1, walk); > return err ? err : 1; > } > return 0; > -- > 2.20.1 > It's good to me. Tested-by: Zong Li